My intern, Amanda, has been experimenting with the Word=>TEI conversion module that's part of OxGarage with little success. We have a couple of encoding projects that rely on Word documents as the source documents (one more complicated by the use of "Track Changes"), and I thought perhaps it was high time to explore options other than "copy and paste" to facilitate their encoding workflow.
At this point, I am not sure of other troubleshooting/trial and error mechanisms to explore so I appeal to you for your wisdom and help.
Here's the short version of what she's done so far:
1. Converted a docx file straight up to TEI. Problems: headings and lists were not recognized, empty comment tags appeared, random empty elements
2. She revisited the Word docx, explicitly applied Word styles to various structural elements. We were sure that in so doing, lists, headings, etc. would transform properly. Only the headings transformed properly. Still problems with all the rest.
Then Amanda took it to Open Office and used the TEI plugin converter. Similar problems.
I dug around to see if I can access the underlying XSL(s), but I am not sure where to go. I came here: <http://www.tei-c.org/release/doc/tei-xsl-common/>, but the documentation link for "default conversion from docx" is broken (http://www.tei-c.org/release/doc/tei-xsl-common/profiles/default/docx/from.html).
I should say that these projects, aside from the track changes which I imagine can be mapped to additions, deletions and notes (what we do manually), are fairly simplistic in structure. Divisions, headings, paragraphs, lists, tables and the rare figure.
We are also operating under limited technical expertise (harken back to Julia's original post on 2/28/2011, http://tei-l.970651.n3.nabble.com/simple-instructions-for-converting-Word-to-TEI-td2596960.html) so bear with us.
| Michelle Dalmau, Digital Projects & Usability Librarian
| Indiana University Digital Library Program
| Herman B Wells Library
| 1320 East 10th Street, W501
| Bloomington, Indiana 47405
| (812) 855-1261, [log in to unmask]