I'm interested in both, though it looks like my old file primarily (maybe entirely) relies on the automatic breaks inserted by the software. Darn.
Depending on the advice on conversion, we'll have to decide whether to spend our time making changes to the Word document (such as inserting manual page breaks) or just adding the appropriate <pb> encoding in the XML file post-conversion.
From: TEI (Text Encoding Initiative) public discussion list [mailto:[log in to unmask]] On Behalf Of Kevin Hawkins
Sent: Wednesday, April 03, 2013 1:21 PM
To: [log in to unmask]
Subject: Re: Word to TEI: capturing page breaks?
On 2:59 PM, Sebastian Rahtz wrote:
> On 2 Apr 2013, at 20:08, Dana Dorman<[log in to unmask]>
>> Is there a simple way to retain MS Word page breaks in transformations from .doc files to TEI P5 XML?
> I see no problem in converting Word page breaks to<pb/> in general;
> getting them out of ODT should be easy too, if needed.
There are different types of page breaks in word processors, so I'd like to clarify for everyone involved.
There's the kind that forces a break at that point, which you insert manually. This has various subtypes: some start a new "section"
(important for the size of margins etc.), whereas others continue in the same section.
But then there's also the kind of breaks that just happen to occur because that's where the text flows. If you edit the text, it reflows, and the break occurs at a different point.
Dana, which are you interested in? And Sebastian, which do you see no problem with?