For the purposes of processing, of course, a sequentially-ordered unique identifier for each page break is useful - on several projects I've worked on, this has been accomplished using @xml:id assigned a value created from a four character identifier, an abbreviation for page break, and a sequential number independent of anything printed/written on the page (e.g., xml:id="DIST-pb-0209"). When the numbering is regular, this alone would be enough to generate labels using automated processes (XSLT with the xpath substring functions, etc.). For leaves/pages with irregular numbering, one could use @n or @rend to record the 'erroneous' label to be processed differently (XSLT: <xsl:when test="@n">, etc.). In other words, only pages with non-sequential or repeated numbering would need the extra attribute:

<pb xml:id="DIST-pb-0288"/>
<pb xml:id="DIST-pb-0289"/>
<pb xml:id="DIST-pb-0290" n="209"/>

I wouldn't think this is too unusual a situation nor always the result of error - several kinds of books come to mind that might contain repeating page labels (omnibus editions of multi-volume series, works with two parts that are independently numbered, etc.).


On Fri, Jan 16, 2009 at 3:14 PM, Martin Holmes <[log in to unmask]> wrote:
HI folks,

We have a set of printed books in which page-numbering is frequently erratic; numbers are omitted and repeated, and sometimes the order of digits in the page number is wrong.

I'd be glad to know how anyone else has handled marking up this problem. One approach we took initially for digit-ordering was to do this:

<pb n="146" rend="164" />

where what should have been 146 was printed as 164. But on another volume, I've found that page numbers 18 and 19 were repeated, meaning that everything subsequent to that is "wrong"; that's made me reconsider what I mean by "wrong" in this context, and whether a page number might just be better viewed as a label rather than a necessarily unique identifier for a page.

Martin Holmes
University of Victoria Humanities Computing and Media Centre
([log in to unmask])
Half-Baked Software, Inc.
([log in to unmask])
[log in to unmask]