On Fri, 16 Jan 2009, Martin Holmes wrote:

> I'd be glad to know how anyone else has handled marking up this problem. One 
> approach we took initially for digit-ordering was to do this:
> <pb n="146" rend="164" />
> where what should have been 146 was printed as 164. But on another volume, 
> I've found that page numbers 18 and 19 were repeated, meaning that everything 
> subsequent to that is "wrong"; that's made me reconsider what I mean by 
> "wrong" in this context, and whether a page number might just be better 
> viewed as a label rather than a necessarily unique identifier for a page.

As usual, we take a simple and radical approach. In almost all
of our current work*, we use <pb @n to capture the page number
as printed in the source, slightly normalized (e.g., "(4)" and
"--4--" are both captures as <pb n="4"). This has 
the drawback, of course, of forcing us to capture literal data
in an attribute, but is otherwise congenial to our modest
editorial ambitions. There are many, many cases, perhaps even
a majority of books, in which the notion of a 'correct' page
number is highly problematic, and the prospect of supplying
one far too ambitious. IN any case, @n is for us a record of
the label that the printer placed on the page, not an identifier,
and certainly not a unique one.

Further reflections...:

-- <pb for us is not a page but (as it were) a repeatable
    event or action of turning the page. This implies that
    <pb cannot easily be used to supply a unique identifier to
    the page. In the course of reading a single page may be
    turned to more than once, leading to some contrived ID
    numbering to avoid duplication.

-- <fw is perhaps the correct way to record the page
    number is you want to apply markup to the number
    itself (e.g. sic/corr). We don't use <fw much, but
    we could and maybe should.

-- several different sequences are potentially involved
    in any set of pages: the actual sequence of pages in the
    printed source; the intended sequence of the pages in the
    printed source (which may well differ, due to misbinding,
    etc.); the sequence of image surrogates (if any); the
    nominal 'page number' order; and the intended *reading*
    order. We capture the intended reading order simply by
    arranging the <pbs in this order (though in some implementations
    we add a @seq attribute); the order of images implicitly
    by means of their identifiers (recorded with home-brewed
    @ref or latter-day @facs); and ignore the others.

    I don't think @rend is right for any of these.


* in ancient times (say ten years ago), we resorted to some
informal markup in recording @n values, viz., placing supplied
or corrected values in brackets (<pb n="12[5]">). I do not
say this proudly.

Paul Schaffner | [log in to unmask] |
316-C Hatcher Library N, Univ. of Michigan, Ann Arbor MI 48109-1205