Thank you all for your attempts at shedding light on this one. I
realize, instead of moaning about the state of world affairs in
general, I should have described my problem more specificly.
Michael Beddow <[log in to unmask]> writes:
> I can see a problem as far as correspondence and related issues goes
> (especially postscripts) but I don't see one in this example. If the second
> lot of stuff truly doesn't belong to the previous or the following div, then
> it appears to belong in a div of its own, so why not give it one? This
> looks to me more like a problem of document analysis rather than a tagset
Quite so, so I will try to be more verbose this time. The problem is
haunting me in different guises, but the root cause is always quite similar:
Many premodern Chinese texts started their life being written on
scrolls made from sheets of paper that had been glued together before
the writing process started. Since the length of a scroll was more or
less fixed by convention (and the convenience of the reader), longer
texts needed multiple scrolls to be written on. For that reason,
every scroll would have a "running header" with the title of the text,
the number of the scroll and usually some additional information, for
example the original author, commentator (in the case of a text with
commentary) and so on. The end of the scroll might have a "trailer",
indicating the title and number of the scroll. After this trailer,
some texts have additional material appended, which is out-of-context to
the running of the original text.
With the advent of printing on woodblocks in the tenth century A.D.,
sheets where no longer glued together before the writing occurred, but
rather they where folded and stitched together to form little
booklets, but the numbering scheme and running headers / trailers
where kept as in the scrolls and they are kept as part of the text
even in modern reprints.
The challenge now is to find an appropriate way to encode this content
with the TEI framework. Since the "scroll boundaries" are features of
the medium that carried the text, <milestone> -like handling would be
an obvious candidate, which leaves the logical structuring along
<div>'s for the content. One problem arises if there is some
additional commentary or note, or other text in between the divisions
of the content as in the following example, but there is also the
general problem of finding a good way to deal with the opener/closer
Here is an example with bogus English in place of the Chinese.
<milestone unit="juan" n="1"/>
<p type="opener"><title>example text</title> juan 1</p>
<!-- end of scroll one -->
<p type="closer"><title>example text</title> juan 1</p>
<p>this text has been copied by ...</p>
<p type="opener"><title>example text</title> juan 2</p>
<milestone unit="juan" n="2"/>
As can be seen, the header/trailer lines have for the time being
tagged as paragraphs and are distinguished from the running text with
an attribute value, which is clearly not ideal, and is currently used
in the data-capture phase. Any better suggestions are most welcome.
At some point, following a suggestion by Lou Burnard, we tried using
<fw> for this purpose, but this turned out to be of little help, at
least with its current content model.
I hope this message is providing more details to usefully continue
this interesting debate.
All the best,
Institute for Research in Humanities, Kyoto University
47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN