David Sewell wrote
> I'm in the middle of editing TEI-tagged transcriptions of correspondence
> from the late 18th - early 19th centuries. I have a couple of questions
> about the handling of "typographical" features in the manuscript.
> 1. The editors have used the <emph> tag for all spans of text that are
> underlined, superscripted, or whatever, in the letters. I'm aware that
> TEI P4 distinguishes between <emph> as "linguistically emphatic" versus
> <hi> as "typographically 'emphasized'."
My personal line on this one is that in such circumstances "<hi>" is always
right (since the typographical/calligraphical highlight is indisputibly
there) even though it may not be perfect (since it flattens the distinction
between locations where the glyphs/strokes attempt to signal "linguistic"
emphasis and those where they don't). <emph> by contrast is pretty likely to
be wrong in at least some cases (assuming the declared intention of using
that element is indeed to mark "linguistic emphasis") unless it is being
applied on the attributed editorial responsibility of an someone with
expert and close knowlege of the text, with the express intention of being
able to process out such features. (Or unless, in a very different context,
a recording is being transcribed where there are no "<hi>"-lights to be
encoded in the first place)
> 2. I'm finding this kind of construction in the tagging:
> Boston, 27<emph rend="underline"><emph
> It parses, since <emph> is allowed to be recursive (likewise <hi>). Is
> this acceptable practice, or is it better to do something like
> Boston, 27<emph rend="underlined_superscript">th</emph> August
The recursive approach means that you may end up having to encode multiple
rendition types in different ways on different elements (since rend, being
global, can crop up anywhere, including on elements that can't nest in this
way). Some projects have extended the DTD to implement something like the
HyTime "rendition ladders", but it is possible to use a "laddering"
convention without any extension by agreeing on a separator to concatenate
the rendition components into a single attribute value which a processor can
then tokenise out.
Sebastian's XSLT does this out of the box for rendering to HTML or FOP. Just
tell it what your separator character is in the parameterisation sheet, and
after that it will split the value of your rend attributes on that character
separator and apply each of the rendition values in the resulting list of
tokens in turn. Very neat indeed.