I would only add that far from being anomalous, the problem of
polysemous glyphs is extremely common. I.e. one frequently
encounters (semantic) characters represented by glyphs
that also (or usually) represent a different character or characters.
The 'con-' / '-us' overlap is common enough that we tend to
regard them as the same glyph (with some formal variants
belonging to only one or the other), albeit not the same character
(brevigraph). But the problem shows up nearly every day; even
while writing this, I was interrupted by a question from one
of my editors about a book that routinely uses the usual
glyph for 'quod' where one would expect the glyph for '-que'.
And there are many many more examples, especially, but not
exclusively, with brevigraphs. "Rx" (recipe) in place of
the "-rum" glyph or the "response" glyph, or vice versa,
is a common one, for example.
Our general policy when the use is systematic is to simply
extend the permissible formal range of the character, i.e.
we capture according to sense, not form, rather than employ
TEI's admirably nifty mechanism for capturing both.
The problem, as Syd's followup question suggests, is in
establishing when the use is systematic: at what point does one simply
regard the "con- / -us" glyph as chronically ambiguous, as
one routinely does with (say) l/1 (one/el) or O/0 (letter-o/zero)?
By a crude taxonomy: some of the substituted glyphs appear to be mistakes;
some appear to be deliberate authorial or compositorial expedients; and
some are common enough to be thought of as traditional or (almost) standard.
But it is often hard to tell the difference between 'typos' (the first
category), 'hijacked glyphs' (the second category), and 'standard
variants' (the third). "con-" and "Rx" belong to the third, I think;
"quod" for "que" and "Greek-ou-ligature" for "Welsh vocalic w" to
the second; and "ct-ligature" for ampersand to the first (or maybe
the second).
Further complications arise from Unicode's own inconsistency with
respect to the meaning of 'character' (some of the code-points are
much more semantically weighted than others, and some more
formally ambiguous than others); ignorance of meaning on the
part of the transcriber (if all you know is the form, then
the form is all you can capture); cases in which glyphs are
used with no meaning at all, or with hidden meaning (e.g. in ciphers);
and all sorts of diachronic complexities (e.g. at what point
does "z" cease to be a formal variant of 'yogh' or of the brevigraph
for words in '-et' and deserve to be captured simply as
"z"? When do you start capturing "Dalziel" and "viz." with 'z'
instead of 'Dal{yogh}iel' and 'vi{brevigraph-et}'?
I'm sure I don't know!
pfs
On Thu, 29 Apr 2010, Syd Bauman wrote:
> A hasty response, as I have to go soon ...
>
>
> My instinct is that you've got this almost exactly spot on the best
> way to do it. The one hitch is that I would not put the U+035E in the
> document as the content of <g>. I'd put either U+0305, or nothing:
>
> ...cora<g ref="#varCombOvrLn"/>, in<g ref="#varCombOvrLn"/>otescat...
>
> The nothing case leaves it completely up to the processor to decide
> whether to resolve the <g> with the standard U+0305 mapping or the
> orthographic U+035E mapping. If you use U+035E as the content, then
> (in theory) a processor that did not know how to decipher <charDecl>s
> could just ignore the the <g> tags and (by leaving only the content
> of the <g> element) would end up with the logically "correct"
> character.
>
>
> Is the "con" glyph consistently used for "us", or is this just an
> anomalous error:
> <choice>
> <sic>ꝯ</sic>
> <corr>ꝰ</corr>
> </choice>
>
>
>> I have a question about how to represent glyphs and characters in
>> P5 when the abstract character seems to be one thing, but its glyph
>> in the document is visually identical to another character. I have
>> two examples.
>>
>> First, in my document (a 16th-century alchemical work in Latin) the
>> combining overline representing a nasal contraction regularly
>> extends over both the preceding and succeeding characters, even if
>> the succeeding character is a space or punctuation. You can see two
>> examples at
>> https://mywebspace.wisc.edu/pcgorman/web/tei/OverbarExample.jpg
>>
>> The Unicode character x035E, COMBINING DOUBLE MACRON, very closely
>> matches the visual presentation of this character in this work, but
>> over the past years I've followed the
>> 'nasal-stroke-is-an-overline-not-a-macron' discussion. In that
>> light, one might see this as a variant glyph of the combining
>> overline. Taking that as a hypothesis, would this be an appropriate
>> way to mark up the document:
>>
>> <charDecl>
>> <char xml:id="combOvrLn">
>> <charName>COMBINING OVERLINE</charName>
>> <mapping type="Unicode">̅</mapping>
>> </char>
>> <char xml:id="combDblMacr">
>> <charName>COMBINING DOUBLE MACRON</charName>
>> <mapping type="Unicode">͞</mapping>
>> </char>
>> <glyph xml:id="varCombOvrLn">
>> <mapping type="Standard">
>> <g ref="#combOvrLn"/>
>> </mapping>
>> <mapping type="Orthographic">
>> <g ref="#combDblMacr"/>
>> </mapping>
>> </glyph>
>> </charDecl>
>> ...
>> <!-- in the document -->
>> ...cora<g ref="#varCombOvrLn">͞</g>, in<g
>> ref="#varCombOvrLn">͞</g>otescat...
>>
>> Note that in the document, the <g> contains the double macron but
>> references the <glyph>.
>>
>> My second example is also in the image above. In this document, the
>> abbreviation for "us" is represented by a character which looks
>> like the conventional glyph used for "con", and in fact the same
>> glyph is used for "con" elsewhere in the document. Unicode xA770
>> (MODIFIER LETTER US) is elevated, but in this document is always on
>> the baseline, as is xA76F (LATIN SMALL LETTER CON). Could I handle
>> this in the same way as the overline above, by using the visually
>> appropriate character but encoding it as a variant glyph of the
>> logically appropriate one? In this case, MUFI 3.0 defines a
>> character in the PUA for a baseline "us", but the glyph it provides
>> is still less a match for the glyph in my document and "con".
>>
>> Of course, where the glyph is used for "con" I'd just reference the
>> appropriate character directly.
>
>
>
--------------------------------------------------------------------
Paul Schaffner | [log in to unmask] | http://www.umich.edu/~pfs/
316-C Hatcher Library N, Univ. of Michigan, Ann Arbor MI 48109-1190
--------------------------------------------------------------------
|