Paul F. Schaffner wrote:
> I can certainly imagine representing characters (or even pieces of
> characters!) with elements, though I didn't know that P5 contained
> such a scheme.
See the proposals from the TEI Working Group on Character Encoding
issues at http://www.tei-c.org/Activities/CE in particular CE/FASC-wd.pdf
At the simplest end, I guess one could use simple empty
> elements <char n="abquod" codepoint="U+2345"/>; at the other end, one
> could create a whole inventory of features:
Something like that is proposed
> <char codepoint="U+2345">>
> <desc>lower case q with loop through descender</desc>
> <expan type="typical" lang="lat">quod</expan>
> I obviously have no idea what P5 does, but does it fall somewhere
> between those two extremes?
It has both actually! You define properties for your character or glyph
in the header and then reference them from the document.
>>can you say why entities are better than elements?
> Aside from being familiar, shorter, easier to read, and easier to
> maintain, the only thing that springs to mind is that elements cannot
> appear in attribute values: <note n="&thistle;">
> - Fonts typically only display forms, not meanings, so they are
> not good at editing homoglyphs (is that a word?)--glyphs that
> look identical but represent different characters with different
> meanings. &dram; and &yogh; (as well as several identical-looking
> abbreviation symbols) are again an example; or consider
> &trine;, &fire;, and ▵, all of which appear as upright
> open triangles. The first is an astrological symbol for 120-degree
> separation; the second an alchemical symbol for fire; the third
> an upright triangle with no specific meaning. One does not
> even need to reach so far for examples: hard vs. soft hyphens;
> superscript o vs. mathematical degree sign; etc. Character capture
> is a kind translation of image into text: a kind
> of textual description. To turn that immediately back into
> image seems almost perverse. I already know what the character
> looks like: I want the transcription to tell me what it *is*.
I think this mistakes the purpose of *character* encoding. The fact that
the same glyph (the upright open triangle) can have different meanings
doesn't make it any less of an upright open triangle. If it was
represented by the characters "tr" you wouldn't claim that one "tr" was
somehow a different sequence of letters from another. Instead you'd
think about ways of representing the interpretation you wished to place
on each instance.