> I can certainly imagine representing characters (or even pieces of
> characters!) with elements, though I didn't know that P5 contained
> such a scheme.
This is worrying. It means that the TEI is not doing its
job right! I shall pursue this. In the meanwhile, see
http://www.tei-c.org/Activities/CE/ for a draft of
the relevant chapter. The P5 test releases of DTDs and Schemas
implment all this, of course
At the simplest end, I guess one could use simple empty
> elements <char n="abquod" codepoint="U+2345"/>; at the other end, one
> could create a whole inventory of features:
indeed. thats the sort of thing that is in the new module
> Aside from being familiar, shorter, easier to read, and easier to
> maintain, the only thing that springs to mind is that elements cannot
> appear in attribute values: <note n="&thistle;">
the working group on character encoding worried about this
sort of thing a lot. They came down on the side of
using elements wherever possible.
> be served by a different format. So long as the format is transparent
> and the information convertible, which format one uses for what surely
> depends more on the local mix of habit, workflow, technical and
> intellectual resources, and pure whim than on anything intrinsic.
yes. but elements and entities are not equal citizens. Entities
are a transient effect between an input text and the XML parser,
and leave no record in the info set. So they do not survive
an identity transform, for instance. Since they are tied to DTD
stuff, they are not even guarenteed to be available, unless they
are in the DOCTYPE of each instance document.
> image seems almost perverse. I already know what the character
> looks like: I want the transcription to tell me what it *is*.
indeed. its a fair point. I'd still use markup if I was capturing
this sort of complex stuff.
> - If you're me, Fonts are hard to make and harder to maintain in an
> environment when characters are added weekly. Unless, of course,
> one uses the font for unicode characters and elements for
> non-Unicode characters.
when you meet a new character and assign it a new position in
your private use area, how you record what it looks like?
This is all good stuff; it's a matter of sadness to me
that it is happening _after_ the character encoding workgroup
has more or less finished its work. Lets hope that they
knew of all the concerns you raise.