Print

Print


Greg Murray (responding to Maja Bärenfänger) wrote:

> For this to work, you'll need to:
>
>  * Change <!DOCTYPE TEI.2 ... to <!DOCTYPE teiCorpus.2 ...
>
> * If the individual TEI.2 documents like 44-08-22.xml have XML
>    declarations and/or DOCTYPE declarations, remove them


We're agreed on these points (plus the addition of <!ENTITY % TEI.corpus
"INCLUDE" > to the internal subset of the master document.)

But in the light of various encoding-clash hassles in which I've been
entangled recently, it might be worth pointing out that external general
entities, while they should indeed not have an XML declaration (though some
parsers seem to turn a blind eye if they do) may in some cases definitely
need a text declaration (This is an XML issue, not a distinctively TEI
one)..

For those who haven't encountered such beasts recently, text declarations
can look uncannily like XML declarations. Here's one:
<?xml version="1.0" encoding="EUC-JP"?>
For anyone forgivably asking how that's supposed to be different from an XML
declaration, two things:
1) the version information is optional; 2) the encoding information is
mandatory.

Additionally, a text declaration may head up a piece of text which is not a
self-contained well-formed XML document, but is nevertheless a valid general
entity (since such entities are exempted from the well-formedness constraint
of having a single enlosing document element).

The purpose of a text declaration is to enable the parser to take
appropriate action  if  the external entity uses a different encoding from
the container document. That means that any entities intended as generic
boilerplate constituents for a wide variety of documents should always have
a (correct!) text declaration. That way, they will be usable no matter what
the encoding of the document into which they are included. Similarly,
precisely in the context of a corpus where more than one encoding may have
been employed, it is wise to take precautions against hard-to-trace encoding
muddles by furnishing each entity with an appropriate text declaration.

One final caveat though. Encoding declarations, whether in XML or text
declarations, are indeed simply that: declarations, not magic incantations.
They do not change the encoding of the text they precede. If that encoding
is other than what you declare it to be, your last state will most certainly
be worse than your first.

Michael Beddow