Michael Beddow remarks:
> That means that any entities intended as generic
> boilerplate constituents for a wide variety of documents should
> always have
> a (correct!) text declaration. That way, they will be usable no
> matter what
> the encoding of the document into which they are included. Similarly,
> precisely in the context of a corpus where more than one encoding may have
> been employed, it is wise to take precautions against
> hard-to-trace encoding
> muddles by furnishing each entity with an appropriate text declaration.
>
A further aspect of this which has always perplexed me is that (unless I'm
mistaken) the encoding of an entity which embeds another one does *not*
become the default for the embedded entity. In other words, if I have a
corpus of non-UTF-8 encoded entities, it is not enough simply to stick an
appropriate encoding declaration on the outermost entity which embeds all
the others: if they don't have their own declarations, they will default to
UTF8 and things will go Horribly Wrong.
L
|