On Wed, 21 Mar 2001 10:17:14 -0500, Syd Bauman <[log in to unmask]> wrote:
>> ... we are not sure about how to encode or indicate the language of
>> each work. We consider that a good place to do it can be the
>> attribute id= [of] the TEXT element.
>The lang= attribute of the TEXT element would be *much* better.
>Besides being the place were this information is supposed to go,
>this frees up id= for what it is for -- a unique identifier. Even
>though your files are separate now, it is not at all unreasonable
>to think that in the future you might want to assemble them all
>together as one big corpus (e.g., TEICORPUS) file. If you have a
>project-wide unique id= on each TEXT ahead of time, this will
>facilitate the task. At the WWP each encoded text has a reference
>number. The physical version in the file cabinet is referred to
>by "OT" (for "office text") followec by the 5-digit reference
>number; the corresponding TEI file has an id= of "TR" (for
>"transcription") followed by the same 5-digit reference number.
>This allows for an easy way to find the physical copy associated
>with each file and vice-versa, too.
The only problem with what you propose is that the DTD doesn't validate the
file, it says that the *id* attribute should be filled in. So I tried all
possible solutions: first I filled in only the *lang* attribute and
ERROR!!, then I tried to fill in both attributes (*lang* & *id*) and it
validates, and finally I tried to fill in only the *id* attribute and it
also validates; so I concluded that *id* was the correct one.
Please, tell me if this is usual or if I made some mistake.
Thank you
Manuel Sánchez
|