[Sorry for the delay in answering your mail, it's been a busy week]
Syd Bauman wrote:
>>I've started encoding old english poems using the TEI standard, and
>>I wondered about the <seg> element: my lines look like this
>> <seg type="hemistichA">HwŠt! Ic ■ysne sang</seg>
>> <seg type="hemistichB">sigeomor fand</seg>
>>Can I use TEILite as a DTD? It seems to me that <seg> has a different,
>>more limited meaning in TEI Lite.
> I can see where you'd get that impression from reading parts of TEI
> U5, but I don't think it is what was intended or a proper conclusion
> to draw. I think the semantics of <seg> in Lite are precisely the
> same as in other views of the TEI scheme: <seg> means whatever you
> say its type= (and subtype=) attributes mean. To quote P4:
> The <gi>seg</gi> element may be used at the encoder's discretion
> to mark any segments of the text of interest for processing.
Compare that to this paragraph from TEI Lite:
A more general purpose segmentation element, the <seg> has already been
introduced for use in identifying otherwise unmarked targets of cross
references and hypertext links.
I was under the impression that for every every <seg> element there has
to be a pointer of some kind, thanks for clarifying the issue. As I've
decided to add a full critical apparatus to my texts, I won't be using
TEI Lite anyway.
> Certainly the sub-structure of a metrical line may well be of
> interest for processing.
> That said, presuming that making use of <caesura> or part= of <l> is
> insufficient for your purposes, I think it may make more sense to use
> your own view of the TEI DTDs and create an <hs> (for "hemistich")
> element as syntactic sugar for <seg type="hemistich">. I don't
Remember that I qualified myself as a newbie :)
I understand that my <seg type="hemistichA/B"> is not the most efficient
solution, any advice as to improve it is more than welcome. If the only
sensible way is to create my own view of the TEI DTD I will investigate
in that direction.
BTW, I've been looking for TEI based sample encodings on the web, to
read and study as examples, but I've found very little (many of the
links in the TEI projects page seem to have changed). Can some fellow
encoder point me to old/middle english (or even classical) texts encoded
using TEI P4?
> know enough about classical poetry to say myself, but if a line of
> poetry may be divided into more than two, you might want a more
> general-purpose name.
The important distinction is that between the first and the second half
of the germanic long verse (see below).
>  P4:2002-03, page 939. http://www.tei-c.org/P4X/ref-SEG.html. I'll
> take this opportunity to point out that I'm not very fond of the
> first example. Without seeing the associated TEI Header it's
> impossible to know, but it seems as though the two <seg>s have
> different meanings (perhaps "question" and "answer"), in which
> case they really oughta have type= attributes to specify which is
> which. I am almost of the opinion that type= should be required
> on <seg>. (I am of the opinion it should be required on <div> et
>  Why do you have "hemistichA" and "hemistichB"? Is order of
> occurence within the <l> insufficient?
Unfortunately, it is: a single germanic verse can be split in two
hemistichs, or half-verses, which are tied by alliteration. The latter
follows different rules in h. A vs. B, which means that if I want to
perform searches relating to alliteration I'd have better to distinguish
between the two. Furthermore, it could be interesting to investigate
formulas as they occur in h. A and B.
> P.S. I'm a little concerned about either your character encoding or
> my (lack of) understanding of Unicode encoding formats. The
> following is under the brash assumption that you are using XML
> and thus ISO 10646 aka Unicode encodings. If this is not the
> case, I'm probably wasting my time. Anyway, I noticed that the
> third character of hemistichA is an actual 0xE6 (as opposed to
> "æ", "æ", "æ", or some such). None of the
> Unicode encoding formats with which I'm familiar would encode a
> U+E6 (LATIN SMALL LETTER AE (ash)) as =E6 (i.e., 1110 0110). In
> UTF-16 it would be =00=E6 (i.e., 0000 0000 1110 0110 presuming
> big-endian order). In UCS-4 it would be =00=00=00=E6 (i.e., 0000
> 0000 0000 0000 0000 0000 1110 0110; this may be why so few use
> UCS-4). In UTF-8 it would be =E0=83=A6 (i.e., 1110 0000 1000
> 0011 1010 0110). Am I missing something here? Did some mail
> gateway munge it? Or does your editor just do the wrong thing?
> Mine does, which is why I noticed it in the first place. :-)
I'm sorry to tell you that you've wasted some of your time :) I'm still
using an non Unicode compliant editor, this is an experimental phase
before starting real work.
Many thanks for your comments.
Roberto Rosselli Del Turco e-mail: [log in to unmask]
Dipartimento di Scienze [log in to unmask]
del Linguaggio Then spoke the thunder DA
Universita' di Torino Datta: what have we given? (TSE)
Hige sceal the heardra, heorte the cenre,
mod sceal the mare, the ure maegen litlath. (Maldon 312-3)