I don't see a conflict between <w> and <choice>, and I think the "choice" (HA!) Is really between whether you are trying to encode that there is a word and that word is inscribed one of two ways, or that there is a choice between two words with the same meaning and different inscriptions.  If it's the former, if say put the choice inside the word, if the latter, put the word(s) inside the choice.

On Jun 11, 2015 5:50 PM, "Piotr Bański" <[log in to unmask]> wrote:
Dear Torsten,

Like I said in an earlier message, to me, the choice between (b) and (c) depends on some project-internal assumptions -- if for some reason it would be easier for your processing or visualisation tools to be presented with a continuous stream of <w>s and, at the same time, the usage of <choice> would be heavily restricted, then well, one could consider (c). Choice (b) requires adjustments to how you count <w>s or how you point at them for visualisation (and there's a few more hidden assumptions here).

As far as semantic equivalence between <w> and <choice> is concerned, then from the "lexical" point of view, there is none. From the "constructional" point of view, in those cases that you have selected, there might be equivalence on the bottom-up route, but from the top-down perspective (and as usual depending on some other assumptions), you might need an extra test for each occurrence of <choice> to see if it's really meant to stand for <w>, which altogether might negatively influence the processing time and complexity.

Yet another bunch of assumptions concerns the way in which the markup is to be constructed, and by who. It may sometimes be easier to have encoding guidelines in which every instance of a word is to be clad in <w> tags, and <choice> used for (... | ...) at whichever level.

HTH and best regards,


On 11/06/15 16:45, Torsten Schassan wrote:
Dear Magdalena, dear Piotr,

thanks for your answers and your thoughts.

The reason why I put "(sometimes)" in the subject line was exactly that
in those cases where <choice> contains just that single "word" I do see
the semantic equivalence. Nonetheless, I think that <choice> doesn't
target at sub-word level but rather at words or multiple words. Thus,
I'm surprised that both of you consider to go for (c), I thought (b) to
be more natural.

To answer to your other questions:

- Will your <w>s carry IDs? And if so, for what purpose(s)?

<w> might carry IDs but its main purpose is to deal with the <lb/>
according to the needs of the reader: Either show line breaks and the
separator or to suppress both. Another typical application in our
digital library is to attach to it the coordinates of the word on the
page in order to highlight a search result.

- Do you envision using <choice> for anything else than hyphenated words?

Yes, we use it for all possible editorial pairs (abbr+expan, sic+corr,
orig+reg), it is a relatively rare case that one of these comes with an
additional line break. Thus we've got "a stack" of incidents that the
XSLT has to take care of during publication.

Best, Torsten

Am 11.06.2015 um 16:13 schrieb Magdalena Turska:
Dear Torsten,

I think the third option is probably most typical situation - the
abbreviation is at the word level so you can wrap the <choice> in a <w>.

As to why you want to mark up words it's a whole different story. You said "in
our editions we usually wrap words (tokens) that go across lines in <w>,
e.g. <w>con=<lb/>silio</w>". Are they the only words you mark with <w>? If
so, why do they deserve this special treatment? I think only answering
these questions would allow to judge one way of encoding "better" than the


On 11 June 2015 at 14:28, Torsten Schassan <[log in to unmask]> wrote:

Dear all,

in our editions we usually wrap words (tokens) that go across lines in
<w>, e.g. <w>con=<lb/>silio</w>.

Now, that word is abbreviated and that fact would be represented using

Would you say <choice> works on the same level as <w> thus only one of
them is needed, or not? Indeed, <w> is part of model.segLike while
<choice> can contain larger portions of text thus belonging to
model.linePart and model.pPart.editorial.

Which encoding option would you consider be best?

a: mutually exclusiveness
either just <w>con=<lb/>silio</w>

b: <w> inside

c: <w> outside

Curious, best, Torsten

Torsten Schassan
Digitale Editionen
Abteilung Handschriften und Sondersammlungen
Herzog August Bibliothek, Postfach 1364, D-38299 Wolfenbuettel
Tel.: +49-5331-808-130 (Fax -165), schassan {at}