Dear all,


The following arose in connection with a medieval Hebrew manuscript, but strike me as occurring more broadly, and I'd be interested in how people have addresses issues like this.


1. The first issue is a scribe's idiosyncratic use of diacritics (vocalization points to be precise). One diacritic mark looks much like a macron. When two adjacent characters both take this mark, the scribe uses a longer mark to apply to both letters. Using macrons and Unicode combining overline, this is analogous to:

a̅e for āē


One question is whether one should address the markup for this "grammatically" or "orthographically"

In the first case, I would do something like:

(a) <seg type="doubleRafe"><c rend="doubleRafe>ā</c><c rend="doubleRafe">ē</c></seg>


In the second, I would define a glyph called "doubleRafe" and refer to it along the lines of:

                (b) a<g ref="#doubleRafe"/>e


This captures the way the MS is written, but one would need special processing to figure out if "e" is to be treated as marked.


2. One broader issue here is whether this class of diacriticals should be considered part of the "main" character (and thus bound with it in a <c> element) or characters on their own. I encountered this as a specific markup problem when the pointing (in my case, done by a second hand) is damaged or unclear while the consonantal text is not.


In example (a) above, say the macron over a is damaged. This becomes:

(c) <seg type="doubleRafe"><c rend="doubleRafe">a<damage>◌̄</damage></c><c rend="doubleRafe">ē</c></seg>


However, this is use of <damage> within <c> is prohibited, presumably because a character is assumed to be an indivisible entity that cannot be partly damaged.


I'd be curious to know if anyone on the list has encountered issues such as this and how you have dealt with it.


All best,