I am working at a project that involves annotation and alignment with
image zones of individual characters in manuscript transcriptions
We are going to use <g> to encode ligatures, exactly as shown in an
example in the Guidelines
At a certain stage of the project, the transcriptions will need to be
tokenised at word and character level using <w> and <c> tags.
Although we have not yet seen such a case in the corpus, it is possible
that a ligature join the last letter of a word with the first letter of
another, e.g. don&ctlig;u = "donc tu" in modern French.
In the tokenized transcription, I would like to do something like this
<w>don<g ref="#ctlig" part="I">c</g></w> <w><g ref="#ctlig"
but, unlike segLike elements (including <c>), <g> is not member of
Of course, I could use the <c> element instead of <g>, but then I would
lose the the semantics of <g> and the ref attribute.
So, my question is whether there is a particular reason for <g> not
being member of att.fragmentable, and, if not, whether it is worth
submitting a feature request.
Otherwise, I would be very grateful for any alternative encoding
proposal for "cross-word" glyphs.