Thanks Fabio, James, Lou and Piotr for interesting replies!

Lou is absolutely right that the underlying problem is the overlap 
between the linguistic structure and the orthographic surface.

Another example I could give is


where an abbreviation mark (combining horizontal bar) expands to a 
letter in one word and a letter in another. How should this be tokenized?

Piotr would say that stand-off is my friend but I believe that it is in 
all cases necessary to know how to "inject" the stand-off markup.

My initial question was in fact much more pragmatic. I was looking for a 
way to indicate that one and the same glyph expands to two characters in 
two different words.

The @next & @prev solution seemed an overkill to me, as it implies 
assigning a unique identifier to each <g> (which I did not intend to do 
at that stage of encoding), and att.fragmentable appeared to be a 
convenient solution.

However, it did not occur to me that the glyph might be de-composed, as 
Lou suggested. So the right solution would probably be to use @sameAs 
(which implies @xml:id) to make it clear that one and the same glyph 
belongs to two words... if I am getting right.



Le 13/02/2015 15:14, Piotr Bański a écrit :
> Things pronounced bonkers one day need not stay in that category as 
> the research progresses, and the TEI does wear a few pioneer's badges.
> You may have just helped Alexey to lay foundations for a new section 
> of ch.5 -- how about "Glyph deconstruction: free variation vs. 
> contextual variants". And since it's Alexey who's involved, we might 
> see it begin to happen already at the next TEI-MM... just sayin'.
> If this is taken seriously, then the semantics of @ref would simply 
> need to be properly defined, to fit the context (or some kind of grid 
> mapping would have to devised). Nice.
> Best,
>   P.
> On 13/02/15 12:21, Lou Burnard wrote:
>> I suggest that the underlying problem here is nothing to do with <g>s,
>> fragmentable or otherwise, but rather with the desire to have a
>> tokenisation which is not well-structured with respect to the
>> orthographic structure. As such it's no different from such problems as
>> how to encode things like "it's" or "isn't" in English.
>> I must say I don't like the idea of fragmentary <g>s, if only because I
>> don't know whether the @ref attribute is then supposed to point to the
>> whole (reconsituted) glyph, as you have in your example, or rather
>> whether it's supposed to point to a partial glyph. If that sounds
>> bonkers, consider someone trying to encode as separate glyphs the
>> strokes that constitute a single Chinese character.
>> On 13/02/15 10:35, James Cummings wrote:
>>> Hi Alexey, Fabio,
>>> I think I'd do as Fabio suggests. To answer your underlying question,
>>> I guess that no one had considered that <g> might be fragmented in
>>> this way. I certainly hadn't. If you find lots of examples of this,
>>> they could be used as evidence for a clear feature request.
>>> -James
>>> On 13/02/15 08:29, Fabio Ciotti wrote:
>>>> Dear Alexey.
>>>> you have @next and @prev for coreferencing the two parts of the
>>>> ligature (or what I consider a misuse of @corresp), and they are
>>>> available in <g>.
>>>> Fabio
>>>> 2015-02-11 17:42 GMT+01:00 Lavrentev Alexey
>>>> <[log in to unmask]>:
>>>>> Dear all,
>>>>> I am working at a project that involves annotation and alignment
>>>>> with image
>>>>> zones of individual characters in manuscript transcriptions
>>>>> (
>>>>> We are going to use <g> to encode ligatures, exactly as shown in an
>>>>> example
>>>>> in the Guidelines
>>>>> (
>>>>> At a certain stage of the project, the transcriptions will need to be
>>>>> tokenised at word and character level using <w> and <c> tags.
>>>>> Although we have not yet seen such a case in the corpus, it is
>>>>> possible that
>>>>> a ligature join the last letter of a word with the first letter of
>>>>> another,
>>>>> e.g. don&ctlig;u = "donc tu" in modern French.
>>>>> In the tokenized transcription, I would like to do something like 
>>>>> this
>>>>> <w>don<g ref="#ctlig" part="I">c</g></w> <w><g ref="#ctlig"
>>>>> part="F">t</g>u</w>
>>>>> but, unlike segLike elements (including <c>), <g> is not member of
>>>>> att.fragmentable class.
>>>>> Of course, I could use the <c> element instead of <g>, but then I
>>>>> would lose
>>>>> the the semantics of <g> and the ref attribute.
>>>>> So, my question is whether there is a particular reason for <g> not
>>>>> being
>>>>> member of att.fragmentable, and, if not, whether it is worth
>>>>> submitting a
>>>>> feature request.
>>>>> Otherwise, I would be very grateful for any alternative encoding
>>>>> proposal
>>>>> for "cross-word" glyphs.
>>>>> Best,
>>>>> Alexei