Dear Alexey,

I'm happy to see more on this topic, it's quite intriguing.

For one thing, I am not so sure that stand-off can help much here. Not 
without marking the <am> as having a special double-faced status, and 
even then it wouldn't be a 100% optimal solution, I'm afraid. It's great 
to see such examples! :-)

But for a similar reason, the @sameAs approach would suffer from a 
similar predicament -- if I understand you correctly, one end of the 
"sameAs" relation would have a little bit tacked on that is *not* 
"sameAs", wouldn't it?.

I think at this point I might be considering "zooming out" of this fused 
representation and not try to tag the individual parts but rather 
ascribe two lemmas to a single orthographic segment.

Or else (or additionally): I'd be lemmatizing the <expan> in such cases. 
(And would adopt a necessary extra condition for the visualization).

I'm looking forward to learning about what you are going to end up with!

This feels very much like a case either worth mentioning in the 
Guidelines straightforwardly, or at least worth putting in a footnote.



On 17/02/15 10:23, Lavrentev Alexey wrote:
> Thanks Fabio, James, Lou and Piotr for interesting replies!
> Lou is absolutely right that the underlying problem is the overlap
> between the linguistic structure and the orthographic surface.
> Another example I could give is
> <choice>
>    <abbr>q<am>&#x305;</am>lle</abbr>
>    <expan>q<ex>u'e</ex>lle</expan>
> </choice>
> where an abbreviation mark (combining horizontal bar) expands to a
> letter in one word and a letter in another. How should this be tokenized?
> Piotr would say that stand-off is my friend but I believe that it is in
> all cases necessary to know how to "inject" the stand-off markup.
> My initial question was in fact much more pragmatic. I was looking for a
> way to indicate that one and the same glyph expands to two characters in
> two different words.
> The @next & @prev solution seemed an overkill to me, as it implies
> assigning a unique identifier to each <g> (which I did not intend to do
> at that stage of encoding), and att.fragmentable appeared to be a
> convenient solution.
> However, it did not occur to me that the glyph might be de-composed, as
> Lou suggested. So the right solution would probably be to use @sameAs
> (which implies @xml:id) to make it clear that one and the same glyph
> belongs to two words... if I am getting right.
> Best,
> Alexey
> Le 13/02/2015 15:14, Piotr Bański a écrit :
>> Things pronounced bonkers one day need not stay in that category as
>> the research progresses, and the TEI does wear a few pioneer's badges.
>> You may have just helped Alexey to lay foundations for a new section
>> of ch.5 -- how about "Glyph deconstruction: free variation vs.
>> contextual variants". And since it's Alexey who's involved, we might
>> see it begin to happen already at the next TEI-MM... just sayin'.
>> If this is taken seriously, then the semantics of @ref would simply
>> need to be properly defined, to fit the context (or some kind of grid
>> mapping would have to devised). Nice.
>> Best,
>>   P.
>> On 13/02/15 12:21, Lou Burnard wrote:
>>> I suggest that the underlying problem here is nothing to do with <g>s,
>>> fragmentable or otherwise, but rather with the desire to have a
>>> tokenisation which is not well-structured with respect to the
>>> orthographic structure. As such it's no different from such problems as
>>> how to encode things like "it's" or "isn't" in English.
>>> I must say I don't like the idea of fragmentary <g>s, if only because I
>>> don't know whether the @ref attribute is then supposed to point to the
>>> whole (reconsituted) glyph, as you have in your example, or rather
>>> whether it's supposed to point to a partial glyph. If that sounds
>>> bonkers, consider someone trying to encode as separate glyphs the
>>> strokes that constitute a single Chinese character.
>>> On 13/02/15 10:35, James Cummings wrote:
>>>> Hi Alexey, Fabio,
>>>> I think I'd do as Fabio suggests. To answer your underlying question,
>>>> I guess that no one had considered that <g> might be fragmented in
>>>> this way. I certainly hadn't. If you find lots of examples of this,
>>>> they could be used as evidence for a clear feature request.
>>>> -James
>>>> On 13/02/15 08:29, Fabio Ciotti wrote:
>>>>> Dear Alexey.
>>>>> you have @next and @prev for coreferencing the two parts of the
>>>>> ligature (or what I consider a misuse of @corresp), and they are
>>>>> available in <g>.
>>>>> Fabio
>>>>> 2015-02-11 17:42 GMT+01:00 Lavrentev Alexey
>>>>> <[log in to unmask]>:
>>>>>> Dear all,
>>>>>> I am working at a project that involves annotation and alignment
>>>>>> with image
>>>>>> zones of individual characters in manuscript transcriptions
>>>>>> (
>>>>>> We are going to use <g> to encode ligatures, exactly as shown in an
>>>>>> example
>>>>>> in the Guidelines
>>>>>> (
>>>>>> At a certain stage of the project, the transcriptions will need to be
>>>>>> tokenised at word and character level using <w> and <c> tags.
>>>>>> Although we have not yet seen such a case in the corpus, it is
>>>>>> possible that
>>>>>> a ligature join the last letter of a word with the first letter of
>>>>>> another,
>>>>>> e.g. don&ctlig;u = "donc tu" in modern French.
>>>>>> In the tokenized transcription, I would like to do something like
>>>>>> this
>>>>>> <w>don<g ref="#ctlig" part="I">c</g></w> <w><g ref="#ctlig"
>>>>>> part="F">t</g>u</w>
>>>>>> but, unlike segLike elements (including <c>), <g> is not member of
>>>>>> att.fragmentable class.
>>>>>> Of course, I could use the <c> element instead of <g>, but then I
>>>>>> would lose
>>>>>> the the semantics of <g> and the ref attribute.
>>>>>> So, my question is whether there is a particular reason for <g> not
>>>>>> being
>>>>>> member of att.fragmentable, and, if not, whether it is worth
>>>>>> submitting a
>>>>>> feature request.
>>>>>> Otherwise, I would be very grateful for any alternative encoding
>>>>>> proposal
>>>>>> for "cross-word" glyphs.
>>>>>> Best,
>>>>>> Alexei