What I suggested is different from the position your argument is directed against. I argued that, since we cannot count on text nodes to originate in the source, we should mark these explicitly - and used <contents> for this. The elements, for sure, do not originate in the source, but an empty element, like the one you mention, has no text node. so it does not really enter into the equation. In your example,
<contents>...blah di blah blah</contents><note>This is rubbish (Ed.)</note><contents> blah di blah</contents>
would not include the note (the note is made by the editor of the TEI document), whereas
<contents>...blah di blah blah</contents><note><contents>This is rubbish (Ed.)</contents></note><contents> blah di blah</contents>
would (the note is in the digitised document).
As I wrote, there would probably be different kind of <contents> - aside from being a hot potato, my idea is also a raw potato! And "contents" is probably not the right term ….
On Aug 26, 2011, at 5:34 PM, Martin Holmes wrote:
> On 11-08-26 06:13 AM, Doug Reside wrote:
>> This is precisely why tool development should be so central to the
>> TEI. Until the community _as a whole_ tries to use the markup, we
>> won't know the best way to encode.
>> I've been irritated by text nodes that come from an editor rather than
>> the source before, but I agree that placing this sort of text in
>> attributes is probably not the way to do this either. I like the
>> namespacing idea, as it means I can first process out anything in the
>> editorial namespace before doing indexing or any other sort of
>> processing. However, I wonder if leaving such editorial commentary to
>> standoff annotations indexed by xpath isn't a better solution.
> You still have to put something in the text to anchor the annotations to. For instance, if you want to do this:
> ...blah di blah blah<note>This is rubbish (Ed.)</note> blah di blah
> then you have to introduce something like this:
> ...blah di blah blah<anchor xml:id="ed_note_1"/> blah di blah
> to anchor your external annotation to. So once again we have elements which don't contain original source text (although in this case the element is empty).
>> On Fri, Aug 26, 2011 at 8:54 AM, Martin Holmes<[log in to unmask]> wrote:
>>> On 11-08-26 05:42 AM, Sebastian Rahtz wrote:
>>>>> One way to do this would be with namespaces. Elements deemed to contain
>>>>> only transcription could be in a separate namespace from elements
>>>>> containing interpretive data or metadata.
>>>> it's not impossible we could allow any attribute to also appear as a child
>>>> element in a separate namespace. But I suspect a total rewrite with the
>>>> idea of considerably lessening use of attributes would be cleaner.
>>> I thought what he was basically saying was that _if_ this separation were
>>> - elements (in<text> at least) contain transcribed content
>>> - attributes contain editorial/interpretive/metadata content
>>> then indexing and searching the original text would be much simpler.
>>> However, this would make it impossible to use helpful markup inside
>>> editorial interpolations, and there are other issues, such as supplied
>>> <abbr>Brd</abbr> <--- original content
>>> <expan>Board</expan<--- supplied, but should be indexed anyway
>>> The use of distinct namespaces would solve this problem.
>>>> I like Jens' thinking, but its a whole big can of worms to open...
>>>> Sebastian Rahtz
>>>> Head of Information and Support Group
>>>> Oxford University Computing Services
>>>> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
>>>> Sólo le pido a Dios
>>>> que el futuro no me sea indiferente
> Martin Holmes
> University of Victoria Humanities Computing and Media Centre
> ([log in to unmask])