Hi, Mike— Your project looks fascinating! I’m going to try to address your questions in reverse order:

I believe the problem you’re seeing with the <seg> element is a problem with XML well-formedness: You’ve nested your opening and closing tags for seg inside two different paragraphs to produce what we call “tangled tags”—like so: <p><seg></p><p></seg></p>.  To the XML parser, this disrupts the XML hierarchy: an element set inside a <p> is expected to open and close inside the <p> element. 

Very well, you might say, how about if I change the hierarchy, and set <seg> outside of those <p> elements: <seg><p></p><p></p></seg>? Well, the TEI schema will fire an error here because seg isn’t allowed to contain <p> children. This is a pretty common issue in our community, and there are a variety of ways to deal with it: it’s a problem of how to write good XML markup that accommodates overlapping hierarchies. It’s going to take some planning. I might be tempted in this case, if you’re really liking <seg>, to work with it like so:

<p>….<seg xml:id=“a1” next=“#a2">…</seg></p>
<p><seg xml:id=“a2” prev=“#a1">…</seg>…</p>

Here I’m using an @xml:id to set unique identifiers on each seg, and I’m using @next and @prev to point to the members of a series that span multiple paragraphs in the document. That’s one way of approaching the problem, but there will certainly be others.

Now, as for <interp>, your use of this has a certain logic but isn’t consistent with the TEI Guidelines’ explanation and examples, where the element isn’t being used as markup for base text. Instead, <interp> is basically part of a little family of elements (with <spanGrp> and <span> and more) that are for handling what we call “stand-off” annotation”, for analytical notes with a set vocabulary that you’re appending and attaching usually to a base text. This is a little difficult to explain, so first take a look at http://www.tei-c.org/release/doc/tei-p5-doc/en/html/AI.html#AISP 
and then let’s look at some examples: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/examples-spanGrp.html 
Do you see how that’s being used? 

I’d say we don’t want <interp> here, unless you want to come up with a use of <spanGrp> as a means of handling your annotations. That’s worth considering, too!

But I think you might continue simply using the <seg> element in place of how you’re using <interp>. Use your @type to set up a series of set types for seg, when they point out things you care about. Presumably seg elements contain long spans of text that contain information of various kinds that you’re wanting to highlight. What are those various kinds of information? You could come up with @type and @subtype categories to apply to that element.

I hope this resolves the problem and gives you some ideas!


Elisa Beshero-Bondar, PhD
Director, Center for the Digital Text | Associate Professor of English
University of Pittsburgh at Greensburg | Humanities Division
150 Finoli Drive
Greensburg, PA  15601  USA
E-mail: [log in to unmask]
Development site: http://newtfire.org

On Apr 28, 2017, at 3:41 AM, Mike Engle <[log in to unmask]> wrote:

Hi all

I'm involved in a project in which we're marking up a large body of literature and we want to mark various passages in the texts as significant, in essence labeling them as "this" or "that".  We took a look through the TEI Guidelines and decided to use <seg> and <interp> to mark the various passages with different attributes to specifiy what type of passage it is.  

For example, with <interp>:

In the <interp type="placeMain">Country of the Bhargas, on Mount Śuśumāra in a fearsome forest of wild animals</interp> with a <interp type="audienceGen">great saṅgha of about 500 monks, eminent śrāvaka-elders who possessed clairvoyance</interp>.

And with <seg>:

<seg function="modul" type="pastLifeWho">if you wonder whether the brahmin boy
                        Bhadraśuddha was then at that time someone else, or you are of two minds
                        about it, or doubtful, do not see him so. Why? Because the bodhisattva
                        mahāsattva Maitreya himself was then at that time the brahmin boy

Two questions:

1)  Since this is going to be a very long term and labor extensive project, I wanted to check and see if the community in general felt this was a reasonable way to mark these passages and also if there are any suggestions for other ways to do this which might work better.  Can anyone suggest any other elements that might be useful for this kind of thing?

2)  We have a problem of going across elements (breaking the nesting so to speak).  For instance, a passage might start in the middle of one paragraph and finish halfway through another paragraph, and when we mark it the tag begins in one paragraph and closes in the next.  The schema doesn't like this one bit, and I'm wondering what is the best way to handle this.  For example:

<p><seg function="modul" type="qualitiesBuddha">The Tathāgata was handsome and charismatic,
                        controlled in his faculties and in his mind. He had attained excellence in
                        control and calm abiding, and superiority in control and calm abiding. He
                        guarded his faculties, elephant-like in control of his passions, and
                        was radiant, unsullied, and clear like a lake.</p> 

<p>His body was adorned with the
                        thirty-two marks of a great being, and with the eighty minor marks, like the
                        blossoming flower of a royal sal tree, and towering like Mount Meru, the
                        king of mountains. His face was as calm as the sphere of the moon, and       
                        radiantly clear and brilliant like the sphere of
                        the sun. His body was proportioned like a nyagrodha tree, blazing with light
                        and great splendor.</seg></p>

Any help is greatly appreciate.  Forgive the formatting