On 11-08-22 10:32 AM, Doug Reside wrote:
> Martin Holmes writes:
>
>> If you say in your introduction that 75% of an author's sentences have two
>> or more clauses in them, and you provide your XML encoding to support the
>> claim, I can check it in seconds using XSLT. If you don't provide an
>> encoding (or some other apparently trustworthy dataset), I'm forced to take
>> your word for it.
>
> But surely this is a great case for interoperable markup. You say 75%
> of your author's sentences have 2 or more clauses in them. I then "
> for each getElementsByTagName("sentence")
> {getElementsByTagName("clause").size()) and check to see if you're
> right, and then do a list of all your clauses to make sure I agree
> with what you're calling a clause. Point is, I shouldn't have to use
> YOUR software to check your stats. I shouldn't have to guess at how
> you've labeled clauses. The markup should be interoperable, even if
> the actual encoding choices spark some debate. For this case, in
> particular, though, I think that natural language processing would
> probably automatically reveal the same results about as accurately and
> probably a good deal more quickly.
That's exactly right; and I'd argue that in this case, interoperability
would work fine. Either the encoder has used <s>, <cl> etc. exactly as
recommended and expected; or if not, they've used <encodingDesc> to
explain how they _have_ used it, and why they've departed from the
guidelines. After all, as John pointed out, I can read your XML and see
precisely what you've done.
> As to John Walsh's argument that XML is human readable--I certainly
> wouldn't want to have to read raw XML as my primary way of engaging
> with a text. And clearly style-sheets exist because few others want to
> either.
>
> There may be a very few edge cases where XML encoding is in fact the
> best way to interpret a text, but I think these are very few (and
> actually vanishingly few as better annotation software provides much
> of the same functionality that you get from non-interoperable XML).
>
> Moreover, I really can't believe granting agencies and member
> institutions fund the organization so that a few scholars (maybe 50 or
> so) can conduct their close readings in a tongue even more obscure
> than Latin. If the TEI actually wants this to be the primary use
> case for its standard, then I strongly encourage the leadership to
> continue to say this, and say it even louder, more clearly, and more
> publicly. I suspect if those who control the purse strings at
> supporting organizations understand what is meant by this claim, it
> will mean a rapid drying up of funding for the TEI, but at least, it
> will be an honest death.
50 scholars? The Digital Humanities Summer Institute alone trains dozens
of encoders every year. In the room where I'm sitting right now there
are five people working on XML texts (different projects), and this is
the summer when most of our folks are away. I have no idea how many
people actually use TEI, or how we could know, but it certainly isn't 50.
Cheers,
Martin
--
Martin Holmes
University of Victoria Humanities Computing and Media Centre
([log in to unmask])
|