Dear Gioele, Andreas,
thanks for your comments and the references. I guess the most honest
answer as to why we decided to use TEI rather than Akoma Ntoso is that
we know and love TEI :). But somewhat more objectively, a) CLARIN
centres offer a lot of different types of corpora, and it would be nice
(we can always dream) if they all could be encoded to a common schema,
rather than a different one for each type of text, b) such corpora are
typically linguistically annotated (PoS tagging, lemmatisation, NER,
maybe syntax) and, as far as I know, AN does not make provisions for
such annotation, whereas TEI of course does, and, maybe, c) the
parliamentary transcripts for many countries still have to be obtained
by scraping them from the web, say in HTML or PDF, meaning that only
very basic structure can be automatically inserted into the document; I
understand that AN does make provisions to encode only core elements,
still, even that might be too much to expect from such conversions -
however, I could be wrong here. But this is not to say that AN is
irrelevant to our proposal and I completely agree that it would be great
to have cross-walks between the two. At the very least it would mean
simple import of already encoded AN materials into TEI.
Also, the workshop is very much meant as a forum to gather opinions on
the suitability of the TEI proposal and how - and if! - to develop it
further and we are looking forward to participants that have possibly
diverging views on how to go about it. Already we know that a part of
the community is very much in favour of using RDF to encode
parliamentarily data, which I see much more problematic than TEI vs. AN
(and so was very happy to read the recent mails on this list by
Chirstian Chiarcos and others on TEI vs. RDF).
As for our work, some is already mentioned in the call, but to summarise:
- SlovParl: http://lrec-conf.org/workshops/lrec2018/W2/pdf/4_W2.pdf,
http://hdl.handle.net/11356/1208 (+ http://hdl.handle.net/11356/1209)
These two also make a nice contrast, the first was carefully checked and
hand corrected, the second a somewhat sloppy conversion from a DB dump.
And, of course, it would be nice to see both of you at the workshop!
Andreas Wagner je 13/02/2019 ob 13:57 napisal:
> Hi Tomaž, hi all,
> I second Gioele's cheers about parliamentary documents coming into the
> focus of interdisciplinary research initiatives. And, like him, I
> would also be very interested in learning more about the background:
> the workshop announcement on the list was the first time that I became
> aware of the ParlaCLARIN workshop and CLARIN's corresponding priorities.
> Recently, I have presented some observations about a comparison of TEI
> and Akoma Ntoso for (historical) research purposes  and while I had
> come to the conclusion that, in many cases, TEI probably is a better
> suited standard for (historical) research projects, f.ex. because of
> the tooling that's available, I had imagined that the effort that is
> going to be spent in providing for interoperability in this area,
> would most likely be allocated to developing mechanisms for easy AKN
> to TEI conversion, and this seems to play only a minor part in your
> agenda. A part of my results also was that TEI provides for many of
> the needs out-of-the-box, so I wonder if the aim of the envisaged
> "teiParla" is to be understood more in terms of best practices or in
> those of formally adding or subsetting "normal" TEI.
> Are there online resources you can point us to, or can you maybe
> elaborate a bit?
> Thanks in advance, and best wishes,
>  https://eadh2018.exordo.com/programme/presentation/25 - I'm afraid
> that the final paper still has to be written. The slides are here: