Print

Print


Hi all,

First off, sincere apologies for my late response, which does not imply 
any lack of appreciation at all for your kind replies. It's nice to see 
such a wide variety of "input workflows" for TEI projects in only a 
couple of reactions!

Concerning the "non-Word perspectives": thanks, Laura, for putting 
CRWC-Writer on my radar! The demo at http://208.75.74.217 is looking 
impressive (with very elegant in-browser solutions for XML editing 
operations and a potential for intuitive and rich in-browser TEI 
markup), and the documentation of the Github repository/ies suggests it 
should be possible to self-host the system. I'll keep an eye on the 
project, and hope to find the time to experiment. Ditto for Ben's wealth 
of links and projects, which I'm still digesting, many thanks! 
Hypothes.is looks like a wonderful tool; it's great to see a clever 
example of how to use it in a project. From your description of Mashbill 
and FromThePage, these seem spot-on and make me want to try them all at 
once. It's amazing and refreshing to see such great annotation-oriented 
projects integrating annotations and identification/description of named 
entities. In the current project, we're still in a pilot phase for a 
limited corpus, so if we can find better ways for the input workflow, 
this is definitely worth investigating. For one thing, even though the 
project budget is limited, we do have access to a dedicated server, so 
there is some hope here...

Last but not least, thanks, Omar, for sharing your code and thoughts on 
the Word approach in your project. Your "discovery" method seems clever, 
though probably only usable in an even more tightly controlled input 
scenario than ours. I realize your words of caution hold for any 
approach trying to get structured output from a completely loose input 
environment like a text processor.

All this said, I guess what I'm looking for in the short run is a 
easy/intuitive way to create an external data source for identifying and 
describing persons, places, etc. (which in a TEI workflow would 
typically take the shape of person, place, ... listings in separate TEI 
files), whose records can be linked to from Word transcriptions. A 
collaborative spreadheet would probably be sufficient for representing 
the data, but falls short w.r.t. ease of input and ways to query and 
view individual records. Still investigating!

Kind regards,

Ron


On 14/11/2017 17:59, Ben Brumfield wrote:
> Dear Ron,
>
> Let me offer three projects that did low-cost entity mark-up in my own experience which might be helpful.
>
> The Civil War Governors of Kentucky Digital Documentary Edition had transcribed correspondence in TEI-XML created using a combination of DocTracker and Oxygen.  These were available for further mark-up in an early access website[1] (based on Omeka, with the documents lightly converted into HTML) as well of in XML source residing in a Github repository.  Their goal was to mark up the people, places, organizations and geographic features mentioned within the documents, to identify and document those entities, and to connect the references within the documents to the entries on the entities themselves.
>
> We had graduate students use Hypothes.is to mark up the entities within each document on the Early Access site.  We then wrote an open-source system [2][3] to programmatically ingest the Hypothes.is annotations and present them for identification and documentation.  We are in the very last stages of the project now, publishing the entities, their biographies and bibliographies, the links between documents and entities, and the network visualization of entities and their relationships.  You can see more detail about the project at the presentation we gave at DH2017 this year.[4]
>
> I don't know enough about your project's resources, but Hypothes.is was an easy, inexpensive way to do the mark-up itself, and if installing Mashbill (and modifying it to remove the CWGK-specific code) is too much, you might ask your users to put URIs for entities hosted elsewhere into the annotation bodies.
>
> ======
>
> FromThePage[5] is an open-source[6] collaborative transcription and annotation platform I developed to do almost exactly what your project is attempting.  Users are presented with document facsimiles on a webpage and transcribe them into a data-entry box next to the facsimile image.  The mark-up allowed is limited compared with the richness of TEI, but the system is optimized for entity tagging, identification, documentation and indexing.  Users use wiki-links to mark up entities mentioned within a transcript, specifying a canonical name for the entity and the verbatim text within the document referring to it[7], as [[canonical name|verbatim text]] (e.g. [[Sally Smith Jones (1756-1823)|Rev. Jones wife]].  When users save a page containing linked subjects, a database record for the subject is created if it does not already exist, and an index entry created linking the page to the subject.  All these are visible as HTML links[8] and are transformed into rs and person tags in the TEI export.
>
> I am biased, of course, but I'd think this platform solves the use cases you've described.  I'm not sure how you'd convince your transcribers to move from Word to the web, however.
>
> ======
>
> Another option might be to cut-and-paste the existing transcripts into MediaWiki sites like pbworks or wikia.  The transcripts could be linked to articles about subjects using wiki-links (as in FromThePage).  This would be pretty low cost, but the big challenge there would be in getting the data back out again, so you'd want to figure that out first.   You'd also face the challenge of getting your users to start using the web.
>
> Best of luck,
>
> Ben
>
> [1] Early Access publication: http://discovery.civilwargovernors.org/
> [2] CWGK description of Mashbill: http://discovery.civilwargovernors.org/mashbill
> [3] Source code for Mashbill: https://github.com/CivilWarGovernorsOfKentucky/Mashbill
> [4] "Beyond Coocurrence: Network Visualization in the Civil War Governors of Kentucky Digital Documentary Edition" http://manuscripttranscription.blogspot.com/2017/08/beyond-coocurrence.html
> [5] Commercially hosted version: https://fromthepage.com/
> [6] Source code for FromThePage: https://github.com/benwbrum/fromthepage
> [7] More detail on wiki-links at "Wiki-links in FromThePage": http://manuscripttranscription.blogspot.com/2014/03/wiki-links-in-fromthepage.html
> [8] See links at https://fromthepage.com/yaquinalights/1871-1900-yaquina-head-lighthouse-letter-books/vol-439-cook-appt-1875/display/17170
>