First off, sincere apologies for my late response, which does not imply
any lack of appreciation at all for your kind replies. It's nice to see
such a wide variety of "input workflows" for TEI projects in only a
couple of reactions!
Concerning the "non-Word perspectives": thanks, Laura, for putting
CRWC-Writer on my radar! The demo at http://22.214.171.124 is looking
impressive (with very elegant in-browser solutions for XML editing
operations and a potential for intuitive and rich in-browser TEI
markup), and the documentation of the Github repository/ies suggests it
should be possible to self-host the system. I'll keep an eye on the
project, and hope to find the time to experiment. Ditto for Ben's wealth
of links and projects, which I'm still digesting, many thanks!
Hypothes.is looks like a wonderful tool; it's great to see a clever
example of how to use it in a project. From your description of Mashbill
and FromThePage, these seem spot-on and make me want to try them all at
once. It's amazing and refreshing to see such great annotation-oriented
projects integrating annotations and identification/description of named
entities. In the current project, we're still in a pilot phase for a
limited corpus, so if we can find better ways for the input workflow,
this is definitely worth investigating. For one thing, even though the
project budget is limited, we do have access to a dedicated server, so
there is some hope here...
Last but not least, thanks, Omar, for sharing your code and thoughts on
the Word approach in your project. Your "discovery" method seems clever,
though probably only usable in an even more tightly controlled input
scenario than ours. I realize your words of caution hold for any
approach trying to get structured output from a completely loose input
environment like a text processor.
All this said, I guess what I'm looking for in the short run is a
easy/intuitive way to create an external data source for identifying and
describing persons, places, etc. (which in a TEI workflow would
typically take the shape of person, place, ... listings in separate TEI
files), whose records can be linked to from Word transcriptions. A
collaborative spreadheet would probably be sufficient for representing
the data, but falls short w.r.t. ease of input and ways to query and
view individual records. Still investigating!
On 14/11/2017 17:59, Ben Brumfield wrote:
> Dear Ron,
> Let me offer three projects that did low-cost entity mark-up in my own experience which might be helpful.
> The Civil War Governors of Kentucky Digital Documentary Edition had transcribed correspondence in TEI-XML created using a combination of DocTracker and Oxygen. These were available for further mark-up in an early access website (based on Omeka, with the documents lightly converted into HTML) as well of in XML source residing in a Github repository. Their goal was to mark up the people, places, organizations and geographic features mentioned within the documents, to identify and document those entities, and to connect the references within the documents to the entries on the entities themselves.
> We had graduate students use Hypothes.is to mark up the entities within each document on the Early Access site. We then wrote an open-source system  to programmatically ingest the Hypothes.is annotations and present them for identification and documentation. We are in the very last stages of the project now, publishing the entities, their biographies and bibliographies, the links between documents and entities, and the network visualization of entities and their relationships. You can see more detail about the project at the presentation we gave at DH2017 this year.
> I don't know enough about your project's resources, but Hypothes.is was an easy, inexpensive way to do the mark-up itself, and if installing Mashbill (and modifying it to remove the CWGK-specific code) is too much, you might ask your users to put URIs for entities hosted elsewhere into the annotation bodies.
> FromThePage is an open-source collaborative transcription and annotation platform I developed to do almost exactly what your project is attempting. Users are presented with document facsimiles on a webpage and transcribe them into a data-entry box next to the facsimile image. The mark-up allowed is limited compared with the richness of TEI, but the system is optimized for entity tagging, identification, documentation and indexing. Users use wiki-links to mark up entities mentioned within a transcript, specifying a canonical name for the entity and the verbatim text within the document referring to it, as [[canonical name|verbatim text]] (e.g. [[Sally Smith Jones (1756-1823)|Rev. Jones wife]]. When users save a page containing linked subjects, a database record for the subject is created if it does not already exist, and an index entry created linking the page to the subject. All these are visible as HTML links and are transformed into rs and person tags in the TEI export.
> I am biased, of course, but I'd think this platform solves the use cases you've described. I'm not sure how you'd convince your transcribers to move from Word to the web, however.
> Another option might be to cut-and-paste the existing transcripts into MediaWiki sites like pbworks or wikia. The transcripts could be linked to articles about subjects using wiki-links (as in FromThePage). This would be pretty low cost, but the big challenge there would be in getting the data back out again, so you'd want to figure that out first. You'd also face the challenge of getting your users to start using the web.
> Best of luck,
>  Early Access publication: http://discovery.civilwargovernors.org/
>  CWGK description of Mashbill: http://discovery.civilwargovernors.org/mashbill
>  Source code for Mashbill: https://github.com/CivilWarGovernorsOfKentucky/Mashbill
>  "Beyond Coocurrence: Network Visualization in the Civil War Governors of Kentucky Digital Documentary Edition" http://manuscripttranscription.blogspot.com/2017/08/beyond-coocurrence.html
>  Commercially hosted version: https://fromthepage.com/
>  Source code for FromThePage: https://github.com/benwbrum/fromthepage
>  More detail on wiki-links at "Wiki-links in FromThePage": http://manuscripttranscription.blogspot.com/2014/03/wiki-links-in-fromthepage.html
>  See links at https://fromthepage.com/yaquinalights/1871-1900-yaquina-head-lighthouse-letter-books/vol-439-cook-appt-1875/display/17170