For a project of ours (https://mecmua.acdh.oeaw.ac.at) I used an
approach where I (ab)used Words comments function for annotation data
and character styles for the type of entity. The DOCX documents were
processed using a customized variant of the TEI XSL styleheets
(https://github.com/simar0at/TEI-Stylesheets/tree/mecmua) in oXygen XML.
There was no central data source for the entities but only the rule that
an entity should be annotated with all available data at first occurence
in the document and then for any further occurence with as much
information as is needed to distinguish it from an entity with the same
This approach worked somewhat for 2 domain experts annotating in Word
but in the end a lot of proof reading the annotations was necessary
because the annotators made hardly visible mistakes.
In hindsight I would have invested more time to customize Word for
stricter input checking and suggestions.
If you find this useful please take it as an inspiration.
Am 09.11.2017 um 16:22 schrieb Ron Van den Branden:
> A digital edition project we're assisting in is trying to engage a
> limited group of volunteers for transcribing letters, creating basic
> annotations, and identifying named entities in the texts. The project
> context is quite challenging, since budget is limited, the project is
> building on previous efforts which had produced basic Word
> transcriptions, and the volunteers are domain experts without any
> desire to extend their human-computer interaction beyond basic office
> software. Hence, we've tried to accommodate this in a word
> processor-based workflow for the "volunteer phase", after which the
> docs will be transformed and the TEI life of these texts begins.
> Against this backdrop, I'm looking for a way to enable the volunteers
> to identify named entities in the texts. Transcription-wise, I think
> this is feasible in a word processor by identifying them as hyperlinks
> and have the URL point at least to an unique ID code, or ideally to a
> valid URL where the available information can be viewed. Either the ID
> code or a field in each record can then contain type information that
> could help in the transformation to TEI. Of course, this requires an
> external data source to link to, and this is still a major concern.
> Given the fact that a lot of these names won't feature in any existing
> resources, and that they will probably require specific information
> tailored to this edition anyhow, it seems to make most sense to
> construct a project-specific resource which can hold the required
> information for the different entities (persons, organizations,
> places, titles,...), of course providing space for pointing to
> existing resources when available. Ideally, the volunteers would be
> able to look up whether an entry exists already, copy the ID/URL and
> use it in the transcription; or create a new one for people, places,
> ... that haven't been described yet. Also, if needed, it should be
> possible to edit existing descriptions if e.g. more information
> becomes available along the way (of course, without touching the
> original ID/URL). On top of that, querying and entering new
> information should ideally be as intuitive as possible for the
> volunteers. Summing up, the main requirements would be:
> -collaborative: volunteers should be able to create and/or modify
> entries when needed
> -intuitive input/query form
> -ability to import existing data + export (to CSV or XLSX)
> Since a basic spreadsheet could be sufficient to store this
> information (e.g. different sheets per name type, with name-specific
> information fields in separate columns), I've been looking into Google
> Sheets, but I'm not sure if that allows to view individual "records"
> (i.e. rows in a sheet), and if the forms component everything needed
> to query/create/edit existing records.
> I realize this is probably a terribly basic and peripheral question
> which I've long hesitated to ask here, but how do others do this
> (after all, it's such a basic component of any edition project)? We've
> been advised to look into a Mediawiki direction, but that seems too
> unstructured, hard to navigate in existing information and quite
> complex to enter new information.
> As might be clear at this point, (non-XML) databases are not my field
> of expertise, we don't have any IT-departmental back-up, and I'm a bit
> at a loss. Are there any known lightweight (and preferably free)
> solutions available for facilitating this task? Or what would be the
> most sensible direction to look into?
> Many thanks for any advice,