LISTSERV 16.5 - TEI-L Archives

Subscriber's Corner

Email Lists

TEI-L Archives

TEI-L@LISTSERV.BROWN.EDU

View:

Message:

[

First

Last

]

By Topic:

[

First

Last

]

By Author:

[

First

Last

]

Font:

Proportional Font

		LISTSERV Archives
		TEI-L Home
		TEI-L February 2016

Subject:

Crowdsourcing Transcription Failures

From:

Ben Brumfield <[log in to unmask]>

Reply-To:

Ben Brumfield <[log in to unmask]>

Date:

Wed, 24 Feb 2016 09:17:24 -0600

Content-Type:

text/plain

Parts/Attachments:

text/plain (116 lines)

Øyvind makes an excellent point. Opportunities for failure abound,
especially in projects (like crowdsourced transcription) that require
such a large investment into digitization and software development up
front, but which are often justified by eventual cost savings. I
suspect that many failures--especially the ones which fail due to
inability to find motivated volunteers--are costly failures indeed and
are not adequately publicized. Nevertheless, I think I can come up
with some failure stories.

It's only fair to start with a personal example. In 2012, I started
work with Free UK Genealogy. Because this organization had been
running crowdsourced transcription projects off-line for almost two
decades, our goal was to rewrite the entire technical stack. We began
building new transcription tools to replace the existing system of
spreadsheets and emails. However, the existing volunteers (who were
comfortable with the existing system) rebelled at the idea of
replacing the tools they were familiar with for no obvious benefit to
them. Within a year we had to table the effort and instead focus on
the public-facing website/search engine. This proved very popular,
and now we feel that we have more support for the transcription tool
project. Nevertheless, we're only revisiting the transcription tool
now, four years after the re-write was started.

Successful projects also experience instructive failures. One of the
papers published by the Bentham team (
http://www.digitalhumanities.org/dhq/vol/6/2/000125/000125.html#p36)
talked about their efforts at outreach and marketing to find
volunteers for the project. Several mechanisms (Google AdWords, the
formal press release) proved to be wastes of effort. Similarly,
outreach to secondary and university students yielded no results.
(And let me express my thanks to the Bentham team for their candor,
which has been a constant within their publications.)

Another failure inherent to success is the cost of supporting
transcription tools if they are adopted by other institutions. I know
from talking with Matt Butler at the University of Iowa that the
success of DIYHistory has placed in enormous burden of external
product support on UIowa's IT staff, who are funded, after all, to
support their own institution and not third parties. I've personally
had to build diagnostic code so that institutions running my
open-source software could debug problems which turned out to be in
their campus fire-wall, pro bono. I am not sure that the current
institutional funding systems for digital humanities have figured out
how cover software support costs, and certainly the open source
community struggles with this as well (cf
http://www.mikeperham.com/2015/11/23/how-to-charge-for-your-open-source/
), so this may be a special case of a general problem.

Another potential for failure is mis-match between the project's call
for participation and the project's other messaging. At the MLA this
year, Aaron Pratt and Brett Hirsch pointed to a lack of significant
public commentary on the Devonshire Manuscript Social Edition,
suggesting that positioning a text as already "authoritative" actually
discourages social engagement. Similarly, projects may choose the
wrong tool for their data -- UNL's yearbook transcription project came
into some criticism on Twitter for running on Scripto (which supports
full-page, plain-text transcripts) instead of a structured-data tool
like the Zooniverse platforms.

Subscribers to the TEI list will be all too familiar with poor choices
of encoding standards. Many crowdsourcing projects are not informed by
the experience of scholarly editors. When volunteers or professionals
from other disciplines are faced with questions about textual
interpretation, they generally rely on mark-up informed either by
print conventions or by HTML. In some cases the print conventions do
a huge disservice to the text as, for example, when genealogists
transcribe surnames in all capital letters, following an old
typesetting convention from their literature and thereby obliterating
potentially important data in the source materials.

Although not entirely transcription-focused, one of the most
spectacular examples of failure in crowdsourcing is the Manuscript
Fragments Project, which was based at the Harry Ransom Center. In this
case, the crowdsourcing project was run by an employee largely on his
own initiative but with some institutional support. When that
employee's contract was not renewed, the entire site was pulled down.
(See https://micahcapstone.wordpress.com/2015/05/25/why-the-medieval-fragments-project-nolonger-exists-and-when-crowdsourcing-doesnt-work/
)

Finally, success and failure are not easy to define. To return to a
personal example, I ran a crowdsourced transcription project on
FromThePage as a pilot for a local institution. I have one
super-volunteer who will jump onto any manuscripts dealing with early
19th century Texas, and this material was exactly that. This volunteer
transcribed all the institution's materials in a matter of days. After
that, the institution decided against moving forward on a full
project, explaining that they were disappointed to have attracted only
one volunteer. Their goal was outreach to the community, not
productivity, and they would have preferred ten users transcribe one
page each over one user transcribing one hundred pages, even though
the product would be much smaller.

In addition to failures, I think that it's also worth thinking about
threats to the success of crowdsourced transcription projects. In my
opinion, the primary threat–the existential threat–comes from the
decline of desktop computing and its replacement by mobile devices
without keyboards. The mobile world allows image and audio capture
much easier than we've ever seen before. This is a very good thing for
the previously-difficult problem of scanning and uploading images of
documents to shareable server. (In fact a mobile uploader has already
been used with FromThePage, much to my surprise.)

Unfortunately, to the extent that documents must be transcribed
diplomatically, keyboard input really is necessary. That is especially
true when we ask users to encode texts using markup like TEI. We may
see some application of dictation software, but that will really only
work for highly regular source material written with standard
orthography in popular, modern languages. This transformation in
hardware may have as much impact on the kinds of digitization we do as
the 1923 copyright cut-off does.

I'd love to hear more stories of failure.

Ben Brumfield
http://manuscripttranscription.blogspot.com/

Top of Message | Previous Page | Permalink

Search Archives

Advanced Options

Options

		Log In
		Get Password

		Subscribe or Unsubscribe