Dear Matthew,

in german there is the German Text Archive that provides complete dumps of the text, see:

Also you can download a complete zip file of the "Digital Library" hosted at - a huge collection of german language literature: or you can use the OAI-PMH interface of the hosting repository.

There is many more out there - for example Paul Fievres Repo on GitHub containing frech theatre plays:


On 12/19/2016 06:13 PM, Lavin, Matthew J wrote:
[log in to unmask]" type="cite">

Apologies for any duplicates received due to cross-posting.


I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:


Bulk download of raw xml (not html transformed)

Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)


Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads! 


Matthew Lavin

Clinical Assistant Professor of English and Director of Digital Media Lab

University of Pittsburgh


Mathias Göbel
Abt. Forschung & Entwicklung

Georg-August-Universität Göttingen
Niedersächsische Staats- und Universitätsbibliothek Göttingen
D-37070 Göttingen

Papendiek 14 (hist. Gebäude, Raum 2.408)
+49 551 39-20184 (Tel.)
+49 551 39-33856 (Fax.)

[log in to unmask]