A couple of years ago I gathered some URLs into a project for testing.

https://github.com/stuartyeates/sampler/tree/master/TEI

Cheers
Stuart

On Wednesday, December 21, 2016, MLH <[log in to unmask]> wrote:

Hi Matthew,


Greta Franzini's digital scholarly editions app links to 16 resources with downloadable TEI, but I'm afraid it doesn't specify regarding bulk download / predictable URLs. Still, it may be worth a look.

https://dig-ed-cat.acdh.oeaw.ac.at/browsing/editions/?name=&institution__name=&manager__name=&url=&scholarly=&digital=&edition=&writing_support=&begin_date=&end_date=&audience=&philological_statement=&textual_variance=&value_witnesses=&tei_transcription=&download=1&images=&zoom_images=&image_manipulation=&text_image=&source_translation=&glossary=&indices=&search=&advanced_search=&cc_license=&open_source=&infrastructure=&key_or_ocr=&print_friendly=&api=&amount=&Filter=Filter


Matthew


From: TEI (Text Encoding Initiative) public discussion list <[log in to unmask]');" target="_blank">[log in to unmask]> on behalf of TEI-L automatic digest system <[log in to unmask]');" target="_blank">[log in to unmask]>
Sent: 20 December 2016 05:00
To: [log in to unmask]');" target="_blank">[log in to unmask]
Subject: TEI-L Digest - 18 Dec 2016 to 19 Dec 2016 (#2016-244)
 
There are 4 messages totaling 541 lines in this issue.

Topics of the day:

  1. seeking links to TEI corpora (3)
  2. Don't upgrade your Oxygen plugin yet!

----------------------------------------------------------------------

Date:    Mon, 19 Dec 2016 17:13:29 +0000
From:    "Lavin, Matthew J" <[log in to unmask]');" target="_blank">[log in to unmask]>
Subject: seeking links to TEI corpora

Apologies for any duplicates received due to cross-posting.

I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

Bulk download of raw xml (not html transformed)
Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

Matthew Lavin
Clinical Assistant Professor of English and Director of Digital Media Lab
University of Pittsburgh

------------------------------

Date:    Mon, 19 Dec 2016 09:33:09 -0800
From:    Martin Holmes <[log in to unmask]');" target="_blank">[log in to unmask]>
Subject: Don't upgrade your Oxygen plugin yet!

Hi all,

We've found a problem with the latest release of the TEI Oxygen plugin,
so if you have it installed (instead of the regular TEI framework
bundled with Oxygen), please don't update it to the new release. We're
working on the problem.

Cheers,
Martin

------------------------------

Date:    Mon, 19 Dec 2016 18:05:19 +0000
From:    "Dalmau, Michelle Denise" <[log in to unmask]');" target="_blank">[log in to unmask]>
Subject: Re: seeking links to TEI corpora

Dear Matthew,

The IU Libraries provide XML downloads (at the item-level) for the following TEI P5 collections:

Wright American Fiction: http://dlib.indiana.edu/collections/wright/
Lyle H. Wright, a librarian at the Huntington Library in San Marino, CA, created a bibliography of American fiction from the years 1851–1875, published as American ...

Victorian Women Writers Project: http://www.dlib.indiana.edu/collections/vwwp/
The Victorian Women Writers Project (VWWP) began in 1995 at Indiana University and is primarily concerned with the exposure of lesser-known British women writers of ...

Brevier Legislative Reports: http://www.dlib.indiana.edu/collections/law/brevier/

We have two additional projects in TEI P4 with XML download:
Indiana Authors and Their Books: http://dlib.indiana.edu/collections/inauthors
Indiana Authors and Their Books is an LSTA–funded project based on the digitization and encoding of the 3–volume reference work, Indiana Authors and Their Books ...

Indiana Magazine of History: https://scholarworks.iu.edu/journals/index.php/imh (XML download in the View Text link per article)
Published continuously since 1905, the Indiana Magazine of History is one of the nation's oldest historical journals. Since 1913, the IMH has been edited and ...


You could also grab most of these files via GitHub:
https://github.com/iulibdcs/tei_text  (caveat: the repo needs to be refreshed — on our to-do list)
tei_text - Free-for-all repository of TEI and plain text files for you (to do cool stuff) provided by the Digital Collections Services group at the Indiana University ...


This is probably not what you are after, but we provide EAD XML access to IU finding aids as well:
http://dlib.indiana.edu/collections/findingaids/
Welcome to Archives Online at Indiana University. This site is a portal for accessing descriptions of Special Collections and Archives - ones chiefly containing ...


—Michelle
-----
Michelle Dalmau
Head, Digital Collections Services
-----
Indiana University Libraries
Herman B Wells Library
1320 East 10th Street, Rm W501
Bloomington, Indiana 47405
-----
Web:  http://michelledalmau.com
Twitter:  @mdalmau


On Dec 19, 2016, at 12:13 PM, Lavin, Matthew J <[log in to unmask]');" target="_blank">[log in to unmask]<mailto:[log in to unmask]');" target="_blank">lavin@pitt.edu>> wrote:

Apologies for any duplicates received due to cross-posting.

I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

Bulk download of raw xml (not html transformed)
Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

Matthew Lavin
Clinical Assistant Professor of English and Director of Digital Media Lab
University of Pittsburgh



On Dec 19, 2016, at 12:13 PM, Lavin, Matthew J <[log in to unmask]');" target="_blank">[log in to unmask]<mailto:[log in to unmask]');" target="_blank">lavin@pitt.edu>> wrote:

Apologies for any duplicates received due to cross-posting.

I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

Bulk download of raw xml (not html transformed)
Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

Matthew Lavin
Clinical Assistant Professor of English and Director of Digital Media Lab
University of Pittsburgh


------------------------------

Date:    Mon, 19 Dec 2016 10:08:46 -0800
From:    Matthew Davis <[log in to unmask]');" target="_blank">[log in to unmask]>
Subject: Re: seeking links to TEI corpora

Dear Matthew,

I don’t know that it’s what you’re looking for (it is still early days, there’s still a lot to transcribe and input, and I’m one person doing all the work), but I think my archive of Lydgate works may meet your criteria.  There’s a link to download the xml for each transformed html page, and the raw xml files are stored until an XML folder by work. 

The link is www.minorworksoflydgate.net <http://www.minorworksoflydgate.net/>.  Much of it is still behind a password as I’m hoping to have  a peer review done on it, but the items in the Clopton chantry chapel (http://www.minorworksoflydgate.net/Quis_Dabit/Clopton/ww_qd_1.html <http://www.minorworksoflydgate.net/Quis_Dabit/Clopton/ww_qd_1.html> and http://www.minorworksoflydgate.net/Testament/Clopton/sw_test_1.html <http://www.minorworksoflydgate.net/Testament/Clopton/sw_test_1.html>) are readily accessible since the transcriptions will be published in January.  If it’s what you’re looking for, send me a message off-list and I’ll give you the password credentials for the other items.
Welcome to the virtual archive of the minor works of the fifteenth-century poet, John Lydgate. The goals of this archive are twofold: first, it is an ...
Welcome to the virtual archive of the minor works of the fifteenth-century poet, John Lydgate. The goals of this archive are twofold: first, it is an ...


There’s also a section on the site, “About the Archive,” that articulates some of my thinking about site design, the decisions I made while encoding, etc.

All the best,
—Matt


> On Dec 19, 2016, at 9:13 AM, Lavin, Matthew J <[log in to unmask]');" target="_blank">[log in to unmask]> wrote:
>
> Apologies for any duplicates received due to cross-posting.

> I am collecting links for publicly accessible, computable TEI (or other similar xml markup such as SGM, LMNL) files. In order to be included, archives/collections/datasets/corpora must have meet one of the two criteria:

> Bulk download of raw xml (not html transformed)
> Xml fully accessible via predictable url structure (an example of this would be the Walk Whitman archive, which as a “raw xml” link on every transformed html page)

> Please note that I am not interested in sample xml, only collections with some kind of curatorial or scholarly focus. Thank you all for any leads!

> Matthew Lavin
> Clinical Assistant Professor of English and Director of Digital Media Lab
> University of Pittsburgh


------------------------------

End of TEI-L Digest - 18 Dec 2016 to 19 Dec 2016 (#2016-244)
************************************************************


--
--
...let us be heard from red core to black sky