Print

Print


The NLM DTDs (superseded by JATS) are three tag sets, each designed for 
different purposes.  While JATS is generally used by publishers to 
encode material they are currently publishing (born-digital material), 
the "Archiving and Interchange" tag set is more flexible, allowing for 
encoding previously published material that might not follow the same 
rigid rules you are trying to enforce for material you are currently 
publishing.  Thus, it shouldn't necessarily be ruled out for encoding a 
collection of preexisting texts.

Furthermore, there's a customization of JATS called BITS designed for 
books, so you have the possibility to use JATS and BITS to encode a 
collection including articles and books.

DITA is different from the others you cite in that it's designed for 
when you have chunks of content that you want to remix in various 
documents.  It's a level of abstraction above the common uses of XML for 
documents (to describe structure and meaning of the text).

So, given that you're encoding previously published material (therefore 
ruling out DocBook), your choice of encoding language really does come 
down to:

a) what kind of texts you have
b) whether you are interested in representing only what was on the 
printed page or more complex artifacts such as annotations made on the page
c) what you plan to do with these texts (that is, are there tools for 
working with your documents that expect a certain encoding format, or 
other organizations that you will give the texts to who expect a certain 
format?)

Regardless of which XML format you choose, you can use XSLT or XQuery to 
work with the documents.

--Kevin