The NLM DTDs (superseded by JATS) are three tag sets, each designed for
different purposes. While JATS is generally used by publishers to
encode material they are currently publishing (born-digital material),
the "Archiving and Interchange" tag set is more flexible, allowing for
encoding previously published material that might not follow the same
rigid rules you are trying to enforce for material you are currently
publishing. Thus, it shouldn't necessarily be ruled out for encoding a
collection of preexisting texts.
Furthermore, there's a customization of JATS called BITS designed for
books, so you have the possibility to use JATS and BITS to encode a
collection including articles and books.
DITA is different from the others you cite in that it's designed for
when you have chunks of content that you want to remix in various
documents. It's a level of abstraction above the common uses of XML for
documents (to describe structure and meaning of the text).
So, given that you're encoding previously published material (therefore
ruling out DocBook), your choice of encoding language really does come
a) what kind of texts you have
b) whether you are interested in representing only what was on the
printed page or more complex artifacts such as annotations made on the page
c) what you plan to do with these texts (that is, are there tools for
working with your documents that expect a certain encoding format, or
other organizations that you will give the texts to who expect a certain
Regardless of which XML format you choose, you can use XSLT or XQuery to
work with the documents.