Sheau-Hwang Chang wrote:
>
> I would like to know if a DTD has been written for encoding newspapers.
Is this for encoding existing content, or new content?
Collins Dictionaries have considerable experience of encoding electronic
newspaper sources for our "Bank of English" corpus, which is now well
over 400 million words. The range of encoding methods in the sources we
get never ceases to amaze (and horrify) me. The most civilized I've seen
is from TAZ, the German "alternative" newspaper, which used HTML 3.2
with additional metadata encoded in "structured comments". Somewhere on
their website (http://www.taz.de/) they have (or had) details of their
data structure, which might be a good place to start for rolling your
own DTD.
The worst encoding I've seen was data we got from a Spanish newspaper,
which came as a huge bundle of CDs -- filled with 300dpi TIFF and JPEG
images of pages. The data were platform-independent, but not in any
useful sense...
Stewart
--
Stewart C. Russell Senior Analyst Programmer
[log in to unmask] Collins Dictionaries
use Disclaimer; my $opinion; Bishopbriggs, Scotland
|