> > 1. Has anyone experience of using a data service to key-in ancient
> > texts rather than undertaking it in-house? Are there any issues to be
> > especially aware of?
> well, the TLG was originally done by keyboarders in Asia, wasn't it?
The story I heard was that it was more effective than asking Greek
keyboarders to do it, as they would correct the texts to modern Greek.
Don't remember who I heard it from, though.
> > 2. Standard practice seems to be to transliterate the Greek using the
> > encoding scheme. Is this still the case? I get the impression that
> > entities (for classical Greek) are not yet well supported, but I would
> > interested to hear about other possible solutions.
Whether numeric or (heaven forbid!) named entities are "Unicode" is a
semantic issue; one would prefer to use Unicode with UTF encodings, but
Unicode is not an encoding or group of encodings; it is sort of a
standardized reference set of code-point / character correspondences which
can be used (and has been used) to construct encodings or entity lists. I
myself probably have to shoulder some of the blame for the confusion that
entities are Unicode, as it is promulgated on earlier drafts of my
still-in-draft Unicode polytonic Greek help pages, mainly because the use of
such entities is easier for the creation of static web pages in Windows 9x
(the original audience for those pages). Entities are a means of
representing Unicode characters in non-Unicode-compatible encodings (such as
ISO8859-1, or even ASCII). True Unicode encodings include UTF-8, UTF-16 and
its variants, and UTF-32, none of which are natively supported in Windows 9x
(though you can display characters from them using Internet Explorer, and
use them in Office 2000) or Mac OS9 (though through the use of ATSUI Unicode
awareness has been incorporated into many OS9 applications). UTF-8 is the
native encoding form of Windows 2000 (and Windows NT, but it works better in
Windows 2000) and the forthcoming Windows XP, and I believe in Mac OS X.
Yes, the use of beta code has been the standard practice, in part because
most of the development work has been done with non-Unicode aware operating
systems (earlier Unices, for instance). But if practicable, storage in some
encoding (probably UTF-8) of the Normalization Form D of Unicode would be
preferable. The World Wide Web Consortium's working draft for the character
model recommendation at http://www.w3.org/TR/1999/WD-charmod-19991129/
recommends Normalization Form C (especially see section 4.1, W3C Text
Normalization); however, this would I think dramatically increase the
difficulty of creating a search engine. One can then easily convert to
Normalization Form C (so-called "precomposed characters") for display.
There are a number of freeware and shareware and even a few professional
fonts which can handle this; Jeffrey Rusten's review of Palatino Linotype
(now included in Windows 2000) in the Bryn Mawr Classical Review is an
excellent introduction, if now out of date; he did not realize that PL would
not support combining diacriticals (nor does his own shareware font support
This is not the same practice as used by Perseus, as explained by Anne
Mahoney; Perseus stores in Beta Code and then converts on the fly to any of
a number of proprietary font encodings (including the two GreekKeys
encodings, which are very popular with Macintosh-toting classicists). This
was the best solution at the time (as BetaCode storage was the best solution
for TLG back in the 70's, and BetaCode to GreekKeys conversion was the best
solution for the average TLG user with a Mac in 1991, and as betacode in
email remains even today, with the limitations of many popular mailers); but
one would like to think that two years down the road the pure Unicode
solution will be better.
At any rate, if you do decide to use Beta Code instead, it's easy enough to
write a script to convert that to UTF-8 at a later date (It's been done at
least six times that I know of). And one can easily enough renormalize NFC
Unicode to NFD or back, if standards-compliance requirements change.
If you decide to use Unicode, UTF-8 is the best bet: it will be natively
supported by the two current commercial consumer (i.e., non-open-source)
operating systems by January 1 (I mean Windows XP and - if I remember
correctly - Mac OS X; as I said above, Windows 2000 also has native
support), and it is easily handled in many Linux applications (Yudit and
Mozilla, for instance). Your service would hopefully have access to Windows
2000. Also, if you do decide to use Unicode, make the investment in a copy
of *The Unicode Standard 3.0* - the book is US$50, and worth every penny; it
includes a CD-ROM with a number of example scripts (including a
normalization/renormalization script) in both Java and C.
> > 3. For delivery to the end-user (after processing the XML) we are
> > considering using some form of dynamic fonts to represent Greek on the
> > screen (or failing that, suggesting the download of a font like SPionic
> > SGreek). Again, I would be interested in hearing of other mechanisms
> > displaying ancient Greek via the Web, particularly solutions which are
> > server-side.
It is easy enough to run betacode through a Perl script that will transcode
it to Unicode. See http://www.methymna.com/unicode/g-script.cgi ; and see
its PHP inspiration, Sean Redmond's Greek Font Converter at
http://www.jiffycomp.com/smr/unicode/convert.php3. The one at Methymna is
merely a bunch of RegExps that convert betacode into UTF-8, though I have to
say that it is 1. unfinished and 2. untested. Converting this server-side
it might be worth taking a look at the open source Mozilla character
handling code; they manage a nice trick of checking various fonts for the
correct characters or combinations to display anything in Greek (so that as
far as I've been able to determine one can use Mozilla 0.9 in any compatible
operating system to read any Greek, with only the standard Microsoft Core
Web fonts installed). There is a partial list of Unicode fonts that can
handle polytonic Greek at http://www.methymna.com/unicode/3-fonts.html and
of relevant links at http://www.methymna.com/unicode/9-references.html ; all
of this material is in a draft state (much less finished than implied by the
version number) and so should not be taken as reliable.
> Of course, you can always create PDF on the fly, if you want to be
> *sure* of getting decent fonts.
UGGH. Please, whatever you do, do not use PDF. User experience with PDF is
far less satisfying than with XML (even XHTML!) display (one must wait for
the plug in to load, pages are usually loaded one at a time, one has to deal
with page breaks). PDF is great for anything that has to be printed, but
for web display is simply irritating.
[log in to unmask]