I can take the first one:
On Tue, Oct 26, 2004 at 11:39:30PM -0400, Linda E. Patrik wrote:
> 1) Has anyone had experience doing TEI encoding on Tibetan texts or another
> foreign language with an odd alphabet, where the text is written in a script
> software that uses multiple fonts?
My answer to this is: convert [it] to Unicode.
I've done quite a bit of work on non-TEI xml/sgml South Asian
resources for the Digital South Asia Library http://dsal.uchicago.edu/.
I've gotten bilinguial dictionaries in Roman, Indic and Perso-Arabic
alphabets, in encodings ranging from ISCII to crazy unmappable
transliterations, and I've converted them all to UTF-8 for processing.
Even if the entire Tibetan computer world uses Sambhota, it's better
to re-convert UTF to Sambhota when you deliver your text. Some people
say that they love Unicode in principle but that the world's not ready
for it yet; I love Unicode in practice because the rest of the world
is increasingly ready for it.
Unicode will make your life easier for text-processing in the short
term, and in the long term you'll save yourself a lot of work once the
world *is* ready for it, and everyone will praise you for your wise
foresight. And from a glance at Sambhota encoding it appears
straightforward enough that a home-brewed transcoder will do the trick
fairly painlessly.
But I can say that because I haven't done a lot of
transliterating/transcoding lately -- though when I did, it was
reasonably painful.
I've found
http://thdltools.sourceforge.net/TibetanFormatConverterDesign.html
which appears to describe something that doesn't exist yet.
http://iris.lib.virginia.edu/tibet/tools/conv.html mentions
http://www.babelstone.co.uk/Software/BabelPad.html (page's BabelPad
link doesn't work), which can convert Extended Wylie to Sambhota, so
if you can get your Sambhota to give you Wylie you may be able to do
your conversion in two hops. But I've never done Tibetan and haven't
tried any of this software.
I've packaged the most recent milestone of my own indic-transcoding Perl
module here:
http://valla.uchicago.edu/Obliterator-current.tar.gz
(Obliterator = "Ob"-ject-based trans-"literator"). It's licensed
under the redistributor's choice of the GNU GPL or the Perl Artistic
License. It's cranky, and stupid in some places, but it has worked
for my needs of transcoding mixed roman and ISCII to UTF-8 (though for
most uses I must recommend IBM's ICU/uconv for this purpose instead of
Obliterator). As it is it won't help you with Sambhota in the least,
but there it is. Contact me offlist (and be patient) if you want to
go through its vagaries.
I hope this helps,
O.
|