Print

Print


> Date:         Fri, 9 Nov 2001 14:00:16 -0500
> From: John Cowan <[log in to unmask]>
>
> Lars Henrik Mathiesen wrote:
> > Ah, I forgot that we have an expert on Unicode on board. There's a few
> > little things I'd like to know, John --- you can answer in private
> > email if you want to spare the rest of the list:
>
> I'll reply publicly so that the answers get archived, since others
> may well care in future.

Thanks for the answers, John.

I think I should clarify that I want to produce pure Unicode as
output, which can be cut and pasted from whatever medium you get the
script result in, to your own document in another application.

Markup is all good and well, but does HTML work if I want to paste
into yudit?

Adding glyphs and ligatures to a font is not viable either, if it
means that people will have to download 25 MB to view a web page.

So I'm trying to limit myself to stuff that will be legible in any
conforming font and (level 3) implementation.

I think that means that I will have to use the full size arrows for
upstep and downstep, and maybe have a user option to produce either
contour symbols or diacritics for tone --- and in general adding zero
width spaces to carry any diacritics beyond the first, in effect
turning them into modifier letters.

But it's a real problem that Arial Unicode seems to have a bad glyph
for the double inverse breve, in that it makes it hard to produce
correct Unicode that is also displayed correctly in the one existing
implementation.

I didn't check how Word renders the combining ligature halves at
U+FF20 and U+FF21, but using them will at the very least require some
reordering of characters... hmmm... or maybe not, if I use a table for
all the _x combos, including those with tie bars. What constraints on
the order of combining characters do I have to observe to make sure
the ligature halves will mesh? (According to the Unicode standard,
that is --- I'm not expecting Word's font engine to do anything beyond
overprinting).

> No general-purpose font like AU can serve all purposes equally.
> Hell, you shouldn't use the same O WITH ACUTE glyph for both Spanish and
> Polish -- the typical compromise isn't steeply inclined enough for
> Polish.

Well, part of the problem is that the IPA-specific characters seem to
be found only in the 25 MB complete-Unicode monster fonts.

I would be happy if there was a font freely available that covered
just the chars needed for IPA, in a unified design. Including the
50-odd glyphs needed from ASCII, Latin-1, Extended-A/B, Superscript,
Arrows, Combining Half Marks, wherever, and having some ligaturing
info and extra glyphs, perhaps.

This may seem inconsistent with what I wrote above --- but such a font
would be small enough to download if needed, and small enough that it
might be distributed with standard software. And while font selection
is just as unportable between applications as markup is, it would not
be something that happened dynamically in the IPA text --- it would be
a sort of template that each user of the system would develop once.

And, note well, this font would only be needed to get a pretty
display. The goal is still to be as correct as possible in any
conforming Unicode font/implementation.

Lars Mathiesen (U of Copenhagen CS Dep) <[log in to unmask]> (Humour NOT marked)