LISTSERV mailing list manager LISTSERV 16.5

Help for TEI-L Archives


TEI-L Archives

TEI-L Archives


TEI-L@LISTSERV.BROWN.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

TEI-L Home

TEI-L Home

TEI-L  February 1991

TEI-L February 1991

Subject:

Unicode 1.0

From:

[log in to unmask]

Reply-To:

Text Encoding Initiative public discussion list <[log in to unmask]>

Date:

Tue, 5 Feb 91 10:56:20 GMT

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (80 lines)

I recently requested a copy of the draft spec of Unicode 1.0 character
encoding.  Although not able to give it all the time I'd have liked, my
brief look does raise a number of comments.  I'm grateful to have the
opportunity to plug my comments into the general discussion (via TEI,
HUMANIST and the UNICODE team themselves:[log in to unmask]).
 
(a)  There are a number of significant typos; is anyone keeping a master
record of these?
 
(b)  Robin Cover  <ZRCC1001@SMUVM1> has raised the question why there are
not separate encodings for Hebrew SIN and SHIN.  They are certainly at least
as distinct as, say, LATIN E followed by ACUTE and LATIN E ACUTE.  I take
it that the reason the latter case has two encodings is because of
previous ISO encodings; but since those are in any case ASCII encodings
(and Unicode is intended as a replacement for ASCII) how relevant is that?
The question also raises a more fundamental problem in my mind.  There
are a number of situations where a glyph (or conglomerate of glyphs) can
reasonably be encoded in alternative ways; HYPHEN (U+2010=U+002d) would be
a case in point.  We are told that some of these redundancies are there so
that natural pairing can be used "if desired" (page 6).  However, these coded
pairs are not consistently undertaken (eg CAPITAL DOTTED I).  But what worries
me is that two encodings of an identical text may thus turn out to be very
different; and for anyone using computer comparison of texts this could be
quite problematic.  So over against those who complained that, eg, separate
codings for GREEK ALPHA+GRAVE are not available I would voice the opposite
disquiet:  the encodings are too comprehensive.  If ALL accentuation was
added as a separate code I think comparison of texts would be easier.
 
The ordering of the accents would then of course be important, and I don't
think the algorithm given (centre-out) is terribly helpful; which is
nearest the cente in GREEK ROUGH BREATHING+ACUTE+IOTA SUBSCRIPT?
Wouldn't an additional algorithm (clockwise starting at twelve o'clock)
be useful?
 
(c) While we're on Greek, I couldn't find a Greek semicolon (raised dot).
Maybe I just didn't look hard enough, but full punctuation would be useful.
But see my comment (e) below.
I also failed to locate LATIN CAPITAL LETTER WYNN.
 
(d)  In general I approve of the policy that by adding the special Coptic
forms to the Greek alphabet one can generate Coptic text, with hard copy
generated by choosing an appropriate font.  (And mutatis mutandis for
other languages.)  However, there are some drawbacks to this policy; I
foresee the following problems:
  (i)  It may be necessary to indicate to someone (if only the compositor)
where to change font.  Could a coding for change-of-language be incorporated?
  (ii) In some Greek texts it may be important to indicate where ligatures
are used; there seems no way in this encoding to distinguish between
GREEK KAPPA + GREEK ALPHA + GREEK IOTA on the one hand and the ligature
which stood for "kai" on the other.  I am sometimes in the position of
needing to say (as indeed the authors of the manual were) something like
"There are three possible form of LATIN SMALL LETTER G CEDILLA (U+0123)
and they look like ..."  How could I encode my ellipsis?  Could the whole
of the manual as printed be sensibly encoded in Unicode?  Oddly, there are
some forms which are exclusively graphic variants (ie one would not find
them together in a "natural" text) which do attract separate codings;
GREEK SMALL LETTER SCRIPT THETA for instance.  Perhaps consistency is
unattainable, but to me it is a desideratum.
 
(e) The encoding of special numerals seemed odd.  AS well as a select
group of fractions (thirds, quarters and eighths, I think) there is the
top half of fractional 1/nnn (U+215f).  How is its use envisaged?  Wouldn't
a generalised "fractional line" be better (let's call it U+nnnn) so that
<number string1>nnnn<number string2> is to be interpreted as a fraction?
 
Similarly, Roman 12 (XII) is encoded as U+216b, but 13 (XIII) must be
(presumably) U+2169 2162.  Why not a single code for "roman numbers follow
 here:"
(or just use ROMAN CAPITAL LETTER X &c)?
 
If codes for general *modes* like "Greek font"; "roman numeral", "fraction"
were included, then many ambiguities and problems could be reduced.  My
Greek semicolon, for instance, could be "GREEK FONT + ;"
 
This contribution could be better thought-out, but it was this or nothing.
If the latter seems preferable; please discard!
 
Sincerely,
Douglas de Lacey.

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

ATOM RSS1 RSS2



LISTSERV.BROWN.EDU

CataList Email List Search Powered by the LISTSERV Email List Manager