Print

Print


Don Blaheta scripsit:

> Enter Unicode.  Rather than restrict itself to 8 bits, the Unicode
> consortium decided to make a 16-bit standard.  This gave them 65,535
> character values to play with; finally, they could create one character
> set to include every character in every script currently in use, and
> several that aren't.
>
> Of course, this isn't without its problems.  One-byte codes are *very*
> entrenched in the computer world, and there is a lot of extant code that
> assumes that characters are only one byte long.

Enter UTF-8.  This is a method for encoding Unicode, whereby the 128
ASCII values continue to be represented by 0-127 only, and combinations
of 2, 3, or 4 numbers in the range 128-253 (254 and 255 aren't used)
are used to represent all the other Unicode characters.  Thi simeans
that programs that understand only ASCII still work, and the other
characters can often just be "passed through" without understanding.
Not perfect, but it helps.

--
John Cowan                                   [log in to unmask]
       I am a member of a civilization. --David Brin