Print

Print


On Wed, Jan 20, 2010 at 3:01 PM, Alex Fink <[log in to unmask]> wrote:
> There's something basic I'm missing about the design of ISO 639 and ISO
> 15924 and whatever else is out there.  Why in the world do they use tiny
> fixed-length letter codes?

Tiny in the case of ISO 639 because it's a very old standards, and the
revisions have maintained backward compatibility; fixed-length because
fixed-length codes are much more efficient to store and manipulate
computationally.

> To me that seems an unduly limiting constraint

It's not that limiting.  Three basic Latin letters gets you over
17,000 language codes.

> that's completely irrelevant to anything in modern computing

Slight exaggeration.  Fixed-width fields are still more efficient -
show me a RDBMS that doesn't slow down by an order of magnitude when
dealing with CLOBs vs fixed-width columns... and while most modern
programming languages deal with variable-length (and in many cases,
virtually unlimited-length) strings natively, a lot of stuff still
gets written in C, ya know.  Like the implementations of those modern
languages. :)

Now, you could make a case for *larger* fixed- (or at least maximum-)
length fields, but another advantage of <= four-letter codes is that
they can be manipulated as integers on 32-bit systems.  Very efficient
indeed.

> (And scripts get four-letter codes but
> languages three: what, does one expect there to be 26 times more scripts
> than languages that people would want to localise into!?)

That's just because the script standard is much newer than the
language one (which originally only had two letters per language
instead of three, and which now has room for expansion into
four-letter codes when warranted).


-- 
Mark J. Reed <[log in to unmask]>