Print

Print


Ray Brown scripsit:

> How would you define 'character'?

Ah, *that* is so hard that I fear I'm unable!  (Lewis Carroll)

Thus speaks the Unicode Standard:

        (1) The smallest component of written language that has
        semantic value; refers to the abstract meaning and/or shape,
        rather than a specific shape (see also glyph), though in
        code tables some form of visual representation is essential
        for the reader's understanding.  (2) A unit of information
        used for the organization, control, or representation of
        textual data.

But in fact I think that characters are like points in Euclid: truly
undefined, and having some sort of definition (Euclid's was "that which
has no part") only in order to satisfy the desire to define everything.

> But I was commenting here on the term 'grapheme' which some people use. I
> am never quite certain what they regard as the 'smallest/basic unit of
> writing' is.  I do recall somewhere an argument whether lower case {i}
> was one or two graphemes.
>
> Also in the "grapheme" terminolgy, the various form of the character {a},
> including its upper case variant, are termed "allographs".   It appears
> from what you say above, that we would say an 'allograph' is a variant of
> a grapheme with its own distinctive glyph.  Or am I going wildly astray?
>
> I would greatly appreciate your definitions of these terms.

I think the point to be made about graphemes is that, like phonemes,
they are defined with respect to a particular orthographical convention.
"b" and "d" are distinct graphemes for the same reason that [b] and [d]
are in English: it's easy to find minimal pairs.  With a little work,
we can find minimal pairs for "p" and "P" as well:  [Pp]olish, e.g.
Thus the question of whether the dot on the "i" in Turkish is a separate
grapheme is resolved by Occam's Razor: we gain nothing by abstracting
it away, since either there are two graphemes "i" and dotless-i, or
two graphemes dotless-i and dot.  Better then to stick to the overt
level and recognize i and dotless-i.

> But if we can say that a-e ligature is a single letter in one language,
> but a ligature of two letters in another, then it seems to me that we
> can also say that, e.g. {ch} are two separate letters in English but a
> single composite letter in Welsh & Spanish.

Fair enough.

--
Schlingt dreifach einen Kreis vom dies! || John Cowan <[log in to unmask]>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,           || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)