Print

Print


My much tried lang is facing orthographic changes the size of
which haven't been seen since I switched from LATIN1-encoding to
UTF8 some years ago, but in many ways this time it'll be even
worse, and I need some advice/ideas.

The ortography of Taruven is more of a transliteration than a
proper orthography and sometimes the system breaks down. For
instance:

Problem #1

I've been using ¨ (umlaut) to mark roots that are written with a
symbol instead of with letters, for instance in the
negation-marker {ë-}. Until now, all such roots that I knew of
were single-syllable roots containing at least one vowel, but
there might be such roots that are multi-syllabic, or contain a
syllabic consonant, or worse: is a long vowel and therefore
marked with a macron. This means, basically, that it might be
necessary to put an umlaut on every single existing letter in
the orthography. Now, there are methods in UTF8 to but any
diacritic on any letter but it varies *a lot* how well this is
supported. 

Fortunately I have come up with a work-around: use the style
used for transliterating mixed cuneiform: write the
single-symbol roots in uppercase.
http://en.wikipedia.org/wiki/Hittite_cuneiform#Determiners

So, ëbren "no car" becomes "Ebren" or maybe "E-bren".

BUT: One letter is already used in both upppercase and
lowercase. {h} the glottal fricative/approximant is not the same
letter or sound as {H} the affricate /r/. {h} is
something of a problem itself, being ambiguous today.

Problem #2

A word in taruven cannot end in a voiced stop. Voiced stops are
therefore either aspirated (written with an {h}) or adds a
protective e (now written as {e}). Unfortunately this means that
words that actually end with an aspirated voiced stop, or voiced
stop+e, can be mistaken for words with "protected" final voiced
stops. This is unfortunate because in a compound or when adding
suffixes, the "protection" is stripped away:

If the protection is aspiration, the {h} is always stripped
regardless of the form of the suffix:

    dagh "cave" + -en "plural" -> dagen "caves"
    bogh "to drown" + -ra "past tense" -> bogra "drowned"

If the protection is an {e}, it is only stripped in front of
vowels (any vowel):

    ige "short" + -ra "past tense" -> igera "was short"
    ige "short + -a "comparative" -> iga "shorter
    džeŋge "tourist" + -en "plural" -> džeŋgen "tourists"

In the non-latin orthography, the {h} is marked with a diacritic
on the consonant while the {e} is marked with an {e} with the
same diacritic.

{e} is most common with words ending on {g} but as you can see
from {dagh} and {ige}, both are possible.

A complicating factor is that {h} currently has four uses: 1) to
protect/aspirate a voiced stop as mentioned above (something
that phonemically speaking isn't necessary to show in a phonemic
orthography, but this is a transliteration), 2) to show that a
stop is aspirated (though I should really use a superscript h
{ʰ} for this, 3) to show that a sound/syllable has breathy voice
and 4) to show the stand-alone consonant /h/. 

Now, there are plenty of neat symbols in Unicode of course, but
they can't be mixed willy-nilly as what sections that are
supported by fonts and keyboards come in blocks. Tricky enough
that' I'm using {ỳ} and {ȳ} already :)


t.