Print

Print


On Sat, 2 May 2009, Risto Kupsala wrote:

>> [...]

> I am sure that many people have thought about the same thing before.
> There are many constructed languages, which use small base vocabulary
> and a lot of compounds. In my opinion we should use them as models.
>
> If only one percent of compounds are usable, it doesn't matter. If
> there are 400 basic words, then there would be 400*400*0.01=1600
> two-syllable compounds and 400*400*400*0.01=640000 three-syllable
> compounds. They would be enough.
>
> However, the efficiency can be improved by selecting better root
> words, which would be more productive. If ten percent of compounds
> were usable, then there would be 400*400*0.1=16000 two-syllable
> words, which is more than enough for most situations.
>
>> [...]

In any language which relies heavily on compounding, there is always
the problem of discriminating word boundaries.  In writing, this is
rarely a problem, as discrete words tends to be separated by spaces. 
But in continuous speech streams, things are less clear.  The
Loglan/Lojban people (or at least the latter; I am unsure about the
former) constructed their languages so that a continuous speech stream
could/can always be unambiguously parsed into discrete words.  This may
not always be the case.  Sona and Suma some to mind, for examples. 
This may particularly be the case for constructed languages which have
a simple word structure, such as V, CV, CVV, and VCV.  English, for
example, allows complex structures (including one CCCVCCC syllable/word
that I can think of), so although the vocabulary learning burden is
higher (and pronunciation learning for foreign adult speakers as well),
the word boundary parsing issue is lessened.

-- 
Paul Bartlett