```On 4/18/06, Eldin Raigmore <[log in to unmask]> wrote:
> wrote:

> >>>A morpheme consists of one or more phonemes from
> >>>the beginning set, followed by one or more phonemes
> >>>from the following set.
> >>
> >> The rule; "one set of phonemes is the final set, and another set is the
> >> preceding set" would work also.
> >
> >I'm not sure what distinction you're making.  It sounds
> >like alternate terms to describe the same rule, not
> >a different rule.
>
> Well, if you have Initials and Medials, your morphemes could look like:
>
> I
> IM
> IMM
> IMMM
> IMMMM
> ...
>
> but if you have Finals and Medials, your morphemes could look like:
>
> F
> MF
> MMF
> MMMF
> MMMMF
> ...

OK.  Those are distinct.  But neither is what I was
talking about as a generalization of Tceqli's rule.
Tceqli allows *one or more* from each set, and requires
that every morpheme contain at least one phoneme from
each set.  In other words, A+B+.
Your rules are AB* and A*B; in one rule a morpheme
must have exactly one from the initial set and zero or more
from the final set, and in the other it must have exactly one from the
final set preceded by zero or more from the
initial set.

> >>Essentially, pick out a set that can only occur at a boundary; then,
> >>either the boundary is always just before any member of that set, or the
> >>boundary is always just after any member of that set.

After your more detailed examples I see this is different
from Tceqli.

> >This is slightly different, then.  Tceqli's rule allow
> >*one or more* from one set followed by one or more
> >from the other set.  You seem to propose
> >*exactly one* from one set followed by one or more from
> >the other, or one or more from the first followed
> >by *exactly one* from the second.
> >
> >For Tceqli, the boundary is always where a member of the
> >following set is followed by a member of the initial set.
>
> I see a problem with this.
>
> How do you know whether "IIMM" is a single morpheme, or "I" followed
> by "IMM"?

Because I is an illegal morpheme.  IM is the shortest
allowed.  And IIM, IMM, IIIM, IMMM, IIMMM etc.
are all allowed too.  (In Tceqli the maximum number
from the initial set is three because the initial set made
up entirely of consonants while the number taken from
the final set is open ended because it has a
mix of vowels and consonants.)

> Techniques available might include the following;

<snip>

Some of these are different than anything I've got on the
about wiki editing, or if not, I'll add them when I have time.

> 1. You could divide up the phonological segments into the following classes;
> a. Segments that can be the first segment of a morpheme, but can't be any
> non-first segment.
> b. Segments that can't be the first segment of a morpheme, but can be any
> non-first segment.
>
> Then the morphemes will look like a, ab, abb, abbb, abbbb, ... etc.
> Morpheme boundaries would occur just previous to each a.
>
>
> 2. You could divide up the phonological segments into the following classes;
> c. Segments that can be the last segment of a morpheme, but can't be any
> non-last segment.
> d. Segments that can't be the last segment of a morpheme, but can be any
> non-last segment.
>
> Then the morphemes will look like c, dc, ddc, dddc, ddddc, ... etc.
> Morpheme boundaries would occur just after each c.
>
>
> 3. If you require every morpheme to contain at least two segments, you
> could divide up the phonological segments into the following classes;
> e. Segments that can be the first or last segment of a morpheme, but can't
> be any non-first not-last segment.
> f. Segments that can't be the first nor last segment of a morpheme, but can
> be any non-first non-last segment.
>
> Then the morphemes will look like ee, efe, effe, efffe, effffe, ... etc.

> (Without the two-segment-minimum, ee might be "e, e" or might be "ee".
> Morpheme boundaries would occur just after each fe and just before each ef,
> but a string of "ee" morphemes would have to be parsed globally; you
> couldn't tell how to parse it unless you had the whole thing.
>
> 4. Require the first segment of each morpheme to code the length of the
> morpheme.

Plan B, brz, and X-1 all do this.

> 5. Require the last segment of each morpheme to code the length of the
> morpheme.

Maybe theoretically possible, but seems rather perverse.
And maybe pointless.  It seems like you would have to
combine it with some other scheme that limits which
segment can occur where, or else you wouldn't know
where the last segment is so you can count backwards
from it to find the last whole morpheme....
And if you have that other method to show where
a morpheme ends, why add this method to count
the number of segments or syllables?  Redundancy
and error detection, I guess.

> An extreme case; For each possible morpheme length, let there be a set of
> segments that can occur in morphemes of that length, and only in morphemes
> of that length.  For each segment, let there be one and only one length of
> morphemes in which it can occur.  For lengths greater than one, let there
> be a set of segments which are all and only those that can be the first
> segment of such a morpheme, and a disjoint set of all and only those that
......

Henrik Theiling recently proposed doing something like
this at the syllable level rather than the phoneme level
-- there is a set of syllables that are monosyllabic words,
and a disjoint set that are the first syllable of disyllables,
another set disjoint from the first two that's the second
syllable of disyllables, and so forth.  I'm trying this
for the first round of my iterative engelang.

> 6. Another extreme technique;  Pick a number and assume all morphemes are
> exactly that long.  Again, you can't parse it with only local information,
> but you can parse it if you have the whole thing.

E.g. all one syllable or all two syllables.  But the
syllable shape would have to be such that there can
be no ambiguity about syllable boundaries.  E.g.
if you allow (C)V(C) syllables then the inital and final
consonant sets must be disjoint or else
CVCVC would be ambiguous re: CVC VC or CV CVC.
If all are CV(C) then you're probably OK even if the
final consonant set is a subset of the initial consonant
set; CVCVC could only be CV CVC, and CVCCVC
could only be CVC CVC.

> >
> >> My conlang, Livagian, has unambiguous syntax parsed
> >> incrementally with no lookahead, and it cuts the
> >>[snip]

> I'm interested in And Rosta's technique too.

Although it's not a self-segregating morphology
technique as such, I think it needs to be added to
our list on the wiki - not as one of the numbered
list of methods, but in an appendix or something.

--
Jim Henry
http://www.pobox.com/~jimhenry
```