On 4/18/06, Eldin Raigmore <[log in to unmask]> wrote: > On Mon, 17 Apr 2006 14:53:09 -0400, Jim Henry <[log in to unmask]> > wrote: > >>>A morpheme consists of one or more phonemes from > >>>the beginning set, followed by one or more phonemes > >>>from the following set. > >> > >> The rule; "one set of phonemes is the final set, and another set is the > >> preceding set" would work also. > > > >I'm not sure what distinction you're making. It sounds > >like alternate terms to describe the same rule, not > >a different rule. > > Well, if you have Initials and Medials, your morphemes could look like: > > I > IM > IMM > IMMM > IMMMM > ... > > but if you have Finals and Medials, your morphemes could look like: > > F > MF > MMF > MMMF > MMMMF > ... OK. Those are distinct. But neither is what I was talking about as a generalization of Tceqli's rule. Tceqli allows *one or more* from each set, and requires that every morpheme contain at least one phoneme from each set. In other words, A+B+. Your rules are AB* and A*B; in one rule a morpheme must have exactly one from the initial set and zero or more from the final set, and in the other it must have exactly one from the final set preceded by zero or more from the initial set. > >>Essentially, pick out a set that can only occur at a boundary; then, > >>either the boundary is always just before any member of that set, or the > >>boundary is always just after any member of that set. After your more detailed examples I see this is different from Tceqli. > >This is slightly different, then. Tceqli's rule allow > >*one or more* from one set followed by one or more > >from the other set. You seem to propose > >*exactly one* from one set followed by one or more from > >the other, or one or more from the first followed > >by *exactly one* from the second. > > > >For Tceqli, the boundary is always where a member of the > >following set is followed by a member of the initial set. > > I see a problem with this. > > How do you know whether "IIMM" is a single morpheme, or "I" followed > by "IMM"? Because I is an illegal morpheme. IM is the shortest allowed. And IIM, IMM, IIIM, IMMM, IIMMM etc. are all allowed too. (In Tceqli the maximum number from the initial set is three because the initial set made up entirely of consonants while the number taken from the final set is open ended because it has a mix of vowels and consonants.) > Techniques available might include the following; <snip> Some of these are different than anything I've got on the Conlang Wikia list. Please add them, if you know anything about wiki editing, or if not, I'll add them when I have time. > 1. You could divide up the phonological segments into the following classes; > a. Segments that can be the first segment of a morpheme, but can't be any > non-first segment. > b. Segments that can't be the first segment of a morpheme, but can be any > non-first segment. > > Then the morphemes will look like a, ab, abb, abbb, abbbb, ... etc. > Morpheme boundaries would occur just previous to each a. > > > 2. You could divide up the phonological segments into the following classes; > c. Segments that can be the last segment of a morpheme, but can't be any > non-last segment. > d. Segments that can't be the last segment of a morpheme, but can be any > non-last segment. > > Then the morphemes will look like c, dc, ddc, dddc, ddddc, ... etc. > Morpheme boundaries would occur just after each c. > > > 3. If you require every morpheme to contain at least two segments, you > could divide up the phonological segments into the following classes; > e. Segments that can be the first or last segment of a morpheme, but can't > be any non-first not-last segment. > f. Segments that can't be the first nor last segment of a morpheme, but can > be any non-first non-last segment. > > Then the morphemes will look like ee, efe, effe, efffe, effffe, ... etc. > (Without the two-segment-minimum, ee might be "e, e" or might be "ee". > Morpheme boundaries would occur just after each fe and just before each ef, > but a string of "ee" morphemes would have to be parsed globally; you > couldn't tell how to parse it unless you had the whole thing. > > 4. Require the first segment of each morpheme to code the length of the > morpheme. Plan B, brz, and X-1 all do this. > 5. Require the last segment of each morpheme to code the length of the > morpheme. Maybe theoretically possible, but seems rather perverse. And maybe pointless. It seems like you would have to combine it with some other scheme that limits which segment can occur where, or else you wouldn't know where the last segment is so you can count backwards from it to find the last whole morpheme.... And if you have that other method to show where a morpheme ends, why add this method to count the number of segments or syllables? Redundancy and error detection, I guess. > An extreme case; For each possible morpheme length, let there be a set of > segments that can occur in morphemes of that length, and only in morphemes > of that length. For each segment, let there be one and only one length of > morphemes in which it can occur. For lengths greater than one, let there > be a set of segments which are all and only those that can be the first > segment of such a morpheme, and a disjoint set of all and only those that ...... Henrik Theiling recently proposed doing something like this at the syllable level rather than the phoneme level -- there is a set of syllables that are monosyllabic words, and a disjoint set that are the first syllable of disyllables, another set disjoint from the first two that's the second syllable of disyllables, and so forth. I'm trying this for the first round of my iterative engelang. > 6. Another extreme technique; Pick a number and assume all morphemes are > exactly that long. Again, you can't parse it with only local information, > but you can parse it if you have the whole thing. E.g. all one syllable or all two syllables. But the syllable shape would have to be such that there can be no ambiguity about syllable boundaries. E.g. if you allow (C)V(C) syllables then the inital and final consonant sets must be disjoint or else CVCVC would be ambiguous re: CVC VC or CV CVC. If all are CV(C) then you're probably OK even if the final consonant set is a subset of the initial consonant set; CVCVC could only be CV CVC, and CVCCVC could only be CVC CVC. > >On 4/17/06, And Rosta <[log in to unmask]> wrote: > > > >> My conlang, Livagian, has unambiguous syntax parsed > >> incrementally with no lookahead, and it cuts the > >>[snip] > I'm interested in And Rosta's technique too. Although it's not a self-segregating morphology technique as such, I think it needs to be added to our list on the wiki - not as one of the numbered list of methods, but in an appendix or something. -- Jim Henry http://www.pobox.com/~jimhenry