Print

Print


Hi!

Jim Henry writes:
> > > el-  don -ej -o
> > >   0.95, 0.20, 0.0
> ....
> > > or thereabouts.  How would we combine these to
> > > get an overall opacity score for the word?
>
> > The total score should of course be the product of those values, since
> > from the core pieces, each level of opaqueness influences the
> > opaqueness of the whole by its morpheme boundary level.
>
> But multiplying the nonzero values would give a lower opacity
> score for "eldonejo" than for "eldoni", when in
> fact "eldonejo" is slightly more opaque than "eldoni".
> And if we multiply all values then any word that has at least
> one perfectly transparent morpheme boundary
> would get a perfectly-transparent opacity score of 0!

Oops!  That's not want I wanted.

> Maybe it would be better to multiply the
> _transparency_ scores rather than _opacity_ scores,
>
> (1 - n_0) * ( 1 - n_1) * (1 - n_2 )....
> in this case,
> (1 - 0.95) * ( 1 - 0.20 ) * ( 1 - 0 )
> = 0.05 * 0.80
> = 0.04 (transaprency)
>
> and then subtract that from 1 to get its
> opacity score, = 0.96.

You are absolutely right, that's much more sensible.  I had actually
mixed up the two levels.

But we might agree that this type of math may be mainly for fun
anyway. :-)

> > > "eldoni" and "eldonejo" in the lexicon to inflate
> > > the count too much since the latter builds on
> > > the former and is almost transparent if you already
> > > know "eldoni". ...
> >
> > This is more tricky, yes.  In the lexicon an Þrjótrunn, I have an
> > operation that cuts off parts of an existing entry for construction of
> > a new one.  Maybe that would be feasible?
>
> Can you clarify further?

Well, it is currently a simple string operation -- not linguistically
founded, but still helpful for linguistics: you could chop off the
last three characters of 'eldonejo' and use the stub 'eldon' for
further operations.

> I think Alex Fink's suggestions were probably
> along the right lines, at least vis-a-vis lexicon
> counting: count only the outermost branching.

But when the result of previous branching steps are not part of the
lexicon, e.g. because two morphemes are added to form a new word while
adding only the first one leaves you with garbage, then it's not the
best way, I think.  However, I would propose to multiply all
boundaries not resulting in anything already in the lexicon so that
you get a recursive derivation tree.

E.g. if you have ABC in the lexicon already and want to add ABCDE and
if ABCD does not exist, the either assign the operation +DE one score
and use this for a lexicon entry, or multiply the scores of +D and +E.

What this will give you is a score for deriving this word from some
shorter word in the lexicon.  Yes, this is probably what you want for
lexicon counting.

And multiplying all scores will give you the score for deriving that
word from scratch, i.e., from roots only.

**Henrik