Print

Print


Jim Henry, On 21/07/2007 19:25:
> Cool.  What program are you using to measure the frequency of
> tokens?  Does it measure frequency of phrases as well?
> You can get such a script (in Perl) from my site:
> 
> http://www.pobox.com/~jimhenry/conlang/frequencies.pl
> 
> (I have a newer, better version than what is on my website,
> but I can't FTP-upload it from the hospital wireless network.
> I'll do that sometime after I get out.  Meanwhile I could email
> it to you if you want it.)
> 
> If you have something that will measure the frequency of
> wildcard phrases (e.g. how often two words occur with
> any word between them, or with any two words, or...)
> let me know.

Ideally you'd derive your statistics not from strings of wordforms but from semanticosyntactic trees. Or both. E.g. you'd want to find the frequency of "give X food" (which might warrant a compressed form meaning "feed X"), regardless of the length of X.

I say "ideally" because it'd mean an awful lot of work, for results that would be very interesting yet surely still distressingly distant from perfection.


--And.