I am aware of being able to decipher phonotactics by collecting a bunch 
of words and working out the syllable constraints based on those. The 
only trouble is, one, I was trying to avoid that method, and two, I 
don't trust it (not at the level I'm at as a "linguist"). You see, my 
trouble with the method is (though it is a tried and true method--and 
I've read about doing it many times, even been advised to do it by Mark 
Rosenfelder), how would I know, collecting say 20 words, that I've 
collected enough words to properly account for all possible syllable 
constraints within a language?

For example: if I took the English words /false, brown, car, fix, 
create, jump, yellow, activity, quirky, beautiful, emaciate, weep, 
usage, grammar, kick, atrocious, harrow, victory, let, /and/tremendous/, 
I could deduce (C)(C)V(V)(V)(C)(C), using the simple CV formula so many 
people have argued against (but for this example, let us pretend it's 
acceptable to everyone). However, for one, it doesn't account for the 
triple and quadruple consonant clusters of /strength/, for instance. 
Thus, I find this method, for myself, a little too difficult, as I'm 
uncertain I'd be able to pick out an adequate survey of words to 
represent all of the possible syllable constraints of a given language.

So, if you could elaborate a bit (how I could avoid such oversights), I 
would certainly be open to trying this method. Thank you.

J. M. DeSantis
Writer - Illustrator

Website: <>
Figmunds: <>
Game-Flush (A Humorous Video Game Site): 

On 5/24/2012 5:58 AM, BPJ wrote:
> On 2012-05-24 03:12, Dirk Elzinga wrote:
>> However, tt took me about 5 minutes to find papers online on the 
>> syllable
>> structure of Welsh, Gaelic, German, and Latin. All I did was to type<X
>> syllable structure>  into Google (where X is the name of the 
>> language). So I
>> wonder what you were doing that made it so difficult for you to find out
>> anything.
>> Dirk
>> On Wed, May 23, 2012 at 5:56 PM, J. M. DeSantis<[log in to unmask]>  
>> wrote:
>>> >  @Dirk: I'm not asking for anyone to do any research/FOR/  me. I'm 
>>> merely
>>> >  asking, if anyone/has/  the information (knows it, off hand, or 
>>> can easily
>>> >  find it) if they could kindly provide it. After months of 
>>> searching, I
>>> >  haven't turned up/ANY/  such information, and I'm simply burnt 
>>> out. If I
>>> >  was able to find the information on my own, I wouldn't be asking 
>>> the List
>>> >  for it, thank you. I appreciate your understanding in this.
>>> >
> Actually the footwork one needs to do is technically
> simple, if time-consuming and boring: get a
> reader/dictionary/wordlist, extract a largish sample
> and do the statistics. It's precisely because it's time
> consuming and boring everyone avoids doing it and
> therefore such info is hard to come by. However if you
> know a capable scripting language, are somewhat
> comfortable with regular expressions and can come by
> ready digitized texts (which should be easy for these
> languages) the work can be made quicker and more
> intellectually satisfying. I'm sure people on this list
> would not mind *helping* you with the programming part
> of the task.
> Once you have extracted the words you need to split
> words into syllables and syllables into phonemes with
> some regular expressions and then collect the
> statistics.
> In the old times they did this with pen, paper and a
> lot of time, and as always it's the person who needs
> the data (or his students, if s/he has any :-) who will
> have to do the footwork, and acquire the needed skills
> if s/he needs to.  The good thing is that you
> normally won't need a huge data set to get reliable
> results.
> The need to do various kinds of lexical and
> phonological statistics was actually the reason I
> learned Perl back when. In those days I had to create
> the 'encoding' and the fonts (and spend my own money
> on the fontmaking software!) as well as typing in all
> the data, in addition to the analysis tools. Nowadays
> at least the encoding and the fonts, and often even the
> data for more well-studied languages, are available
> online for free.
> /bpj