Print

Print


Here are the letters sorted by frequency of occurrence; the first row is
for the first letter in a sentence and the second for initial letter in a
word.

T  I A H S W B M O F N P C D E Y L R G J U V K Q Z X
T A O S I  W H C B F P M D R E L N G U Y V J K Q Z X




2014/1/14 Daniel Bowman <[log in to unmask]>

> Gary -
>
> Did you calculate that yourself, or did you find the list somewhere?
> Also, would it be alright if I put this on my blog (
> glossarch.wordpress.com) and credited you?
>
> Danny
>
>
> 2014/1/14 Gary Shannon <[log in to unmask]>
>
>> Brown Corpus sentence initial letter frequencies:
>>
>> A  4830
>> B  2412
>> C   981
>> D   941
>> E   868
>> F  1462
>> G   517
>> H  4653
>> I  7006
>> J   354
>> K   143
>> L   713
>> M  1836
>> N  1314
>> O  1735
>> P  1034
>> Q    41
>> R   666
>> S  3225
>> T 11928
>> U   299
>> V   143
>> W  3100
>> X     2
>> Y   859
>> Z    10
>>
>> Word initial frequencies not necessarily sentence initial:
>>
>> A 110860
>> B  43975
>> C  47523
>> D  29485
>> E  23694
>> F  39649
>> G  16784
>> H  50015
>> I  61347
>> J   5104
>> K   4939
>> L  23095
>> M  38075
>> N  20476
>> O  70123
>> P  38482
>> Q   1912
>> R  25593
>> S  66437
>> T 148382
>> U  11410
>> V   6377
>> W  58433
>> X     36
>> Y   7689
>> Z    208
>>
>> --gary
>>
>> On Tue, Jan 14, 2014 at 9:03 AM, Pete Bleackley
>> <[log in to unmask]> wrote:
>> > staving Alex Fink:
>> >
>> > Does anyone happen to have data on what the frequency distribution of
>> > sentence-initial letters in English prose is (even better if it's
>> > subdivided by register)?  Failing that, does anyone have a favourite
>> > corpus from which it would be easy to extract this kind of thing?
>> >
>> >
>> > As I said on G+, Python's NLTK library should include suitable corpora.
>> >
>> > --
>> > Pete Bleackley
>>
>
>