Print

Print


I just loaded the Brown Corpus into NoteTab and used the search/Count
Occurrences on each letter, and then on each letter preceded by a
space. I did this some years ago for cryptographic purposes. Consider
it public domain. You may copy it without restriction and no credit is
necessary. :-)

--gary

On Tue, Jan 14, 2014 at 6:55 PM, Daniel Bowman <[log in to unmask]> wrote:
> Gary -
>
> Did you calculate that yourself, or did you find the list somewhere?
> Also, would it be alright if I put this on my blog (glossarch.wordpress.com)
> and credited you?
>
> Danny
>
>
> 2014/1/14 Gary Shannon <[log in to unmask]>
>
>> Brown Corpus sentence initial letter frequencies:
>>
>> A  4830
>> B  2412
>> C   981
>> D   941
>> E   868
>> F  1462
>> G   517
>> H  4653
>> I  7006
>> J   354
>> K   143
>> L   713
>> M  1836
>> N  1314
>> O  1735
>> P  1034
>> Q    41
>> R   666
>> S  3225
>> T 11928
>> U   299
>> V   143
>> W  3100
>> X     2
>> Y   859
>> Z    10
>>
>> Word initial frequencies not necessarily sentence initial:
>>
>> A 110860
>> B  43975
>> C  47523
>> D  29485
>> E  23694
>> F  39649
>> G  16784
>> H  50015
>> I  61347
>> J   5104
>> K   4939
>> L  23095
>> M  38075
>> N  20476
>> O  70123
>> P  38482
>> Q   1912
>> R  25593
>> S  66437
>> T 148382
>> U  11410
>> V   6377
>> W  58433
>> X     36
>> Y   7689
>> Z    208
>>
>> --gary
>>
>> On Tue, Jan 14, 2014 at 9:03 AM, Pete Bleackley
>> <[log in to unmask]> wrote:
>> > staving Alex Fink:
>> >
>> > Does anyone happen to have data on what the frequency distribution of
>> > sentence-initial letters in English prose is (even better if it's
>> > subdivided by register)?  Failing that, does anyone have a favourite
>> > corpus from which it would be easy to extract this kind of thing?
>> >
>> >
>> > As I said on G+, Python's NLTK library should include suitable corpora.
>> >
>> > --
>> > Pete Bleackley
>>