Print

Print


Gary -

Did you calculate that yourself, or did you find the list somewhere?
Also, would it be alright if I put this on my blog (glossarch.wordpress.com)
and credited you?

Danny


2014/1/14 Gary Shannon <[log in to unmask]>

> Brown Corpus sentence initial letter frequencies:
>
> A  4830
> B  2412
> C   981
> D   941
> E   868
> F  1462
> G   517
> H  4653
> I  7006
> J   354
> K   143
> L   713
> M  1836
> N  1314
> O  1735
> P  1034
> Q    41
> R   666
> S  3225
> T 11928
> U   299
> V   143
> W  3100
> X     2
> Y   859
> Z    10
>
> Word initial frequencies not necessarily sentence initial:
>
> A 110860
> B  43975
> C  47523
> D  29485
> E  23694
> F  39649
> G  16784
> H  50015
> I  61347
> J   5104
> K   4939
> L  23095
> M  38075
> N  20476
> O  70123
> P  38482
> Q   1912
> R  25593
> S  66437
> T 148382
> U  11410
> V   6377
> W  58433
> X     36
> Y   7689
> Z    208
>
> --gary
>
> On Tue, Jan 14, 2014 at 9:03 AM, Pete Bleackley
> <[log in to unmask]> wrote:
> > staving Alex Fink:
> >
> > Does anyone happen to have data on what the frequency distribution of
> > sentence-initial letters in English prose is (even better if it's
> > subdivided by register)?  Failing that, does anyone have a favourite
> > corpus from which it would be easy to extract this kind of thing?
> >
> >
> > As I said on G+, Python's NLTK library should include suitable corpora.
> >
> > --
> > Pete Bleackley
>