Print

Print


Hey now! I resent this bald faced insinuation that I have ever in me
born days phoneticized an English corpse!

Padraic




----- Original Message -----
> From: MorphemeAddict <[log in to unmask]>
> To: [log in to unmask]
> Cc: 
> Sent: Sunday, 22 December 2013, 19:34
> Subject: Re: enciphering English
> 
> Is there a corpus of phoneticized English (not necessarily spoken) similar
> to Brown's?
> 
> stevo
> 
> 
> 
> On Sun, Dec 22, 2013 at 3:10 PM, Jim Henry <[log in to unmask]> wrote:
> 
>>  On Sun, Dec 22, 2013 at 8:40 AM, Tristan <[log in to unmask]> 
> wrote:
>>  >> Would English be more difficult to decipher (in a cryptogram, 
> e.g.) if
>>  >> it were originally enciphered with the articles prefixed to the
>> 
>>  > Yes, but not very much. If the decipherer didn't know it was the 
> case, it
>>  > would be slightly more effective. You can remove every spaces and not
>> 
>>  As an experiment, I stripped every instance of "the", 
> "a" and "an"
>>  from a million-word etext (Macaulays' History of England) and did
>>  before and after character frequencies.  (These are frequencies of all
>>  characters including space and punctuation, but only spaces and
>>  letters show up in the top 10.)
>> 
>>  ==> with articles <==
>>   984592        15.80%
>>   655099        10.51%        e
>>   473350        7.59%        t
>>   391179        6.28%        a
>>   379825        6.09%        o
>>   358633        5.75%        n
>>   336552        5.40%        i
>>   321708        5.16%        h
>>   319893        5.13%        s
>>   303685        4.87%        r
>> 
>>  ==> without articles <==
>>   984592        16.57%
>>   568683        9.57%        e
>>   386934        6.51%        t
>>   379825        6.39%        o
>>   364927        6.14%        a
>>   354965        5.97%        n
>>   336552        5.66%        i
>>   319893        5.38%        s
>>   303685        5.11%        r
>>   235292        3.96%        h
>> 
>>  The relative frequencies of 'a' and 'o' reverse, but are 
> very similar
>>  either way.  'h' drops a few ranks.  Relative frequencies of the 
> big
>>  'e' and 't' are still the same.
>> 
>>  Replacing all instances of 'th' (in 'these', 'bath' 
> etc.) with 'z' has
>>  a bigger impact on relative frequencies, dropping 't' and 
> 'h' by
>>  several ranks:
>> 
>>   984592        16.21%
>>   655099        10.79%        e
>>   391179        6.44%        a
>>   379825        6.25%        o
>>   358633        5.90%        n
>>   336552        5.54%        i
>>   319893        5.27%        s
>>   314159        5.17%        t
>>   303685        5.00%        r
>>   222909        3.67%        d
>> 
>>  Still, not a huge impact on other relative letter frequencies or the
>>  difficulty of simple cryptograms.  And if you're using a cipher 
> that's
>>  vulnerable to that level of attack for anything more serious than an
>>  espionage RPG, you're in trouble anyway.
>> 
>>  --
>>  Jim Henry
>>  http://www.pobox.com/~jimhenry/
>>  http://www.jimhenrymedicaltrust.org
>> 
>