ETAOIN SRHLDCU

Great article by Peter Norvig about his analysis of letter frequency using the Google Books data set. Looks like ETAOIN’s got a new last name.

On December 17th 2012, I got a nice letter from Mark Mayzner, a retired 85-year-old researcher who studied the frequency of letter combinations in English words in the early 1960s. His 1965 publication has been cited in hundreds of articles. Mayzner describes his work:

I culled a corpus of 20,000 words from a variety of sources, e.g., newspapers, magazines, books, etc. For each source selected, a starting place was chosen at random. In proceeding forward from this point, all three, four, five, six, and seven-letter words were recorded until a total of 200 words had been selected. This procedure was duplicated 100 times, each time with a different source, thus yielding a grand total of 20,000 words. This sample broke down as follows: three-letter words, 6,807 tokens, 187 types; four-letter words, 5,456 tokens, 641 types; five-letter words, 3,422 tokens, 856 types; six-letter words, 2,264 tokens, 868 types; seven-letter words, 2,051 tokens, 924 types. I then proceeded to construct tables that showed the frequency counts for three, four, five, six, and seven-letter words, but most importantly, broken down by word length and letter position, which had never been done before to my knowledge.

and he wonders if:

perhaps your group at Google might be interested in using the computing power that is now available to significantly expand and produce such tables as I constructed some 50 years ago, but now using the Google Corpus Data, not the tiny 20,000 word sample that I used.

The answer is: yes indeed, I am interested! And it will be a lot easier for me than it was for Mayzner. Working 60s-style, Mayzner had to gather his collection of text sources, then go through them and select individual words, punch them on Hollerith cards, and use a card-sorting machine.

Read the whole thing — it’s excellent!


Have an amazing project to share? The Electronics Show and Tell is every Wednesday at 7:30pm ET! To join, head over to YouTube and check out the show’s live chat and our Discord!

Join us every Wednesday night at 8pm ET for Ask an Engineer!

Join over 38,000+ makers on Adafruit’s Discord channels and be part of the community! http://adafru.it/discord

CircuitPython – The easiest way to program microcontrollers – CircuitPython.org


New Products – Adafruit Industries – Makers, hackers, artists, designers and engineers! — New Products 11/15/2024 Featuring Adafruit bq25185 USB / DC / Solar Charger with 3.3V Buck Board! (Video)

Python for Microcontrollers – Adafruit Daily — Python on Microcontrollers Newsletter: A New Arduino MicroPython Package Manager, How-Tos and Much More! #CircuitPython #Python #micropython @ThePSF @Raspberry_Pi

EYE on NPI – Adafruit Daily — EYE on NPI Maxim’s Himalaya uSLIC Step-Down Power Module #EyeOnNPI @maximintegrated @digikey

Adafruit IoT Monthly — The 2024 Recap Issue!

Maker Business – Adafruit Daily — Apple to build another chip at TSMC Arizona

Electronics – Adafruit Daily — SMT Tip – Stop moving around!

Get the only spam-free daily newsletter about wearables, running a "maker business", electronic tips and more! Subscribe at AdafruitDaily.com !


No Comments

No comments yet.

Sorry, the comment form is closed at this time.