0

ETAOIN SRHLDCU

Great article by Peter Norvig about his analysis of letter frequency using the Google Books data set. Looks like ETAOIN’s got a new last name.

On December 17th 2012, I got a nice letter from Mark Mayzner, a retired 85-year-old researcher who studied the frequency of letter combinations in English words in the early 1960s. His 1965 publication has been cited in hundreds of articles. Mayzner describes his work:

I culled a corpus of 20,000 words from a variety of sources, e.g., newspapers, magazines, books, etc. For each source selected, a starting place was chosen at random. In proceeding forward from this point, all three, four, five, six, and seven-letter words were recorded until a total of 200 words had been selected. This procedure was duplicated 100 times, each time with a different source, thus yielding a grand total of 20,000 words. This sample broke down as follows: three-letter words, 6,807 tokens, 187 types; four-letter words, 5,456 tokens, 641 types; five-letter words, 3,422 tokens, 856 types; six-letter words, 2,264 tokens, 868 types; seven-letter words, 2,051 tokens, 924 types. I then proceeded to construct tables that showed the frequency counts for three, four, five, six, and seven-letter words, but most importantly, broken down by word length and letter position, which had never been done before to my knowledge.

and he wonders if:

perhaps your group at Google might be interested in using the computing power that is now available to significantly expand and produce such tables as I constructed some 50 years ago, but now using the Google Corpus Data, not the tiny 20,000 word sample that I used.

The answer is: yes indeed, I am interested! And it will be a lot easier for me than it was for Mayzner. Working 60s-style, Mayzner had to gather his collection of text sources, then go through them and select individual words, punch them on Hollerith cards, and use a card-sorting machine.

Read the whole thing — it’s excellent!


Stop breadboarding and soldering – start making immediately! Adafruit’s Circuit Playground is jam-packed with LEDs, sensors, buttons, alligator clip pads and more. Build projects with Circuit Playground in a few minutes with the drag-and-drop MakeCode programming site, learn computer science using the CS Discoveries class on code.org, jump into CircuitPython to learn Python and hardware together, or even use Arduino IDE. Circuit Playground Express is the newest and best Circuit Playground board, with support for MakeCode, CircuitPython, and Arduino. It has a powerful processor, 10 NeoPixels, mini speaker, InfraRed receive and transmit, two buttons, a switch, 14 alligator clip pads, and lots of sensors: capacitive touch, IR proximity, temperature, light, motion and sound. A whole wide world of electronics and coding is waiting for you, and it fits in the palm of your hand.

Join 12,000+ makers on Adafruit’s Discord channels and be part of the community! http://adafru.it/discord

CircuitPython 2019!

Have an amazing project to share? The Electronics Show and Tell with Google Hangouts On-Air is every Wednesday at 7:30pm ET! To join, head over to YouTube and check out the show’s live chat – we’ll post the link there.

Join us every Wednesday night at 8pm ET for Ask an Engineer!

Follow Adafruit on Instagram for top secret new products, behinds the scenes and more https://www.instagram.com/adafruit/


Maker Business — SiFive is a startup to pay attention to. RISC-5 is here to stay.

Wearables — Swatch it up

Electronics — Code like everyone’s watching

Biohacking — Stroboscopic Visual Training

Python for Microcontrollers — CircuitPython takes flight! All aboard with datum, Bluefruit CPX, and more! #Python #Adafruit #CircuitPython #PythonHardware @circuitpython @micropython @ThePSF @Adafruit

Get the only spam-free daily newsletter about wearables, running a "maker business", electronic tips and more! Subscribe at AdafruitDaily.com !



No Comments

No comments yet.

Sorry, the comment form is closed at this time.