Typo, in v3 through v6, may be the most creative Unix program to have come out of Bell Labs. It served as a spell checker before spell(1), though it knew nothing about spelling beyond a list of the most common words in the language. This brainchild of Bob Morris would, in his words, work just as well in Urdu as in English.

The beautiful trick: gather trigram frequencies in the document, then print out a list of the individual words in increasing order of the likelihood that they came from the statistical source that those frequencies  characterize. Typos (as distinct from phonetic misspellings) generally floated toward the beginning of the list and so were easy to spot.

But that's not all that Bob invented. 26^3 16-bit trigram counts didn't fit in the PDP-11 memory, so he counted them in 8-bit bytes. To do so he invented the trick of "counting large integers in small registers". Roughly speaking, when you see a word whose current count is in the range 2^(n-1) to 2^n-1, you increment the count with probability 1/2^n, thus getting an approximation to lg n, which serves in estimating the entropy of each word.

This counting method merited a patent and is now recognized as the first of what is now an active subfield of theoretical computer science--memory-bounded streaming algorithms.

Doug

On Wed, Mar 1, 2023 at 6:00 PM Warner Losh <imp@bsdimp.com> wrote:
In looking at the first AUUGN today, I noticed the following at the end of a letter John Lions sent home when he spent a sabbatical at Bell Labs

image.png
I've seen the first patent, but not the second one... That's got to be a joke or inside joke, right? Anybody know anything else about it?