While poking around I discovered a modern go rewrite by Rob Pike
https://github.com/robpike/typo
On Wed, Mar 01, 2023 at 11:41:17PM -0500, Douglas McIlroy typed:
Typo, in v3 through v6, may be the most creative Unix
program to have come
out of Bell Labs. It served as a spell checker before spell(1), though it
knew nothing about spelling beyond a list of the most common words in the
language. This brainchild of Bob Morris would, in his words, work just as
well in Urdu as in English.
The beautiful trick: gather trigram frequencies in the document, then print
out a list of the individual words in increasing order of the likelihood
that they came from the statistical source that those frequencies
characterize. Typos (as distinct from phonetic misspellings) generally
floated toward the beginning of the list and so were easy to spot.
But that's not all that Bob invented. 26^3 16-bit trigram counts didn't fit
in the PDP-11 memory, so he counted them in 8-bit bytes. To do so he
invented the trick of "counting large integers in small registers". Roughly
speaking, when you see a word whose current count is in the range 2^(n-1)
to 2^n-1, you increment the count with probability 1/2^n, thus getting an
approximation to lg n, which serves in estimating the entropy of each word.
This counting method merited a patent and is now recognized as the first of
what is now an active subfield of theoretical computer
science--memory-bounded streaming algorithms.
--
Steve Mynott <steve.mynott(a)gmail.com>
rsa3072/629FBB91565E591955B5876A79CEFAA4450EBD50