On Thu, Jan 2, 2025, 7:51 AM Douglas McIlroy <douglas.mcilroy@dartmouth.edu> wrote:
I am not aware that the compressed dictionary was used for anything.
Steve Johnson's first shell-script spelling-checker did make a pass
over a dictionary, but not Webster's second, which would have caused
lots of false negatives because it contains so many exotic small words
that could result from typos.

Where did the Websters Second file come from? Did the labs give the public domain paper dictionary to the equivalent of a typing pool and had them enter it? It did it come from elsewhere? Or something else? How was it checked for accuracy?

Warner


My production spell aggresively stripped
affixes and used hashing and other coding tricks to keep its
"dictionary" in the limited memory of a PDP-11. (The whole story is
told in https://www.cs.dartmouth.edu/~doug/spell.pdf and insightfully
described by Jon Bentley in
https://dl.acm.org/doi/pdf/10.1145/3532.315102.) When larger memory
became available, these heroics were replaced by basic common-prefix
coding patterned after Morris and Thompson, just as Arnold surmised.

On Thu, Jan 2, 2025 at 7:41 AM <arnold@skeeve.com> wrote:
>
> Hi.
>
> The paper on compressing the dictionary was interesting. In the day
> of 20 meg disks, compressing a ~ 2.5 meg file down to ~ .5 meg is
> a big savings.
>
> Was the compressed dictionary put into use? I could imaging that
> spell(1) at least would have needed some library routines to return
> a stream of words from it.
>
> Just wondering.  Thanks,
>
> Arnold