[TUHS] Re: Was the compressed dictionary used?

2 Jan 2025

The BSDs since 4.4lite have added a lot of missing words, but few
corrections. From FreeBSD:
Capitalized Transvaal, fixed 'stock certificate' to have a 't' and
preconsoidate -> preconsolidate
Ahtena, freen, unknowen and structurelessness were removed
corelate (etc)  and freend were removed as typos and only thinly supported
variants.
Not bad for 50 years of nit-pickers pouring over the file.
Warner
On Thu, Jan 2, 2025 at 10:20 AM Douglas McIlroy <
douglas.mcilroy(a)dartmouth.edu&gt; wrote:
...
  The word list of Webster's 2nd came from an Air
Force project along
 with several other files, including a medical dictionary and an
 alphabetical list of tetragrams found in Web2--something one would
 expect to create for oneself nowadays. The files were freely
 distributed with no strings attached. We have not noticed any
 mistakes. The list includes 76205 entries that contain blanks or
 hyphens; these were omitted from the pinhead exercise.
 Doug
 On Thu, Jan 2, 2025 at 10:13 AM Warner Losh &lt;imp(a)bsdimp.com&gt; wrote:

 On Thu, Jan 2, 2025, 7:51 AM Douglas McIlroy < 
douglas.mcilroy(a)dartmouth.edu&gt; wrote:

 I am not aware that the compressed dictionary was used for anything.
 Steve Johnson's first shell-script spelling-checker did make a pass
 over a dictionary, but not Webster's second, which would have caused
 lots of false negatives because it contains so many exotic small words
 that could result from typos. 
 Where did the Websters Second file come from? Did the labs give the  public domain
paper dictionary to the equivalent of a typing pool and had
 them enter it? It did it come from elsewhere? Or something else? How was it
 checked for accuracy?

 Warner
> My production spell aggresively stripped
> affixes and used hashing and other coding tricks to keep its
> "dictionary" in the limited memory of a PDP-11. (The whole story is
> told in https://www.cs.dartmouth.edu/~doug/spell.pdf and insightfully
> described by Jon Bentley in
> https://dl.acm.org/doi/pdf/10.1145/3532.315102.) When larger memory
> became available, these heroics were replaced by basic common-prefix
> coding patterned after Morris and Thompson, just as Arnold surmised.
>
> On Thu, Jan 2, 2025 at 7:41 AM &lt;arnold(a)skeeve.com&gt; wrote:
> >
> > Hi.
> >
> > The paper on compressing the dictionary was interesting. In the day
> > of 20 meg disks, compressing a ~ 2.5 meg file down to ~ .5 meg is
> > a big savings.
> >
> > Was the compressed dictionary put into use? I could imaging that
> > spell(1) at least would have needed some library routines to return
> > a stream of words from it.
> >
> > Just wondering.  Thanks,
> >
> > Arnold 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

[TUHS] Re: Was the compressed dictionary used?