It appears that Grant Taylor via TUHS <gtaylor(a)tnetconsulting.net> said:
On 1/2/25 6:40 AM, arnold(a)skeeve.com wrote:
The paper on compressing the dictionary was
interesting. In the day
of 20 meg disks, compressing a ~ 2.5 meg file down to ~ .5 meg is a
big savings.
It's even more important when sending data across the wire.
Was the compressed dictionary put into use? I
could imaging that
spell(1) at least would have needed some library routines to return
a stream of words from it.
I couldn't help but think about the DNS on wire compression format which
will re-use part of the existing query name to de-duplicate later parts
of the same query name.
I know it's not the same, but it felt un-ignorably close in both purpose
and method.
Lempel and Ziv published the LZ77 paper in 1977 (hence the name) which uses
back pointers into a sliding window of text. Later tweaks brought us LZ78
and compress and gzip.
There's really only two ways to compress data: use a variable length coding scheme
with
the shortest codes for the most common tokens, or a dictionary that uses pointers to
repeated strings. Huffman invented the former in 1951, Lempel and Ziv the latter in
1977, although as we've seen people did special purpose versions of the dictionary
approach like this one. Modern schemes use combinarions of both.
The DNS data formats were invented in about 1982 but I have no idea whether
Mockapetris was familar with LZ. I suppose I could ask him.
R's,
John