Thanks Dan, this message is exactly what I was trying to express. To
piggyback on some of your idea, one limitation on getting at the
representation is the simplicity of the shell. If you look back at
The Mother of All Demos, or environments like the LispM, Project
Oberon, Alto, even the BLIT it seems to me like there may be ways to
harmonize the underlying representation with ({use?, implementation?,
mental model?} What is it that makes the unix shell and pipelines so
desirable and unchanged?
One thing I notice is that as my millenial peers learn, adopt, and use
unix as software or doctrine, it is a fixture. It seems like older
generations have some of this as well but we can make computers do
whatever we want; there are no rules only conventions.
I'm not trying to convince anyone of anything, there is mounting
science that we basically never change our minds on anything. The
conversation was useful for solidifying my own views and maybe making
people who have done this for a long time to express their views on
the basics for shared consideration.
Regards,
Kevin
On Mon, Jan 13, 2020 at 4:48 PM Dan Cross <crossd(a)gmail.com> wrote:
[Resending as this got squashed a few days ago. Jon, sorry for the duplicate. Again.]
On Sun, Jan 12, 2020 at 4:38 PM Jon Steinhart <jon(a)fourwinds.com> wrote:
[snip]
So I think that the point that you're trying to make, correct me if I'm wrong,
is that if lists just knew how long they were you could just ask and that it
would be more efficient.
What I understood was that, by translating into a lowest-common-denominator format like
text, one loses much of the semantic information implicit in a richer representation. In
particular, much of the internal knowledge (like type information...) is lost during
translation and presentation. Put another way, with text as usually used by the standard
suite of Unix tools, type information is implicit, rather than explicit. I took this to be
less an issue of efficiency and more of expressiveness.
It is, perhaps, important to remember that Unix works so well because of heavy use of
convention: to take Doug's example, the total number of commands might be easy to
find with `wc` because one assumes each command is presented on a separate line, with no
gaudy header or footer information or extraneous explanatory text.
This sort of convention, where each logical "record" is a line by itself, is
pervasive on Unix systems, but is not guaranteed. In some sense, those representations are
fragile: a change in output might break something else downstream in the pipeline, whereas
a representation that captures more semantic meaning is more robust in the face of change
but, as in Doug's example, often harder to use. The Lisp Machine had all sorts of
cool information in the image and a good Lisp hacker familiar with the machine's
structures could write programs to extract and present that information. But doing so
wasn't trivial in the way that '| wc -l' in response to a casual query is.
While that may be true, it sort of assume that
this is something so common that
the extra overhead for line counting should be part of every list. And it doesn't
address the issue that while maybe you want a line count I may want a character
count or a count of all lines that begin with the letter A. Limiting this example
to just line numbers ignores the fact that different people might want different
information that can't all be predicted in advance and built into every program.
This I think illustrates an important point: Unix conventions worked well enough in
practice that many interesting tasks were not just tractable, but easy and in some cases
trivial. Combining programs was easy via pipelines. Harder stuff involving more elaborate
data formats was possible, but, well, harder and required more involved programming. By
contrast, the Lisp machine could do the hard stuff, but the simple stuff also required
non-trivial programming.
The SQL database point was similarly interesting: having written programs to talk to
relational databases, yes, one can do powerful things: but the amount of programming
required is significant at a minimum and often substantial.
It also seems to me that the root problem here is that the data in the original
example was in an emacs-specific format instead of the default UNIX text file
format.
The beauty of UNIX is that with a common file format one can create tools that
process data in different ways that then operate on all data. Yes, it's not as
efficient as creating a custom tool for a particular purpose, but is much better
for casual use. One can always create a special purpose tool if a particular
use becomes so prevalent that the extra efficiency is worthwhile. If you're not
familiar with it, find a copy of the Communications of the ACM issue where Knuth
presented a clever search algorithm (if I remember correctly) and McIlroy did a
critique. One of the things that Doug pointed out what that while Don's code was
more efficient, by creating a new pile of special-purpose code he introduced bugs.
The flip side is that one often loses information in the conversion to text: yes, there
are structured data formats with text serializations that can preserve the lost
information, but consuming and processing those with the standard Unix tools can be messy.
Seemingly trivial changes in text, like reversing the order of two fields, can break
programs that consume that data. Data must be suitable for pipelining (e.g., perhaps
free-form text must be free of newlines or something). These are all limitations. Where I
think the argument went awry is in not recognizing that very often those problems, while
real, are at least tractable.
Many people have claimed, incorrectly in my
opinion, that this model fails in the
modern era because it only works on text data. They change the subject when I
point out that ImageMagick works on binary data. And, there are now stream
processing utilities for JSON data and such that show that the UNIX model still
works IF you understand it and know how to use it.
Certainly. I think you hit the nail on the head with the proviso that one must
_understand_ the Unix model and how to use it. If one does so, it's very powerful
indeed, and it really is applicable more often than not. But it is not a panacea (not that
anyone suggested it is). As an example, how do I apply an unmodified `grep` to arbitrary
JSON data (which may span more than one line)? Perhaps there is a way (I can imagine a
'record2line' program that consumes a single JSON object and emits it as a
syntactically valid one-liner...) but I can also imagine all sorts of ways that might go
wrong.
- Dan C.