On Sun, Jan 12, 2020 at 1:45 PM Jon Steinhart
<jon(a)fourwinds.com> wrote:
Kevin Bowling writes:
I honestly can't tell if this is genius
level snark :) in case you're
sincere we generally go to great lengths to build up data types and
structures (in C lingo) when programming only to tear those useful
attributes off often at inopportune times. Basically type
systems/type safety have been too expensive or too difficult to use
through history.
Think of sitting at an SQL prompt as a counterpoint. You can pretty
easily get at the underlying representation and relationships of the
data and the output is just a side effect. Not saying SQL is the
ultimate answer, just that most people have a bit of experience with
it and UNIX so can mentally compare the two for themselves and see the
pros and cons to preserving the underlying representations.
Regards,
Kevin
On Sun, Jan 12, 2020 at 1:34 PM Jon Steinhart <jon(a)fourwinds.com> wrote:
>
> Kevin Bowling writes:
> > This is kind of illustrative of the '60s acid trip that perpetuates in
> > programming "Everything's a string maaaaan". The output is seen
as
> > truth because the representation is for some reason too hard to get at
> > or too hard to cascade through the system.
> >
> > There's a total comedy of work going on in the unix way of a wc
> > pipeline versus calling a length function on a list. Nonetheless, the
> > unix pipeline was and is often magnitude easier for a single user to
> > get at. This kind of thing is amusing and endearing to me about our
> > profession in modern day.
> >
> > Regards,
> > Kevin
>
> Can you please elaborate? I read your post, and while I can see that it
> contains English words I can't make any sense out of what you said.
>
> Thanks,
> Jon
I wasn't being snarky. You said
"The output is seen as truth because the representation is for some
reason too hard to get at or too hard to cascade through the system."
I honestly have no idea what that means.
If the SQL prompt example did not clarify you are welcome to go one on
one if this is something you think is curious to you, I think I've
explained the point I was making adequately for a general audience.
Likewise,
"There's a total comedy of work going on in the unix way of
a wc pipeline versus calling a length function on a list."
I just don't know what you mean.
Reason through what happens in a shell pipeline, the more detail the
better. A quick nudge is fork/exec, what happens in the kernel, what
happens in page tables, what happens at the buffered output, tty layer
etc. Very few people actually understand all these steps even at a
general level in modern systems.
If you had a grocery list on a piece of paper would you
a) count the lines or read it directly off the last line number on the
paper if it is numbered
b) copy each character letter by letter to a new piece of equipment
(say, a word processor), until you encounter a special character that
happens to be represented as a space on the screen, increment a
counter, repeat until you reach another special character, output the
result and then destroy and throw away both the list and the word
processor equipment.
This kind of thing doesn't really matter in the small or at all for
performance because computers are fast. But most programming bugs in
the large eventually boil down to some kind of misunderstanding where
the representation was lost and recast in a way that does not make
sense.
Regards,
Kevin
OK, I have trouble correlating this with your original post but I think
that I understand it well enough to comment.
I agree that it is a problem that very few people understand what's going on
inside anything today from a toaster to a computer. On the computer end of
things this concerns me a lot and improving the quality of education in this
area is one of my main late-in-life missions. I'm under the illusion that
I've helped some based on comments that I've received from people who have
tracked me down and let me know how much the information in my book helped them.
On to your example...
If I had a grocery list on a piece of paper I would count the lines because I
don't number my grocery lists. I'm going to guess that few people do. So, I
would count the lines in my head and remember the result. This is pretty much
equivalent to what happens when something is piped into wc.
I don't see much difference between a and b in your example. That's because
when I count up the number of lines in the list, I am making a temporary copy
of the list in my head and then forgetting what was on the list (which may
account for the late night trip to the grocery store a couple of days ago).
So I think that the point that you're trying to make, correct me if I'm wrong,
is that if lists just knew how long they were you could just ask and that it
would be more efficient.
While that may be true, it sort of assume that this is something so common that
the extra overhead for line counting should be part of every list. And it doesn't
address the issue that while maybe you want a line count I may want a character
count or a count of all lines that begin with the letter A. Limiting this example
to just line numbers ignores the fact that different people might want different
information that can't all be predicted in advance and built into every program.
It also seems to me that the root problem here is that the data in the original
example was in an emacs-specific format instead of the default UNIX text file
format.
The beauty of UNIX is that with a common file format one can create tools that
process data in different ways that then operate on all data. Yes, it's not as
efficient as creating a custom tool for a particular purpose, but is much better
for casual use. One can always create a special purpose tool if a particular
use becomes so prevalent that the extra efficiency is worthwhile. If you're not
familiar with it, find a copy of the Communications of the ACM issue where Knuth
presented a clever search algorithm (if I remember correctly) and McIlroy did a
critique. One of the things that Doug pointed out what that while Don's code was
more efficient, by creating a new pile of special-purpose code he introduced bugs.
Many people have claimed, incorrectly in my opinion, that this model fails in the
modern era because it only works on text data. They change the subject when I
point out that ImageMagick works on binary data. And, there are now stream
processing utilities for JSON data and such that show that the UNIX model still
works IF you understand it and know how to use it.
I don't agree with your closing comment about "most programming bugs". Do
you
have any data to support this or is it just an opinion? My opinion is that most
programming bugs today result from total incompetence as one can prety much get
a computer science degree today without every learning that programs run on
computers or what a computer is. That's something I'm trying to change, but
it's
probably a lost cause. A long topic, and not necessarily appropriate for this list.
Jon