Another problem with arrangements of small UNIX commands in pipelines is
that the actual arrangement in use suffers from reliability and usability
problems:
1. No way to test the whole, since in general each application has a unique
structure with a potentially different choice of components, (A shell
program executes whatever commands are on the system, not those it might
have been tested with.)
2. No comprehensive error reporting (at best, reporting from individual
commands), and
3. No way to provide support.
On a much smaller scale, imagine a component stereo setup that is
delivering bad sound. You have a turntable, an arm, a cartridge, a pre-amp,
an amp, speakers, and cables and wires, typically from seven or more
different manufacturers. Not one of them would be able to help you with
support. The dealer would, if you bought the whole lot from them. Or you
could pay a consultant. This is one reason why in the 1960s so-called
console stereos were popular. Generally, console stereos delivered inferior
sound.
This isn't a criticism of sorting with UNIX commands, it's a broader
criticism of the UNIX software tools approach for serious application
development.
Of course, one could build a single system out of components, and package
it all together as a tested and supported product. That's exactly what
object-oriented programming does, and very successfully.
Marc
On Sat, Jan 18, 2025 at 8:50 AM Paul Winalski <paul.winalski(a)gmail.com>
wrote:
On Sat, Jan 18, 2025 at 10:17 AM Larry McVoy
<lm(a)mcvoy.com> wrote:
On Sat, Jan 18, 2025 at 04:51:15PM +0200,
Diomidis Spinellis wrote:
But I can't stop thinking that, in common
with the mainframes these programs were running on, they represent a
mindset
that has been surpassed by superior ideas.
I disagree. Go back and read the reply where someone was talking about
sorting datasets that spanned multiple tapes, each of which was much
larger than local disk. sort(1) can't begin to think about handling
something like that.
I have a lot of respect for how Unix does things, if the problem fits
then the Unix answer is more simple, more flexible, it's better. If
the problem doesn't fit, the Unix answer is awful.
cmd < data | cmd2 | cmd3
is a LOT of data copying. A custom answer that did all of that in
one address space is a lot more efficient but also a lot more special
purpose. Unix wins on flexibility and simplicity, special purpose
wins on performance.
Another consideration: the smaller System/360 mainframes ran DOS (Disk
Operating System) or TOS (Tape Operating System, for shops that didn't have
disks). These were both single-process operating systems. There is no way
that the Unix method of chaining programs together could have been done.
OS MFT (Multiprogramming with a Fixed number of Tasks) and MVT
(Multiprogramming with a Variable number of Tasks) were multiprocess
systems, but they lacked any interprocess communication system (such as
Unix pipes).
True databases in those days were rare, expensive, slow, and of limited
capacity. The usual way to, say, produce a list of customers who owed
money, sorted by how much they owed would be:
[1] scan the data set for customers who owed money and write that out to
tape(s)
[2] use sort/merge to sort the data on tape(s) in the desired order
[3] run a program to print the sorted data in the desired format
It is important in step [2] to keep the tapes moving. Start/stop
operations waste a ton of time. Most of the complexity of the mainframe
sort/merge programs was in I/O management to keep the devices busy to the
maximum extent. The gold standard for sort/merge in the IBM world was a
third-party program called SyncSort. It cost a fortune but was well worth
it for the big shops.
So the short, bottom line answer is that the Unix way wasn't even possible
on the smaller mainframes and was too inefficient for the large ones.
-Paul W.
--
Subscribe to my Photo-of-the-Week emails at my website
mrochkind.com.