On 2020-03-04 16:50:34, Random832 spake thus:
[...]
Sure, but "stdin is a sequence of any type, and
the argument is an expression that operates on that type or the name of a property that
that type has" is universal enough.
The part that has to operate on a specific structure isn't the command, it's
the arguments.
For example, a powershell pipeline to produce a list of files sorted by modified date
is:
gci . | sort lastwritetime | select name
all three *commands* are universal - not all objects have a "lastwritetime" and
"name" property, but sort and select can operate on any property that the
sequence of objects passed into it has.
There are some examples of that type of thing in widely used Unix tools;
my use of 'sort -k1,1n' further down is demonstrating such a use case (the
'sort' command is being told that it is operating on numbers). But beyond
some lowest common denominator types ("number", "string", ...) how
many
commands can really usefully operate on a large number of types? For
example, a program that can operate on IP addresses is probably doing
something different than a program that wants to operate on email
addresses.
I could see where named properties of some object can be used more
generally than types, but again there are widely used tools that do do
that (e.g., jq(1)). IMHO, though, they are more cumbersome to use than
most of the commands I need to use minute to minute.
(gci is an alias for get-childitem... it also has
aliases ls and dir, but I'm emphasizing that it's not exclusive to directories)
*assuming that ls -t didn't exist*, to do this with unix tools that operate on text
you would need:
ls -l | [somehow convert the date to a sortable format, probably in awk] | sort |
[somehow pick the filename alone out of the output - possibly with cut or sed or awk
again]
(Just nit-picking at this particular example)
You could do it without ls[0]:
$ stat -c '%Y %n' * | sort -k1,1n | xargs -L1 sh -c 'echo
"$@"'
That doesn't seem so bad to me, but if it was something I needed regularly
I'd of course put it in an alias[1] or (more likely) a short script file.
and it's very difficult to get tools like awk,
sort, and cut to work on formats that contain more than one field that may contain
embedded spaces (you can just about get away with it for ls output because the date is
always three "words").
[...]
Yes, that's often true. And when I enounter it I typically start out by
seeing if I can inject and remove tokens in the data at key places in the
pipeline. Beyond anything trivial, though, I then quickly start reaching
for tools to put the data into some form that more easily allow for it
(CSV, JSON, ...). But that invariably adds other complications (such as
the need to find or build tools to marshal/unmarshal the data, and to
deal with data-domain-specific notions of null-vs-empty-string).
For the (more common (for me)) case where there is only one field that
contains embedded spaces, I just try to get 'em at the end of the line
and let the shell deal with it:
$ some-command | while read -r first second rest; do ... ; done
Maybe it would be enough to have the universal
interface be "tables" (i.e. text streams in some format that supports adequate
escaping of embedded row and column delimiters)... or maybe even just table rows, and let
the user deal with memorizing column numbers (or let each originating command support a
fully general way to specify what columns are requested, as ps alone does on modern
systems) Of course, this isn't *really* different from allowing any data structure -
after all, the value for any field could itself be a fully escaped table in text format.
[...]
Well, in some sense with byte streams you have a table of newline-delimited
bytes (rows), and byte subfields separated by whitespace (columns). And
anything on top of that could (in some context, and with some syntax) be
considered just further escaped tables in text format. I think that's
essentially the same thing that you said, only with the outermost table
syntax removed. But like you said, this isn't really different from
allowing any data structure. Importantly, though, it doesn't impose any
particular data structure, either.
I've worked at a couple of different places that had in-house tools for
working with explicit table semantics in command line suites, and where
they fit the data domain, that was hugely useful. Generally speaking, they
were special purpose enough to warrant their own tools, but still general
purpose enough to be composable (were designed for use in shell pipelines)
and applicable in domains beyond the intentions of their original authors.
Still, the burden of "thinking in tables" would make them too heavyweight
for a lot of common use cases. Sometimes my data structure is "paragraphs
of text":
$ lorem -p 3 | perl -00 -wnle '2 == $. && print' | wc -w
Other times I want a tree (JSON, s-expressions, ...), or even a stream of
trees[2]. I consider it a feature that these more complex data structures
are not assumed or imposed in contexts where they are not needed.
Take care,
-Al
[0] You could get 'ls' to do it, too, (without '-t') but here the use
of
TIME_STYLE is a presumably non-portable (but handy!) GNU-ism:
$ TIME_STYLE='+%s' ls -l | tail -n +2 | sort -k6,6n | xargs -L1 sh -c
'shift 5; echo "$@"'
It's different from the '-t' option, though, in that it forces a
predicatable date field format in the output of 'ls -l', so side-steps
the need for downstream date parsing altogether and simply jumps into
sorting (after chopping off the 'total N' header (groans all around)).
[1] E.g.,
$ # read 'bmt' as: "by mtime"
$ alias bmt='stat -c "%Y %n" * | sort -k1,1n | xargs -L1 sh -c
'"'echo "'"$@"'"'"
$ bmt
[2] Probably flattened.