On Thu, Nov 9, 2017 at 9:34 AM, Will Senn <will.senn(a)gmail.com> wrote:
Why does the first of these incantations not present
text, but the second
does (word is a file)? Neither errors out.
When dealing with obscure pipeline surprises, I find it best to try
and reason through what's happening, step-by-step.
$ <word | sed 20q
well, here you have (at least) two processes: sed, and another (more
on that in a second). The standard input of `sed` is connected to the
read-end of a pipe; the write-end of that pipe is connected to that
unnamed process. The unnamed process's input is connected to the file
`word` in the current directory.
So...what's that unnamed process? To figure out what's going on, we'll
have to look at the source for the shell itself. In this case, the
answer to the mystery lies in /usr/src/cmd/sh/xec.c, in the (extremely
long) function `execute`. This is the heart of the shell's
interpreter. Basically, the string entered by the user has been parsed
into an in-memory data structure (essentially an abstract syntax
tree), and this function is responsible for interpreting that data
structure and invoking the entered commands. It is called recursively;
you'll note that the bulk of it is a switch statement on the token
type for the current node in the syntax tree.
The magic here is in the case for TFIL, which sets up a pipe between
the left-hand side of tree and the right-hand side.
In this case, the left-hand-side is an empty command; basically,
equivalent to typing return at the shell prompt. Note that that just
happens to be perfectly syntactically valid, so the parser doesn't
generate an error. We may reasonably assume that this results in a
left-hand child of the pipe node in the AST that is set with TFORK and
an empty command vector: indeed, we see this in the `term` function in
cmd.c.
In the context of a pipeline, a new copy of the shell *is* forked for
this, and if we follow the TFORK logic, we can see that, in the
handling of the child, we do things like setting up pipes and I/O
redirection and then either execute a builtin command, or *if the
command is not empty* we execute the command via `execa`. This is not
a builtin command, but since this command is empty, we don't do
anything in the child other than exit.
Thus, the empty command produces no output, despite having its input
redirected from a file and therefore nothing is put into the write-end
of the pipe and sed sees nothing on the read-end (other than EOF).
$ <word sed 20q
This one is easy: you're redirecting the input to `sed 20q` from the
file `word` in the current directory. This is semantically the same
as,
$ sed 20q < word
Except that the shell allows you to play some syntactic shenanigans
with putting the redirection before the command. This is simply a
consequence of how the command parser was written.
- Dan C.