On 06-Dec-24 01:07, Marc Rochkind wrote:
I found that 2017 paper "Extending Unix Pipelines
to DAGs". It's open
access:
https://ieeexplore.ieee.org/document/7903579 <https://
ieeexplore.ieee.org/document/7903579>
The open source code itself is here:
https://github.com/dspinellis/dgsh
<https://github.com/dspinellis/dgsh>
Maybe an ambitious TUHS contributor can get the code running and give us
a report.
I wrote the dgsh code with my co-author Marios Fragkoulis, so I still
have it running. Doug McIlroy, who also mentioned dgsh in another
message, is too modest to say that its design owes much to his input. I
asked him for feedback when I was working on it, and over several
iterations he proposed important (and quite demanding as I recall)
improvements to its design.
The system allows the concise and readable expression of several graph
topologies I had in mind when I started working on it, and more [1].
However, it hasn't caught on. I think the main reason is that it is
based on modified versions of several existing tools (bash, cmp, comm,
cut, diff, diff3, grep, join, paste, perm, sort) [2]. The modifications
allow the tools to coordinate between them the setup of pipes when
placed in a dgsh graph according to available inputs and required
outputs. The changes (especially for bash) aren't small, which meant
that I didn't think it was realistic to push them upstream, which now
means that the modified tools are out of date and difficult to build.
Not sure what can be done to address this problem. It seems that a
widely adopted system, such as modern Unix/Linux, has too much inertia
for it to adopt potentially disrupting innovations.
In retrospect, the way we designed the pipe graph setup could also be
improved. The current design involves an initial phase where IPC
messages are circulating around the graph to communicate the I/O
requirements of each tool, for example that comm(1) should expect input
from two processes and output to three processes. The design is brittle
and difficult to troubleshoot, because coordination happens dynamically
behind the scenes. A better design (and one I think Doug was
advocating) would statically analyze the graph's topology and invoke
each tool with appropriate parameters or environment variables.
However, this design would require significantly more extensive
modifications to bash, or the implementation of a new shell. Both
approaches required work for which we didn't have the time and energy at
the time, and also had their own downsides regarding adoption potential.
[1]
https://www.spinellis.gr/sw/dgsh/#examples
[2]
https://www.spinellis.gr/sw/dgsh/#tools
Diomidis