Today, working with v7m, SVR1, and bsd2.11 all PDP11 ports, for example, will stay booted
and operational for long periods under simulation.
With these older UNIX variants, working with awk and even the classic shell tools is often
problematic. Moreover resource constraints seem to be a persistent annoyance under
simulation.
When dealing with even moderately sized text files, one is often left with writing a C
program to ameliorate the limitations of any attempt to exclusively use awk, and the other
classic shell tools. It’s not a leap to suggest that users running UNIX on actual metal
instead of simulation faced the same resource challenges.
Holy cow have things changed. Today, awk, and the other classic shell tools are amazing.
Resource limitations are rare or even non-existent, especially so in the Cloud. Google
seems to have led the way into taming unstructured data. Even email today is virtually
one huge text stream where it’s binary element is masked by even more text. Text, text,
text! All of this text data (CSV or whatever) has paved the way and extended the
meaningful life of the classic shell tools and even newer tools that are now
classics—-especially when an RDB is involved.
Just don’t hit that null or you might need to ameliorate with C again.
Truly,
Bill Corcoran
On Oct 13, 2019, at 10:35 AM, Richard Tobin
<richard(a)inf.ed.ac.uk> wrote:
I was reminded of this by Larry's comment:
I miss Brian on this list. I've interacted
with him over the years, the
one I remember the most was I was trying to do an awk like interface to a
key/value "database".
Recently I've had to deal with a lot of data in CSV
(comma-separated-value) format. Awk is *almost* prefect for this, but
of course doesn't handle the quoting of fields that contain commas.
One can usually work around it by finding a character that doesn't
occur in the data and converting the CSV file to use that as the
separator, but it's not ideal.
Awk's input could easily be modified to handle CSV files, but output
would be a bit more difficult, because you don't specify field
boundaries explicitly on output. One possibility would be a printf()
format specifier that takes a field and quotes it appropriately.
-- Richard
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.