It seems my reply to Clem went astray. POSIX Part 2, Shell and
Utilities, is very clear:
2..2.2.181 text file: A File that contains characters organized into
one or more lines.
The lines shall not contain NUL characters ...
2.2.2.95 line: A sequence of zero or more non-<newline> characters
plus a terminating <newline> character,
Oddly--and in my opinion wrongly--the standard excludes empty files.
It would be a shock
if an editor or groff refused to process an empty file and thereby
broke Kernighan's law, "Do
nothing gracefully".
Doug
On Thu, Mar 27, 2025 at 3:35 PM Clem Cole <clemc(a)ccc.com> wrote:
Chet - as I said, we tried so hard to keep that kind of crap out. Dennis was right.
FWIW: with UNIX (POSIX), input will end if an EOF and the ANSI C untilies will obey it or
a newline, so you can write the code to work fine either way. But that's a choice of
implementation/what subroutines - how you think about the data.
I'll accept that that is what the words say in the >>awk<< specification
document, but as one of the original authors of the first UNIX standard and the later
POSIX standard I can say we tried hard to mak sure we got it right and follow the idea: A
regular file has no structure and never to allow the standard to impose it. I think the
core standard still says that, and the basic idea is unchanged. The actual structure of
the input file is an application idea, not a UNIX/POSIX defined idea.
The issue here is the term POSIX. Do you mean it to be the kernel (.1) and if a
>>specific<< application with .2 (the C compiler itself, awk, ed) which might
put structure onto the file and that's fine. The >>OS<< does not set
the structure — it is done by something else.
I understand having the application do it; I wish it did not. Many applications (even
text editors) can (and have) been written without needing one specific structure, which is
my point. I also accept that the folks who took over the standard in the name of
"progress" changed (relaxed) much of what we worked so hard to avoid, knowing
there were dragons - particularly WRT to textual information. We really did not want to
repeat the errors of the 1960s. i.e., as George Santayana originally wrote, “Those who
cannot remember the past are condemned to repeat it.”
ᐧ
On Thu, Mar 27, 2025 at 3:05 PM Chet Ramey <chet.ramey(a)case.edu> wrote:
>
> On 3/27/25 3:00 PM, Clem Cole wrote:
> > Argh -- I standard corrected. We worked hard at the beginning to keep that
> > crap out -- sigh.
> >
> > But at least is does says: POSIX.1-2024 /_does not _//_distinguish between
> > text files and binary files_/ (see the ISO C standard)
>
> It also says "The standard utilities that have such restrictions always
> specify "text files" in their STDIN or INPUT FILES sections," so you
can't
> avoid it.
>
> awk is one such utility (sh is not). This is an application requirement, so
> awk is required to add a newline at the end of a file that does not have
> one.
>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
> ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRU chet(a)case.edu
http://tiswww.cwru.edu/~chet/