I think you're looking at the wrong Posix document. Posix Shell and
Utilities (at least my ancient copy) says
2.2.2.181 Text file: A file that contains one or more lines.
The Lines shall not contain NUL characters ...
Doug
On Thu, Mar 27, 2025 at 2:45 PM Clem Cole <clemc(a)ccc.com> wrote:
Branden Robin wrote:
>> info groff gives semantics for including
nonempty files that don't end
>> with newline. Such files violate the Posix definition of text file.
Not so fast. POSIX does not define a text file (that's an ANSI C-ism - thank you
DOS -- more in minute).
Even in the original POSIX definition we were very careful to >>never<< do
that abomination. What the current POSIX specs says is:
3.164 File
An object that can be written to, or read from, or both. A file has certain attributes,
including access permissions and type. File types include regular file, character special
file, block special file, FIFO special file, symbolic link, socket, and directory. Other
types of files may be supported by the implementation.
....
3.317 Regular File
A file that is a randomly accessible sequence of bytes, with no further structure imposed
by the system.
In fact, nowhere in the term "text file" defined (or used — search for it) in
the POSIX standard. The problem is that at the same time, ANSI C was being defined in
the early 1980s The DOS weenies wanted to support the silly two characters line
terminations and old notion of a structured "text file" which PC/MS-DOS had
inherited from the DEC and IBM OS's of 1960s and what IBM has called "access
methods" in the old days (in UNIX everything is just a stream of bytes thank you or
in Multics, everything is segment). The UNIX folks on the C committee were very much
against adding the support — which Lattice C for MS-DOS had added at the time but happened
to be the most popular C compiler for the DOS target and the number of user of the
compiler was growing extremely rapidly. They were also pushing for things like the near
and far keywords. It was quite a fight in the C community. (It was one of the additions
that caused Dennis to quip: "When I read commentary about suggestions for where C
should go, I often think back and give thanks that it wasn't developed under the
advice of a worldwide crowd."
Anyway, the UNIX folk had to decide which eveils the DOS folks were pushes they could
live with and a compromise was the creation of the new "rb" "wb"
"ab" crud in the fopen(3) call since we you use "r", "w",
"b" as because POSIX allows UNIX to open a what ANSI C called a "text
file" since there no such thing under POSIX.
But in the ANSI C spec (which, remember, is >>only<< for C source code), it
says:
5.2.1 Character sets
.... The representation of each member of the source and execution basic character sets
shall fit in a byte.
In both the source and execution basic character sets, the value of each character after
0 in the above
list of decimal digits shall be one greater than the value of the previous. In source
files, there shall
be some way of indicating the end of each line of text; this International Standard
treats such an
end-of-line indicator as if it were a single new-line character. In the basic execution
character set,
there shall be control characters representing alert, backspace, carriage return, and new
line. If any
other characters are encountered in a source file (except in an identifier, a character
constant, a string
literal, a header name, a comment, or a preprocessing token that is never converted to a
token), the
behavior is undefined.
The key point in the highlighted text in ANSI C is that it does not require it. A
problem in the ANSI C standard is that does >>use<< the term "text
file" although ANSI C never really defines what one is (and probably could not
because of the fight - UNIX would have said -- nope -- no structure required. ANSI C does
say in:
7.21.2 Streams
...
Environmental limits
An implementation shall support text files with lines containing at least 254 characters,
including
the terminating new-line character. The value of the macro BUFSIZ shall be at least 256.
The whole compromise was that you did not have to use termination of type. If the concept
of a 'textfile" is local to the implementation. UNIX folks get to keep doing
things as they did before, so if you used fopen with "r" or "w" its up
to the program to worry about the format of the file "access modes" from the
1960s were not needed.
ᐧ