Branden Robin wrote:
> info groff gives semantics for including nonempty
files that don't end
>> with newline. Such files violate the Posix definition of text file.
Not so fast. POSIX does not define a text file (that's an ANSI C-ism -
thank you DOS -- more in minute).
Even in the original POSIX definition we were very careful to >>never<< do
that abomination. What the current POSIX specs says is:
3.164 File
An object that can be written to, or read from, or both. A file has certain
attributes, including access permissions and type. File types include regular
file, character special file, block special file, FIFO special file,
symbolic link, socket, and directory. Other types of files may be supported
by the implementation.
....
3.317 Regular File
A file that is a randomly accessible sequence of bytes, with no further
structure imposed by the system.
In fact, nowhere in the term "text file" defined (or used — search for it)
in the POSIX standard. The problem is that at the same time, ANSI C was
being defined in the early 1980s The DOS weenies wanted to support the
silly two characters line terminations and old notion of a structured "text
file" which PC/MS-DOS had inherited from the DEC and IBM OS's of 1960s and
what IBM has called "access methods" in the old days (in UNIX everything is
just a stream of bytes thank you or in Multics, everything is segment).
The UNIX folks on the C committee were very much against adding the support
— which Lattice C for MS-DOS had added at the time but happened to be the
most popular C compiler for the DOS target and the number of user of the
compiler was growing extremely rapidly. They were also pushing for things
like the near and far keywords. It was quite a fight in the C community. (It
was one of the additions that caused Dennis to quip: *"**When I read
commentary about suggestions for where C should go, I often think back and
give thanks that it wasn't developed under the advice of a worldwide
crowd."*
Anyway, the UNIX folk had to decide which eveils the DOS folks were pushes
they could live with and a compromise was the creation of the new "rb"
"wb"
"ab" crud in the fopen(3) call since we you use "r", "w",
"b" as because
POSIX allows UNIX to open a what ANSI C called a "text file" since there no
such thing under POSIX.
But in the ANSI C spec (which, remember, is >>only<< for C source code), it
says:
5.2.1 Character sets
.... The representation of each member of the source and execution basic
character sets shall fit in a byte.
In both the source and execution basic character sets, the value of each
character after 0 in the above
list of decimal digits shall be one greater than the value of the previous. In
source files, there shall
be some way of indicating the end of each line of text; this International
Standard treats such an
end-of-line indicator as if it were a single new-line character. In the
basic execution character set,
there shall be control characters representing alert, backspace, carriage
return, and new line. If any
other characters are encountered in a source file (except in an identifier,
a character constant, a string
literal, a header name, a comment, or a preprocessing token that is never
converted to a token), the
behavior is undefined.
The key point in the highlighted text in ANSI C is that it does not *require
it*. A problem in the ANSI C standard is that does >>use<< the term
"text file" although ANSI C never really defines what one is (and probably
could not because of the fight - UNIX would have said -- nope -- no
structure required. ANSI C does say in:
7.21.2 Streams
...
Environmental limits
An implementation shall support text files with lines containing at least
254 characters, including
the terminating new-line character. The value of the macro BUFSIZ shall be
at least 256.
The whole compromise was that you did not have to use termination of type.
If the concept of a 'textfile" is local to the implementation. UNIX folks
get to keep doing things as they did before, so if you used fopen with "r"
or "w" its up to the program to worry about the format of the file "access
modes" from the 1960s were not needed.
ᐧ