On Fri, May 15, 2020 at 8:58 PM Brantley Coile <brantley(a)coraid.com> wrote:
I always kept local, single characters in ints. This
avoided the problem
with loading a character being signed or unsigned. The reason for not
specifying is obvious. Today, you can pick the move-byte-into-word
instruction that either sign extends or doesn't. But when C was defined
that wasn't the case. Some machines sign extended when a byte was loaded
into a register and some filled the upper bits with zero. For machines that
filled with zero, a char was unsigned. If you forced the language to do one
or the other, it would be expensive on the opposite kind of machine.
Not only that, but if one used an exactly `char`-width value to hold, er,
character data as returned from `getchar` et al, then one would necessarily
give up the possibility of handling whatever character value was chosen for
the sentinel marking end-of-input stream. `getchar` et al are defined to
return EOF on end of input; if they didn't return a wider type than `char`,
there would be data that could not be read. On probably every machine I am
ever likely to use again in my lifetime, byte value 255 would be -1 as a
signed char, but it is also a perfect valid value for a byte.
The details of whether char is signed or unsigned aside, use of a wider
type is necessary for correctness and ability to completely represent the
input data.
It's one of the things that made C a good choice on a wide variety of
machines.
I guess I always "saw" the return value of the getchar() as being in a int
sized register, at first namely R0, so kept the character values returned
as ints. The actual EOF indication from a read is a return value of zero
for the number of characters read.
That's certainly true. Had C supported multiple return values or some kind
of option type from the outset, it might have been that `getchar`, read,
etc, returned a pair with some useful value (e.g., for `getchar` the value
of the byte read; for `read` a length) and some indication of an
error/EOF/OK value etc. Notably, both Go and Rust support essentially this:
in Go, `io.Read()` returns a `(int, error)` pair, and the error is `io.EOF`
on end-of-input; in Rust, the `read` method of the `Read` trait returns a
`Result<usize, io::Error>`, though a `Result::Ok(n)`, where `n==0`
indicates EOF.
But I'm just making noise because I'm sure everyone knows all this.
I think it's worthwhile stating these things explicitly, sometimes.
- Dan C.
> On May 15, 2020, at 4:18 PM, ron(a)ronnatalie.com wrote:
>
> > EOF is defined to be -1.
> > getchar() returns int, but c is a unsigned char, the value of (c =
> getchar()) will be 255. This will never compare equal to -1.
>
>
>
> > Ron,
>
> > Hmmm... getchar/getc are defined as
returning int in the man page and C
> is traditionally defined as an int in this code..
>
> > On Fri, May 15, 2020 at 4:02 PM
<ron(a)ronnatalie.com> wrote:
> >> Unfortunately, if c is char on a machine with unsigned chars, or it’s
> of type unsigned char, the EOF will never be detected.
> >
> >
> >
> >>> • while ((c = getchar()) != EOF) if (c == '\n') { /*
entire record
> is now there */