Clem Cole wrote in
<CAC20D2O33LZvF2-gn2J8yB_84jk8jfPAdgsvyT476qv54o9U1w(a)mail.gmail.com>:
|On Wed, Aug 14, 2024 at 2:18 PM Steffen Nurpmeso <steffen(a)sdaoden.eu> \
|wrote:
|> Depending on what you mean by "at that point" i think here you
|> misremember. (To the contrary 733 and also 822 allow practically \
|> anything
|>
|Page 3 of SMTPD -- RFC 821 [which the 822 sits on top of says]:
|
|"Commands and Replies are composed of characters from the ASCII character
|set. When a transport service provides an 8-bit byte (octet) transmission
|channel, each 7-bit character is transmitted right justified in an octet
|with the *high order bit cleared to zero*"
Sure, that is the 8-bit thing that i introduced into the
discussion. You all were talking about binary, which means
"lots of non-printable characters".
|BTW, this was specified in 821 because it had been a factor in earlier
|experience of the ARPANET [733 and using FTP as a mail transport, which was
|how much of this all started], and binary was explicitly not allowed. 822
|force 7-bit ASCII because of the earlier issues of things like CDC's
|display code (what a nightmare), much less EBCDIC. The key was that those
|of us in the ARPANET community could not allow "anything," - but we did
|have to detail what was there.
Ok, mails had to be transferred somehow. So without binary
transport no binary storage. (RFC 822 allows any ASCII character,
except that CRLF ends a line.)
|Some of us lived in this world and wrote programs that dealt with those
|constraints at the time. I am relaying to you what it was and how it
|happened. As I said, we have what we have because that was what we were
|living with -- two distinct worlds, the ARPANET community - which was
|setting the standards for interchange, and UNIX (USENET), which was de
|facto and growing because it cost little to join it.
|
|> As a remark, the MBOX format as standardized by POSIX decades ago
|>
|Ouch -- I was part of POSIX and /usr/group before that. And had >>nothing<<
|to do with it. The POSIX definition was at least 15 years After Bruce
|wrote MH and Kurt did delivermail and if I count I >>suspect<< it is
|correctly closer to 20-25. mbox was created I believe by Ken (Doug do you
|remember?), but I'm not sure who actually wrote the original "mail"
program
|for Fifth edition (maybe Fourth) - which as I said was both an MUI and a
|MTA. But that development was over 15 yrs before we started the
|/usr/group standard, much less the POSIX ones.
That all is very interesting. For now i always presumed s2/mail.c
was mostly written by Ken Thompson? Yes, Research V5 already did
some sort of From_, and ensured a blank line when doing concat()
onto the target users "MBOX" file.
|You are probably correct that until it was formally specified in the POSIX
|definition, the format was defined originally in Ken's code, then Kurt's
|and finally in Eric's. IIRC, Bruce actually had a man page in MH that
|described his format that used the ^A characters, but I would have to
|rummage through old sources to be 100% certain. Certainly, later
|distributions of it did describe it, and MMDF may have also - but I'm not
|sure.
The difference, and what i meant, is that POSIX says "one or more
header lines". Compare this to Kurt Shoen's ishead() from 1978,
where he only tests a single line by itself. (But "From " plus
date plus user of maximally 17 bytes is still better than five
times ^A i would think, except that consecutive ^A are rare.)
So i would think the problem stems from the fact that in early
[mM]ails there was no separating newline in between the "From "
line and message text *unless* there was some header to put:
fprintf(fout, "From %s %s", myname, date);
puthead(hp, fout);
while ((c = getc(fo)) != EOF)
putc(c, fout);
->
puthead(hp, fo)
struct header *hp;
FILE *fo;
{
if (hp->h_to != NOSTR)
fprintf(fo, "To: %s\n", hp->h_to);
if (hp->h_subj != NOSTR)
fprintf(fo, "Subj: %s\n", hp->h_subj);
if (hp->h_cc != NOSTR)
fprintf(fo, "Cc: %s\n", hp->h_cc);
if (hp->h_to != NOSTR || hp->h_subj != NOSTR || hp->h_cc != NOSTR)
putc('\n', fo);
return(0);
}
So there *could* be no separating empty line after the
(so-called) From_ line. This actually changed with POSIX, and it
took more than fourty years until Werner Fink pointed to the fact
that this "one ore more header lines" can be taken into account
when doing From_ line detection in a MBOX file.
Or 31 years, if it was POSIX 1988 which brought that rule.
For a later born one
- it is de facto hard to understand why noone cared for some sort
of content encoding for neither of SMTP nor text messages.
I mean, if one reads the early RFCs, before 1975, say, it was
all unbelievable "direct", trial and error, discovery, etc.
But, take for example RFC 698 from July 1975, "TELNET EXTENDED
ASCII OPTION", which says
Several sites[.] for example MIT-AI, use keyboards which use
almost all 128 characters as printable characters, and use one
or more additional bits as "control' bits as command modifiers
[.] several characters cannot be entered as text because they
are used for control purposes, such as the greek letter "beta'
which on a TELNET connection is CONTROL-C and is used for
stopping ones job.
Ie "control by external player" was not only the usual thing,
but it was recognized as causing problems.
It is fascinating to read Postel's RFC 767 from 1980, "A
Structured Format for Transmission of Multi-Media Documents".
To the contrary it took until the early 90s until it was no
longer expected that receivers "know" (like RFC 767, which
assumes users know how to interpret data formats to the odd
bit) by converting data to harmless text "garbage" which needs
dedicated and declared support on the receiver side to become
interpreted.
And i hope IETF's new thing SML will not bring back such.
- Why the 8-bit problem, at all? Already RFC 354 says:
The transfer byte size must be 8 bits.
It is really amazing that in hindsight to data formats like RFC
767 encoding for (the) local storage (that) MBOX (is) was not
worth an RFC. (Until MIME came which *i* use to circumvent any
possible problem; it must be said that especially in the OSS world
people *explicitly* do not do this, so that messages with >From
quoting, aka *without* MIME, can still be seen.)
|BTW: by about 4.2BSD time (maybe a little earlier), particularly because of
|sendmail - the MH system (which had left Rand and was then being supported
|by someone else ??UC Irvine maybe??) had been hacked to handle the mbox
|format
These days anyone runs away from MBOX. Even the dovecot IMAP(++)
server no longer uses it (by default) i think, since not too long
ago. I think they all went the Maildir way, which also stores
one message per file. I love MBOX (with MIME encoded messages
etc), however i -- for the MUA i maintain -- still have the way to
go they all have passed, and that is external index files etc.
Netscape's mailer had that thirty years ago...
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)