On Wed, Mar 8, 2023 at 8:22 PM John Cowan <cowan(a)ccil.org> wrote:
On Wed, Mar 8, 2023 at 2:53 PM Dan Cross
<crossd(a)gmail.com> wrote:
But the
3090 was really more like a distributed system than the Athlon box
was, with all sorts of offload capabilities. For that matter, a
thousand users probably _could_ telnet into the Athlon system. With
telnet in line mode, it'd probably even be decently responsive.
I find that difficult to believe. It seems too high by an order of magnitude.
I'm not going to claim it would be zippy, but I do think it would work
acceptably.
Suppose that 1000 users telnet'ed into the x86 machine, but remained
essentially idle; what resources would that consume? We'd have 1000
open TCP connections, a thousand shell processes, a thousand
telnetd's, etc. All of that would consume some amount of RAM (though
there'd be a lot of sharing of text and read-only data and so on),
some VM space requiring RAM for paging structures and so on, some
accounting data in the kernel, 1000 pseudo-ttys allocated, entries in
the process table, etc. But, most of those shells would spend most of
their time blocked waiting on input, so wouldn't consume CPU
continuously, and similarly with the TCP connections mostly idle, the
kernel is not generally wasting a lot of processor time on the login
sessions. There'd be some bookkeeping data on disk, but that would be
small. System overhead would amount to maybe a few megabytes, I'd
imagine.
If all of those users ran telnet in line mode, then the system isn't
getting pounded with interrupts all the time, even if they're
executing commands (the per-character overhead would be absorbed by
the client).
I don't think I have a machine of quite the Athlon vintage, but I _do_
have a machine with a Ryzen processor that's a couple of years old
down in my basement. As an experiment, I wrote a little "expect"
script to login to that machine a thousand times, doing so
recursively: that is, the script starts off ssh'ing into the machine,
and then in that session, logs in again, and so on, a thousand times,
before finally going interactive. I used encryption, public-key
authentication, and compression, and bounced through a "jump host" for
each session, ensuring that I'm using the network for each login. The
effect here is that typing into the final shell sort of simulates 1000
users typing simultaneously, complete with all the glorious interrupt
and scheduler overhead that implies.
Response time in that connection is not bad; certainly on par with the
3090 I used for a while in the early 90s. If I login in another
window, it doesn't even register that there are a thousand "users"
logged in, even if I'm running something chatty in the "thousand
users" window.
By contrast, the mainframe required a tremendous amount of offload
support to shield the CPU from all of that bursty user activity. They
made user actions look like block transfers, thus amortizing (much) of
the overhead of interactivity. With the same load, the mainframe is
storing some state data in memory regarding which users are logged in
or connected or dialed or whatever, but the situation isn't that much
different than mostly-idle telnet connections in line-mode: save that
it's even more favorable to the mainframe in that much of the
interaction is per-screen of data, as opposed to per-line.
The difference in interactivity and offload is why I think the
comparison is poor. If the mainframe handled user sessions the same
way the x86 machine handled telnet logins, I imagine it would be
swamped way worse than the AMD machine (or whatever it was that person
was writing about 10 or 15 years ago). Perhaps a better comparison
would be to a web server that was accepting HTTP requests from 1000
different clients. I'm quite sure that x86 machines of the Athlon era
could cope with that load.
Another thing that doesn't get mentioned much is
that classic mainframes had SRAM, so their memory bandwidth was enormous.
I suspect this has less of a difference than one would hope when
comparing against a modern machine.
The specific comparison in this case was against an IBM 3090-600J. It
appears to use SRAM for cache ("high speed buffer" in IBM-speak), but
seems to use DRAM for central and expanded storage. In this reference
I found on bitsavers, they make a big deal about their "one million
bit memory chip", but that's DRAM
(
http://www.bitsavers.org/pdf/ibm/3090/G580-1005-0_The_IBM_3090_Processor_Fa…;
see "IBM Advances the Technology" on page 10).
Moreover, that machine supported up to 6 CPUs running at a clock rate
of 69 MHz. That same reference says they could bring cycle times down
to 17.2ns using ECL chips; DDR2 can match that. My Mac Studio blows it
out of the water.
For systems older than the 3090, I'm not sure that the SRAM difference
matters much at all: those machines had tiny memories compared to even
modern cell phones, and their CPUs and buses were pitifully slow. Even
if they had more RAM bandwidth than machines now (which I do not think
is really true), they couldn't use it. Indeed, I suspect their total
memory sizes were smaller than L3 cache (which is SRAM) on modern
machines.
- Dan C.