When I left Bell Labs in 1986, I joined Ardent Computer in California.
We built a multiprocessor Unix system with up to four processors based
on ECL technology (faster than the computer chips of the time). The CPU
had a standard set of registers, and there were four vector registers
that would hold 1000 floating-point numbers and vector instructions to
do arithmetic.
So that meant that we had compiler work to do. Luckily, Randy Allen had
just graduated and signed on to do the parallelism. I took on the job
of doing the assembler, loader, parallelizing C and FORTRAN compilers,
and I did the lower-level stuff: assembler, loader,
designed the a.out format, and even wrote a bug tracking system.
Randy's compiler was excellent, but there were other problems. The Sun
workstations had some quirks: from time to time they would page in a
page of all zeros due to a timing problem. Unhappily, the zero was the
halt operation! We addressed that by adding code to the Kernel the
verify that no code page was all 0's before executing. AT&T and Sun
and MIPS and all the hardware makers have problems like this with early
chips. One thing I had told the team from the beginning was that we
were going to have to patch hardware problems in the early versions.
The most serious early hardware bug in our machine was that when the
MIPS chip had a page fault, the CPU started executing the new page
before it was all present. It only missed the first two or three
instructions. We settled on a strategy to generate the a.out file so
that the first 4 instructions were all No-Ops. This solved the MIPS
problem.
Now we faced the problem of how do we take a standard a.out format and
redo it so that the first four instructions in each code page are NOPs.
We built an "editor" for a.out files that would read the file in,
respond to a series of requests, relocate the instructions correctly,
and then branch to the line of code that it had been about to execute.
One good thing about this was that when the chip got fixed we would not
have to change any code -- it would just work.
And then we got creative. We could use the "editor" to find the basic
blocks in the code, introduce counting instructions at the head of each
block, and produce a profiler by recompiling. We probably found about
20 things we could do with this mechanism, including optimization after
loading, timing the code without having to recompile everything,
collecting parallelism statistics, etc.
---
On 2022-11-28 05:24, Paul Ruizendaal wrote:
The discussion about the 3B2 triggered another
question in my head:
what were the earliest multi-processor versions of Unix and how did
they relate?
My current understanding is that the earliest one is a dual-CPU VAX
system with a modified 4BSD done at Purdue. This would have been late
1981, early 1982. I think one CPU was acting as master and had
exclusive kernel access, the other CPU would only run user mode code.
Then I understand that Keith Kelleman spent a lot of effort to make
Unix run on the 3B2 in a SMP setup, essentially going through the
source and finding all critical sections and surrounding those with
spinlocks. This would be around 1983, and became part of SVr3. I
suppose that the “spl()” calls only protected critical sections that
were shared between the main thread and interrupt sequences, so that a
manual review was necessary to consider each kernel data structure for
parallel access issues in the case of 2 CPU’s.
Any other notable work in this area prior to 1985?
How was the SMP implementation in SVr3 judged back in its day?
Paul