Hi again,
I've recently brought the "prestruct-c" compiler back to "life"
(
https://github.com/TheBrokenPipe/C-Compiler-Dec72) and thought it might be worth
documenting here. One thing I have to say first - it's barely working and probably
never worked to begin with.
There were some efforts in the distant past to revive this compiler; however, the compiled
compiler never worked. The reasons are as follows:
- The compiled executable is too big (exceeds 32K, making pointers effectively negative).
This triggers a bug in the liba I/O routines.
- The compiler assumes an origin of 0 and writes temp data at the NULL pointer. Without an
MMU, this kills the interrupt vectors and possibly the kernel on the 11/20.
- The compiler is missing ALL code/tables written in assembly language. This is pretty
fatal, and internal changes rendered files from the last1120c compiler incompatible.
- Calling convention changes make the s2/last1120c libc library incompatible.
I'm a big fan of the C programming language, and the reason I was so insistent on
getting this compiler to work is that it has a funny struct syntax not seen in any other C
compiler. Structs are defined like:
struct name (
type field;
...
);
... with round brackets (parentheses) instead of curly braces.
Another notable thing introduced in this compiler is that certain things are no longer
lvalues. In the past (B and last1120c), functions, labels, and arrays were lvalues,
meaning they could be assigned. For instance, this code:
func1() { return (1); }
func2() { return (2); }
main() {
printf("func1() = %d\n", func1());
printf("func2() = %d\n", func2());
printf("func1 = func2\n");
func1 = func2;
printf("func1() = %d\n", func1());
}
produces the output:
func1() = 1
func2() = 2
func1 = func2
func1() = 2
This code:
main() {
second = first;
goto second;
first:
printf("first\n");
second:
printf("second\n");
}
produces the output:
first
second
And this code:
main(argc, argv) {
int arr1[10];
int arr2[10];
arr1[0] = 5;
arr2[0] = 8;
printf("arr1[0] = %d\n", arr1[0]);
printf("arr2[0] = %d\n", arr2[0]);
printf("arr1 = arr2\n");
arr1 = arr2;
printf("arr1[0] = %d\n", arr1[0]);
}
produces the output:
arr1[0] = 5
arr2[0] = 8
arr1 = arr2
arr1[0] = 8
Now, the rules of the game have changed with the prestruct-c compiler, and these are no
longer lvalues. I don't know why this change was made, but if I had to guess, speed
was probably the biggest driving factor, with security also playing a role. Anyhow,
they're no longer lvalues, so there's now one less level of indirection
involving functions and labels. This means the codegen tables from last1120c have to be
modified to suit this compiler change.
However, even with it generating the correct code, there is still one fatal problem - the
libc. The libc from s2/last1120c was designed for the older compiler and therefore has one
extra layer of indirection for functions. Luckily, the source code of the libc is
available on the last1120c tape, and it wasn't too much work to remove the
indirection manually.
Okay, what else? Well, this compiler also seems to be the first to introduce the
"modern" pointer syntax. Before this compiler, pointers were declared using the
same syntax as arrays/vectors, like "char name[];". This compiler introduced the
"modern" syntax of "char *name;". No big deal, right? Well, the
compiler itself was written using the old syntax, meaning it cannot compile itself. I
think this indicates that this compiler (or the new syntax) was so unstable that the
production compiler still used the old syntax.
With everything carefully put back into place, I managed to get this to work:
struct foo (
char x;
int y;
char *z;
);
main(argc, argv)
char **argv;
{
struct foo bruh;
bruh.x = 'C';
bruh.y = 123;
bruh.z = "test";
printf("x = '%c', y = %d, z = \"%s\"\n", bruh.x, bruh.y,
bruh.z);
}
However, if I rename the variable "bruh" to something like "bar", it
throws the error "Unimplemented pointer conversion". I have no clue why.
I've also never gotten struct pointers to work - it always complains about
"Illegal structure ref" when I try to use "->". It also seems to
accept "." on structure pointers (and does not actually dereference the pointer
when accessing the members), so something is probably very wrong with referencing and
dereferencing.
Anyway, there are plenty of other issues with the compiler, like the code may not compile
correctly with pointers and switch statements. I'm not sure if the issues are caused
by my poor reconstruction of the assembly tables, or if the compiler itself never worked
properly in the first place (or both). Either way, I've managed to get it to spit out
a correct hello world, as well as the struct test above, so I think I've fulfilled my
goal of seeing this compiler "work".
The code, build instructions and pre-built binaries are here:
https://github.com/TheBrokenPipe/C-Compiler-Dec72
Ideally, it runs under a PDP-11/45 environment with 0 as the origin and generates code for
the PDP-11/45. However, I made it target the 11/20 since I couldn't get the 11/45
toolchain to work, and I haven't implemented 11/45 instructions in my simulator yet.
If anyone wants to pick up the baton and get it working for the 11/45 or fix my bugs, be
my guest!
Sincerely,
Yufeng