COFF February 2022

coff@tuhs.org

11 participants
8 discussions

Tail-recursion was Re: [TUHS] Lorinda Cherry

by Will Senn

My all time favorite presentation on tail-recursion: https://www.youtube.com/watch?v=-PX0BV9hGZY On 2/22/22 4:39 AM, Ralph Corderoy wrote: > Hi Otto, > >> MacOS uses the GNU implementation which has a long standing issue with >> deep recursion. It even cannot handle the tail recursive calls used >> here and will run out of its stack. > When learning dc and seeing it relied on tail calls, the first thing > I did was check it did tail-call elimination, and it did. That was > GNU dc. > > Trying just now, I see no growth in memory usage despite heavy CPU load > shown by TIME increasing. > > $ dc > !ps u `pidof dc` > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > ralph 11489 0.0 0.0 2332 1484 pts/1 S+ 10:33 0:00 dc > [lmx]smlmx > ^C > Interrupt! > !ps u `pidof dc` > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > ralph 11489 75.5 0.0 2332 1488 pts/1 S+ 10:33 0:46 dc > > The memory used remained at that level during the macro execution too, > watched from outside. > > Do you have more detail on what GNU dc can't handle? dc without > tail-call elimination is a bit crippled. >

3 years, 5 months

[TUHS] Lorinda Cherry (fwd)

by Dave Horsfall

Sad news... -- Dave ---------- Forwarded message ---------- Date: Tue, 15 Feb 2022 16:17:04 -0500 From: John P. Linderman <jpl.jpl(a)gmail.com> To: The Eunuchs Hysterical Society <tuhs(a)tuhs.org> Subject: [TUHS] Lorinda Cherry I got his from a friend today (15 February): =========== I'm sorry to report that Lorinda passed away a few days ago. I got a call from her sister today. Apparently the dog walker hadn't seen her for a few days and called the police. The police entered the house and found her there. Her sister says they are assuming either a heart attack or a stroke.

3 years, 5 months

Re: [COFF] [TUHS] more about Brian... [really Rust]

by Dan Cross

[TUHS to Bcc, +COFF <coff(a)minnie.tuhs.org> ] This isn't exactly COFF material, but I don't know what list is more appropriate. On Thu, Feb 3, 2022 at 9:41 PM Jon Steinhart <jon(a)fourwinds.com> wrote: > Adam Thornton writes: > > Do the august personages on this list have opinions about Rust? > > People who generally have tastes consonant with mine tell me I'd like > Rust. > > Well, I'm not an august personage and am not a Rust programmer. I did > spend a while trying to learn rust a while ago and wasn't impressed. > > Now, I'm heavily biased in that I think that it doesn't add value to keep > inventing new languages to do the same old things, and I didn't see > anything > in Rust that I couldn't do in a myriad of other languages. > I'm a Rust programmer, mostly using it for bare-metal kernel programming (though in my current gig, I find myself mostly in Rust userspace...ironically, it's back to C for the kernel). That said, I'm not a fan-boy for the language: it's not perfect. I've written basically four kernels in Rust now, to varying degrees of complexity from, "turn the computer on, spit hello-world out of the UART, and halt" to most of a v6 clone (which I really need to get around to finishing) to two rather more complex ones. I've done one ersatz kernel in C, and worked on a bunch more in C over the years. Between the two languages, I'd pick Rust over C for similar projects. Why? Because it really doesn't just do the same old things: it adds new stuff. Honest! Further, the sad reality (and the tie-in with TUHS/COFF) is that modern C has strayed far from its roots as a vehicle for systems programming, in particular, for implementing operating system kernels ( https://arxiv.org/pdf/2201.07845.pdf) C _implementations_ target the abstract machine defined in the C standard, not hardware, and they use "undefined behavior" as an excuse to make aggressive optimizations that change the semantics of one's program in such a way that some of the tricks you really do have to do when implementing an OS are just not easily done. For example, consider this code: uint16_t mul(uint16_t a, uint16_t b) { return a * b; } Does that code ever exhibit undefined behavior? The answer is that "it depends, but on most platforms, yes." Why? Because most often uint16_t is a typedef for `unsigned short int`, and because `short int` is of lesser "rank" than `int` and usually not as wide, the "usual arithmetic conversions" will apply before the multiplication. This means that the unsigned shorts will be converted to (signed) int. But on many platforms `int` will be a 32-bit integer (even 64-bit platforms!). However, the range of an unsigned 16-bit integer is such that the product of two uint16_t's can include values whose product is larger than whatever is representable in a signed 32-bit int, leading to overflow, and signed integer overflow is undefined overflow is undefined behavior. But does that _matter_ in practice? Potentially: since signed int overflow is UB, the compiler can decide it would never happen. And so if the compiler decides, for whatever reason, that (say) a saturating multiplication is the best way to implement that multiplication, then that simple single-expression function will yield results that (I'm pretty sure...) the programmer did not anticipate for some subset of inputs. How do you fix this? uint16_t mul(uint16_t a, uint16_t b) { unsigned int aa = a, bb = b; return aa * bb; } That may sound very hypothetical, but similar things have shown up in the wild: https://people.csail.mit.edu/nickolai/papers/wang-undef-2012-08-21.pdf In practice, this one is unlikely. But it's not impossible: the compiler would be right, the programmer would be wrong. One thing I've realized about C is that successive generations of compilers have tightened the noose on UB so that code that has worked for *years* all of a sudden breaks one day. There be dragons in our code. After being bit one too many times by such issues in C I decided to investigate alternatives. The choices at the time were either Rust or Go: for the latter, one gets a nice, relatively simple language, but a big complex runtime. For the former, you get a big, ugly language, but a minimal runtime akin to C: to get it going, you really don't have to do much more than set up a stack and join to a function. While people have built systems running Go at the kernel level ( https://pdos.csail.mit.edu/papers/biscuit.pdf) that seemed like a pretty heavy lift. On the other hand, if Rust could deliver on a quarter of the promises it made, I'd be ahead of the game. That was sometime in the latter half of 2018 and since then I've generally been pleasantly surprised at how much it really does deliver. For the above example, integer overflow is defined to trap. If you want wrapping (or saturating!) semantics, you request those explicitly: fn mul(a: u16, b: u16) -> u16 { a.wrapping_mul(b) } This is perfectly well-defined, and guaranteed to work pretty much forever. But, my real issue came from some of the tutorials that I perused. Rust is > being sold as "safer". As near as I can tell from the tutorials, the model > is that nothing works unless you enable it. Want to be able to write a > variable? Turn that on. So it seemed like the general style was to write > code and then turn various things on until it ran. > That's one way to look at it, but I don't think that's the intent: the model is rather, "immutable by default." Rust forces you to think about mutability, ownership, and the semantics of taking references, because the compiler enforces invariants on all of those things in a way that pretty much no other language does. It is opinionated, and not shy about sharing those opinions. To me, this implies a mindset that programming errors are more important > than thinking errors, and that one should hack on things until they work > instead of thinking about what one is doing. I know that that's the > modern definition of programming, but will never be for me. It's funny, I've had the exact opposite experience. I have found that it actually forces you to invest a _lot_ more in-up front thought about what you're doing. Writing code first, and then sprinkling in `mut` and `unsafe` until it compiles is a symptom of writing what we called "crust" on my last project at Google: that is, "C in Rust syntax." When I convinced our team to switch from C(++) to Rust, but none of us were really particularly adept at the language, and all hit similar walls of frustration; at one point, an engineer quipped, "this language has a near-vertical learning curve." And it's true that we took a multi-week productivity hit, but once we reached a certain level of familiarity, something equally curious happened: our debugging load went way, _way_ down and we started moving much faster. It turned out it was harder to get a Rust program to build at first, particularly with the bad habits we'd built up over decades of whatever languages we came from, but once it did those programs very often ran correctly the first time. You had to think _really hard_ about what data structures to use, their ownership semantics, their visibility, locking, etc. A lot of us had to absorb an emotional gut punch when the compiler showed us things that we _knew_ were correct were, in fact, not correct. But once code compiled, it tended not to have the kinds of errors that were insta-panics or triple faults (or worse, silent corruption you only noticed a million instructions later): no dangling pointers, no use-after-free bugs, no data races, no integer overflow, no out-of-bounds array references, etc. Simply put, the language _forced_ a level of discipline on us that even veteran C programmers didn't have. It also let us program at a moderately higher level of abstraction; off-by-one errors were gone because we had things like iterators. ADTs and a "Maybe" monad (the `Result<T,E>` type) greatly improved our error handling. `match` statements have to be exhaustive so you can't add a variant to an enum and forget to update code to account in just that one place (the compiler squawks at you). It's a small point, but the `?` operator removed a lot of tedious boilerplate from our code, making things clearer without sacrificing robust failure handling. Tuples for multiple return values instead of using pointers for output arguments (that have to be manually checked for validity!) are really useful. Pattern matching and destructuring in a fast systems language? Good to go. In contrast, I ran into a "bug" of sorts with KVM due to code I wrote that manifested itself as an "x86 emulation error" when it was anything but: I was turning on paging very early in boot, and I had manually set up an identity mapping for the low 4GiB of address space for the jump from 32-bit to 64-bit mode. I used gigabyte pages since it was easy, and I figured it would be supported, but I foolishly didn't check the CPU features when running this under virtualization for testing and got that weird KVM error. What was going on? It turned out KVM in this case didn't support gig pages, but the hardware did; the software worked just fine until the first time the kernel went to do IO. Then, when the hypervisor went to fetch the instruction bytes to emulate the IO instruction, it saw the gig-sized pages and errored. Since the incompatibility was manifest deep in the bowels of the instruction emulation code, that was the error that returned, even though it had nothing to do with instruction emulation. It would have been nice to plumb through some kind of meaningful error message, but in C that's annoying at best. In Rust, it's trivial. https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ 70% of CVEs out of Microsoft over the last 15 years have been memory safety issues, and while we may poo-poo MSFT, they've got some really great engineers and let's be honest: Unix and Linux aren't that much better in this department. Our best and brightest C programmers continue to turn out highly buggy programs despite 50 years of experience. But it's not perfect. The allocator interface was a pain (it's defined to panic on allocation failure; I'm cool with a NULL return), though work is ongoing in this area. There's no ergonomic way to initialize an object 'in-place' (https://mcyoung.xyz/2021/04/26/move-ctors/) and there's no great way to say, essentially, "this points at RAM; even though I haven't initialized it, just trust me don't poison it" ( https://users.rust-lang.org/t/is-it-possible-to-read-uninitialized-memory-w… -- we really need a `freeze` operation). However, right now? I think it sits at a local maxima for systems languages targeting bare-metal. - Dan C.

3 years, 6 months

Re: [COFF] [TUHS] more about Brian...

by Dan Cross

Oh dear. This is getting a little heated. TUHS to Bcc:, replies to COFF. On Sun, Feb 6, 2022 at 8:15 AM Ed Carp <erc(a)pobox.com> wrote: > Since you made this personal and called me out specifically, I will > respond: > > "In what way is automatic memory management harder, more unsafe, and > less robust than hand-written memory management using malloc and > free?" > > Because there's no difference in the two. Someone had to write the > "automatic memory management", right? > I cannot agree with this, there is a big difference. With GC, you are funneling all of the fiddly bits of dealing with memory management through a runtime that is written by a very small pool of people who are intimately familiar with the language, the runtime, the compilation environment, and so on. That group of subject matter experts produce a system that is tested by every application (much like the _implementation_ of malloc/free itself, which is not usually reproduced by every programmer who _uses_ malloc/free). It's like in "pure" functional languages such as Haskell, where everything is immutable: that doesn't mean that registers don't change values, or that memory cells don't get updated, or that IO doesn't happen, or the clock doesn't tick. Rather, it means that the programmer makes a tradeoff where they cede control over those things to the compiler and a runtime written by a constrained set of contributors, in exchange for guarantees those things make about the behavior of the program. With manual malloc/free, one smears responsibility for getting it right across every program that does dynamic memory management. Some get it right; many do not. In many ways, the difference between automatic and manual memory management is like the difference between programming in assembler and programming in a high-level language. People have written reliable, robust assembler for decades (look at the airline industry), but few people would choose to do so today; why? Because it's tedious and life is too short as it is. Further, the probability of error is greater than in a high-level language; why tempt fate? [snip] > "This discussion should probably go to COFF, or perhaps I should just > leave the list. I am starting to feel uncomfortable here. Too much > swagger." > > I read through the thread. Just because people don't agree with each > other doesn't equate to "swagger". I've seen little evidence of > anything other than reasoned analysis and rational, respectful > discussion. Was there any sort of personal attacks that I missed? > It is very difficult, in a forum like this, to divine intent. I know for a fact that I've written things to this list that were interpreted very differently than I meant them. That said, there has definitely been an air that those who do not master manual memory management are just being lazy and that "new" programmers are unskilled. Asserting that this language or that is "ours" due to its authors while that is "theirs" or belongs solely to some corporate sponsor is a bit much. The reality is that languages and operating systems and hardware evolve over time, and a lot of the practices we took for granted 10 years ago deserve reexamination in the light of new context. There's nothing _wrong_ with that, even if it may be uncomfortable (I know it is for me). The fact of the matter is, code written with malloc/free, if written > carefully, will run for *years*. There are Linux boxes that have been > running for literally years without being rebooted, and mainframes and > miniframes that get booted only when a piece of hardware fails. > That there exist C programs that have run for many years without faults is indisputable. Empirically, people _can_ write reliable C programs, but it is often harder than it seems to do so, particularly since the language standard gives so much latitude for implementations to change semantics in surprising ways over time. Just in the past couple of weeks a flaw was revealed in some Linux daemon that allowed privilege escalation to root...due to improper memory management. That flaw had been in production for _12 years_. Sadly, this is not an isolated incident. That said, does manual memory management have a place in modern computing? Of course it does, as you rightly point out. So does assembly language. Rust came up in the context of this thread as a GC'd language, and it may be worth mentioning that Rust uses manual memory management; the language just introduces some facilities that make this safer. For instance, the concept of ownership is elevated to first-class status in Rust, and there are rules about taking references to things; when something's owner goes out of scope, it is "dropped", but the compiler statically enforces that there are no outstanding references to that thing. Regardless, when dealing with some resource it is often the programmer's responsibility to make sure that a suitable drop implementation exists. FWIW, I used to sit down the hall from a large subgroup of the Go developers; we usually ate lunch together. I know that many of them shared my opinion that Rust and Go are very complimentary. No one tool is right for all tasks. - Dan C.

3 years, 6 months

SICP [Was: Re: [TUHS] ratfor vibe] (moved to COFF)

by Nemo Nusquam

Replying on COFF as firmly in COFF territory. On 2022-02-01 16:50, Win Treese wrote: >> On Feb 1, 2022, at 1:19 PM, Noel Chiappa <jnc(a)mercury.lcs.mit.edu> wrote: >> >> From: Clem Cole >>> So by the late 70s/early 80s, [except for MIT where LISP/Scheme reigned] >> Not quite. The picture is complicated, because outside the EECS department, >> they all did their own thing - e.g. in the mid-70's I took a programming >> intro couse in the Civil Engineering department which used Fortran. But in >> EECS, in the mid-70's, their intro programming course used assembler >> (PDP-11), Algol, and LISP - very roughly, a third of the time in each. Later >> on, I think it used CLU (hey, that was MIT-grown :-). I think Scheme was used >> later. In both of these cases, I have no idea if it was _only_ CLU/Scheme, or >> if they did part of it in other languages. > I took 6.001 (with Scheme) in the spring of 1983, which was using a course > handout version of what became Structure and Interpretation of Computer > Programs by Sussman and Abelson. My impression was that it had been > around for a year before that, but not much more, and it was part of > revamping the EECS core curriculum at the time. I recall that one of the SICP authors wrote an interesting summary of 6.001 (with Scheme) but I cannot find it. Incidentally, SICP with Javascript will be released next year: https://mitpress.mit.edu/books/structure-and-interpretation-computer-progra… N. > In at least the early 80s, CLU was used in 6.170, Software Engineering > Laboratory, in which a big project was writing a compiler. > > And Fortran was still being taught for the other engineering departments. > In 1982(ish), those departments had the Joint Computing Facility for a lot > of their computing, of which the star then was a new VAX 11/782. > > - Win >

3 years, 6 months

Re: [COFF] Compilation "vs" byte-code interpretation, was Re: Looking back to 1981 - what pascal was popular on what unix?

by jnc＠mercury.lcs.mit.edu

> From: Peter Jeremy > all (AFAIK) PDP-11's were microcoded Not the -11/20; it pre-dated the fast, cheap ROMs needed to go the micocode route, so it used a state machine. All the others were, though (well, I don't know about the Mentec ones). Noel

3 years, 6 months

Re: [COFF] Compilation "vs" byte-code interpretation, was Re: Looking back to 1981 - what pascal was popular on what unix?

by Peter Jeremy

=> COFF On 2022-Jan-30 10:07:15 -0800, Dan Stromberg <drsalists(a)gmail.com> wrote: >On Sun, Jan 30, 2022 at 8:58 AM David Barto <david(a)kdbarto.org> wrote: > >> Yes, the UCSD P-code interpreter was ported to 4.1 BSD on the VAX and it >> ran natively there. I used it on sdcsvax in my senior year (1980). > >This reminds me of a question I've had percolating in the back of my mind. > >Was USCD Pascal "compiled" or "interpreted" or both? > >And is Java? They both have a byte code interpreter. A bit late to the party but my 2¢: I think it's fairly clear that both UCSD Pascal and Java are compiled - to binary machine code for a p-code machine or JVM respectively. That's no different to compiling (eg) C to PDP-11 or amd64 binary machine code. As for how the machine code is executed: * p-code was typically interpreted but (as mentioned elsewhere) there were a number of hardware implementions. * Java bytecode is often executed using a mixture of interpretation and (JIT) compilation to the host's machine code. Again there are a number of hardware implementations. And looking the other way, all (AFAIK) PDP-11's were microcoded, therefore you could equally well say that PDP-11 machine code is being interpreted by the microcode on a "real" PDP-11. And, nowadays, PDP-11 machine code is probably more commonly interpreted using something like simh than being run on a hardware PDP-11. Typical amd64 implementations are murkier - with machine code being further converted ("compiled"?) into a variable number of micro-ops that have their own caches and are then executed on the actual CPU. (And, going back in time, the Transmeta Crusoe explicity did JIT conversion from iA32 machine code to its own proprietary machine code). -- Peter Jeremy

3 years, 6 months

Re: [COFF] [TUHS] Compilation "vs" byte-code interpretation, was Re: Looking back to 1981 - what pascal was popular on what unix?

by Adam Thornton

On Mon, Jan 31, 2022 at 10:17 AM Paul Winalski <paul.winalski(a)gmail.com> wrote: > On 1/30/22, Steve Nickolas <usotsuki(a)buric.co> wrote: > > And I think I've heard the Infocom compilers' bytecode called "Z-code" (I > > use this term too). > That is correct. The Infocom games ran on an interpreter for an > abstract machine called the Z-machine. Z-code is the Z-machine's > instruction set. There is a freeware implementation out there called > Frotz. > > There's a reasonably functional Frotz implementation for TOPS-20, as it happens. The ZIP interpreter was easier to port to 2.11BSD on the PDP-11. https://github.com/athornton/tops20-frotz Adam

3 years, 6 months

2025

2024

2023

2022

2021

2020

2019

2018

COFF February 2022