On Wed, Feb 3, 2021 at 8:34 PM Larry McVoy <lm(a)mcvoy.com> wrote:
I have to admit that I haven't looked at ARM
assembler, the M1 is making
me rethink that. Anyone have an opinion on where ARM lies in the pleasant
to unpleasant scale?
Redirecting to "COFF" as this is drifting away from Unix.
I have a soft spot for ARM, but I wonder if I should. At first blush, it's
a pleasant RISC-ish design: loads and stores for dealing with memory,
arithmetic and logic instructions work on registers and/or immediate
operands, etc. As others have mentioned, there's an inline barrel shifter
in the ALU that a lot of instructions can take advantage of in their second
operand; you can rotate, shift, etc, an immediate or register operand while
executing an instruction: here's code for setting up page table entries for
an identity mapping for the low part of the physical address space (the
root page table pointer is at phys 0x40000000):
MOV r1, #0x0000
MOVT r1, #0x4000
MOV r0, #0
.Lpti: MOV r2, r0, LSL #20
ORR r2, r2, r3
STR r2, [r1], #4
ADD r0, r0, #1
CMP r0, #2048
BNE .Lpti
(Note the `LSL #20` in the `MOV` instruction.)
32-bit ARM also has some niceness for conditionally executing instructions
based on currently set condition codes in the PSW, so you might see
something like:
1: CMP r0, #0
ADDNE r1, r1, #1
SUBNE r0, r0, #1
BNE 1b
The architecture tends to map nicely to C and similar languages (e.g.
Rust). There is a rich set of instructions for various kinds of arithmetic;
for instance, they support saturating instructions for DSP-style code. You
can push multiple registers onto the stack at once, which is a little odd
for a RISC ISA, but works ok in practice.
The supervisor instruction set is pretty nice. IO is memory-mapped, etc.
There's a co-processor interface for working with MMUs and things like it.
Memory mapping is a little weird, in that the first-level page table isn't
the same second-level tables: the first-level page table maps the 32-bit
address space into 1MiB "sections", each of which is described by a 32-bit
section descriptor; thus, to map the entire 4GiB space, you need 4096 of
those in 16KiB of physically contiguous RAM. At the second-level, 4KiB page
frames map page into the 1MiB section at different granularities; I think
the smallest is 1KIB (thus, you need 1024 32-bit entries). To map a 4KiB
virtual page to a 4KiB PFN, you repeat the relevant entry 4 times in the
second-level page. It ends up being kind of annoying. I did a little toy
kernel for ARM32 and ended up deciding to use 16KiB pages (basically, I map
4x4KiB contiguous pages) so I could allocate a single sized structure for
the page tables themselves.
Starting with the ARMv8 architecture, it's been split into 32-bit aarch32
(basically the above) and 64-bit aarch64; the latter has expanded the
number and width of general purpose registers, one is a zero register in
some contexts (and I think a stack pointer in others? I forget the
details). I haven't played around with it too much, but looked at it when
it came out and thought "this is reasonable, with some concessions for
backwards compatibility." They cleaned up the paging weirdness mentioned
above. The multiple push instruction has been retired and replaced with a
"push a pair of adjacent registers" instruction; I viewed that as a
concession between code size and instruction set orthogonality.
So...Overall quite pleasant, and far better than x86_64, but with some
oddities.
- Dan C.