Paul,
Noel did a great job of explaining the C calling conventions. I'm going to see if I can add a little color, mostly to help explain why.
FWIW: Page 115 of the 1979 PDP-11 Processor Handbook has a nice picture and explanation, which I have attached (but I'm not sure if it will pass through the mailing list filters):
A simple way to think about it is that in C, with an HW-based stack usually r6 (not all systems had an HW-based sp), the >>caller<< maintains the save area. The sp is always pointing to a place that it can write. Thus no need to pop the last item off the stack (i.e. first pushed - which in this case means overwriting the top of the stack) since it will also be overwritten when the sp is used later.
In your example [I added the LXX and LYY for later explanation]:
L2:mov $L4,(sp)
jsr pc,*$_printf
mov $1,_a
mov $2,_b
mov $3,_c
LXX:mov _c,(sp)
mov _b,-(sp)
mov _a,-(sp)
mov $L5,-(sp)
LYY:jsr pc,*$_printf
add $6,sp
mov $L6,(sp)
jsr pc,*$_printf
L3:jmp cret
So, to map it to the DEC example... the first mov $L4,(sp) at label L2 is putting 'mmmmmm' on the stack, then the immediately following jsr puts the return [which uses an implied pre-decrement, before it writes the stack].
After printf returns (RTS instruction in printf), the sp is back pointing to the same 'top' as it was before the call, so no need to further mess with sp, caller has done nothing other than modify the top of the stack [mov $L4,(sp)]. But then starting at LXX we see 4 mov instructions in row, 3 which modify the sp. This means the caller has 'pushed 6 bytes to the top of stack [downward on a PDP-11 -- via the last three mov's to the stack with the pre-decrement push's], so the caller needs to clean up those 6 bytes that it pushed, with the add the follows the jsr. Note that the next printf's argument is placed on the top of the stack but since not other args are pushed (as with the first call), there is not need to clean up the sp when we return.
Clem
BTW: If you look at the processors, like the IBM S/360 which lacks an HW stack, and other languages (say Fortran), the 'push down save area' is maintained by the callee. There is a different convention on where to find parameters for those machines and languages. It's interesting that if you talk to the designers of same (which I have in a number of cases) the 8 and 16 bits microprocessors (e.g. 8080/M6800/M6502, 8086/M68000) were most heavily influenced by the S/360 and the PDP-11. It's interesting what features they took from each. Having programmed in both systems (including assembler in both), the PDP-11 stack scheme is much more natural to me personally, but I did spend a number of years in an IBM shop hacking TSS/360 in assembler before I ever saw UNIX.