that's 28+13 = 41 memory cycles.
...
purely in overhead (counting putting the args on the stack as overhead).
Oh, I missed an instruction for de-stacking the arguments, which was typically
something like 'add #N, sp', so another two instruction word fetches, or 43
cycles.
Ironically, if N=4, the compiler used to emit a 'cmp (sp)+, (sp)+', which is
more efficient space-wise (one word instead of two), but less time-wise
(3 cycles instead of 2).
Noel