On Wed, Sep 18, 2024 at 9:54 PM Bakul Shah <bakul(a)iitbombay.org> wrote:
On Sep 18,
2024, at 5:53 PM, Dan Cross <crossd(a)gmail.com> wrote:
On Wed, Sep 18, 2024 at 8:05 PM Bakul Shah via TUHS <tuhs(a)tuhs.org> wrote:
Can you not avoid resetting the machine? This can
be treated almost as sleep in the old kernel, wakeup in the new one! You do have to reset
devices individually (which may not always work if it requires assistance from some
undocumented firmware).
Perhaps this is what you mean when you mention assistance from
firmware, Bakul, but it may be useful to consider that _many_ devices
are touched by e.g. a BIOS or UEFI or whatever well before the OS is
even loaded.
Right but presumably the old kernel leaves them in a good enough state.
I suspect this is one of the thornier parts of the whole problem. In
some sense, kexec is similar to live migration of a VM: it's certainly
possible to do, but in particular devices have to be quiesced and in a
state where they are ready to migrate; outstanding IOs may cause
problems with synchronization between the source and destination.
Similarly, if the outgoing kernel in a kexec cannot adequately ensure
that device state is going to be (at a minimum) discoverable in the
incoming kernel, you're going to have a bad time. If there's an
outstanding DMA request? Well, good luck, but you're likely going to
have a bad day....
If one steps
back and considers the utility of a BIOS/UEFI (and I
often lump these into the same category), there are three principal
reasons for it: 1) back in the bad old days, we could offload common
IO functions into code stored on a ROM, freeing up precious RAM for
programs. 2) firmware provides a layer of indirection between the
system and the host software, allowing both to vary while continuing
to work with newer versions of the other. And finally 3) firmware
facilitates bootstrapping the system by providing the host some way to
access devices and locate and load an OS image, er, before the OS
image is loaded. SOMETHING has to get enough code loaded from
somewhere to start the system; often times that's firmware.
The new OS image is already in memory but may need to be copied to
the right place. The devices were already working (but may need to
have their interrupts disabled and any DMA stopped etc.).
Yes, sorry, I was trying to explain why firmware is in the loop for
those who may not be familiar.
Anyway, the
last two suggest that device state can be arbitrarily
munged before the OS takes over, and an actual reset at the device
level might wipe out some state the OS depends on. Consider, for
example, programming PCI BARs; on a "modern" x86-64 system with UEFI,
this is done by firmware in the PEI layer, and the OS may expect that
to already be set up by the time it is probing buses. An actual
honest-to-goodness reset will probably wipe the BARs, requiring the
host OS to program them (ironically, many OSes are already equipped to
do so, as they have to handle these cases for e.g. PCI hotplug events,
though many don't do it in the "ordinary" discovery and initialization
phase of boot).
All that is done on powerup.
That's true, but it's non-trivial, and done by opaque firmware that
one has no control over; in particular, it's hard to get the firmware
to cooperate in the kexec protocol.
I suppose the
point is that a reset is great because it really does
wipe out state, but it may also be a bummer because, well, it really
does wipe out state. :-)
:-) I was speculating that kernel to kernel warmboot should be doable.
Oh sorry; I think I misunderstood that and thought you were asking,
"why can't you reset the machine?" Apologies there; my bad. I
absolutely agree that it is doable, and that we have several existence
proofs showing just that.
- Dan C.