Message ID | 4968818.GXAFRqVoOG@rjwysocki.net (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | [v1] kexec_core: Add and update comments regarding the KEXEC_JUMP flow | expand |
On Mon, 2024-12-16 at 14:39 +0100, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > The KEXEC_JUMP flow is analogous to hibernation flows occurring > before > and after creating an image and before and after jumping from the > restore kernel to the image one, which is why it uses the same device > callbacks as those hibernation flows. > > Add comments explaining that to the code in question and update an > existing comment in it which appears a bit out of context. > > No functional changes. > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Thanks. I'll round that up into my kexec-debug tree, which Ingo has been taking into tip/x86/boot. Once I'm done fighting with objtool(qv).
David Woodhouse <dwmw2@infradead.org> writes: > On Mon, 2024-12-16 at 14:39 +0100, Rafael J. Wysocki wrote: >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> >> The KEXEC_JUMP flow is analogous to hibernation flows occurring >> before >> and after creating an image and before and after jumping from the >> restore kernel to the image one, which is why it uses the same device >> callbacks as those hibernation flows. >> >> Add comments explaining that to the code in question and update an >> existing comment in it which appears a bit out of context. >> >> No functional changes. >> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Thanks. I'll round that up into my kexec-debug tree, which Ingo has > been taking into tip/x86/boot. Once I'm done fighting with > objtool(qv). I have no objection to getting kexec jump more in sync with the rest of the power management code. I do have a question though. Does anyone actually use kexec jump? It is fine if folks do, but I haven't actually heard of anyone using it. If folks aren't using it, it might make sense to just use the fact that it is broken as a nudge to remove that option. Eric
On 16 December 2024 16:00:00 GMT, "Eric W. Biederman" <ebiederm@xmission.com> wrote: >David Woodhouse <dwmw2@infradead.org> writes: > >> On Mon, 2024-12-16 at 14:39 +0100, Rafael J. Wysocki wrote: >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>> >>> The KEXEC_JUMP flow is analogous to hibernation flows occurring >>> before >>> and after creating an image and before and after jumping from the >>> restore kernel to the image one, which is why it uses the same device >>> callbacks as those hibernation flows. >>> >>> Add comments explaining that to the code in question and update an >>> existing comment in it which appears a bit out of context. >>> >>> No functional changes. >>> >>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> >> Thanks. I'll round that up into my kexec-debug tree, which Ingo has >> been taking into tip/x86/boot. Once I'm done fighting with >> objtool(qv). > >I have no objection to getting kexec jump more in sync with the >rest of the power management code. > >I do have a question though. Does anyone actually use kexec jump? > >It is fine if folks do, but I haven't actually heard of anyone using >it. If folks aren't using it, it might make sense to just use the fact >that it is broken as a nudge to remove that option. > >Eric > > > It isn't broken. I know of it being used a few million times a week. There are corner cases which have never worked right, like the callee putting a different return address for its next invocation, on the stack *above* the address it 'ret's to. Which since the first kjump patch has been the first word of the page *after* the swap page (and is now fixed in my tree). But fundamentally it *does* work. I only started messing with it because I was working on relocate_kernel() and needed to write a test case for it; the fact that I know of it being used in production is actually just a coincidence.
David Woodhouse <dwmw2@infradead.org> writes: > It isn't broken. I know of it being used a few million times a week. > > There are corner cases which have never worked right, like the callee > putting a different return address for its next invocation, on the > stack *above* the address it 'ret's to. Which since the first kjump > patch has been the first word of the page *after* the swap page (and > is now fixed in my tree). But fundamentally it *does* work. > > I only started messing with it because I was working on > relocate_kernel() and needed to write a test case for it; the fact > that I know of it being used in production is actually just a > coincidence. Cool. I had the sense that the original developer never got around to using it, so I figured I should check. Mind if I ask what you know of it being used for? I had imagined it might be a way to call firmware code preventing the need to code of a specific interface for each type of firmware. Eric
--- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1001,6 +1001,12 @@ #ifdef CONFIG_KEXEC_JUMP if (kexec_image->preserve_context) { + /* + * This flow is analogous to hibernation flows that occur before + * creating an image and before jumping from the restore kernel + * to the image one, so it uses the same device device callbacks + * as those two flows. + */ pm_prepare_console(); error = freeze_processes(); if (error) { @@ -1011,12 +1017,10 @@ error = dpm_suspend_start(PMSG_FREEZE); if (error) goto Resume_console; - /* At this point, dpm_suspend_start() has been called, - * but *not* dpm_suspend_end(). We *must* call - * dpm_suspend_end() now. Otherwise, drivers for - * some devices (e.g. interrupt controllers) become - * desynchronized with the actual state of the - * hardware at resume time, and evil weirdness ensues. + /* + * dpm_suspend_end() must be called after dpm_suspend_start() + * to complete the transition, like in the hibernation flows + * mentioned above. */ error = dpm_suspend_end(PMSG_FREEZE); if (error) @@ -1052,6 +1056,13 @@ #ifdef CONFIG_KEXEC_JUMP if (kexec_image->preserve_context) { + /* + * This flow is analogous to hibernation flows that occur after + * creating an image and after the image hernel has got control + * back, and in case the devices have been reset or otherwise + * manipulated in the meantime, it uses the device callbacks + * used by the latter. + */ syscore_resume(); Enable_irqs: local_irq_enable();