Message ID | 20190318131155.29450-4-jgross@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xen: simplify suspend/resume handling | expand |
On Mon, 2019-03-18 at 14:11 +0100, Juergen Gross wrote: > --- a/xen/include/xen/cpu.h > +++ b/xen/include/xen/cpu.h > @@ -32,23 +32,25 @@ void register_cpu_notifier(struct notifier_block > *nb); > * (a) A CPU is going down; or (b) CPU_UP_CANCELED > */ > /* CPU_UP_PREPARE: Preparing to bring CPU online. */ > -#define CPU_UP_PREPARE (0x0001 | NOTIFY_FORWARD) > +#define CPU_UP_PREPARE (0x0001 | NOTIFY_FORWARD) > In the comment block before these definitions, there's this: * Possible event sequences for a given CPU: * CPU_UP_PREPARE -> CPU_UP_CANCELLED -- failed CPU up * CPU_UP_PREPARE -> CPU_STARTING -> CPU_ONLINE -- successful CPU up * CPU_DOWN_PREPARE -> CPU_DOWN_FAILED -- failed CPU down * CPU_DOWN_PREPARE -> CPU_DYING -> CPU_DEAD -- successful CPU down Shouldn't we add a line for this new hook? Something, IIUIC, like: CPU_UP_PREPARE -> CPU_UP_CANCELLED -> CPU_RESUME_FAILED --CPU not resuming With this, FWIW, Reviewed-by: Dario Faggioli <dfaggioli@suse.com> One more (minor) thing... > /* CPU_REMOVE: CPU was removed. */ > -#define CPU_REMOVE (0x0009 | NOTIFY_REVERSE) > +#define CPU_REMOVE (0x0009 | NOTIFY_REVERSE) > +/* CPU_RESUME_FAILED: CPU failed to come up in resume, all other CPUs up. */ > +#define CPU_RESUME_FAILED (0x000a | NOTIFY_REVERSE) > ... technically, when we're dealing with CPU_RESUME_FAILED on one CPU, we don't know if _all_ others really went up, so I think I'd remove what follows the ','. Regards, Dario
On 25/03/2019 13:21, Dario Faggioli wrote: > On Mon, 2019-03-18 at 14:11 +0100, Juergen Gross wrote: >> --- a/xen/include/xen/cpu.h >> +++ b/xen/include/xen/cpu.h >> @@ -32,23 +32,25 @@ void register_cpu_notifier(struct notifier_block >> *nb); >> * (a) A CPU is going down; or (b) CPU_UP_CANCELED >> */ >> /* CPU_UP_PREPARE: Preparing to bring CPU online. */ >> -#define CPU_UP_PREPARE (0x0001 | NOTIFY_FORWARD) >> +#define CPU_UP_PREPARE (0x0001 | NOTIFY_FORWARD) >> > In the comment block before these definitions, there's this: > > * Possible event sequences for a given CPU: > * CPU_UP_PREPARE -> CPU_UP_CANCELLED -- failed CPU up > * CPU_UP_PREPARE -> CPU_STARTING -> CPU_ONLINE -- successful CPU up > * CPU_DOWN_PREPARE -> CPU_DOWN_FAILED -- failed CPU down > * CPU_DOWN_PREPARE -> CPU_DYING -> CPU_DEAD -- successful CPU down > > Shouldn't we add a line for this new hook? Something, IIUIC, like: > > CPU_UP_PREPARE -> CPU_UP_CANCELLED -> CPU_RESUME_FAILED --CPU not resuming Fine with me. > > With this, FWIW, > > Reviewed-by: Dario Faggioli <dfaggioli@suse.com> > > One more (minor) thing... > >> /* CPU_REMOVE: CPU was removed. */ >> -#define CPU_REMOVE (0x0009 | NOTIFY_REVERSE) >> +#define CPU_REMOVE (0x0009 | NOTIFY_REVERSE) >> +/* CPU_RESUME_FAILED: CPU failed to come up in resume, all other CPUs up. */ >> +#define CPU_RESUME_FAILED (0x000a | NOTIFY_REVERSE) >> > ... technically, when we're dealing with CPU_RESUME_FAILED on one CPU, > we don't know if _all_ others really went up, so I think I'd remove > what follows the ','. The point is that for the CPU_RESUME_FAILED case we can be sure that no cpu will come up due to resume just a little bit later. So we can test for e.g. a cpupool suddenly having no more cpus available. This is in contrast to CPU_UP_CANCELLED being signalled just after the one cpu failing to come up, but before the next cpu is triggered to come up. Juergen
On 3/18/19 1:11 PM, Juergen Gross wrote: > Add a new cpu notifier action CPU_RESUME_FAILED which is called for all > cpus which failed to come up at resume. The calls will be done after > all other cpus are already up. > > Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
On Mon, 2019-03-25 at 13:29 +0100, Juergen Gross wrote: > On 25/03/2019 13:21, Dario Faggioli wrote: > > > Reviewed-by: Dario Faggioli <dfaggioli@suse.com> > > > > One more (minor) thing... > > > > > /* CPU_REMOVE: CPU was removed. */ > > > -#define CPU_REMOVE (0x0009 | NOTIFY_REVERSE) > > > +#define CPU_REMOVE (0x0009 | NOTIFY_REVERSE) > > > +/* CPU_RESUME_FAILED: CPU failed to come up in resume, all other > > > CPUs up. */ > > > +#define CPU_RESUME_FAILED (0x000a | NOTIFY_REVERSE) > > > > > ... technically, when we're dealing with CPU_RESUME_FAILED on one > > CPU, > > we don't know if _all_ others really went up, so I think I'd remove > > what follows the ','. > > The point is that for the CPU_RESUME_FAILED case we can be sure that > no > cpu will come up due to resume just a little bit later. > Ah, I see what you mean... that's the fact that this notifier is invoked from another loop, and although we don't know which CPU did manage to come up and which don't, we do know that all the ones that could come up, are up already. > So we can test > for e.g. a cpupool suddenly having no more cpus available. This is in > contrast to CPU_UP_CANCELLED being signalled just after the one cpu > failing to come up, but before the next cpu is triggered to come up. > Right. Well, it still looks to me that "all other CPUs up" is not entirely accurate. But I can't propose a better alternative (unless we write something very long, which is probably not worth it). Perhaps you can explain this a little in another comment, like in enable_nonboot_cpus(), before the for_each_cpu() loop itself. But I don't feel too strong about that, and the RoB stands, whatever you decide to do. Regards, Dario
>>> On 18.03.19 at 14:11, <jgross@suse.com> wrote: > --- a/xen/common/cpu.c > +++ b/xen/common/cpu.c > @@ -214,7 +214,12 @@ void enable_nonboot_cpus(void) > printk("Error bringing CPU%d up: %d\n", cpu, error); > BUG_ON(error == -EBUSY); > } > + else > + __cpumask_clear_cpu(cpu, &frozen_cpus); > } > > + for_each_cpu ( cpu, &frozen_cpus ) > + BUG_ON(cpu_notifier_call_chain(cpu, CPU_RESUME_FAILED, NULL)); > + > cpumask_clear(&frozen_cpus); > } Is there a particular reason you add a second loop here? Jan
On 27/03/2019 17:29, Jan Beulich wrote: >>>> On 18.03.19 at 14:11, <jgross@suse.com> wrote: >> --- a/xen/common/cpu.c >> +++ b/xen/common/cpu.c >> @@ -214,7 +214,12 @@ void enable_nonboot_cpus(void) >> printk("Error bringing CPU%d up: %d\n", cpu, error); >> BUG_ON(error == -EBUSY); >> } >> + else >> + __cpumask_clear_cpu(cpu, &frozen_cpus); >> } >> >> + for_each_cpu ( cpu, &frozen_cpus ) >> + BUG_ON(cpu_notifier_call_chain(cpu, CPU_RESUME_FAILED, NULL)); >> + >> cpumask_clear(&frozen_cpus); >> } > > Is there a particular reason you add a second loop here? Yes, I want to know which cpus did come up again. Juergen
diff --git a/xen/common/cpu.c b/xen/common/cpu.c index c436c0de7f..f3cf9463b4 100644 --- a/xen/common/cpu.c +++ b/xen/common/cpu.c @@ -214,7 +214,12 @@ void enable_nonboot_cpus(void) printk("Error bringing CPU%d up: %d\n", cpu, error); BUG_ON(error == -EBUSY); } + else + __cpumask_clear_cpu(cpu, &frozen_cpus); } + for_each_cpu ( cpu, &frozen_cpus ) + BUG_ON(cpu_notifier_call_chain(cpu, CPU_RESUME_FAILED, NULL)); + cpumask_clear(&frozen_cpus); } diff --git a/xen/include/xen/cpu.h b/xen/include/xen/cpu.h index 2fe3ec05d8..2fc0cb1bb5 100644 --- a/xen/include/xen/cpu.h +++ b/xen/include/xen/cpu.h @@ -32,23 +32,25 @@ void register_cpu_notifier(struct notifier_block *nb); * (a) A CPU is going down; or (b) CPU_UP_CANCELED */ /* CPU_UP_PREPARE: Preparing to bring CPU online. */ -#define CPU_UP_PREPARE (0x0001 | NOTIFY_FORWARD) +#define CPU_UP_PREPARE (0x0001 | NOTIFY_FORWARD) /* CPU_UP_CANCELED: CPU is no longer being brought online. */ -#define CPU_UP_CANCELED (0x0002 | NOTIFY_REVERSE) +#define CPU_UP_CANCELED (0x0002 | NOTIFY_REVERSE) /* CPU_STARTING: CPU nearly online. Runs on new CPU, irqs still disabled. */ -#define CPU_STARTING (0x0003 | NOTIFY_FORWARD) +#define CPU_STARTING (0x0003 | NOTIFY_FORWARD) /* CPU_ONLINE: CPU is up. */ -#define CPU_ONLINE (0x0004 | NOTIFY_FORWARD) +#define CPU_ONLINE (0x0004 | NOTIFY_FORWARD) /* CPU_DOWN_PREPARE: CPU is going down. */ -#define CPU_DOWN_PREPARE (0x0005 | NOTIFY_REVERSE) +#define CPU_DOWN_PREPARE (0x0005 | NOTIFY_REVERSE) /* CPU_DOWN_FAILED: CPU is no longer going down. */ -#define CPU_DOWN_FAILED (0x0006 | NOTIFY_FORWARD) +#define CPU_DOWN_FAILED (0x0006 | NOTIFY_FORWARD) /* CPU_DYING: CPU is nearly dead (in stop_machine context). */ -#define CPU_DYING (0x0007 | NOTIFY_REVERSE) +#define CPU_DYING (0x0007 | NOTIFY_REVERSE) /* CPU_DEAD: CPU is dead. */ -#define CPU_DEAD (0x0008 | NOTIFY_REVERSE) +#define CPU_DEAD (0x0008 | NOTIFY_REVERSE) /* CPU_REMOVE: CPU was removed. */ -#define CPU_REMOVE (0x0009 | NOTIFY_REVERSE) +#define CPU_REMOVE (0x0009 | NOTIFY_REVERSE) +/* CPU_RESUME_FAILED: CPU failed to come up in resume, all other CPUs up. */ +#define CPU_RESUME_FAILED (0x000a | NOTIFY_REVERSE) /* Perform CPU hotplug. May return -EAGAIN. */ int cpu_down(unsigned int cpu);
Add a new cpu notifier action CPU_RESUME_FAILED which is called for all cpus which failed to come up at resume. The calls will be done after all other cpus are already up. Signed-off-by: Juergen Gross <jgross@suse.com> --- xen/common/cpu.c | 5 +++++ xen/include/xen/cpu.h | 20 +++++++++++--------- 2 files changed, 16 insertions(+), 9 deletions(-)