mbox series

[v3,0/6] xen: simplify suspend/resume handling

Message ID 20190402053457.24912-1-jgross@suse.com (mailing list archive)
Headers show
Series xen: simplify suspend/resume handling | expand

Message

Jürgen Groß April 2, 2019, 5:34 a.m. UTC
Especially in the scheduler area (schedule.c, cpupool.c) there is a
rather complex handling involved when doing suspend and resume.

This can be simplified a lot by not performing a complete cpu down and
up cycle for the non-boot cpus, but keeping the pure software related
state and freeing it only in case a cpu didn't come up again during
resume.

In summary not only the complexity can be reduced, but the failure
tolerance will be even better with this series: With a dedicated hook
for failing cpus when resuming it is now possible to survive e.g. a
cpupool being left without any cpu after resume by moving its domains
to cpupool0.

Juergen Gross (6):
  xen/sched: call cpu_disable_scheduler() via cpu notifier
  xen: add helper for calling notifier_call_chain() to common/cpu.c
  xen: add new cpu notifier action CPU_RESUME_FAILED
  xen: don't free percpu areas during suspend
  xen/cpupool: simplify suspend/resume handling
  xen/sched: don't disable scheduler on cpus during suspend

 xen/arch/arm/smpboot.c     |   2 -
 xen/arch/x86/percpu.c      |   3 +-
 xen/arch/x86/smpboot.c     |   3 -
 xen/common/cpu.c           |  61 +++++++-------
 xen/common/cpupool.c       | 131 ++++++++++++-----------------
 xen/common/schedule.c      | 203 +++++++++++++++++++--------------------------
 xen/include/xen/cpu.h      |  29 ++++---
 xen/include/xen/sched-if.h |   1 -
 8 files changed, 190 insertions(+), 243 deletions(-)

Comments

Andrew Cooper April 2, 2019, 3:47 p.m. UTC | #1
On 02/04/2019 06:34, Juergen Gross wrote:
> Especially in the scheduler area (schedule.c, cpupool.c) there is a
> rather complex handling involved when doing suspend and resume.
>
> This can be simplified a lot by not performing a complete cpu down and
> up cycle for the non-boot cpus, but keeping the pure software related
> state and freeing it only in case a cpu didn't come up again during
> resume.
>
> In summary not only the complexity can be reduced, but the failure
> tolerance will be even better with this series: With a dedicated hook
> for failing cpus when resuming it is now possible to survive e.g. a
> cpupool being left without any cpu after resume by moving its domains
> to cpupool0.
>
> Juergen Gross (6):
>   xen/sched: call cpu_disable_scheduler() via cpu notifier
>   xen: add helper for calling notifier_call_chain() to common/cpu.c
>   xen: add new cpu notifier action CPU_RESUME_FAILED
>   xen: don't free percpu areas during suspend
>   xen/cpupool: simplify suspend/resume handling
>   xen/sched: don't disable scheduler on cpus during suspend

So I came to try and commit this series.  However,

[root@fusebot ~]# xen-hptool cpu-offline 1
Prepare to offline CPU 1
CPU 1 offline failed (error 11: Resource temporarily unavailable)
[root@fusebot ~]# xen-hptool cpu-offline 2
Prepare to offline CPU 2
CPU 2 offline failed (error 11: Resource temporarily unavailable)
[root@fusebot ~]#

Something here has regressed all ability to hotplug.  Its not
immediately obvious what.

~Andrew