diff mbox series

xen/percpu: don't initialize percpu on resume

Message ID 20250327052241.202167-1-xakep.amatop@gmail.com (mailing list archive)
State Superseded
Headers show
Series xen/percpu: don't initialize percpu on resume | expand

Commit Message

Mykola Kvach March 27, 2025, 5:22 a.m. UTC
Invocation of the CPU_UP_PREPARE notification
on ARM64 during resume causes a crash:

(XEN) [  315.807606] Error bringing CPU1 up: -16
(XEN) [  315.811926] Xen BUG at common/cpu.c:258
[...]
(XEN) [  316.142765] Xen call trace:
(XEN) [  316.146048]    [<00000a0000202264>] enable_nonboot_cpus+0x128/0x1ac (PC)
(XEN) [  316.153219]    [<00000a000020225c>] enable_nonboot_cpus+0x120/0x1ac (LR)
(XEN) [  316.160391]    [<00000a0000278180>] suspend.c#system_suspend+0x4c/0x1a0
(XEN) [  316.167476]    [<00000a0000206b70>] domain.c#continue_hypercall_tasklet_handler+0x54/0xd0
(XEN) [  316.176117]    [<00000a0000226538>] tasklet.c#do_tasklet_work+0xb8/0x100
(XEN) [  316.183288]    [<00000a0000226920>] do_tasklet+0x68/0xb0
(XEN) [  316.189077]    [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194
(XEN) [  316.195644]    [<00000a0000277638>] shutdown.c#halt_this_cpu+0/0x14
(XEN) [  316.202383]    [<0000000000000008>] 0000000000000008

Freeing per-CPU areas and setting __per_cpu_offset to INVALID_PERCPU_AREA
only occur when !park_offline_cpus and system_state is not SYS_STATE_suspend.
On ARM64, park_offline_cpus is always false, so setting __per_cpu_offset to
INVALID_PERCPU_AREA depends solely on the system state.

If the system is suspended, this area is not freed, and during resume, an error
occurs in init_percpu_area, causing a crash because INVALID_PERCPU_AREA is not
set and park_offline_cpus remains 0:

    if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
        return park_offline_cpus ? 0 : -EBUSY;

It appears that the same crash can occur on x86 if park_offline_cpus is set
to 0 during Xen suspend.

Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Discussion related to these changes can be found here:
https://marc.info/?l=xen-devel&m=174116563705295&w=2
---
 xen/common/percpu.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Jan Beulich March 27, 2025, 8:25 a.m. UTC | #1
On 27.03.2025 06:22, Mykola Kvach wrote:
> --- a/xen/common/percpu.c
> +++ b/xen/common/percpu.c
> @@ -30,7 +30,12 @@ static int init_percpu_area(unsigned int cpu)
>      char *p;
>  
>      if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
> -        return park_offline_cpus ? 0 : -EBUSY;
> +    {
> +        if ( system_state == SYS_STATE_resume )
> +            return 0;
> +        else
> +            return park_offline_cpus ? 0 : -EBUSY;
> +    }

Hmm, why not the much simpler

        return park_offline_cpus || system_state == SYS_STATE_resume
               ? 0
               : -EBUSY;

Even if not for whatever reason, I'd really like to ask to omit such an
unnecessary "else".

Preferably with the adjustment (which I'd be happy to make while
committing) and with the (iirc) previously suggested Fixes: tag:
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan
Jan Beulich March 27, 2025, 8:26 a.m. UTC | #2
On 27.03.2025 06:22, Mykola Kvach wrote:
> Invocation of the CPU_UP_PREPARE notification
> on ARM64 during resume causes a crash:
> 
> (XEN) [  315.807606] Error bringing CPU1 up: -16
> (XEN) [  315.811926] Xen BUG at common/cpu.c:258
> [...]
> (XEN) [  316.142765] Xen call trace:
> (XEN) [  316.146048]    [<00000a0000202264>] enable_nonboot_cpus+0x128/0x1ac (PC)
> (XEN) [  316.153219]    [<00000a000020225c>] enable_nonboot_cpus+0x120/0x1ac (LR)
> (XEN) [  316.160391]    [<00000a0000278180>] suspend.c#system_suspend+0x4c/0x1a0
> (XEN) [  316.167476]    [<00000a0000206b70>] domain.c#continue_hypercall_tasklet_handler+0x54/0xd0
> (XEN) [  316.176117]    [<00000a0000226538>] tasklet.c#do_tasklet_work+0xb8/0x100
> (XEN) [  316.183288]    [<00000a0000226920>] do_tasklet+0x68/0xb0
> (XEN) [  316.189077]    [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194
> (XEN) [  316.195644]    [<00000a0000277638>] shutdown.c#halt_this_cpu+0/0x14
> (XEN) [  316.202383]    [<0000000000000008>] 0000000000000008
> 
> Freeing per-CPU areas and setting __per_cpu_offset to INVALID_PERCPU_AREA
> only occur when !park_offline_cpus and system_state is not SYS_STATE_suspend.
> On ARM64, park_offline_cpus is always false, so setting __per_cpu_offset to
> INVALID_PERCPU_AREA depends solely on the system state.
> 
> If the system is suspended, this area is not freed, and during resume, an error
> occurs in init_percpu_area, causing a crash because INVALID_PERCPU_AREA is not
> set and park_offline_cpus remains 0:
> 
>     if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
>         return park_offline_cpus ? 0 : -EBUSY;
> 
> It appears that the same crash can occur on x86 if park_offline_cpus is set
> to 0 during Xen suspend.

Oh, also - it's not "appears"; iirc Marek meanwhile confirmed the misbehavior on
x86 AMD hardware.

Jan
Mykola Kvach March 27, 2025, 11:42 a.m. UTC | #3
Hi,

On Thu, Mar 27, 2025 at 10:25 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 27.03.2025 06:22, Mykola Kvach wrote:
> > --- a/xen/common/percpu.c
> > +++ b/xen/common/percpu.c
> > @@ -30,7 +30,12 @@ static int init_percpu_area(unsigned int cpu)
> >      char *p;
> >
> >      if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
> > -        return park_offline_cpus ? 0 : -EBUSY;
> > +    {
> > +        if ( system_state == SYS_STATE_resume )
> > +            return 0;
> > +        else
> > +            return park_offline_cpus ? 0 : -EBUSY;
> > +    }
>
> Hmm, why not the much simpler
>
>         return park_offline_cpus || system_state == SYS_STATE_resume
>                ? 0
>                : -EBUSY;
>
> Even if not for whatever reason, I'd really like to ask to omit such an
> unnecessary "else".

Done

>
> Preferably with the adjustment (which I'd be happy to make while
> committing) and with the (iirc) previously suggested Fixes: tag:

Done

> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Added

>
> Jan

Best regards,
Mykola
Mykola Kvach March 27, 2025, 11:44 a.m. UTC | #4
On Thu, Mar 27, 2025 at 10:27 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 27.03.2025 06:22, Mykola Kvach wrote:
> > Invocation of the CPU_UP_PREPARE notification
> > on ARM64 during resume causes a crash:
> >
> > (XEN) [  315.807606] Error bringing CPU1 up: -16
> > (XEN) [  315.811926] Xen BUG at common/cpu.c:258
> > [...]
> > (XEN) [  316.142765] Xen call trace:
> > (XEN) [  316.146048]    [<00000a0000202264>] enable_nonboot_cpus+0x128/0x1ac (PC)
> > (XEN) [  316.153219]    [<00000a000020225c>] enable_nonboot_cpus+0x120/0x1ac (LR)
> > (XEN) [  316.160391]    [<00000a0000278180>] suspend.c#system_suspend+0x4c/0x1a0
> > (XEN) [  316.167476]    [<00000a0000206b70>] domain.c#continue_hypercall_tasklet_handler+0x54/0xd0
> > (XEN) [  316.176117]    [<00000a0000226538>] tasklet.c#do_tasklet_work+0xb8/0x100
> > (XEN) [  316.183288]    [<00000a0000226920>] do_tasklet+0x68/0xb0
> > (XEN) [  316.189077]    [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194
> > (XEN) [  316.195644]    [<00000a0000277638>] shutdown.c#halt_this_cpu+0/0x14
> > (XEN) [  316.202383]    [<0000000000000008>] 0000000000000008
> >
> > Freeing per-CPU areas and setting __per_cpu_offset to INVALID_PERCPU_AREA
> > only occur when !park_offline_cpus and system_state is not SYS_STATE_suspend.
> > On ARM64, park_offline_cpus is always false, so setting __per_cpu_offset to
> > INVALID_PERCPU_AREA depends solely on the system state.
> >
> > If the system is suspended, this area is not freed, and during resume, an error
> > occurs in init_percpu_area, causing a crash because INVALID_PERCPU_AREA is not
> > set and park_offline_cpus remains 0:
> >
> >     if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
> >         return park_offline_cpus ? 0 : -EBUSY;
> >
> > It appears that the same crash can occur on x86 if park_offline_cpus is set
> > to 0 during Xen suspend.
>
> Oh, also - it's not "appears"; iirc Marek meanwhile confirmed the misbehavior on
> x86 AMD hardware.

I've updated this line in the commit message.

>
> Jan

~Mykola
diff mbox series

Patch

diff --git a/xen/common/percpu.c b/xen/common/percpu.c
index e4e8b7bcab..607557436d 100644
--- a/xen/common/percpu.c
+++ b/xen/common/percpu.c
@@ -30,7 +30,12 @@  static int init_percpu_area(unsigned int cpu)
     char *p;
 
     if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
-        return park_offline_cpus ? 0 : -EBUSY;
+    {
+        if ( system_state == SYS_STATE_resume )
+            return 0;
+        else
+            return park_offline_cpus ? 0 : -EBUSY;
+    }
 
     if ( (p = alloc_xenheap_pages(PERCPU_ORDER, 0)) == NULL )
         return -ENOMEM;