diff mbox series

[v2] xen/percpu: don't initialize percpu on resume

Message ID 7c3fece57a28462ae0c071c1cf72eb32541f1b91.1743074241.git.mykola_kvach@epam.com (mailing list archive)
State Superseded
Headers show
Series [v2] xen/percpu: don't initialize percpu on resume | expand

Commit Message

Mykola Kvach March 27, 2025, 11:32 a.m. UTC
From: Mykola Kvach <mykola_kvach@epam.com>

Invocation of the CPU_UP_PREPARE notification
on ARM64 during resume causes a crash:

(XEN) [  315.807606] Error bringing CPU1 up: -16
(XEN) [  315.811926] Xen BUG at common/cpu.c:258
[...]
(XEN) [  316.142765] Xen call trace:
(XEN) [  316.146048]    [<00000a0000202264>] enable_nonboot_cpus+0x128/0x1ac (PC)
(XEN) [  316.153219]    [<00000a000020225c>] enable_nonboot_cpus+0x120/0x1ac (LR)
(XEN) [  316.160391]    [<00000a0000278180>] suspend.c#system_suspend+0x4c/0x1a0
(XEN) [  316.167476]    [<00000a0000206b70>] domain.c#continue_hypercall_tasklet_handler+0x54/0xd0
(XEN) [  316.176117]    [<00000a0000226538>] tasklet.c#do_tasklet_work+0xb8/0x100
(XEN) [  316.183288]    [<00000a0000226920>] do_tasklet+0x68/0xb0
(XEN) [  316.189077]    [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194
(XEN) [  316.195644]    [<00000a0000277638>] shutdown.c#halt_this_cpu+0/0x14
(XEN) [  316.202383]    [<0000000000000008>] 0000000000000008

Freeing per-CPU areas and setting __per_cpu_offset to INVALID_PERCPU_AREA
only occur when !park_offline_cpus and system_state is not SYS_STATE_suspend.
On ARM64, park_offline_cpus is always false, so setting __per_cpu_offset to
INVALID_PERCPU_AREA depends solely on the system state.

If the system is suspended, this area is not freed, and during resume, an error
occurs in init_percpu_area, causing a crash because INVALID_PERCPU_AREA is not
set and park_offline_cpus remains 0:

    if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
        return park_offline_cpus ? 0 : -EBUSY;

The same crash can occur on x86 if park_offline_cpus is set
to 0 during Xen resume.

Fixes: f75780d26b2f ("xen: move per-cpu area management into common code")
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes introduced in V2:
 - cosmetic fixes after review
 - chnages in a commit message: add "Fixes"
---
 xen/common/percpu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Mykola Kvach March 27, 2025, 11:37 a.m. UTC | #1
On Thu, Mar 27, 2025 at 1:32 PM Mykola Kvach <xakep.amatop@gmail.com> wrote:
>

Hmm, looks like this line...

> From: Mykola Kvach <mykola_kvach@epam.com>

...shouldn't be here

>
> Invocation of the CPU_UP_PREPARE notification
> on ARM64 during resume causes a crash:
>
> (XEN) [  315.807606] Error bringing CPU1 up: -16
> (XEN) [  315.811926] Xen BUG at common/cpu.c:258
> [...]
> (XEN) [  316.142765] Xen call trace:
> (XEN) [  316.146048]    [<00000a0000202264>] enable_nonboot_cpus+0x128/0x1ac (PC)
> (XEN) [  316.153219]    [<00000a000020225c>] enable_nonboot_cpus+0x120/0x1ac (LR)
> (XEN) [  316.160391]    [<00000a0000278180>] suspend.c#system_suspend+0x4c/0x1a0
> (XEN) [  316.167476]    [<00000a0000206b70>] domain.c#continue_hypercall_tasklet_handler+0x54/0xd0
> (XEN) [  316.176117]    [<00000a0000226538>] tasklet.c#do_tasklet_work+0xb8/0x100
> (XEN) [  316.183288]    [<00000a0000226920>] do_tasklet+0x68/0xb0
> (XEN) [  316.189077]    [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194
> (XEN) [  316.195644]    [<00000a0000277638>] shutdown.c#halt_this_cpu+0/0x14
> (XEN) [  316.202383]    [<0000000000000008>] 0000000000000008
>
> Freeing per-CPU areas and setting __per_cpu_offset to INVALID_PERCPU_AREA
> only occur when !park_offline_cpus and system_state is not SYS_STATE_suspend.
> On ARM64, park_offline_cpus is always false, so setting __per_cpu_offset to
> INVALID_PERCPU_AREA depends solely on the system state.
>
> If the system is suspended, this area is not freed, and during resume, an error
> occurs in init_percpu_area, causing a crash because INVALID_PERCPU_AREA is not
> set and park_offline_cpus remains 0:
>
>     if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
>         return park_offline_cpus ? 0 : -EBUSY;
>
> The same crash can occur on x86 if park_offline_cpus is set
> to 0 during Xen resume.
>
> Fixes: f75780d26b2f ("xen: move per-cpu area management into common code")
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> Changes introduced in V2:
>  - cosmetic fixes after review
>  - chnages in a commit message: add "Fixes"
> ---
>  xen/common/percpu.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/xen/common/percpu.c b/xen/common/percpu.c
> index e4e8b7bcab..b20f0460fc 100644
> --- a/xen/common/percpu.c
> +++ b/xen/common/percpu.c
> @@ -30,7 +30,9 @@ static int init_percpu_area(unsigned int cpu)
>      char *p;
>
>      if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
> -        return park_offline_cpus ? 0 : -EBUSY;
> +        return park_offline_cpus || system_state == SYS_STATE_resume
> +            ? 0
> +            : -EBUSY;
>
>      if ( (p = alloc_xenheap_pages(PERCPU_ORDER, 0)) == NULL )
>          return -ENOMEM;
> --
> 2.43.0
>
Jan Beulich March 27, 2025, 12:17 p.m. UTC | #2
On 27.03.2025 12:37, Mykola Kvach wrote:
> On Thu, Mar 27, 2025 at 1:32 PM Mykola Kvach <xakep.amatop@gmail.com> wrote:
>>
> 
> Hmm, looks like this line...
> 
>> From: Mykola Kvach <mykola_kvach@epam.com>
> 
> ...shouldn't be here

And instead it should be "From: Mykyta Poturai <mykyta_poturai@epam.com>", to
match ...

>> Invocation of the CPU_UP_PREPARE notification
>> on ARM64 during resume causes a crash:
>>
>> (XEN) [  315.807606] Error bringing CPU1 up: -16
>> (XEN) [  315.811926] Xen BUG at common/cpu.c:258
>> [...]
>> (XEN) [  316.142765] Xen call trace:
>> (XEN) [  316.146048]    [<00000a0000202264>] enable_nonboot_cpus+0x128/0x1ac (PC)
>> (XEN) [  316.153219]    [<00000a000020225c>] enable_nonboot_cpus+0x120/0x1ac (LR)
>> (XEN) [  316.160391]    [<00000a0000278180>] suspend.c#system_suspend+0x4c/0x1a0
>> (XEN) [  316.167476]    [<00000a0000206b70>] domain.c#continue_hypercall_tasklet_handler+0x54/0xd0
>> (XEN) [  316.176117]    [<00000a0000226538>] tasklet.c#do_tasklet_work+0xb8/0x100
>> (XEN) [  316.183288]    [<00000a0000226920>] do_tasklet+0x68/0xb0
>> (XEN) [  316.189077]    [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194
>> (XEN) [  316.195644]    [<00000a0000277638>] shutdown.c#halt_this_cpu+0/0x14
>> (XEN) [  316.202383]    [<0000000000000008>] 0000000000000008
>>
>> Freeing per-CPU areas and setting __per_cpu_offset to INVALID_PERCPU_AREA
>> only occur when !park_offline_cpus and system_state is not SYS_STATE_suspend.
>> On ARM64, park_offline_cpus is always false, so setting __per_cpu_offset to
>> INVALID_PERCPU_AREA depends solely on the system state.
>>
>> If the system is suspended, this area is not freed, and during resume, an error
>> occurs in init_percpu_area, causing a crash because INVALID_PERCPU_AREA is not
>> set and park_offline_cpus remains 0:
>>
>>     if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
>>         return park_offline_cpus ? 0 : -EBUSY;
>>
>> The same crash can occur on x86 if park_offline_cpus is set
>> to 0 during Xen resume.
>>
>> Fixes: f75780d26b2f ("xen: move per-cpu area management into common code")
>> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>

... the first S-o-b?

>> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Changes introduced in V2:
>>  - cosmetic fixes after review
>>  - chnages in a commit message: add "Fixes"
>> ---
>>  xen/common/percpu.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/common/percpu.c b/xen/common/percpu.c
>> index e4e8b7bcab..b20f0460fc 100644
>> --- a/xen/common/percpu.c
>> +++ b/xen/common/percpu.c
>> @@ -30,7 +30,9 @@ static int init_percpu_area(unsigned int cpu)
>>      char *p;
>>
>>      if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
>> -        return park_offline_cpus ? 0 : -EBUSY;
>> +        return park_offline_cpus || system_state == SYS_STATE_resume
>> +            ? 0
>> +            : -EBUSY;

Nit: That's not quite the indentation I suggested to use, I think?

Jan
diff mbox series

Patch

diff --git a/xen/common/percpu.c b/xen/common/percpu.c
index e4e8b7bcab..b20f0460fc 100644
--- a/xen/common/percpu.c
+++ b/xen/common/percpu.c
@@ -30,7 +30,9 @@  static int init_percpu_area(unsigned int cpu)
     char *p;
 
     if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
-        return park_offline_cpus ? 0 : -EBUSY;
+        return park_offline_cpus || system_state == SYS_STATE_resume
+            ? 0
+            : -EBUSY;
 
     if ( (p = alloc_xenheap_pages(PERCPU_ORDER, 0)) == NULL )
         return -ENOMEM;