Xen panic when shutting down ARINC653 cpupool

Message ID	BN0P110MB21487F77F8E578780A3FE44490DFA@BN0P110MB2148.NAMP110.PROD.OUTLOOK.COM (mailing list archive)
State	New
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: "Choi, Anderson" <Anderson.Choi@boeing.com> To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org> CC: "nathan.studer@dornerworks.com" <nathan.studer@dornerworks.com>, "stewart@stew.dk" <stewart@stew.dk>, "Weber (US), Matthew L" <matthew.l.weber3@boeing.com>, "Whitehead (US), Joshua C" <joshua.c.whitehead@boeing.com> Subject: Xen panic when shutting down ARINC653 cpupool Thread-Topic: Xen panic when shutting down ARINC653 cpupool Thread-Index: AduW+TkPYPhh1TXsRdStDrK0/zWwnA== Date: Mon, 17 Mar 2025 05:07:38 +0000 Message-ID: <BN0P110MB21487F77F8E578780A3FE44490DFA@BN0P110MB2148.NAMP110.PROD.OUTLOOK.COM> Accept-Language: en-US Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BN0P110MB2148.NAMP110.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: 8418bc6e-272e-442d-5a7a-08dd6511a370 X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Mar 2025 05:07:38.7282 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted
Series	Xen panic when shutting down ARINC653 cpupool \| expand Xen panic when shutting down ARINC653 cpupool

Message ID

BN0P110MB21487F77F8E578780A3FE44490DFA@BN0P110MB2148.NAMP110.PROD.OUTLOOK.COM (mailing list archive)

State

New

Headers

Errors-To: xen-devel-bounces@lists.xenproject.org
Precedence: list
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
From: "Choi, Anderson" <Anderson.Choi@boeing.com>
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
CC: "nathan.studer@dornerworks.com" <nathan.studer@dornerworks.com>,
        "stewart@stew.dk" <stewart@stew.dk>,
        "Weber (US), Matthew L"
	<matthew.l.weber3@boeing.com>,
        "Whitehead (US), Joshua C"
	<joshua.c.whitehead@boeing.com>
Subject: Xen panic when shutting down ARINC653 cpupool
Thread-Topic: Xen panic when shutting down ARINC653 cpupool
Thread-Index: AduW+TkPYPhh1TXsRdStDrK0/zWwnA==
Date: Mon, 17 Mar 2025 05:07:38 +0000
Message-ID: 
 <BN0P110MB21487F77F8E578780A3FE44490DFA@BN0P110MB2148.NAMP110.PROD.OUTLOOK.COM>
Accept-Language: en-US
Content-Language: en-US
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: 
 7uy84ms81+vB2CgE3NJ0xj6iTy0OSY0xVRB66shfADt6SPC7mO3WasUwojQ3jqOe1L9mO69KReh3kH8j6Kl19YfjtJ5AtZxeEE/vxke/QecjglnzuCTUATtNn6k824IrlXy7qlpoGmievQ99f0ig5rYozOpBJDzCcW6kSoC0OIIeqrTntUbCejQQD7UUk/Zkgm4VVvlrA30qzT337El0CiaShgZil2jkCVKvvMkFfFrr4us3+lfUq5R4CVvn83lGW6FDaJI036Aby6vMWiw6JiBCrz7fbCLBFzvwIdMcX8FfWN7s2TeBPUQdrXme7FOGuk2nl7PH9k3HI3c8KF4A++giNkvtAreR8TAcUwkXkAKoK5RupW+uper/VmxJ8fEDsuJx2qjB10tl19ZuEBvh2HeCya6QHhII1sx3cguj3AgQJ5Bn4y3zRyyfY/CEBxoO2TxrPsd/6zvTqZLYz9lylI442Y6uFOFQUanLyZqZ8j25y+gyX3ntaMLtd2Pxl4Xqfdq4HMuVpKEkyumbTVpD1IsOlxH94aA+1tZnIwHh0F1D6//eNA4axW7I+l3zugdJYXlQ5B8HefSE7otYsvxihIfsAzsd0pTR8VOL/XHyeS9+1xBPAE176f0gHLTnUIVgy/ID9lcObWoIHcTROfJ74/U5d1mb33PxByltpQgJCecx/KHGupGwS0+HluFUWF83oywjLcWLx4yvOx20hc6f0iOr2nUHapI+x8+ee9TdQsvhZ9H7veGnGDzYIJHf1TFhVx772O++lbEVNU2NN+u8FT3J0QeZBvu99/NvGFPVSczHEW5y6ADPmLVQQBN+HBviLiqQOzbn/7IU+db64ZUEuZX+SYckexS+Dc21FMPHW0V3XsIaOFx1CTVF0q5jNu6WO5qXu3Ee0BljNlECgboLQDbeplcXud1lBJOiODSfHCfcmEqIvtBWqrVCqvt+fGmQsMxGPohpeIE09QXanr/wZEsKAxWejxKUdBusBrSXzR3rzKDVzXuQsT/VhsYvKZ+nGTkCyoM0ns8W7DQ9UsUFMbdy1GdupTHZnub+xJC84yjeNuEwxXQkMVNgC8dB/CQeaXu4aLnemg3Vpuj8J04MxI3R1B2zKivgLC6ssHVY49ko6bZBnYdNWIbRG7qjMWuYZXHZqvXmwMcAPCSFD7Yz8eQGPVYQq3i1KMVauooTuKGdloyK39c011TEDe9uZwbDYBnYghbp9Rt4rzD4UqFyUft9nFs0MIXp2SUgSx/NVi/Vc2dO9LqCLh9UGceWRZVzKbEPakXAwu8pe0/hDeE3ZAsXC/kHC54QKdf7xR/2WtvLYPJwVoUvaIU1LfAJPq+q5Rr1GZ6Jlh1Hpywu2A47MGGguo4pPv+C0uYQat4AFSxpjIdgoRcVkxH74GCRpDFjDHpH+pUDBCj2Z7LVQnilG5vQdY90ohiTIr2anw/iYqR0MkmnaQOzEvozFtjx0hXri7OHHiVMf0Zq9ORYsaW2/b08rvovFRyxZ2cwEDLLWomWt2sIQl5Tb23qGcP57ENR
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: BN0P110MB2148.NAMP110.PROD.OUTLOOK.COM
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 8418bc6e-272e-442d-5a7a-08dd6511a370
X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Mar 2025 05:07:38.7282
 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: bcf48bba-4d6f-4dee-a0d2-7df59cc36629
X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH1P110MB1396
X-OriginatorOrg: boeing.com
X-TM-AS-GCONF: 00

Series

Xen panic when shutting down ARINC653 cpupool | expand

Commit Message

Choi, Anderson March 17, 2025, 5:07 a.m. UTC

I'd like to report xen panic when shutting down an ARINC653 domain with the following setup.
Note that this is only observed when CONFIG_DEBUG is enabled.

[Test environment]
Yocto release : 5.05
Xen release : 4.19 (hash = 026c9fa29716b0ff0f8b7c687908e71ba29cf239)
Target machine : QEMU ARM64
Number of physical CPUs : 4

[Xen config]
CONFIG_DEBUG = y

[CPU pool configuration files]
cpupool_arinc0.cfg
- name= "Pool-arinc0"
- sched="arinc653"
- cpus=["2"]

[Domain configuration file]
dom1.cfg
- vcpus = 1
- pool = "Pool-arinc0"

$ xl cpupool-cpu-remove Pool-0 2
$ xl cpupool-create -f cpupool_arinc0.cfg
$ xl create dom1.cfg
$ a653_sched -P Pool-arinc0 dom1:100

** Wait for DOM1 to complete boot.**

$ xl shutdown dom1

[xen log]
root@boeing-linux-ref:~# xl shutdown dom1
Shutting down domain 1
root@boeing-linux-ref:~# (XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714
(XEN) ----[ Xen-4.19.1-pre  arm64  debug=y  Tainted: I      ]----
(XEN) CPU:    2
(XEN) PC:     00000a000022d2b0 xfree+0x130/0x1a4
(XEN) LR:     00000a000022d2a4
(XEN) SP:     00008000fff77b50
(XEN) CPSR:   00000000200002c9 MODE:64-bit EL2h (Hypervisor, handler)
...
(XEN) Xen call trace:
(XEN)    [<00000a000022d2b0>] xfree+0x130/0x1a4 (PC)
(XEN)    [<00000a000022d2a4>] xfree+0x124/0x1a4 (LR)
(XEN)    [<00000a00002321f0>] arinc653.c#a653sched_free_udata+0x50/0xc4
(XEN)    [<00000a0000241bc0>] core.c#sched_move_domain_cleanup+0x5c/0x80
(XEN)    [<00000a0000245328>] sched_move_domain+0x69c/0x70c
(XEN)    [<00000a000022f840>] cpupool.c#cpupool_move_domain_locked+0x38/0x70
(XEN)    [<00000a0000230f20>] cpupool_move_domain+0x34/0x54
(XEN)    [<00000a0000206c40>] domain_kill+0xc0/0x15c
(XEN)    [<00000a000022e0d4>] do_domctl+0x904/0x12ec
(XEN)    [<00000a0000277a1c>] traps.c#do_trap_hypercall+0x1f4/0x288
(XEN)    [<00000a0000279018>] do_trap_guest_sync+0x448/0x63c
(XEN)    [<00000a0000262c80>] entry.o#guest_sync_slowpath+0xa8/0xd8
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 2:
(XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714
(XEN) ****************************************

In commit 19049f8d (sched: fix locking in a653sched_free_vdata()), locking was introduced to prevent a race against the list manipulation but leads to assertion failure when the ARINC 653 domain is shutdown.

I think this can be fixed by calling xfree() after spin_unlock_irqrestore() as shown below.

xen/common/sched/arinc653.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


Regards,
Anderson

Comments

Jürgen Groß March 17, 2025, 7:58 a.m. UTC | #1

On 17.03.25 06:07, Choi, Anderson wrote:
> I'd like to report xen panic when shutting down an ARINC653 domain with the following setup.
> Note that this is only observed when CONFIG_DEBUG is enabled.
> 
> [Test environment]
> Yocto release : 5.05
> Xen release : 4.19 (hash = 026c9fa29716b0ff0f8b7c687908e71ba29cf239)
> Target machine : QEMU ARM64
> Number of physical CPUs : 4
> 
> [Xen config]
> CONFIG_DEBUG = y
> 
> [CPU pool configuration files]
> cpupool_arinc0.cfg
> - name= "Pool-arinc0"
> - sched="arinc653"
> - cpus=["2"]
> 
> [Domain configuration file]
> dom1.cfg
> - vcpus = 1
> - pool = "Pool-arinc0"
> 
> $ xl cpupool-cpu-remove Pool-0 2
> $ xl cpupool-create -f cpupool_arinc0.cfg
> $ xl create dom1.cfg
> $ a653_sched -P Pool-arinc0 dom1:100
> 
> ** Wait for DOM1 to complete boot.**
> 
> $ xl shutdown dom1
> 
> [xen log]
> root@boeing-linux-ref:~# xl shutdown dom1
> Shutting down domain 1
> root@boeing-linux-ref:~# (XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714
> (XEN) ----[ Xen-4.19.1-pre  arm64  debug=y  Tainted: I      ]----
> (XEN) CPU:    2
> (XEN) PC:     00000a000022d2b0 xfree+0x130/0x1a4
> (XEN) LR:     00000a000022d2a4
> (XEN) SP:     00008000fff77b50
> (XEN) CPSR:   00000000200002c9 MODE:64-bit EL2h (Hypervisor, handler)
> ...
> (XEN) Xen call trace:
> (XEN)    [<00000a000022d2b0>] xfree+0x130/0x1a4 (PC)
> (XEN)    [<00000a000022d2a4>] xfree+0x124/0x1a4 (LR)
> (XEN)    [<00000a00002321f0>] arinc653.c#a653sched_free_udata+0x50/0xc4
> (XEN)    [<00000a0000241bc0>] core.c#sched_move_domain_cleanup+0x5c/0x80
> (XEN)    [<00000a0000245328>] sched_move_domain+0x69c/0x70c
> (XEN)    [<00000a000022f840>] cpupool.c#cpupool_move_domain_locked+0x38/0x70
> (XEN)    [<00000a0000230f20>] cpupool_move_domain+0x34/0x54
> (XEN)    [<00000a0000206c40>] domain_kill+0xc0/0x15c
> (XEN)    [<00000a000022e0d4>] do_domctl+0x904/0x12ec
> (XEN)    [<00000a0000277a1c>] traps.c#do_trap_hypercall+0x1f4/0x288
> (XEN)    [<00000a0000279018>] do_trap_guest_sync+0x448/0x63c
> (XEN)    [<00000a0000262c80>] entry.o#guest_sync_slowpath+0xa8/0xd8
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 2:
> (XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714
> (XEN) ****************************************
> 
> In commit 19049f8d (sched: fix locking in a653sched_free_vdata()), locking was introduced to prevent a race against the list manipulation but leads to assertion failure when the ARINC 653 domain is shutdown.
> 
> I think this can be fixed by calling xfree() after spin_unlock_irqrestore() as shown below.
> 
> xen/common/sched/arinc653.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/sched/arinc653.c b/xen/common/sched/arinc653.c
> index 7bf288264c..1615f1bc46 100644
> --- a/xen/common/sched/arinc653.c
> +++ b/xen/common/sched/arinc653.c
> @@ -463,10 +463,11 @@ a653sched_free_udata(const struct scheduler *ops, void *priv)
>       if ( !is_idle_unit(av->unit) )
>           list_del(&av->list);
> 
> -    xfree(av);
>       update_schedule_units(ops);
> 
>       spin_unlock_irqrestore(&sched_priv->lock, flags);
> +
> +    xfree(av);
>   }
> 
> Can I hear your opinion on this?

Yes, this seems the right way to fix the issue.

Could you please send a proper patch (please have a look at [1] in case
you are unsure how a proper patch should look like)?


Juergen

[1] 
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/process/sending-patches.pandoc

Choi, Anderson March 17, 2025, 1:21 p.m. UTC | #2

Jürgen,

> On 17.03.25 06:07, Choi, Anderson wrote:
>> I'd like to report xen panic when shutting down an ARINC653 domain 
>> with the following setup. Note that this is only observed when 
>> CONFIG_DEBUG is enabled.
>> 
>> [Test environment]
>> Yocto release : 5.05
>> Xen release : 4.19 (hash = 026c9fa29716b0ff0f8b7c687908e71ba29cf239)
>> Target machine : QEMU ARM64
>> Number of physical CPUs : 4
>> 
>> [Xen config]
>> CONFIG_DEBUG = y
>> 
>> [CPU pool configuration files]
>> cpupool_arinc0.cfg
>> - name= "Pool-arinc0"
>> - sched="arinc653"
>> - cpus=["2"]
>> 
>> [Domain configuration file]
>> dom1.cfg
>> - vcpus = 1
>> - pool = "Pool-arinc0"
>> 
>> $ xl cpupool-cpu-remove Pool-0 2
>> $ xl cpupool-create -f cpupool_arinc0.cfg $ xl create dom1.cfg $ 
>> a653_sched -P Pool-arinc0 dom1:100
>> 
>> ** Wait for DOM1 to complete boot.**
>> 
>> $ xl shutdown dom1
>> 
>> [xen log] root@boeing-linux-ref:~# xl shutdown dom1 Shutting down 
>> domain 1 root@boeing-linux-ref:~# (XEN) Assertion '!in_irq() &&
>> (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at
>> common/xmalloc_tlsf.c:714 (XEN) ----[ Xen-4.19.1-pre  arm64  debug=y 
>> Tainted: I      ]---- (XEN) CPU:    2 (XEN) PC:     00000a000022d2b0
>> xfree+0x130/0x1a4 (XEN) LR:     00000a000022d2a4 (XEN) SP:    
>> 00008000fff77b50 (XEN) CPSR:   00000000200002c9 MODE:64-bit EL2h
>> (Hypervisor, handler) ... (XEN) Xen call trace: (XEN)   
>> [<00000a000022d2b0>] xfree+0x130/0x1a4 (PC) (XEN)   
>> [<00000a000022d2a4>] xfree+0x124/0x1a4 (LR) (XEN)   
>> [<00000a00002321f0>] arinc653.c#a653sched_free_udata+0x50/0xc4 (XEN)   
>> [<00000a0000241bc0>] core.c#sched_move_domain_cleanup+0x5c/0x80 (XEN)  
>>  [<00000a0000245328>] sched_move_domain+0x69c/0x70c (XEN)   
>> [<00000a000022f840>] cpupool.c#cpupool_move_domain_locked+0x38/0x70
>> (XEN)    [<00000a0000230f20>] cpupool_move_domain+0x34/0x54 (XEN)   
>> [<00000a0000206c40>] domain_kill+0xc0/0x15c (XEN)   
>> [<00000a000022e0d4>] do_domctl+0x904/0x12ec (XEN)   
>> [<00000a0000277a1c>] traps.c#do_trap_hypercall+0x1f4/0x288 (XEN)   
>> [<00000a0000279018>] do_trap_guest_sync+0x448/0x63c (XEN)   
>> [<00000a0000262c80>] entry.o#guest_sync_slowpath+0xa8/0xd8 (XEN) 
>> (XEN)
>> (XEN) **************************************** (XEN) Panic on CPU 2:
>> (XEN) Assertion '!in_irq() && (local_irq_is_enabled() ||
>> num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714 (XEN)
>> ****************************************
>> 
>> In commit 19049f8d (sched: fix locking in a653sched_free_vdata()), 
>> locking
> was introduced to prevent a race against the list manipulation but 
> leads to assertion failure when the ARINC 653 domain is shutdown.
>> 
>> I think this can be fixed by calling xfree() after
>> spin_unlock_irqrestore() as shown below.
>> 
>> xen/common/sched/arinc653.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-) diff --git 
>> a/xen/common/sched/arinc653.c b/xen/common/sched/arinc653.c index 
>> 7bf288264c..1615f1bc46 100644
>> --- a/xen/common/sched/arinc653.c
>> +++ b/xen/common/sched/arinc653.c
>> @@ -463,10 +463,11 @@ a653sched_free_udata(const struct scheduler 
>> *ops,
> void *priv)
>>       if ( !is_idle_unit(av->unit) )
>>           list_del(&av->list);
>> -    xfree(av);
>>       update_schedule_units(ops);
>>       
>>       spin_unlock_irqrestore(&sched_priv->lock, flags);
>> +
>> +    xfree(av);
>>   }
>> Can I hear your opinion on this?
> 
> Yes, this seems the right way to fix the issue.
> 
> Could you please send a proper patch (please have a look at [1] in 
> case you are unsure how a proper patch should look like)?
> 
> Juergen
> 
> [1]
> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/process/sending
> -
> patches.pandoc

Thanks for your opinion. Let me read through the link and submit the patch.

Regards,
Anderson

Andrew Cooper March 17, 2025, 1:29 p.m. UTC | #3

On 17/03/2025 1:21 pm, Choi, Anderson wrote:
> Jürgen,
>
>> On 17.03.25 06:07, Choi, Anderson wrote:
>>> I'd like to report xen panic when shutting down an ARINC653 domain 
>>> with the following setup. Note that this is only observed when 
>>> CONFIG_DEBUG is enabled.
>>>
>>> [Test environment]
>>> Yocto release : 5.05
>>> Xen release : 4.19 (hash = 026c9fa29716b0ff0f8b7c687908e71ba29cf239)
>>> Target machine : QEMU ARM64
>>> Number of physical CPUs : 4
>>>
>>> [Xen config]
>>> CONFIG_DEBUG = y
>>>
>>> [CPU pool configuration files]
>>> cpupool_arinc0.cfg
>>> - name= "Pool-arinc0"
>>> - sched="arinc653"
>>> - cpus=["2"]
>>>
>>> [Domain configuration file]
>>> dom1.cfg
>>> - vcpus = 1
>>> - pool = "Pool-arinc0"
>>>
>>> $ xl cpupool-cpu-remove Pool-0 2
>>> $ xl cpupool-create -f cpupool_arinc0.cfg $ xl create dom1.cfg $ 
>>> a653_sched -P Pool-arinc0 dom1:100
>>>
>>> ** Wait for DOM1 to complete boot.**
>>>
>>> $ xl shutdown dom1
>>>
>>> [xen log] root@boeing-linux-ref:~# xl shutdown dom1 Shutting down 
>>> domain 1 root@boeing-linux-ref:~# (XEN) Assertion '!in_irq() &&
>>> (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at
>>> common/xmalloc_tlsf.c:714 (XEN) ----[ Xen-4.19.1-pre  arm64  debug=y 
>>> Tainted: I      ]---- (XEN) CPU:    2 (XEN) PC:     00000a000022d2b0
>>> xfree+0x130/0x1a4 (XEN) LR:     00000a000022d2a4 (XEN) SP:    
>>> 00008000fff77b50 (XEN) CPSR:   00000000200002c9 MODE:64-bit EL2h
>>> (Hypervisor, handler) ... (XEN) Xen call trace: (XEN)   
>>> [<00000a000022d2b0>] xfree+0x130/0x1a4 (PC) (XEN)   
>>> [<00000a000022d2a4>] xfree+0x124/0x1a4 (LR) (XEN)   
>>> [<00000a00002321f0>] arinc653.c#a653sched_free_udata+0x50/0xc4 (XEN)   
>>> [<00000a0000241bc0>] core.c#sched_move_domain_cleanup+0x5c/0x80 (XEN)  
>>>  [<00000a0000245328>] sched_move_domain+0x69c/0x70c (XEN)   
>>> [<00000a000022f840>] cpupool.c#cpupool_move_domain_locked+0x38/0x70
>>> (XEN)    [<00000a0000230f20>] cpupool_move_domain+0x34/0x54 (XEN)   
>>> [<00000a0000206c40>] domain_kill+0xc0/0x15c (XEN)   
>>> [<00000a000022e0d4>] do_domctl+0x904/0x12ec (XEN)   
>>> [<00000a0000277a1c>] traps.c#do_trap_hypercall+0x1f4/0x288 (XEN)   
>>> [<00000a0000279018>] do_trap_guest_sync+0x448/0x63c (XEN)   
>>> [<00000a0000262c80>] entry.o#guest_sync_slowpath+0xa8/0xd8 (XEN) 
>>> (XEN)
>>> (XEN) **************************************** (XEN) Panic on CPU 2:
>>> (XEN) Assertion '!in_irq() && (local_irq_is_enabled() ||
>>> num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714 (XEN)
>>> ****************************************
>>>
>>> In commit 19049f8d (sched: fix locking in a653sched_free_vdata()), 
>>> locking
>> was introduced to prevent a race against the list manipulation but 
>> leads to assertion failure when the ARINC 653 domain is shutdown.
>>> I think this can be fixed by calling xfree() after
>>> spin_unlock_irqrestore() as shown below.
>>>
>>> xen/common/sched/arinc653.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-) diff --git 
>>> a/xen/common/sched/arinc653.c b/xen/common/sched/arinc653.c index 
>>> 7bf288264c..1615f1bc46 100644
>>> --- a/xen/common/sched/arinc653.c
>>> +++ b/xen/common/sched/arinc653.c
>>> @@ -463,10 +463,11 @@ a653sched_free_udata(const struct scheduler 
>>> *ops,
>> void *priv)
>>>       if ( !is_idle_unit(av->unit) )
>>>           list_del(&av->list);
>>> -    xfree(av);
>>>       update_schedule_units(ops);
>>>       
>>>       spin_unlock_irqrestore(&sched_priv->lock, flags);
>>> +
>>> +    xfree(av);
>>>   }
>>> Can I hear your opinion on this?
>> Yes, this seems the right way to fix the issue.
>>
>> Could you please send a proper patch (please have a look at [1] in 
>> case you are unsure how a proper patch should look like)?
>>
>> Juergen
>>
>> [1]
>> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/process/sending
>> -
>> patches.pandoc
> Thanks for your opinion. Let me read through the link and submit the patch.

Other good references are:

https://lore.kernel.org/xen-devel/20250313093157.30450-1-jgross@suse.com/
https://lore.kernel.org/xen-devel/d8c08c22-ee70-4c06-8fcd-ad44fc0dc58f@suse.com/

One you hopefully recognise, and the other is another bugfix to ARINC
noticed by the Coverity run over the weekend.

~Andrew

Jürgen Groß March 17, 2025, 2:41 p.m. UTC | #4

On 17.03.25 14:29, Andrew Cooper wrote:
> On 17/03/2025 1:21 pm, Choi, Anderson wrote:
>> Jürgen,
>>
>>> On 17.03.25 06:07, Choi, Anderson wrote:
>>>> I'd like to report xen panic when shutting down an ARINC653 domain
>>>> with the following setup. Note that this is only observed when
>>>> CONFIG_DEBUG is enabled.
>>>>
>>>> [Test environment]
>>>> Yocto release : 5.05
>>>> Xen release : 4.19 (hash = 026c9fa29716b0ff0f8b7c687908e71ba29cf239)
>>>> Target machine : QEMU ARM64
>>>> Number of physical CPUs : 4
>>>>
>>>> [Xen config]
>>>> CONFIG_DEBUG = y
>>>>
>>>> [CPU pool configuration files]
>>>> cpupool_arinc0.cfg
>>>> - name= "Pool-arinc0"
>>>> - sched="arinc653"
>>>> - cpus=["2"]
>>>>
>>>> [Domain configuration file]
>>>> dom1.cfg
>>>> - vcpus = 1
>>>> - pool = "Pool-arinc0"
>>>>
>>>> $ xl cpupool-cpu-remove Pool-0 2
>>>> $ xl cpupool-create -f cpupool_arinc0.cfg $ xl create dom1.cfg $
>>>> a653_sched -P Pool-arinc0 dom1:100
>>>>
>>>> ** Wait for DOM1 to complete boot.**
>>>>
>>>> $ xl shutdown dom1
>>>>
>>>> [xen log] root@boeing-linux-ref:~# xl shutdown dom1 Shutting down
>>>> domain 1 root@boeing-linux-ref:~# (XEN) Assertion '!in_irq() &&
>>>> (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at
>>>> common/xmalloc_tlsf.c:714 (XEN) ----[ Xen-4.19.1-pre  arm64  debug=y
>>>> Tainted: I      ]---- (XEN) CPU:    2 (XEN) PC:     00000a000022d2b0
>>>> xfree+0x130/0x1a4 (XEN) LR:     00000a000022d2a4 (XEN) SP:
>>>> 00008000fff77b50 (XEN) CPSR:   00000000200002c9 MODE:64-bit EL2h
>>>> (Hypervisor, handler) ... (XEN) Xen call trace: (XEN)
>>>> [<00000a000022d2b0>] xfree+0x130/0x1a4 (PC) (XEN)
>>>> [<00000a000022d2a4>] xfree+0x124/0x1a4 (LR) (XEN)
>>>> [<00000a00002321f0>] arinc653.c#a653sched_free_udata+0x50/0xc4 (XEN)
>>>> [<00000a0000241bc0>] core.c#sched_move_domain_cleanup+0x5c/0x80 (XEN)
>>>>   [<00000a0000245328>] sched_move_domain+0x69c/0x70c (XEN)
>>>> [<00000a000022f840>] cpupool.c#cpupool_move_domain_locked+0x38/0x70
>>>> (XEN)    [<00000a0000230f20>] cpupool_move_domain+0x34/0x54 (XEN)
>>>> [<00000a0000206c40>] domain_kill+0xc0/0x15c (XEN)
>>>> [<00000a000022e0d4>] do_domctl+0x904/0x12ec (XEN)
>>>> [<00000a0000277a1c>] traps.c#do_trap_hypercall+0x1f4/0x288 (XEN)
>>>> [<00000a0000279018>] do_trap_guest_sync+0x448/0x63c (XEN)
>>>> [<00000a0000262c80>] entry.o#guest_sync_slowpath+0xa8/0xd8 (XEN)
>>>> (XEN)
>>>> (XEN) **************************************** (XEN) Panic on CPU 2:
>>>> (XEN) Assertion '!in_irq() && (local_irq_is_enabled() ||
>>>> num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714 (XEN)
>>>> ****************************************
>>>>
>>>> In commit 19049f8d (sched: fix locking in a653sched_free_vdata()),
>>>> locking
>>> was introduced to prevent a race against the list manipulation but
>>> leads to assertion failure when the ARINC 653 domain is shutdown.
>>>> I think this can be fixed by calling xfree() after
>>>> spin_unlock_irqrestore() as shown below.
>>>>
>>>> xen/common/sched/arinc653.c | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-) diff --git
>>>> a/xen/common/sched/arinc653.c b/xen/common/sched/arinc653.c index
>>>> 7bf288264c..1615f1bc46 100644
>>>> --- a/xen/common/sched/arinc653.c
>>>> +++ b/xen/common/sched/arinc653.c
>>>> @@ -463,10 +463,11 @@ a653sched_free_udata(const struct scheduler
>>>> *ops,
>>> void *priv)
>>>>        if ( !is_idle_unit(av->unit) )
>>>>            list_del(&av->list);
>>>> -    xfree(av);
>>>>        update_schedule_units(ops);
>>>>        
>>>>        spin_unlock_irqrestore(&sched_priv->lock, flags);
>>>> +
>>>> +    xfree(av);
>>>>    }
>>>> Can I hear your opinion on this?
>>> Yes, this seems the right way to fix the issue.
>>>
>>> Could you please send a proper patch (please have a look at [1] in
>>> case you are unsure how a proper patch should look like)?
>>>
>>> Juergen
>>>
>>> [1]
>>> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/process/sending
>>> -
>>> patches.pandoc
>> Thanks for your opinion. Let me read through the link and submit the patch.
> 
> Other good references are:
> 
> https://lore.kernel.org/xen-devel/20250313093157.30450-1-jgross@suse.com/
> https://lore.kernel.org/xen-devel/d8c08c22-ee70-4c06-8fcd-ad44fc0dc58f@suse.com/
> 
> One you hopefully recognise, and the other is another bugfix to ARINC
> noticed by the Coverity run over the weekend.

Please note that the Coverity report is not about a real bug, but just
a latent one. As long as the arinc653 scheduler is supporting a single
physical cpu only, there is no real need for the lock when accessing
sched_priv->next_switch_time (the lock is thought to protect the list
of units/vcpus, not all the other fields of sched_priv).


Juergen

diff --git a/xen/common/sched/arinc653.c b/xen/common/sched/arinc653.c
index 7bf288264c..1615f1bc46 100644
--- a/xen/common/sched/arinc653.c
+++ b/xen/common/sched/arinc653.c
@@ -463,10 +463,11 @@  a653sched_free_udata(const struct scheduler *ops, void *priv)
     if ( !is_idle_unit(av->unit) )
         list_del(&av->list);

-    xfree(av);
     update_schedule_units(ops);

     spin_unlock_irqrestore(&sched_priv->lock, flags);
+
+    xfree(av);
 }

Can I hear your opinion on this?

Xen panic when shutting down ARINC653 cpupool

Commit Message

Comments

Patch