diff mbox series

[v4] page_alloc: assert IRQs are enabled in heap alloc/free

Message ID 20220425132801.1076759-1-dvrabel@cantab.net (mailing list archive)
State New, archived
Headers show
Series [v4] page_alloc: assert IRQs are enabled in heap alloc/free | expand

Commit Message

David Vrabel April 25, 2022, 1:28 p.m. UTC
From: David Vrabel <dvrabel@amazon.co.uk>

Heap pages can only be safely allocated and freed with interuupts
enabled as they may require a TLB flush which will send IPIs (on x86).

Normally spinlock debugging would catch calls from the incorrect
context, but not from stop_machine_run() action functions as these are
called with spin lock debugging disabled.

Enhance the assertions in alloc_xenheap_pages() and
alloc_domheap_pages() to check interrupts are enabled. For consistency
the same asserts are used when freeing heap pages.

As an exception, when only 1 PCPU is online, allocations are permitted
with interrupts disabled as any TLB flushes would be local only. This
is necessary during early boot.

Signed-off-by: David Vrabel <dvrabel@amazon.co.uk>
---
Changes in v4:
- Tweak comment.

Changes in v3:
- Use num_online_cpus() in assert.

Changes in v2:
- Set SYS_STATE_smp_boot on arm.
---
 xen/common/page_alloc.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

Comments

Jan Beulich April 25, 2022, 1:30 p.m. UTC | #1
On 25.04.2022 15:28, David Vrabel wrote:
> From: David Vrabel <dvrabel@amazon.co.uk>
> 
> Heap pages can only be safely allocated and freed with interuupts
> enabled as they may require a TLB flush which will send IPIs (on x86).
> 
> Normally spinlock debugging would catch calls from the incorrect
> context, but not from stop_machine_run() action functions as these are
> called with spin lock debugging disabled.
> 
> Enhance the assertions in alloc_xenheap_pages() and
> alloc_domheap_pages() to check interrupts are enabled. For consistency
> the same asserts are used when freeing heap pages.
> 
> As an exception, when only 1 PCPU is online, allocations are permitted
> with interrupts disabled as any TLB flushes would be local only. This
> is necessary during early boot.
> 
> Signed-off-by: David Vrabel <dvrabel@amazon.co.uk>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Julien Grall April 25, 2022, 1:34 p.m. UTC | #2
Hi David,

On 25/04/2022 14:28, David Vrabel wrote:
> From: David Vrabel <dvrabel@amazon.co.uk>
> 
> Heap pages can only be safely allocated and freed with interuupts
> enabled as they may require a TLB flush which will send IPIs (on x86).
> 
> Normally spinlock debugging would catch calls from the incorrect
> context, but not from stop_machine_run() action functions as these are
> called with spin lock debugging disabled.
> 
> Enhance the assertions in alloc_xenheap_pages() and
> alloc_domheap_pages() to check interrupts are enabled. For consistency
> the same asserts are used when freeing heap pages.
> 
> As an exception, when only 1 PCPU is online, allocations are permitted
> with interrupts disabled as any TLB flushes would be local only. This
> is necessary during early boot.
> 
> Signed-off-by: David Vrabel <dvrabel@amazon.co.uk>
> ---
> Changes in v4:
> - Tweak comment.
> 
> Changes in v3:
> - Use num_online_cpus() in assert.
> 
> Changes in v2:
> - Set SYS_STATE_smp_boot on arm.
> ---
>   xen/common/page_alloc.c | 23 +++++++++++++++--------
>   1 file changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 319029140f..739ca6e74b 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -162,6 +162,13 @@
>   static char __initdata opt_badpage[100] = "";
>   string_param("badpage", opt_badpage);
>   
> +/*
> + * Heap allocations may need TLB flushes which require IRQs to be

The comment needs to be updated to reflect the fact that at least Arm 
doesn't use IPI to flush TLBs.

The update can possibly be done on commit.

> + * enabled (except when only 1 PCPU is online).
> + */

Cheers,
Jan Beulich April 25, 2022, 1:37 p.m. UTC | #3
On 25.04.2022 15:34, Julien Grall wrote:
> On 25/04/2022 14:28, David Vrabel wrote:
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -162,6 +162,13 @@
>>   static char __initdata opt_badpage[100] = "";
>>   string_param("badpage", opt_badpage);
>>   
>> +/*
>> + * Heap allocations may need TLB flushes which require IRQs to be
> 
> The comment needs to be updated to reflect the fact that at least Arm 
> doesn't use IPI to flush TLBs.

I thought the use of "may" was satisfying your earlier request?

Jan
Julien Grall April 25, 2022, 1:43 p.m. UTC | #4
Hi Jan,

On 25/04/2022 14:37, Jan Beulich wrote:
> On 25.04.2022 15:34, Julien Grall wrote:
>> On 25/04/2022 14:28, David Vrabel wrote:
>>> --- a/xen/common/page_alloc.c
>>> +++ b/xen/common/page_alloc.c
>>> @@ -162,6 +162,13 @@
>>>    static char __initdata opt_badpage[100] = "";
>>>    string_param("badpage", opt_badpage);
>>>    
>>> +/*
>>> + * Heap allocations may need TLB flushes which require IRQs to be
>>
>> The comment needs to be updated to reflect the fact that at least Arm
>> doesn't use IPI to flush TLBs.
> 
> I thought the use of "may" was satisfying your earlier request?

Maybe I read wrongly this comment... To me, anything after 'which' is 
optional (it is a non-defining clause) and describe how the TLB flushes 
works. So the 'may' here is referring to the possibility to use flush TLB.

Cheers,
Jan Beulich April 25, 2022, 1:44 p.m. UTC | #5
On 25.04.2022 15:43, Julien Grall wrote:
> On 25/04/2022 14:37, Jan Beulich wrote:
>> On 25.04.2022 15:34, Julien Grall wrote:
>>> On 25/04/2022 14:28, David Vrabel wrote:
>>>> --- a/xen/common/page_alloc.c
>>>> +++ b/xen/common/page_alloc.c
>>>> @@ -162,6 +162,13 @@
>>>>    static char __initdata opt_badpage[100] = "";
>>>>    string_param("badpage", opt_badpage);
>>>>    
>>>> +/*
>>>> + * Heap allocations may need TLB flushes which require IRQs to be
>>>
>>> The comment needs to be updated to reflect the fact that at least Arm
>>> doesn't use IPI to flush TLBs.
>>
>> I thought the use of "may" was satisfying your earlier request?
> 
> Maybe I read wrongly this comment... To me, anything after 'which' is 
> optional (it is a non-defining clause) and describe how the TLB flushes 
> works. So the 'may' here is referring to the possibility to use flush TLB.

Oh, so you'd like to have a 2nd "may" inserted later in the sentence.

Jan
Julien Grall April 25, 2022, 1:44 p.m. UTC | #6
Hi,

On 25/04/2022 14:28, David Vrabel wrote:
> From: David Vrabel <dvrabel@amazon.co.uk>
> 
> Heap pages can only be safely allocated and freed with interuupts

The typo I pointed out in v3 has not been addressed. For reminder, this is:

s/interupts/interrupts/

Cheers,
Julien Grall April 25, 2022, 1:46 p.m. UTC | #7
Hi,

On 25/04/2022 14:44, Jan Beulich wrote:
> On 25.04.2022 15:43, Julien Grall wrote:
>> On 25/04/2022 14:37, Jan Beulich wrote:
>>> On 25.04.2022 15:34, Julien Grall wrote:
>>>> On 25/04/2022 14:28, David Vrabel wrote:
>>>>> --- a/xen/common/page_alloc.c
>>>>> +++ b/xen/common/page_alloc.c
>>>>> @@ -162,6 +162,13 @@
>>>>>     static char __initdata opt_badpage[100] = "";
>>>>>     string_param("badpage", opt_badpage);
>>>>>     
>>>>> +/*
>>>>> + * Heap allocations may need TLB flushes which require IRQs to be
>>>>
>>>> The comment needs to be updated to reflect the fact that at least Arm
>>>> doesn't use IPI to flush TLBs.
>>>
>>> I thought the use of "may" was satisfying your earlier request?
>>
>> Maybe I read wrongly this comment... To me, anything after 'which' is
>> optional (it is a non-defining clause) and describe how the TLB flushes
>> works. So the 'may' here is referring to the possibility to use flush TLB.
> 
> Oh, so you'd like to have a 2nd "may" inserted later in the sentence.

Yes. The first 'may' was already there and I suggested to add a second 
'may' in v3. But it didn't seem to have been added in both the commit 
message and here.

Cheers,
David Vrabel April 25, 2022, 2:13 p.m. UTC | #8
On 25/04/2022 14:43, Julien Grall wrote:
> Hi Jan,
> 
> On 25/04/2022 14:37, Jan Beulich wrote:
>> On 25.04.2022 15:34, Julien Grall wrote:
>>> On 25/04/2022 14:28, David Vrabel wrote:
>>>> --- a/xen/common/page_alloc.c
>>>> +++ b/xen/common/page_alloc.c
>>>> @@ -162,6 +162,13 @@
>>>>    static char __initdata opt_badpage[100] = "";
>>>>    string_param("badpage", opt_badpage);
>>>> +/*
>>>> + * Heap allocations may need TLB flushes which require IRQs to be
>>>
>>> The comment needs to be updated to reflect the fact that at least Arm
>>> doesn't use IPI to flush TLBs.
>>
>> I thought the use of "may" was satisfying your earlier request?
> 
> Maybe I read wrongly this comment... To me, anything after 'which' is 
> optional (it is a non-defining clause) and describe how the TLB flushes 
> works. So the 'may' here is referring to the possibility to use flush TLB.

Oh dear, you're using formal grammar with a native English speaker who's 
never seen a grammar rule in any of his schooling.

I think this should be:

"Heap allocations may need TLB flushes that require IRQs..."

i.e., "that" instead of "which"

David
Julien Grall April 25, 2022, 6:32 p.m. UTC | #9
Hi,

On 25/04/2022 15:13, David Vrabel wrote:
> 
> 
> On 25/04/2022 14:43, Julien Grall wrote:
>> Hi Jan,
>>
>> On 25/04/2022 14:37, Jan Beulich wrote:
>>> On 25.04.2022 15:34, Julien Grall wrote:
>>>> On 25/04/2022 14:28, David Vrabel wrote:
>>>>> --- a/xen/common/page_alloc.c
>>>>> +++ b/xen/common/page_alloc.c
>>>>> @@ -162,6 +162,13 @@
>>>>>    static char __initdata opt_badpage[100] = "";
>>>>>    string_param("badpage", opt_badpage);
>>>>> +/*
>>>>> + * Heap allocations may need TLB flushes which require IRQs to be
>>>>
>>>> The comment needs to be updated to reflect the fact that at least Arm
>>>> doesn't use IPI to flush TLBs.
>>>
>>> I thought the use of "may" was satisfying your earlier request?
>>
>> Maybe I read wrongly this comment... To me, anything after 'which' is 
>> optional (it is a non-defining clause) and describe how the TLB 
>> flushes works. So the 'may' here is referring to the possibility to 
>> use flush TLB.
> 
> Oh dear, you're using formal grammar with a native English speaker who's 
> never seen a grammar rule in any of his schooling.
> 
> I think this should be:
> 
> "Heap allocations may need TLB flushes that require IRQs..."
> 
> i.e., "that" instead of "which"

I am fine with that.

Cheers,
Jan Beulich April 26, 2022, 8:36 a.m. UTC | #10
On 25.04.2022 20:32, Julien Grall wrote:
> On 25/04/2022 15:13, David Vrabel wrote:
>> On 25/04/2022 14:43, Julien Grall wrote:
>>> On 25/04/2022 14:37, Jan Beulich wrote:
>>>> On 25.04.2022 15:34, Julien Grall wrote:
>>>>> On 25/04/2022 14:28, David Vrabel wrote:
>>>>>> --- a/xen/common/page_alloc.c
>>>>>> +++ b/xen/common/page_alloc.c
>>>>>> @@ -162,6 +162,13 @@
>>>>>>    static char __initdata opt_badpage[100] = "";
>>>>>>    string_param("badpage", opt_badpage);
>>>>>> +/*
>>>>>> + * Heap allocations may need TLB flushes which require IRQs to be
>>>>>
>>>>> The comment needs to be updated to reflect the fact that at least Arm
>>>>> doesn't use IPI to flush TLBs.
>>>>
>>>> I thought the use of "may" was satisfying your earlier request?
>>>
>>> Maybe I read wrongly this comment... To me, anything after 'which' is 
>>> optional (it is a non-defining clause) and describe how the TLB 
>>> flushes works. So the 'may' here is referring to the possibility to 
>>> use flush TLB.
>>
>> Oh dear, you're using formal grammar with a native English speaker who's 
>> never seen a grammar rule in any of his schooling.
>>
>> I think this should be:
>>
>> "Heap allocations may need TLB flushes that require IRQs..."
>>
>> i.e., "that" instead of "which"
> 
> I am fine with that.

But that's still not necessarily correct. I've gone with adding the 2nd
"may".

Jan
Jan Beulich April 26, 2022, 2:01 p.m. UTC | #11
On 25.04.2022 15:28, David Vrabel wrote:
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -162,6 +162,13 @@
>  static char __initdata opt_badpage[100] = "";
>  string_param("badpage", opt_badpage);
>  
> +/*
> + * Heap allocations may need TLB flushes which require IRQs to be
> + * enabled (except when only 1 PCPU is online).
> + */
> +#define ASSERT_ALLOC_CONTEXT() \
> +    ASSERT(!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1))

At least one of these tightened assertions triggers on Arm, as per the
most recent smoke flight. I'm going to revert this for the time being.

Jan
Julien Grall April 26, 2022, 2:14 p.m. UTC | #12
Hi,

On 26/04/2022 15:01, Jan Beulich wrote:
> On 25.04.2022 15:28, David Vrabel wrote:
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -162,6 +162,13 @@
>>   static char __initdata opt_badpage[100] = "";
>>   string_param("badpage", opt_badpage);
>>   
>> +/*
>> + * Heap allocations may need TLB flushes which require IRQs to be
>> + * enabled (except when only 1 PCPU is online).
>> + */
>> +#define ASSERT_ALLOC_CONTEXT() \
>> +    ASSERT(!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1))
> 
> At least one of these tightened assertions triggers on Arm, as per the
> most recent smoke flight. I'm going to revert this for the time being.

 From the serial console [1]:

(XEN) Xen call trace:
(XEN)    [<0022a510>] alloc_xenheap_pages+0x120/0x150 (PC)
(XEN)    [<00000000>] 00000000 (LR)
(XEN)    [<002736ac>] arch/arm/mm.c#xen_pt_update+0x144/0x6e4
(XEN)    [<002740d4>] map_pages_to_xen+0x10/0x20
(XEN)    [<00236864>] __vmap+0x400/0x4a4
(XEN)    [<0026aee8>] 
arch/arm/alternative.c#__apply_alternatives_multi_stop+0x144/0x1ec
(XEN)    [<0022fe40>] stop_machine_run+0x23c/0x300
(XEN)    [<002c40c4>] apply_alternatives_all+0x34/0x5c
(XEN)    [<002ce3e8>] start_xen+0xcb8/0x1024
(XEN)    [<00200068>] arch/arm/arm32/head.o#primary_switched+0xc/0x1c

So we need to move out the vmap() from the 
__apply_alternatives_multi_stop() to apply_alternatives_all().

The patch below (only compile tested so far) should do the job. I will 
do further testing and confirm there are no other issue on Arm.

diff --git a/xen/arch/arm/alternative.c b/xen/arch/arm/alternative.c
index 237c4e564209..8004fc8a7d1a 100644
--- a/xen/arch/arm/alternative.c
+++ b/xen/arch/arm/alternative.c
@@ -170,7 +170,7 @@ static int __apply_alternatives(const struct 
alt_region *region,
   * We might be patching the stop_machine state machine, so implement a
   * really simple polling protocol here.
   */
-static int __apply_alternatives_multi_stop(void *unused)
+static int __apply_alternatives_multi_stop(void *xenmap)
  {
      static int patched = 0;

@@ -185,22 +185,9 @@ static int __apply_alternatives_multi_stop(void 
*unused)
      {
          int ret;
          struct alt_region region;
-        mfn_t xen_mfn = virt_to_mfn(_start);
-        paddr_t xen_size = _end - _start;
-        unsigned int xen_order = get_order_from_bytes(xen_size);
-        void *xenmap;

          BUG_ON(patched);

-        /*
-         * The text and inittext section are read-only. So re-map Xen to
-         * be able to patch the code.
-         */
-        xenmap = __vmap(&xen_mfn, 1U << xen_order, 1, 1, PAGE_HYPERVISOR,
-                        VMAP_DEFAULT);
-        /* Re-mapping Xen is not expected to fail during boot. */
-        BUG_ON(!xenmap);
-
          region.begin = __alt_instructions;
          region.end = __alt_instructions_end;

@@ -208,8 +195,6 @@ static int __apply_alternatives_multi_stop(void *unused)
          /* The patching is not expected to fail during boot. */
          BUG_ON(ret != 0);

-        vunmap(xenmap);
-
          /* Barriers provided by the cache flushing */
          write_atomic(&patched, 1);
      }
@@ -224,14 +209,29 @@ static int __apply_alternatives_multi_stop(void 
*unused)
  void __init apply_alternatives_all(void)
  {
      int ret;
+    mfn_t xen_mfn = virt_to_mfn(_start);
+    paddr_t xen_size = _end - _start;
+    unsigned int xen_order = get_order_from_bytes(xen_size);
+    void *xenmap;

      ASSERT(system_state != SYS_STATE_active);

+    /*
+     * The text and inittext section are read-only. So re-map Xen to
+     * be able to patch the code.
+     */
+    xenmap = __vmap(&xen_mfn, 1U << xen_order, 1, 1, PAGE_HYPERVISOR,
+                    VMAP_DEFAULT);
+    /* Re-mapping Xen is not expected to fail during boot. */
+    BUG_ON(!xenmap);
+
  	/* better not try code patching on a live SMP system */
      ret = stop_machine_run(__apply_alternatives_multi_stop, NULL, 
NR_CPUS);

      /* stop_machine_run should never fail at this stage of the boot */
      BUG_ON(ret);
+
+    vunmap(xenmap);
  }

  int apply_alternatives(const struct alt_instr *start, const struct 
alt_instr *end)

Cheers,

[1] 
http://logs.test-lab.xenproject.org/osstest/logs/169729/test-armhf-armhf-xl/info.html

> 
> Jan
>
David Vrabel April 26, 2022, 2:33 p.m. UTC | #13
On 26/04/2022 15:14, Julien Grall wrote:
> Hi,
> 
> On 26/04/2022 15:01, Jan Beulich wrote:
>> On 25.04.2022 15:28, David Vrabel wrote:
>>> --- a/xen/common/page_alloc.c
>>> +++ b/xen/common/page_alloc.c
>>> @@ -162,6 +162,13 @@
>>>   static char __initdata opt_badpage[100] = "";
>>>   string_param("badpage", opt_badpage);
>>> +/*
>>> + * Heap allocations may need TLB flushes which require IRQs to be
>>> + * enabled (except when only 1 PCPU is online).
>>> + */
>>> +#define ASSERT_ALLOC_CONTEXT() \
>>> +    ASSERT(!in_irq() && (local_irq_is_enabled() || num_online_cpus() 
>>> <= 1))
>>
>> At least one of these tightened assertions triggers on Arm, as per the
>> most recent smoke flight. I'm going to revert this for the time being.
> 
>  From the serial console [1]:
> 
> (XEN) Xen call trace:
> (XEN)    [<0022a510>] alloc_xenheap_pages+0x120/0x150 (PC)
> (XEN)    [<00000000>] 00000000 (LR)
> (XEN)    [<002736ac>] arch/arm/mm.c#xen_pt_update+0x144/0x6e4
> (XEN)    [<002740d4>] map_pages_to_xen+0x10/0x20
> (XEN)    [<00236864>] __vmap+0x400/0x4a4
> (XEN)    [<0026aee8>] 
> arch/arm/alternative.c#__apply_alternatives_multi_stop+0x144/0x1ec
> (XEN)    [<0022fe40>] stop_machine_run+0x23c/0x300

An allocation inside a stop_machine_run() action function. That is what 
the asserts were designed to catch.

I did try the run the GitLab CI pipelines but it is setup to use runners 
that are only available to the Xen Project group, so forking the repo 
doesn't work.

Can my (personal) GitLab be added as a Developer to the Xen Project 
group? I think this is the intended way for people to run the CI 
pipelines on their own branches.

David
Andrew Cooper April 26, 2022, 2:35 p.m. UTC | #14
On 26/04/2022 15:33, David Vrabel wrote:
>
>
> On 26/04/2022 15:14, Julien Grall wrote:
>> Hi,
>>
>> On 26/04/2022 15:01, Jan Beulich wrote:
>>> On 25.04.2022 15:28, David Vrabel wrote:
>>>> --- a/xen/common/page_alloc.c
>>>> +++ b/xen/common/page_alloc.c
>>>> @@ -162,6 +162,13 @@
>>>>   static char __initdata opt_badpage[100] = "";
>>>>   string_param("badpage", opt_badpage);
>>>> +/*
>>>> + * Heap allocations may need TLB flushes which require IRQs to be
>>>> + * enabled (except when only 1 PCPU is online).
>>>> + */
>>>> +#define ASSERT_ALLOC_CONTEXT() \
>>>> +    ASSERT(!in_irq() && (local_irq_is_enabled() ||
>>>> num_online_cpus() <= 1))
>>>
>>> At least one of these tightened assertions triggers on Arm, as per the
>>> most recent smoke flight. I'm going to revert this for the time being.
>>
>>  From the serial console [1]:
>>
>> (XEN) Xen call trace:
>> (XEN)    [<0022a510>] alloc_xenheap_pages+0x120/0x150 (PC)
>> (XEN)    [<00000000>] 00000000 (LR)
>> (XEN)    [<002736ac>] arch/arm/mm.c#xen_pt_update+0x144/0x6e4
>> (XEN)    [<002740d4>] map_pages_to_xen+0x10/0x20
>> (XEN)    [<00236864>] __vmap+0x400/0x4a4
>> (XEN)    [<0026aee8>]
>> arch/arm/alternative.c#__apply_alternatives_multi_stop+0x144/0x1ec
>> (XEN)    [<0022fe40>] stop_machine_run+0x23c/0x300
>
> An allocation inside a stop_machine_run() action function. That is
> what the asserts were designed to catch.
>
> I did try the run the GitLab CI pipelines but it is setup to use
> runners that are only available to the Xen Project group, so forking
> the repo doesn't work.
>
> Can my (personal) GitLab be added as a Developer to the Xen Project
> group? I think this is the intended way for people to run the CI
> pipelines on their own branches.

It is.  Username?

~Andrew
Julien Grall April 26, 2022, 8:08 p.m. UTC | #15
On 26/04/2022 15:14, Julien Grall wrote:
> On 26/04/2022 15:01, Jan Beulich wrote:
>> On 25.04.2022 15:28, David Vrabel wrote:
>>> --- a/xen/common/page_alloc.c
>>> +++ b/xen/common/page_alloc.c
>>> @@ -162,6 +162,13 @@
>>>   static char __initdata opt_badpage[100] = "";
>>>   string_param("badpage", opt_badpage);
>>> +/*
>>> + * Heap allocations may need TLB flushes which require IRQs to be
>>> + * enabled (except when only 1 PCPU is online).
>>> + */
>>> +#define ASSERT_ALLOC_CONTEXT() \
>>> +    ASSERT(!in_irq() && (local_irq_is_enabled() || num_online_cpus() 
>>> <= 1))
>>
>> At least one of these tightened assertions triggers on Arm, as per the
>> most recent smoke flight. I'm going to revert this for the time being.
> 
>  From the serial console [1]:
> 
> (XEN) Xen call trace:
> (XEN)    [<0022a510>] alloc_xenheap_pages+0x120/0x150 (PC)
> (XEN)    [<00000000>] 00000000 (LR)
> (XEN)    [<002736ac>] arch/arm/mm.c#xen_pt_update+0x144/0x6e4
> (XEN)    [<002740d4>] map_pages_to_xen+0x10/0x20
> (XEN)    [<00236864>] __vmap+0x400/0x4a4
> (XEN)    [<0026aee8>] 
> arch/arm/alternative.c#__apply_alternatives_multi_stop+0x144/0x1ec
> (XEN)    [<0022fe40>] stop_machine_run+0x23c/0x300
> (XEN)    [<002c40c4>] apply_alternatives_all+0x34/0x5c
> (XEN)    [<002ce3e8>] start_xen+0xcb8/0x1024
> (XEN)    [<00200068>] arch/arm/arm32/head.o#primary_switched+0xc/0x1c

I have sent a formal patch:

https://lore.kernel.org/xen-devel/20220426200629.58921-1-julien@xen.org/
Cheers,
Stefano Stabellini April 26, 2022, 11:37 p.m. UTC | #16
On Tue, 26 Apr 2022, Andrew Cooper wrote:
> > Can my (personal) GitLab be added as a Developer to the Xen Project
> > group? I think this is the intended way for people to run the CI
> > pipelines on their own branches.
> 
> It is.  Username?

David, let us know if you have any issues with gitlab. Once added, you
should be able to trigger gitlab-ci runs, which include 3 ARM runtime
tests dom0 and dom0less. You should be able to see the failures with
your original patch and the failure being fixed with Julien's patch.
diff mbox series

Patch

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 319029140f..739ca6e74b 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -162,6 +162,13 @@ 
 static char __initdata opt_badpage[100] = "";
 string_param("badpage", opt_badpage);
 
+/*
+ * Heap allocations may need TLB flushes which require IRQs to be
+ * enabled (except when only 1 PCPU is online).
+ */
+#define ASSERT_ALLOC_CONTEXT() \
+    ASSERT(!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1))
+
 /*
  * no-bootscrub -> Free pages are not zeroed during boot.
  */
@@ -2160,7 +2167,7 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
 {
     struct page_info *pg;
 
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     pg = alloc_heap_pages(MEMZONE_XEN, MEMZONE_XEN,
                           order, memflags | MEMF_no_scrub, NULL);
@@ -2173,7 +2180,7 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
 
 void free_xenheap_pages(void *v, unsigned int order)
 {
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     if ( v == NULL )
         return;
@@ -2202,7 +2209,7 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     struct page_info *pg;
     unsigned int i;
 
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     if ( xenheap_bits && (memflags >> _MEMF_bits) > xenheap_bits )
         memflags &= ~MEMF_bits(~0U);
@@ -2224,7 +2231,7 @@  void free_xenheap_pages(void *v, unsigned int order)
     struct page_info *pg;
     unsigned int i;
 
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     if ( v == NULL )
         return;
@@ -2249,7 +2256,7 @@  void init_domheap_pages(paddr_t ps, paddr_t pe)
 {
     mfn_t smfn, emfn;
 
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     smfn = maddr_to_mfn(round_pgup(ps));
     emfn = maddr_to_mfn(round_pgdown(pe));
@@ -2369,7 +2376,7 @@  struct page_info *alloc_domheap_pages(
     unsigned int bits = memflags >> _MEMF_bits, zone_hi = NR_ZONES - 1;
     unsigned int dma_zone;
 
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     bits = domain_clamp_alloc_bitsize(memflags & MEMF_no_owner ? NULL : d,
                                       bits ? : (BITS_PER_LONG+PAGE_SHIFT));
@@ -2419,7 +2426,7 @@  void free_domheap_pages(struct page_info *pg, unsigned int order)
     unsigned int i;
     bool drop_dom_ref;
 
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     if ( unlikely(is_xen_heap_page(pg)) )
     {
@@ -2738,7 +2745,7 @@  int __init acquire_domstatic_pages(struct domain *d, mfn_t smfn,
 {
     struct page_info *pg;
 
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     pg = acquire_staticmem_pages(smfn, nr_mfns, memflags);
     if ( !pg )