diff mbox series

[v2] page_alloc: assert IRQs are enabled in heap alloc/free

Message ID 20220421104305.878204-1-dvrabel@cantab.net (mailing list archive)
State Superseded
Headers show
Series [v2] page_alloc: assert IRQs are enabled in heap alloc/free | expand

Commit Message

David Vrabel April 21, 2022, 10:43 a.m. UTC
From: David Vrabel <dvrabel@amazon.co.uk>

Heap pages can only be safely allocated and freed with interuupts
enabled as they may require a TLB flush which will send IPIs.

Normally spinlock debugging would catch calls from the incorrect
context, but not from stop_machine_run() action functions as these are
called with spin lock debugging disabled.

Enhance the assertions in alloc_xenheap_pages() and
alloc_domheap_pages() to check interrupts are enabled. For consistency
the same asserts are used when freeing heap pages.

As an exception, during early boot when only 1 PCPU is online,
allocations are permitted with interrupts disabled. This required
setting the SYS_STATE_smp_boot system state on arm, to match x86.

Signed-off-by: David Vrabel <dvrabel@amazon.co.uk>
---
Changes in v2:
- Set SYS_STATE_smp_boot on arm.
---
 xen/arch/arm/setup.c    |  2 ++
 xen/common/page_alloc.c | 24 ++++++++++++++++--------
 2 files changed, 18 insertions(+), 8 deletions(-)

Comments

Jan Beulich April 21, 2022, 11:38 a.m. UTC | #1
On 21.04.2022 12:43, David Vrabel wrote:
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -984,6 +984,8 @@ void __init start_xen(unsigned long boot_phys_offset,
>  
>      console_init_postirq();
>  
> +    system_state = SYS_STATE_smp_boot
> +
>      do_presmp_initcalls();
>  
>      for_each_present_cpu ( i )

I'm afraid it's not this simple: There are two
"ASSERT(system_state != SYS_STATE_boot)" in Arm-specific code. While
both could in principle be left as is, I think both want modifying to
">= SYS_STATE_active", such that they would also trigger when in this
newly set state (in case registration of the notifiers was altered).

It also wants at least mentioning that setting this state is okay with
all uses of system_state in common code (where it's not impossible
that x86-isms still exist, having gone unnoticed so far), just to
indicate that all of these were actually inspected (there's just one
where it looks to be unobvious when simply looking at grep output, the
one in keyhandler.c). As a result this may want to be a separate,
prereq patch. At which point it will want considering whether to put
the setting of the state _in_ do_presmp_initcalls() instead of ahead
of its invocation.

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -162,6 +162,14 @@
>  static char __initdata opt_badpage[100] = "";
>  string_param("badpage", opt_badpage);
>  
> +/*
> + * Heap allocations may need TLB flushes which require IRQs to be
> + * enabled (except during early boot when only 1 PCPU is online).
> + */
> +#define ASSERT_ALLOC_CONTEXT()                                          \
> +    ASSERT(!in_irq() && (local_irq_is_enabled()                         \
> +                         || system_state < SYS_STATE_smp_boot))

Upon further consideration: In principle IRQs would be okay to be off
whenever we're in UP mode (and hence flush IPIs don't need sending).
Provided of course spin debug is off as well and no other IRQs-on
checks get in the way (like that in flush_area_mask()). This might be
more robust overall than depending on system_state, but I'm not going
to exclude there may also be arguments against doing so.

In any event, looking back at my v1 comment, it would have been nice
if the spinlock related aspect was at least also mentioned here, even
if - as you did say in reply - the uses of the new macro aren't fully
redundant with check_lock().

Also, nit: The || belongs on the earlier line as per our coding style.

Jan
David Vrabel April 21, 2022, 12:23 p.m. UTC | #2
On 21/04/2022 12:38, Jan Beulich wrote:
> On 21.04.2022 12:43, David Vrabel wrote:
>> --- a/xen/arch/arm/setup.c
>> +++ b/xen/arch/arm/setup.c
>> @@ -984,6 +984,8 @@ void __init start_xen(unsigned long boot_phys_offset,
>>   
>>       console_init_postirq();
>>   
>> +    system_state = SYS_STATE_smp_boot
>> +
>>       do_presmp_initcalls();
>>   
>>       for_each_present_cpu ( i )
> 
> I'm afraid it's not this simple: There are two
> "ASSERT(system_state != SYS_STATE_boot)" in Arm-specific code. While
> both could in principle be left as is, I think both want modifying to
> ">= SYS_STATE_active", such that they would also trigger when in this
> newly set state (in case registration of the notifiers was altered).

These asserts are already too-relaxed given that there's an early_boot 
state.

> It also wants at least mentioning that setting this state is okay with
> all uses of system_state in common code (where it's not impossible
> that x86-isms still exist, having gone unnoticed so far), just to
> indicate that all of these were actually inspected (there's just one
> where it looks to be unobvious when simply looking at grep output, the
> one in keyhandler.c). As a result this may want to be a separate,
> prereq patch. At which point it will want considering whether to put
> the setting of the state _in_ do_presmp_initcalls() instead of ahead
> of its invocation.

Not sure I understand this comment. The transition to the smp_boot state 
on arm makes the state machine on x86 and arm look _more_ alike, thus 
common code should be happier.

>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -162,6 +162,14 @@
>>   static char __initdata opt_badpage[100] = "";
>>   string_param("badpage", opt_badpage);
>>   
>> +/*
>> + * Heap allocations may need TLB flushes which require IRQs to be
>> + * enabled (except during early boot when only 1 PCPU is online).
>> + */
>> +#define ASSERT_ALLOC_CONTEXT()                                          \
>> +    ASSERT(!in_irq() && (local_irq_is_enabled()                         \
>> +                         || system_state < SYS_STATE_smp_boot))
> 
> Upon further consideration: In principle IRQs would be okay to be off
> whenever we're in UP mode (and hence flush IPIs don't need sending).
> Provided of course spin debug is off as well and no other IRQs-on
> checks get in the way (like that in flush_area_mask()). This might be
> more robust overall than depending on system_state, but I'm not going
> to exclude there may also be arguments against doing so.

Not sure I understand what you're suggesting here. Do you mean something 
like this?

#define ASSERT_ALLOC_CONTEXT()                                         \
     ASSERT(!in_irq() && (local_irq_is_enabled()                        \
                          || nr_online_cpus == 1))

> In any event, looking back at my v1 comment, it would have been nice
> if the spinlock related aspect was at least also mentioned here, even
> if - as you did say in reply - the uses of the new macro aren't fully
> redundant with check_lock().
> 
> Also, nit: The || belongs on the earlier line as per our coding style.

CODING_STYLE says: "Long lines should be split at sensible places and 
the trailing portions indented."

If you're going to have rules (that have, IMO[1], worse readability) 
please document them.

David

[1] Compare

a = b
     + dksaldksa_daskldsa_dsakdlsad
     + hds
     + dsadjka_jdaksjdk_daskajd;

and

a = b +
     dksaldksa_daskldsa_dsakdlsad +
     hds +
     dsadjka_jdaksjdk_daskajd;

Which one is more clearly readable as a sum?
Jan Beulich April 21, 2022, 12:51 p.m. UTC | #3
On 21.04.2022 14:23, David Vrabel wrote:
> On 21/04/2022 12:38, Jan Beulich wrote:
>> On 21.04.2022 12:43, David Vrabel wrote:
>>> --- a/xen/arch/arm/setup.c
>>> +++ b/xen/arch/arm/setup.c
>>> @@ -984,6 +984,8 @@ void __init start_xen(unsigned long boot_phys_offset,
>>>   
>>>       console_init_postirq();
>>>   
>>> +    system_state = SYS_STATE_smp_boot
>>> +
>>>       do_presmp_initcalls();
>>>   
>>>       for_each_present_cpu ( i )
>>
>> I'm afraid it's not this simple: There are two
>> "ASSERT(system_state != SYS_STATE_boot)" in Arm-specific code. While
>> both could in principle be left as is, I think both want modifying to
>> ">= SYS_STATE_active", such that they would also trigger when in this
>> newly set state (in case registration of the notifiers was altered).
> 
> These asserts are already too-relaxed given that there's an early_boot 
> state.

Indeed they are. But that's not an excuse to make them ignore yet one
more state.

>> It also wants at least mentioning that setting this state is okay with
>> all uses of system_state in common code (where it's not impossible
>> that x86-isms still exist, having gone unnoticed so far), just to
>> indicate that all of these were actually inspected (there's just one
>> where it looks to be unobvious when simply looking at grep output, the
>> one in keyhandler.c). As a result this may want to be a separate,
>> prereq patch. At which point it will want considering whether to put
>> the setting of the state _in_ do_presmp_initcalls() instead of ahead
>> of its invocation.
> 
> Not sure I understand this comment. The transition to the smp_boot state 
> on arm makes the state machine on x86 and arm look _more_ alike, thus 
> common code should be happier.

I agree that it _should_, but experience tells me that such expectations
do not hold.

>>> --- a/xen/common/page_alloc.c
>>> +++ b/xen/common/page_alloc.c
>>> @@ -162,6 +162,14 @@
>>>   static char __initdata opt_badpage[100] = "";
>>>   string_param("badpage", opt_badpage);
>>>   
>>> +/*
>>> + * Heap allocations may need TLB flushes which require IRQs to be
>>> + * enabled (except during early boot when only 1 PCPU is online).
>>> + */
>>> +#define ASSERT_ALLOC_CONTEXT()                                          \
>>> +    ASSERT(!in_irq() && (local_irq_is_enabled()                         \
>>> +                         || system_state < SYS_STATE_smp_boot))
>>
>> Upon further consideration: In principle IRQs would be okay to be off
>> whenever we're in UP mode (and hence flush IPIs don't need sending).
>> Provided of course spin debug is off as well and no other IRQs-on
>> checks get in the way (like that in flush_area_mask()). This might be
>> more robust overall than depending on system_state, but I'm not going
>> to exclude there may also be arguments against doing so.
> 
> Not sure I understand what you're suggesting here. Do you mean something 
> like this?
> 
> #define ASSERT_ALLOC_CONTEXT()                                         \
>      ASSERT(!in_irq() && (local_irq_is_enabled()                        \
>                           || nr_online_cpus == 1))

Yes, using num_online_cpus(). I'd like this to be at least considered.

>> In any event, looking back at my v1 comment, it would have been nice
>> if the spinlock related aspect was at least also mentioned here, even
>> if - as you did say in reply - the uses of the new macro aren't fully
>> redundant with check_lock().
>>
>> Also, nit: The || belongs on the earlier line as per our coding style.
> 
> CODING_STYLE says: "Long lines should be split at sensible places and 
> the trailing portions indented."
> 
> If you're going to have rules (that have, IMO[1], worse readability) 
> please document them.

Personally I, too, prefer operators at the start. But that's not how
Xen has been written, and this aspect didn't change in all the years,
no matter that ./CODING_STYLE doesn't explicitly say so (and it
doesn't state quite a few more rules that we try to abide to).

Jan

> [1] Compare
> 
> a = b
>      + dksaldksa_daskldsa_dsakdlsad
>      + hds
>      + dsadjka_jdaksjdk_daskajd;
> 
> and
> 
> a = b +
>      dksaldksa_daskldsa_dsakdlsad +
>      hds +
>      dsadjka_jdaksjdk_daskajd;
> 
> Which one is more clearly readable as a sum?
>
diff mbox series

Patch

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index d5d0792ed4..44d45f1449 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -984,6 +984,8 @@  void __init start_xen(unsigned long boot_phys_offset,
 
     console_init_postirq();
 
+    system_state = SYS_STATE_smp_boot
+
     do_presmp_initcalls();
 
     for_each_present_cpu ( i )
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 319029140f..e1ce38df13 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -162,6 +162,14 @@ 
 static char __initdata opt_badpage[100] = "";
 string_param("badpage", opt_badpage);
 
+/*
+ * Heap allocations may need TLB flushes which require IRQs to be
+ * enabled (except during early boot when only 1 PCPU is online).
+ */
+#define ASSERT_ALLOC_CONTEXT()                                          \
+    ASSERT(!in_irq() && (local_irq_is_enabled()                         \
+                         || system_state < SYS_STATE_smp_boot))
+
 /*
  * no-bootscrub -> Free pages are not zeroed during boot.
  */
@@ -2160,7 +2168,7 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
 {
     struct page_info *pg;
 
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     pg = alloc_heap_pages(MEMZONE_XEN, MEMZONE_XEN,
                           order, memflags | MEMF_no_scrub, NULL);
@@ -2173,7 +2181,7 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
 
 void free_xenheap_pages(void *v, unsigned int order)
 {
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     if ( v == NULL )
         return;
@@ -2202,7 +2210,7 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     struct page_info *pg;
     unsigned int i;
 
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     if ( xenheap_bits && (memflags >> _MEMF_bits) > xenheap_bits )
         memflags &= ~MEMF_bits(~0U);
@@ -2224,7 +2232,7 @@  void free_xenheap_pages(void *v, unsigned int order)
     struct page_info *pg;
     unsigned int i;
 
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     if ( v == NULL )
         return;
@@ -2249,7 +2257,7 @@  void init_domheap_pages(paddr_t ps, paddr_t pe)
 {
     mfn_t smfn, emfn;
 
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     smfn = maddr_to_mfn(round_pgup(ps));
     emfn = maddr_to_mfn(round_pgdown(pe));
@@ -2369,7 +2377,7 @@  struct page_info *alloc_domheap_pages(
     unsigned int bits = memflags >> _MEMF_bits, zone_hi = NR_ZONES - 1;
     unsigned int dma_zone;
 
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     bits = domain_clamp_alloc_bitsize(memflags & MEMF_no_owner ? NULL : d,
                                       bits ? : (BITS_PER_LONG+PAGE_SHIFT));
@@ -2419,7 +2427,7 @@  void free_domheap_pages(struct page_info *pg, unsigned int order)
     unsigned int i;
     bool drop_dom_ref;
 
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     if ( unlikely(is_xen_heap_page(pg)) )
     {
@@ -2738,7 +2746,7 @@  int __init acquire_domstatic_pages(struct domain *d, mfn_t smfn,
 {
     struct page_info *pg;
 
-    ASSERT(!in_irq());
+    ASSERT_ALLOC_CONTEXT();
 
     pg = acquire_staticmem_pages(smfn, nr_mfns, memflags);
     if ( !pg )