Message ID | 0396a0fa-e318-493a-9858-30f63304cc99@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC,DO-NOT-APPLY] x86/IRQ: don't pass offline CPU(s) to on_selected_cpus() | expand |
On Tue, Feb 11, 2025 at 10:38:41AM +0100, Jan Beulich wrote: > on_selected_cpus() asserts that it's only passed online CPUs. Between > __cpu_disable() removing a CPU from the online map and fixup_eoi() > cleaning up for the CPU being offlined there's a window, though, where > the CPU in question can still appear in an action's cpu_eoi_map. Remove > offline CPUs, as they're (supposed to be) taken care of by fixup_eoi(). > > Move the entire tail of desc_guest_eoi() into a new helper, thus > streamlining processinf also for other call sites when the sole ^ processing > remaining CPU is the local one. Move and split the assertion, to be a > functionally equivalent replacement for the earlier BUG_ON() in > __pirq_guest_unbind(). Note that we can't use scratch_cpumask there, in > particular in the context of a timer handler and with data held there > needing to stay intact across process_pending_softirqs(). I would probably add the crash backtrace to the commit message. > > Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > While this builds okay, for now I didn't even consider testing it: Both > aspects below (plus the ugly locking arrangement) speak against dealing > with the issue this way, imo. Cc-ing REST rather than just x86 for this > reason. Indeed, the locking is specially problematic here, both for being ugly, and because I assume EOI is a hot path. > RFC: Don't we need {get,put}_cpu_maps() around quite a few more uses of > cpu_online_map (e.g. _clear_irq_vector() when called from > destroy_irq(), or about every caller of smp_call_function())? Indeed, that's my understanding also. Quite a lot of users of cpu_online_mask likely requires wrapping around {get,put}_cpu_maps() calls, otherwise they very likely incur in TOCTOU races. > Roger > suggests using stop-machine context for CPU {on,off}lining, but I > seem to vaguely recall that there was a reason not to do so at > least for the offlining part. I would have to explore this more thoroughly, it does seems feasible on first sight, but devil (and bugs) is in the details. I don't think we want to go to the lengths of what you do below for quite a lot of users of cpu_online_mask. > > RFC: I doubt process_pending_softirqs() is okay to use from a timer > handler. (Callers of, in particular, {desc,pirq}_guest_eoi() would > also need checking for process_pending_softirqs() to be okay to use > in their contexts.) Yeah, I would be worry of doing softirq processing from this context, the more that (even if not the common case) I would be afraid of the unbounded latency that process_pending_softirqs() could introduce. > > --- unstable.orig/xen/arch/x86/irq.c 2021-04-08 11:10:47.000000000 +0200 > +++ unstable/xen/arch/x86/irq.c 2025-02-11 09:54:38.532567287 +0100 > @@ -6,6 +6,7 @@ > */ > > #include <xen/init.h> > +#include <xen/cpu.h> > #include <xen/delay.h> > #include <xen/errno.h> > #include <xen/event.h> > @@ -1183,7 +1184,7 @@ static inline void clear_pirq_eoi(struct > } > } > > -static void cf_check set_eoi_ready(void *data); > +static bool invoke_set_eoi_ready(struct irq_desc *desc, bool wait); > > static void cf_check irq_guest_eoi_timer_fn(void *data) > { > @@ -1224,18 +1225,13 @@ static void cf_check irq_guest_eoi_timer > > switch ( action->ack_type ) > { > - cpumask_t *cpu_eoi_map; > - > case ACKTYPE_UNMASK: > if ( desc->handler->end ) > desc->handler->end(desc, 0); > break; > > case ACKTYPE_EOI: > - cpu_eoi_map = this_cpu(scratch_cpumask); > - cpumask_copy(cpu_eoi_map, action->cpu_eoi_map); > - spin_unlock_irq(&desc->lock); > - on_selected_cpus(cpu_eoi_map, set_eoi_ready, desc, 0); > + invoke_set_eoi_ready(desc, false); > return; > } > > @@ -1468,6 +1464,49 @@ static void cf_check set_eoi_ready(void > flush_ready_eoi(); > } > > +/* > + * Locking is somewhat special here: In all cases the function is invoked with > + * desc's lock held. When @wait is true, the function tries to avoid unlocking. > + * It always returns whether the lock is still held. > + */ > +static bool invoke_set_eoi_ready(struct irq_desc *desc, bool wait) > +{ > + const irq_guest_action_t *action = guest_action(desc); > + cpumask_t cpu_eoi_map; > + > + cpumask_copy(&cpu_eoi_map, action->cpu_eoi_map); > + > + if ( __cpumask_test_and_clear_cpu(smp_processor_id(), &cpu_eoi_map) ) > + { > + ASSERT(action->ack_type == ACKTYPE_EOI); > + __set_eoi_ready(desc); > + spin_unlock(&desc->lock); > + flush_ready_eoi(); > + local_irq_enable(); > + } > + else if ( wait && cpumask_empty(&cpu_eoi_map) ) > + return true; > + else > + { > + ASSERT(action->ack_type == ACKTYPE_EOI); > + spin_unlock_irq(&desc->lock); > + } > + > + if ( cpumask_empty(&cpu_eoi_map) ) > + return false; > + > + while ( !get_cpu_maps() ) > + process_pending_softirqs(); > + > + cpumask_and(&cpu_eoi_map, &cpu_eoi_map, &cpu_online_map); > + if ( !cpumask_empty(&cpu_eoi_map) ) > + on_selected_cpus(&cpu_eoi_map, set_eoi_ready, desc, wait); > + > + put_cpu_maps(); > + > + return false; > +} > + > void pirq_guest_eoi(struct pirq *pirq) Possibly for pirq_guest_eoi() when called from the PHYSDEVOP_eoi hypercall we should propagate whether the EOI has been performed, and otherwise use an hypercall continuation to retry? As said above however, this approach is so ugly, and we would require similar adjustments in other places that also use cpu_online_maks that I would only consider going this route as last resort. Thanks, Roger.
--- unstable.orig/xen/arch/x86/irq.c 2021-04-08 11:10:47.000000000 +0200 +++ unstable/xen/arch/x86/irq.c 2025-02-11 09:54:38.532567287 +0100 @@ -6,6 +6,7 @@ */ #include <xen/init.h> +#include <xen/cpu.h> #include <xen/delay.h> #include <xen/errno.h> #include <xen/event.h> @@ -1183,7 +1184,7 @@ static inline void clear_pirq_eoi(struct } } -static void cf_check set_eoi_ready(void *data); +static bool invoke_set_eoi_ready(struct irq_desc *desc, bool wait); static void cf_check irq_guest_eoi_timer_fn(void *data) { @@ -1224,18 +1225,13 @@ static void cf_check irq_guest_eoi_timer switch ( action->ack_type ) { - cpumask_t *cpu_eoi_map; - case ACKTYPE_UNMASK: if ( desc->handler->end ) desc->handler->end(desc, 0); break; case ACKTYPE_EOI: - cpu_eoi_map = this_cpu(scratch_cpumask); - cpumask_copy(cpu_eoi_map, action->cpu_eoi_map); - spin_unlock_irq(&desc->lock); - on_selected_cpus(cpu_eoi_map, set_eoi_ready, desc, 0); + invoke_set_eoi_ready(desc, false); return; } @@ -1468,6 +1464,49 @@ static void cf_check set_eoi_ready(void flush_ready_eoi(); } +/* + * Locking is somewhat special here: In all cases the function is invoked with + * desc's lock held. When @wait is true, the function tries to avoid unlocking. + * It always returns whether the lock is still held. + */ +static bool invoke_set_eoi_ready(struct irq_desc *desc, bool wait) +{ + const irq_guest_action_t *action = guest_action(desc); + cpumask_t cpu_eoi_map; + + cpumask_copy(&cpu_eoi_map, action->cpu_eoi_map); + + if ( __cpumask_test_and_clear_cpu(smp_processor_id(), &cpu_eoi_map) ) + { + ASSERT(action->ack_type == ACKTYPE_EOI); + __set_eoi_ready(desc); + spin_unlock(&desc->lock); + flush_ready_eoi(); + local_irq_enable(); + } + else if ( wait && cpumask_empty(&cpu_eoi_map) ) + return true; + else + { + ASSERT(action->ack_type == ACKTYPE_EOI); + spin_unlock_irq(&desc->lock); + } + + if ( cpumask_empty(&cpu_eoi_map) ) + return false; + + while ( !get_cpu_maps() ) + process_pending_softirqs(); + + cpumask_and(&cpu_eoi_map, &cpu_eoi_map, &cpu_online_map); + if ( !cpumask_empty(&cpu_eoi_map) ) + on_selected_cpus(&cpu_eoi_map, set_eoi_ready, desc, wait); + + put_cpu_maps(); + + return false; +} + void pirq_guest_eoi(struct pirq *pirq) { struct irq_desc *desc; @@ -1481,7 +1520,6 @@ void pirq_guest_eoi(struct pirq *pirq) void desc_guest_eoi(struct irq_desc *desc, struct pirq *pirq) { irq_guest_action_t *action = guest_action(desc); - cpumask_t cpu_eoi_map; if ( unlikely(!action) || unlikely(!test_and_clear_bool(pirq->masked)) || @@ -1502,24 +1540,7 @@ void desc_guest_eoi(struct irq_desc *des return; } - ASSERT(action->ack_type == ACKTYPE_EOI); - - cpumask_copy(&cpu_eoi_map, action->cpu_eoi_map); - - if ( __cpumask_test_and_clear_cpu(smp_processor_id(), &cpu_eoi_map) ) - { - __set_eoi_ready(desc); - spin_unlock(&desc->lock); - flush_ready_eoi(); - local_irq_enable(); - } - else - { - spin_unlock_irq(&desc->lock); - } - - if ( !cpumask_empty(&cpu_eoi_map) ) - on_selected_cpus(&cpu_eoi_map, set_eoi_ready, desc, 0); + invoke_set_eoi_ready(desc, false); } int pirq_guest_unmask(struct domain *d) @@ -1734,7 +1755,6 @@ static irq_guest_action_t *__pirq_guest_ struct domain *d, struct pirq *pirq, struct irq_desc *desc) { irq_guest_action_t *action = guest_action(desc); - cpumask_t cpu_eoi_map; int i; if ( unlikely(action == NULL) ) @@ -1765,12 +1785,7 @@ static irq_guest_action_t *__pirq_guest_ if ( test_and_clear_bool(pirq->masked) && (--action->in_flight == 0) && (action->nr_guests != 0) ) - { - cpumask_copy(&cpu_eoi_map, action->cpu_eoi_map); - spin_unlock_irq(&desc->lock); - on_selected_cpus(&cpu_eoi_map, set_eoi_ready, desc, 0); - spin_lock_irq(&desc->lock); - } + invoke_set_eoi_ready(desc, false); break; } @@ -1796,14 +1811,8 @@ static irq_guest_action_t *__pirq_guest_ * would need to flush all ready EOIs before returning as otherwise the * desc->handler could change and we would call the wrong 'end' hook. */ - cpumask_copy(&cpu_eoi_map, action->cpu_eoi_map); - if ( !cpumask_empty(&cpu_eoi_map) ) - { - BUG_ON(action->ack_type != ACKTYPE_EOI); - spin_unlock_irq(&desc->lock); - on_selected_cpus(&cpu_eoi_map, set_eoi_ready, desc, 1); + if ( !invoke_set_eoi_ready(desc, true) ) spin_lock_irq(&desc->lock); - } BUG_ON(!cpumask_empty(action->cpu_eoi_map));
on_selected_cpus() asserts that it's only passed online CPUs. Between __cpu_disable() removing a CPU from the online map and fixup_eoi() cleaning up for the CPU being offlined there's a window, though, where the CPU in question can still appear in an action's cpu_eoi_map. Remove offline CPUs, as they're (supposed to be) taken care of by fixup_eoi(). Move the entire tail of desc_guest_eoi() into a new helper, thus streamlining processinf also for other call sites when the sole remaining CPU is the local one. Move and split the assertion, to be a functionally equivalent replacement for the earlier BUG_ON() in __pirq_guest_unbind(). Note that we can't use scratch_cpumask there, in particular in the context of a timer handler and with data held there needing to stay intact across process_pending_softirqs(). Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> --- While this builds okay, for now I didn't even consider testing it: Both aspects below (plus the ugly locking arrangement) speak against dealing with the issue this way, imo. Cc-ing REST rather than just x86 for this reason. RFC: Don't we need {get,put}_cpu_maps() around quite a few more uses of cpu_online_map (e.g. _clear_irq_vector() when called from destroy_irq(), or about every caller of smp_call_function())? Roger suggests using stop-machine context for CPU {on,off}lining, but I seem to vaguely recall that there was a reason not to do so at least for the offlining part. RFC: I doubt process_pending_softirqs() is okay to use from a timer handler. (Callers of, in particular, {desc,pirq}_guest_eoi() would also need checking for process_pending_softirqs() to be okay to use in their contexts.)