Message ID | 20240610142043.11924-5-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/irq: fixes for CPU hot{,un}plug | expand |
On 10.06.2024 16:20, Roger Pau Monne wrote: > If external interrupts are using logical mode it's possible to have an overlap > between the current ->arch.cpu_mask and the provided mask (or TARGET_CPUS). If > that's the case avoid assigning a new vector and just move the interrupt to a > member of ->arch.cpu_mask that overlaps with the provided mask and is online. What I'm kind of missing here is an explanation of why what _assign_irq_vector() does to avoid unnecessary migration (very first conditional there) isn't sufficient. > --- a/xen/arch/x86/irq.c > +++ b/xen/arch/x86/irq.c > @@ -837,19 +837,38 @@ void cf_check irq_complete_move(struct irq_desc *desc) > > unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask) > { > - int ret; > - unsigned long flags; > cpumask_t dest_mask; > > if ( mask && !cpumask_intersects(mask, &cpu_online_map) ) > return BAD_APICID; > > - spin_lock_irqsave(&vector_lock, flags); > - ret = _assign_irq_vector(desc, mask ?: TARGET_CPUS); > - spin_unlock_irqrestore(&vector_lock, flags); > + /* > + * mask input set can contain CPUs that are not online. To decide whether > + * the interrupt needs to be migrated restrict the input mask to the CPUs > + * that are online. > + */ > + if ( mask ) > + cpumask_and(&dest_mask, mask, &cpu_online_map); > + else > + cpumask_copy(&dest_mask, TARGET_CPUS); Why once &cpu_online_map and once TARGET_CPUS? > - if ( ret < 0 ) > - return BAD_APICID; > + /* > + * Only move the interrupt if there are no CPUs left in ->arch.cpu_mask > + * that can handle it, otherwise just shuffle it around ->arch.cpu_mask > + * to an available destination. > + */ "an available destination" (singular) gives the impression that it's only ever going to be a single CPU. Yet cpu_mask_to_apicid_flat() and cpu_mask_to_apicid_x2apic_cluster() can produce sets of multiple CPUs. Therefore maybe "an available destination / the (sub)set of available destinations"? Or as that's getting longish "(an) available destination(s)"? > + if ( !cpumask_intersects(desc->arch.cpu_mask, &dest_mask) ) > + { > + int ret; > + unsigned long flags; > + > + spin_lock_irqsave(&vector_lock, flags); > + ret = _assign_irq_vector(desc, mask ?: TARGET_CPUS); Why not pass dest_mask here, as you now calculate that up front? The function will skip offline CPUs anyway. > @@ -862,6 +881,7 @@ unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask) > cpumask_copy(&dest_mask, desc->arch.cpu_mask); > } > cpumask_and(&dest_mask, &dest_mask, &cpu_online_map); > + ASSERT(!cpumask_empty(&dest_mask)); > > return cpu_mask_to_apicid(&dest_mask); I wonder whether the assertion wouldn't better live in cpu_mask_to_apicid() itself (the macro, not the backing functions). Jan
On Tue, Jun 11, 2024 at 12:20:33PM +0200, Jan Beulich wrote: > On 10.06.2024 16:20, Roger Pau Monne wrote: > > If external interrupts are using logical mode it's possible to have an overlap > > between the current ->arch.cpu_mask and the provided mask (or TARGET_CPUS). If > > that's the case avoid assigning a new vector and just move the interrupt to a > > member of ->arch.cpu_mask that overlaps with the provided mask and is online. > > What I'm kind of missing here is an explanation of why what _assign_irq_vector() > does to avoid unnecessary migration (very first conditional there) isn't > sufficient. Somehow I looked at that and think it wasn't enough, but now I cannot figure out why, so it might be just fine, and this patch is not needed. Let me test again and get back to you, for the time being ignore this patch. Thanks, Roger.
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 263e502bc0f6..306e7756112f 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -837,19 +837,38 @@ void cf_check irq_complete_move(struct irq_desc *desc) unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask) { - int ret; - unsigned long flags; cpumask_t dest_mask; if ( mask && !cpumask_intersects(mask, &cpu_online_map) ) return BAD_APICID; - spin_lock_irqsave(&vector_lock, flags); - ret = _assign_irq_vector(desc, mask ?: TARGET_CPUS); - spin_unlock_irqrestore(&vector_lock, flags); + /* + * mask input set can contain CPUs that are not online. To decide whether + * the interrupt needs to be migrated restrict the input mask to the CPUs + * that are online. + */ + if ( mask ) + cpumask_and(&dest_mask, mask, &cpu_online_map); + else + cpumask_copy(&dest_mask, TARGET_CPUS); - if ( ret < 0 ) - return BAD_APICID; + /* + * Only move the interrupt if there are no CPUs left in ->arch.cpu_mask + * that can handle it, otherwise just shuffle it around ->arch.cpu_mask + * to an available destination. + */ + if ( !cpumask_intersects(desc->arch.cpu_mask, &dest_mask) ) + { + int ret; + unsigned long flags; + + spin_lock_irqsave(&vector_lock, flags); + ret = _assign_irq_vector(desc, mask ?: TARGET_CPUS); + spin_unlock_irqrestore(&vector_lock, flags); + + if ( ret < 0 ) + return BAD_APICID; + } if ( mask ) { @@ -862,6 +881,7 @@ unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask) cpumask_copy(&dest_mask, desc->arch.cpu_mask); } cpumask_and(&dest_mask, &dest_mask, &cpu_online_map); + ASSERT(!cpumask_empty(&dest_mask)); return cpu_mask_to_apicid(&dest_mask); }
If external interrupts are using logical mode it's possible to have an overlap between the current ->arch.cpu_mask and the provided mask (or TARGET_CPUS). If that's the case avoid assigning a new vector and just move the interrupt to a member of ->arch.cpu_mask that overlaps with the provided mask and is online. While there also add an extra assert to ensure the mask containing the possible destinations is not empty before calling cpu_mask_to_apicid(), as at that point having an empty mask is not expected. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- xen/arch/x86/irq.c | 34 +++++++++++++++++++++++++++------- 1 file changed, 27 insertions(+), 7 deletions(-)