Message ID | 5CC6DEA80200007800229E9D@prv1-mh.provo.novell.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86: IRQ management adjustments | expand |
On Mon, Apr 29, 2019 at 05:23:20AM -0600, Jan Beulich wrote: > The cleanup IPI may get sent immediately before a CPU gets removed from > the online map. In such a case the IPI would get handled on the CPU > being offlined no earlier than in the interrupts disabled window after > fixup_irqs()' main loop. This is too late, however, because a possible > affinity change may incur the need for vector assignment, which will > fail when the IRQ's move cleanup count is still non-zero. > > To fix this > - record the set of CPUs the cleanup IPIs gets actually sent to alongside > setting their count, > - adjust the count in fixup_irqs(), accounting for all CPUs that the > cleanup IPI was sent to, but that are no longer online, > - bail early from the cleanup IPI handler when the CPU is no longer > online, to prevent double accounting. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Just as a note, this whole interrupt migration business seems extremely complex, and I wonder if Xen does really need it, or what's exactly it's performance gain compared to more simple solutions. I understand this is just fixes, but IMO it's making the logic even more complex. Maybe it would be simpler to have the interrupts hard-bound to pCPUs and instead have a soft-affinity on the guest vCPUs that are assigned as the destination? > --- > TBD: The proper recording of the IPI destinations actually makes the > move_cleanup_count field redundant. Do we want to drop it, at the > price of a few more CPU-mask operations? AFAICT this is not a hot path, so I would remove the move_cleanup_count field and just weight the cpu bitmap when needed. Thanks, Roger.
>>> On 03.05.19 at 17:21, <roger.pau@citrix.com> wrote: > On Mon, Apr 29, 2019 at 05:23:20AM -0600, Jan Beulich wrote: >> The cleanup IPI may get sent immediately before a CPU gets removed from >> the online map. In such a case the IPI would get handled on the CPU >> being offlined no earlier than in the interrupts disabled window after >> fixup_irqs()' main loop. This is too late, however, because a possible >> affinity change may incur the need for vector assignment, which will >> fail when the IRQ's move cleanup count is still non-zero. >> >> To fix this >> - record the set of CPUs the cleanup IPIs gets actually sent to alongside >> setting their count, >> - adjust the count in fixup_irqs(), accounting for all CPUs that the >> cleanup IPI was sent to, but that are no longer online, >> - bail early from the cleanup IPI handler when the CPU is no longer >> online, to prevent double accounting. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Thanks. > Just as a note, this whole interrupt migration business seems > extremely complex, and I wonder if Xen does really need it, or what's > exactly it's performance gain compared to more simple solutions. What more simple solutions would you think about? IRQ affinities tracking their assigned-vCPU ones was added largely to avoid high rate interrupts always arriving on a CPU other than the one where the actual handling will take place. Arguably this may go too far for low rate interrupts, but adding a respective heuristic would rather further complicate handling. > I understand this is just fixes, but IMO it's making the logic even more > complex. > > Maybe it would be simpler to have the interrupts hard-bound to pCPUs > and instead have a soft-affinity on the guest vCPUs that are assigned > as the destination? How would the soft affinity of a vCPU be calculated that has multiple IRQs (with at most partially overlapping affinities) to be serviced by it? >> --- >> TBD: The proper recording of the IPI destinations actually makes the >> move_cleanup_count field redundant. Do we want to drop it, at the >> price of a few more CPU-mask operations? > > AFAICT this is not a hot path, so I would remove the > move_cleanup_count field and just weight the cpu bitmap when needed. Added for v2 (pending successful testing). Jan
>>> On 03.05.19 at 17:21, <roger.pau@citrix.com> wrote: > On Mon, Apr 29, 2019 at 05:23:20AM -0600, Jan Beulich wrote: >> --- >> TBD: The proper recording of the IPI destinations actually makes the >> move_cleanup_count field redundant. Do we want to drop it, at the >> price of a few more CPU-mask operations? > > AFAICT this is not a hot path, so I would remove the > move_cleanup_count field and just weight the cpu bitmap when needed. FTR: It's not fully redundant - the patch removing it that I had added was actually the reason for seeing the ASSERT() trigger that you did ask to add in patch 1. The reason is that the field serves as a flag for irq_move_cleanup_interrupt() whether to act on an IRQ in the first place. Without it, the function will prematurely clean up the vector, thus confusing subsequent code trying to do the cleanup when it's actually due. Jan
On Tue, May 07, 2019 at 01:28:36AM -0600, Jan Beulich wrote: > >>> On 03.05.19 at 17:21, <roger.pau@citrix.com> wrote: > > On Mon, Apr 29, 2019 at 05:23:20AM -0600, Jan Beulich wrote: > >> --- > >> TBD: The proper recording of the IPI destinations actually makes the > >> move_cleanup_count field redundant. Do we want to drop it, at the > >> price of a few more CPU-mask operations? > > > > AFAICT this is not a hot path, so I would remove the > > move_cleanup_count field and just weight the cpu bitmap when needed. > > FTR: It's not fully redundant - the patch removing it that I had > added was actually the reason for seeing the ASSERT() trigger > that you did ask to add in patch 1. The reason is that the field > serves as a flag for irq_move_cleanup_interrupt() whether to > act on an IRQ in the first place. Without it, the function will > prematurely clean up the vector, thus confusing subsequent > code trying to do the cleanup when it's actually due. So weighing desc->arch.old_cpu_mask is not equivalent to move_cleanup_count? Thanks, Roger.
>>> On 07.05.19 at 10:12, <roger.pau@citrix.com> wrote: > On Tue, May 07, 2019 at 01:28:36AM -0600, Jan Beulich wrote: >> >>> On 03.05.19 at 17:21, <roger.pau@citrix.com> wrote: >> > On Mon, Apr 29, 2019 at 05:23:20AM -0600, Jan Beulich wrote: >> >> --- >> >> TBD: The proper recording of the IPI destinations actually makes the >> >> move_cleanup_count field redundant. Do we want to drop it, at the >> >> price of a few more CPU-mask operations? >> > >> > AFAICT this is not a hot path, so I would remove the >> > move_cleanup_count field and just weight the cpu bitmap when needed. >> >> FTR: It's not fully redundant - the patch removing it that I had >> added was actually the reason for seeing the ASSERT() trigger >> that you did ask to add in patch 1. The reason is that the field >> serves as a flag for irq_move_cleanup_interrupt() whether to >> act on an IRQ in the first place. Without it, the function will >> prematurely clean up the vector, thus confusing subsequent >> code trying to do the cleanup when it's actually due. > > So weighing desc->arch.old_cpu_mask is not equivalent to > move_cleanup_count? Not exactly, no: While the field gets set from the cpumask_weight() result, it matter _when_ that happens. Prior to that point, what bits are set in the mask is of no interest to irq_move_cleanup_interrupt(). Jan
--- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -658,6 +658,9 @@ void irq_move_cleanup_interrupt(struct c ack_APIC_irq(); me = smp_processor_id(); + if ( !cpu_online(me) ) + return; + for ( vector = FIRST_DYNAMIC_VECTOR; vector <= LAST_HIPRIORITY_VECTOR; vector++) { @@ -717,11 +720,14 @@ unlock: static void send_cleanup_vector(struct irq_desc *desc) { - cpumask_t cleanup_mask; + cpumask_and(desc->arch.old_cpu_mask, desc->arch.old_cpu_mask, + &cpu_online_map); + desc->arch.move_cleanup_count = cpumask_weight(desc->arch.old_cpu_mask); - cpumask_and(&cleanup_mask, desc->arch.old_cpu_mask, &cpu_online_map); - desc->arch.move_cleanup_count = cpumask_weight(&cleanup_mask); - send_IPI_mask(&cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR); + if ( desc->arch.move_cleanup_count ) + send_IPI_mask(desc->arch.old_cpu_mask, IRQ_MOVE_CLEANUP_VECTOR); + else + release_old_vec(desc); desc->arch.move_in_progress = 0; } @@ -2394,6 +2400,16 @@ void fixup_irqs(const cpumask_t *mask, b vector <= LAST_HIPRIORITY_VECTOR ) cpumask_and(desc->arch.cpu_mask, desc->arch.cpu_mask, mask); + if ( desc->arch.move_cleanup_count ) + { + /* The cleanup IPI may have got sent while we were still online. */ + cpumask_andnot(&affinity, desc->arch.old_cpu_mask, + &cpu_online_map); + desc->arch.move_cleanup_count -= cpumask_weight(&affinity); + if ( !desc->arch.move_cleanup_count ) + release_old_vec(desc); + } + cpumask_copy(&affinity, desc->affinity); if ( !desc->action || cpumask_subset(&affinity, mask) ) {
The cleanup IPI may get sent immediately before a CPU gets removed from the online map. In such a case the IPI would get handled on the CPU being offlined no earlier than in the interrupts disabled window after fixup_irqs()' main loop. This is too late, however, because a possible affinity change may incur the need for vector assignment, which will fail when the IRQ's move cleanup count is still non-zero. To fix this - record the set of CPUs the cleanup IPIs gets actually sent to alongside setting their count, - adjust the count in fixup_irqs(), accounting for all CPUs that the cleanup IPI was sent to, but that are no longer online, - bail early from the cleanup IPI handler when the CPU is no longer online, to prevent double accounting. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- TBD: The proper recording of the IPI destinations actually makes the move_cleanup_count field redundant. Do we want to drop it, at the price of a few more CPU-mask operations?