diff mbox series

[RFC,1/6] x86/ioapic: set disable hook for masking edge interrupts

Message ID 20220421132114.35118-2-roger.pau@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86/ioapic: fix edge triggered interrupt migration | expand

Commit Message

Roger Pau Monné April 21, 2022, 1:21 p.m. UTC
Allow disabling (masking) IO-APIC pins set to edge trigger mode.  This
is required in order to safely migrate such interrupts between CPUs,
as the write to update the IO-APIC RTE (or the IRTE) is not done
atomically, so there's a window where there's a mismatch between the
destination CPU and the vector:

(XEN) CPU1: No irq handler for vector b5 (IRQ -11, LAPIC)
(XEN) IRQ10 a=0002[0002,0008] v=bd[b5] t=IO-APIC-edge s=00000030

The main risk with masking edge triggered interrupts is losing them,
but getting them injected while in the process of updating the RTE is
equally harmful as the interrupts gets lost anyway.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/io_apic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jan Beulich June 3, 2022, 1:19 p.m. UTC | #1
On 21.04.2022 15:21, Roger Pau Monne wrote:
> Allow disabling (masking) IO-APIC pins set to edge trigger mode.  This
> is required in order to safely migrate such interrupts between CPUs,
> as the write to update the IO-APIC RTE (or the IRTE) is not done
> atomically,

For IRTE on VT-d we use cmpxchg16b if available (i.e. virtually always).

> so there's a window where there's a mismatch between the
> destination CPU and the vector:
> 
> (XEN) CPU1: No irq handler for vector b5 (IRQ -11, LAPIC)
> (XEN) IRQ10 a=0002[0002,0008] v=bd[b5] t=IO-APIC-edge s=00000030

I think this needs some further explanation, as we generally move
edge IRQs only when an un-acked interrupt is in flight (and hence
no further one can arrive).

Jan

> The main risk with masking edge triggered interrupts is losing them,
> but getting them injected while in the process of updating the RTE is
> equally harmful as the interrupts gets lost anyway.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  xen/arch/x86/io_apic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
> index c086f40f63..2e5964640b 100644
> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -1782,7 +1782,7 @@ static hw_irq_controller ioapic_edge_type = {
>      .startup 	= startup_edge_ioapic_irq,
>      .shutdown 	= irq_shutdown_none,
>      .enable 	= unmask_IO_APIC_irq,
> -    .disable 	= irq_disable_none,
> +    .disable 	= mask_IO_APIC_irq,
>      .ack 		= ack_edge_ioapic_irq,
>      .set_affinity 	= set_ioapic_affinity_irq,
>  };
Roger Pau Monné June 3, 2022, 2:53 p.m. UTC | #2
On Fri, Jun 03, 2022 at 03:19:34PM +0200, Jan Beulich wrote:
> On 21.04.2022 15:21, Roger Pau Monne wrote:
> > Allow disabling (masking) IO-APIC pins set to edge trigger mode.  This
> > is required in order to safely migrate such interrupts between CPUs,
> > as the write to update the IO-APIC RTE (or the IRTE) is not done
> > atomically,
> 
> For IRTE on VT-d we use cmpxchg16b if available (i.e. virtually always).
> 
> > so there's a window where there's a mismatch between the
> > destination CPU and the vector:
> > 
> > (XEN) CPU1: No irq handler for vector b5 (IRQ -11, LAPIC)
> > (XEN) IRQ10 a=0002[0002,0008] v=bd[b5] t=IO-APIC-edge s=00000030
> 
> I think this needs some further explanation, as we generally move
> edge IRQs only when an un-acked interrupt is in flight (and hence
> no further one can arrive).

A further one can arrive as soon as you modify either the vector or
the destination fields of the IO-APIC RTE, as then the non-EOI'ed
lapic vector is no longer there (because you have moved to a different
destination or vector).

This is the issue with updating the IO-APIC RTE using two separate
writes: even when using interrupt remapping the IRTE cannot be
atomically updated and there's a window where the interrupt is not
masked, but the destination and vector fields are not in sync, because
they reside in different parts of the RTE (destination is in the high
32bits, vector in the low 32bits part of the RTE).

Thanks, Roger.
Jan Beulich June 7, 2022, 8:02 a.m. UTC | #3
On 03.06.2022 16:53, Roger Pau Monné wrote:
> On Fri, Jun 03, 2022 at 03:19:34PM +0200, Jan Beulich wrote:
>> On 21.04.2022 15:21, Roger Pau Monne wrote:
>>> Allow disabling (masking) IO-APIC pins set to edge trigger mode.  This
>>> is required in order to safely migrate such interrupts between CPUs,
>>> as the write to update the IO-APIC RTE (or the IRTE) is not done
>>> atomically,
>>
>> For IRTE on VT-d we use cmpxchg16b if available (i.e. virtually always).
>>
>>> so there's a window where there's a mismatch between the
>>> destination CPU and the vector:
>>>
>>> (XEN) CPU1: No irq handler for vector b5 (IRQ -11, LAPIC)
>>> (XEN) IRQ10 a=0002[0002,0008] v=bd[b5] t=IO-APIC-edge s=00000030
>>
>> I think this needs some further explanation, as we generally move
>> edge IRQs only when an un-acked interrupt is in flight (and hence
>> no further one can arrive).
> 
> A further one can arrive as soon as you modify either the vector or
> the destination fields of the IO-APIC RTE, as then the non-EOI'ed
> lapic vector is no longer there (because you have moved to a different
> destination or vector).

Right - this is what I'm asking you to spell out in the description.

Jan

> This is the issue with updating the IO-APIC RTE using two separate
> writes: even when using interrupt remapping the IRTE cannot be
> atomically updated and there's a window where the interrupt is not
> masked, but the destination and vector fields are not in sync, because
> they reside in different parts of the RTE (destination is in the high
> 32bits, vector in the low 32bits part of the RTE).
> 
> Thanks, Roger.
>
diff mbox series

Patch

diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index c086f40f63..2e5964640b 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1782,7 +1782,7 @@  static hw_irq_controller ioapic_edge_type = {
     .startup 	= startup_edge_ioapic_irq,
     .shutdown 	= irq_shutdown_none,
     .enable 	= unmask_IO_APIC_irq,
-    .disable 	= irq_disable_none,
+    .disable 	= mask_IO_APIC_irq,
     .ack 		= ack_edge_ioapic_irq,
     .set_affinity 	= set_ioapic_affinity_irq,
 };