diff mbox

[4/5] KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered

Message ID 20171105135233.34572-5-nikita.leshchenko@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nikita Leshenko Nov. 5, 2017, 1:52 p.m. UTC
Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for
IOAPICs without an EOI register. They simulate the EOI message manually
by changing the trigger mode to edge and then back to level, with the
entry being masked during this.

QEMU implements this feature in commit ed1263c363c9
("ioapic: clear remote irr bit for edge-triggered interrupts")

As a side effect, this commit removes an incorrect behavior where Remote
IRR was cleared when the redirection table entry was rewritten. This is not
consistent with the manual and also opens an opportunity for a strange
behavior when a redirection table entry is modified from an interrupt
handler that handles the same entry: The modification will clear the
Remote IRR bit even though the interrupt handler is still running.

Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kvm/ioapic.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Wanpeng Li Nov. 6, 2017, 3:30 a.m. UTC | #1
2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
> Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for
> IOAPICs without an EOI register. They simulate the EOI message manually
> by changing the trigger mode to edge and then back to level, with the
> entry being masked during this.
>
> QEMU implements this feature in commit ed1263c363c9
> ("ioapic: clear remote irr bit for edge-triggered interrupts")
>
> As a side effect, this commit removes an incorrect behavior where Remote
> IRR was cleared when the redirection table entry was rewritten. This is not
> consistent with the manual and also opens an opportunity for a strange
> behavior when a redirection table entry is modified from an interrupt
> handler that handles the same entry: The modification will clear the
> Remote IRR bit even though the interrupt handler is still running.
>
> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>

> ---
>  arch/x86/kvm/ioapic.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 6df150eaaa78..163d340ee5f8 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -304,8 +304,17 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>                 } else {
>                         e->bits &= ~0xffffffffULL;
>                         e->bits |= (u32) val;
> -                       e->fields.remote_irr = 0;
>                 }
> +
> +               /*
> +                * Some OSes (Linux, Xen) assume that Remote IRR bit will
> +                * be cleared by IOAPIC hardware when the entry is configured
> +                * as edge-triggered. This behavior is used to simulate an
> +                * explicit EOI on IOAPICs that don't have the EOI register.
> +                */
> +               if (e->fields.trig_mode == IOAPIC_EDGE_TRIG)
> +                       e->fields.remote_irr = 0;
> +
>                 mask_after = e->fields.mask;
>                 if (mask_before != mask_after)
>                         kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after);
> --
> 2.13.3
>
Steve Rutherford Nov. 8, 2017, 12:16 a.m. UTC | #2
This is cleaner. Thanks for doing this. (Note that this is racy
without the read only remote irr change, so I have a slight preference
that swap the order of these patches.)
Reviewed-by: Steve Rutherford <srutherford@google.com>

On Sun, Nov 5, 2017 at 7:30 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
> 2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
>> Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for
>> IOAPICs without an EOI register. They simulate the EOI message manually
>> by changing the trigger mode to edge and then back to level, with the
>> entry being masked during this.
>>
>> QEMU implements this feature in commit ed1263c363c9
>> ("ioapic: clear remote irr bit for edge-triggered interrupts")
>>
>> As a side effect, this commit removes an incorrect behavior where Remote
>> IRR was cleared when the redirection table entry was rewritten. This is not
>> consistent with the manual and also opens an opportunity for a strange
>> behavior when a redirection table entry is modified from an interrupt
>> handler that handles the same entry: The modification will clear the
>> Remote IRR bit even though the interrupt handler is still running.
>>
>> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>
> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
>
>> ---
>>  arch/x86/kvm/ioapic.c | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> index 6df150eaaa78..163d340ee5f8 100644
>> --- a/arch/x86/kvm/ioapic.c
>> +++ b/arch/x86/kvm/ioapic.c
>> @@ -304,8 +304,17 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>>                 } else {
>>                         e->bits &= ~0xffffffffULL;
>>                         e->bits |= (u32) val;
>> -                       e->fields.remote_irr = 0;
>>                 }
>> +
>> +               /*
>> +                * Some OSes (Linux, Xen) assume that Remote IRR bit will
>> +                * be cleared by IOAPIC hardware when the entry is configured
>> +                * as edge-triggered. This behavior is used to simulate an
>> +                * explicit EOI on IOAPICs that don't have the EOI register.
>> +                */
>> +               if (e->fields.trig_mode == IOAPIC_EDGE_TRIG)
>> +                       e->fields.remote_irr = 0;
>> +
>>                 mask_after = e->fields.mask;
>>                 if (mask_before != mask_after)
>>                         kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after);
>> --
>> 2.13.3
>>
Nikita Leshenko Nov. 8, 2017, 9:52 a.m. UTC | #3
> On 8 Nov 2017, at 2:16, Steve Rutherford <srutherford@google.com> wrote:
> 
> This is cleaner. Thanks for doing this. (Note that this is racy
> without the read only remote irr change, so I have a slight preference
> that swap the order of these patches.)
> Reviewed-by: Steve Rutherford <srutherford@google.com>
> 
Thanks for your review.

Can you explain why this is racy?

I think that swapping the order of patches 4 (clearing remote irr) and
5 (making remote irr read only) will break Xen (and other guests that
do explicit EOI) between the patches. Even though the current code is
not semantically correct regarding remote irr behavior, it still makes
explicit EOI work. If we swap the patches and apply patch 5 first,
explicit EOI will break until we apply patch 4.

Nikita
> On Sun, Nov 5, 2017 at 7:30 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> 2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
>>> Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for
>>> IOAPICs without an EOI register. They simulate the EOI message manually
>>> by changing the trigger mode to edge and then back to level, with the
>>> entry being masked during this.
>>> 
>>> QEMU implements this feature in commit ed1263c363c9
>>> ("ioapic: clear remote irr bit for edge-triggered interrupts")
>>> 
>>> As a side effect, this commit removes an incorrect behavior where Remote
>>> IRR was cleared when the redirection table entry was rewritten. This is not
>>> consistent with the manual and also opens an opportunity for a strange
>>> behavior when a redirection table entry is modified from an interrupt
>>> handler that handles the same entry: The modification will clear the
>>> Remote IRR bit even though the interrupt handler is still running.
>>> 
>>> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> 
>> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> 
>>> ---
>>> arch/x86/kvm/ioapic.c | 11 ++++++++++-
>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>>> index 6df150eaaa78..163d340ee5f8 100644
>>> --- a/arch/x86/kvm/ioapic.c
>>> +++ b/arch/x86/kvm/ioapic.c
>>> @@ -304,8 +304,17 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>>>                } else {
>>>                        e->bits &= ~0xffffffffULL;
>>>                        e->bits |= (u32) val;
>>> -                       e->fields.remote_irr = 0;
>>>                }
>>> +
>>> +               /*
>>> +                * Some OSes (Linux, Xen) assume that Remote IRR bit will
>>> +                * be cleared by IOAPIC hardware when the entry is configured
>>> +                * as edge-triggered. This behavior is used to simulate an
>>> +                * explicit EOI on IOAPICs that don't have the EOI register.
>>> +                */
>>> +               if (e->fields.trig_mode == IOAPIC_EDGE_TRIG)
>>> +                       e->fields.remote_irr = 0;
>>> +
>>>                mask_after = e->fields.mask;
>>>                if (mask_before != mask_after)
>>>                        kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after);
>>> --
>>> 2.13.3
>>>
Steve Rutherford Nov. 8, 2017, 9:24 p.m. UTC | #4
If an eoi on one cpu for irq x races with a guest reconfiguring the
redir entry that points to irq x, the reconfiguration might writeback
the high remote irr value that it read out in the first place. This
would leave the remote irr stuck high since there is no pending eoi
waiting to clear that.

On Wed, Nov 8, 2017 at 1:52 AM, Nikita Leshchenko
<nikita.leshchenko@oracle.com> wrote:
>
>> On 8 Nov 2017, at 2:16, Steve Rutherford <srutherford@google.com> wrote:
>>
>> This is cleaner. Thanks for doing this. (Note that this is racy
>> without the read only remote irr change, so I have a slight preference
>> that swap the order of these patches.)
>> Reviewed-by: Steve Rutherford <srutherford@google.com>
>>
> Thanks for your review.
>
> Can you explain why this is racy?
>
> I think that swapping the order of patches 4 (clearing remote irr) and
> 5 (making remote irr read only) will break Xen (and other guests that
> do explicit EOI) between the patches. Even though the current code is
> not semantically correct regarding remote irr behavior, it still makes
> explicit EOI work. If we swap the patches and apply patch 5 first,
> explicit EOI will break until we apply patch 4.
>
> Nikita
>> On Sun, Nov 5, 2017 at 7:30 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>> 2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
>>>> Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for
>>>> IOAPICs without an EOI register. They simulate the EOI message manually
>>>> by changing the trigger mode to edge and then back to level, with the
>>>> entry being masked during this.
>>>>
>>>> QEMU implements this feature in commit ed1263c363c9
>>>> ("ioapic: clear remote irr bit for edge-triggered interrupts")
>>>>
>>>> As a side effect, this commit removes an incorrect behavior where Remote
>>>> IRR was cleared when the redirection table entry was rewritten. This is not
>>>> consistent with the manual and also opens an opportunity for a strange
>>>> behavior when a redirection table entry is modified from an interrupt
>>>> handler that handles the same entry: The modification will clear the
>>>> Remote IRR bit even though the interrupt handler is still running.
>>>>
>>>> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>>>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>>>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>>
>>> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>>
>>>> ---
>>>> arch/x86/kvm/ioapic.c | 11 ++++++++++-
>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>>>> index 6df150eaaa78..163d340ee5f8 100644
>>>> --- a/arch/x86/kvm/ioapic.c
>>>> +++ b/arch/x86/kvm/ioapic.c
>>>> @@ -304,8 +304,17 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>>>>                } else {
>>>>                        e->bits &= ~0xffffffffULL;
>>>>                        e->bits |= (u32) val;
>>>> -                       e->fields.remote_irr = 0;
>>>>                }
>>>> +
>>>> +               /*
>>>> +                * Some OSes (Linux, Xen) assume that Remote IRR bit will
>>>> +                * be cleared by IOAPIC hardware when the entry is configured
>>>> +                * as edge-triggered. This behavior is used to simulate an
>>>> +                * explicit EOI on IOAPICs that don't have the EOI register.
>>>> +                */
>>>> +               if (e->fields.trig_mode == IOAPIC_EDGE_TRIG)
>>>> +                       e->fields.remote_irr = 0;
>>>> +
>>>>                mask_after = e->fields.mask;
>>>>                if (mask_before != mask_after)
>>>>                        kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after);
>>>> --
>>>> 2.13.3
>>>>
>
Steve Rutherford Nov. 8, 2017, 9:25 p.m. UTC | #5
Since there are two issues competing here, there's no reason to reorder these.

On Wed, Nov 8, 2017 at 1:24 PM, Steve Rutherford <srutherford@google.com> wrote:
> If an eoi on one cpu for irq x races with a guest reconfiguring the
> redir entry that points to irq x, the reconfiguration might writeback
> the high remote irr value that it read out in the first place. This
> would leave the remote irr stuck high since there is no pending eoi
> waiting to clear that.
>
> On Wed, Nov 8, 2017 at 1:52 AM, Nikita Leshchenko
> <nikita.leshchenko@oracle.com> wrote:
>>
>>> On 8 Nov 2017, at 2:16, Steve Rutherford <srutherford@google.com> wrote:
>>>
>>> This is cleaner. Thanks for doing this. (Note that this is racy
>>> without the read only remote irr change, so I have a slight preference
>>> that swap the order of these patches.)
>>> Reviewed-by: Steve Rutherford <srutherford@google.com>
>>>
>> Thanks for your review.
>>
>> Can you explain why this is racy?
>>
>> I think that swapping the order of patches 4 (clearing remote irr) and
>> 5 (making remote irr read only) will break Xen (and other guests that
>> do explicit EOI) between the patches. Even though the current code is
>> not semantically correct regarding remote irr behavior, it still makes
>> explicit EOI work. If we swap the patches and apply patch 5 first,
>> explicit EOI will break until we apply patch 4.
>>
>> Nikita
>>> On Sun, Nov 5, 2017 at 7:30 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>>> 2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
>>>>> Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for
>>>>> IOAPICs without an EOI register. They simulate the EOI message manually
>>>>> by changing the trigger mode to edge and then back to level, with the
>>>>> entry being masked during this.
>>>>>
>>>>> QEMU implements this feature in commit ed1263c363c9
>>>>> ("ioapic: clear remote irr bit for edge-triggered interrupts")
>>>>>
>>>>> As a side effect, this commit removes an incorrect behavior where Remote
>>>>> IRR was cleared when the redirection table entry was rewritten. This is not
>>>>> consistent with the manual and also opens an opportunity for a strange
>>>>> behavior when a redirection table entry is modified from an interrupt
>>>>> handler that handles the same entry: The modification will clear the
>>>>> Remote IRR bit even though the interrupt handler is still running.
>>>>>
>>>>> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>>>>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>>>>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>>>
>>>> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>>>
>>>>> ---
>>>>> arch/x86/kvm/ioapic.c | 11 ++++++++++-
>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>>>>> index 6df150eaaa78..163d340ee5f8 100644
>>>>> --- a/arch/x86/kvm/ioapic.c
>>>>> +++ b/arch/x86/kvm/ioapic.c
>>>>> @@ -304,8 +304,17 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>>>>>                } else {
>>>>>                        e->bits &= ~0xffffffffULL;
>>>>>                        e->bits |= (u32) val;
>>>>> -                       e->fields.remote_irr = 0;
>>>>>                }
>>>>> +
>>>>> +               /*
>>>>> +                * Some OSes (Linux, Xen) assume that Remote IRR bit will
>>>>> +                * be cleared by IOAPIC hardware when the entry is configured
>>>>> +                * as edge-triggered. This behavior is used to simulate an
>>>>> +                * explicit EOI on IOAPICs that don't have the EOI register.
>>>>> +                */
>>>>> +               if (e->fields.trig_mode == IOAPIC_EDGE_TRIG)
>>>>> +                       e->fields.remote_irr = 0;
>>>>> +
>>>>>                mask_after = e->fields.mask;
>>>>>                if (mask_before != mask_after)
>>>>>                        kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after);
>>>>> --
>>>>> 2.13.3
>>>>>
>>
diff mbox

Patch

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 6df150eaaa78..163d340ee5f8 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -304,8 +304,17 @@  static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 		} else {
 			e->bits &= ~0xffffffffULL;
 			e->bits |= (u32) val;
-			e->fields.remote_irr = 0;
 		}
+
+		/*
+		 * Some OSes (Linux, Xen) assume that Remote IRR bit will
+		 * be cleared by IOAPIC hardware when the entry is configured
+		 * as edge-triggered. This behavior is used to simulate an
+		 * explicit EOI on IOAPICs that don't have the EOI register.
+		 */
+		if (e->fields.trig_mode == IOAPIC_EDGE_TRIG)
+			e->fields.remote_irr = 0;
+
 		mask_after = e->fields.mask;
 		if (mask_before != mask_after)
 			kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after);