diff mbox

[RFC,v2] kvm: x86: ignore ioapic polarity

Message ID 20140228040617.GA22103@crash.ini.cmu.edu (mailing list archive)
State New, archived
Headers show

Commit Message

Gabriel L. Somlo Feb. 28, 2014, 4:06 a.m. UTC
Both QEMU and KVM have already accumulated a significant number of
optimizations based on the hard-coded assumption that ioapic polarity
will always use the ActiveHigh convention, where the logical and
physical states of level-triggered irq lines always match (i.e.,
active(asserted) == high == 1, inactive == low == 0). QEMU guests
are expected to follow directions given via ACPI and configure the
ioapic with polarity 0 (ActiveHigh). However, even when misbehaving
guests (e.g. OS X <= 10.9) set the ioapic polarity to 1 (ActiveLow),
QEMU will still use the ActiveHigh signaling convention when
interfacing with KVM.

This patch modifies KVM to completely ignore ioapic polarity as set by
the guest OS, enabling misbehaving guests to work alongside those which
comply with the ActiveHigh polarity specified by QEMU's ACPI tables.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu>
---

On Fri, Feb 28, 2014 at 12:31:27AM +0100, Paolo Bonzini wrote:
>>>This is a much better description.  Can you turn it into a patch to
>>>Documentation/virtual/kvm/api.txt and a more complete commit
>>>message?

OK, let me know what you all think of this version.

> If you change ACPI tables to ActiveLow, and leave the polarity
> inversion in hw/intc/ioapic.c, then it will matter.

I only changed ACPI to ActiveLow to cause Linux to misbehave in the
same way OS X does, it was a temporary/test hack. ACPI should stay
ActiveHigh to reflect the rest of QEMU/KVM, and misbehaving guest OSs
should ignore it at their own peril :)

> >(Hmmm, maybe this one of the reasons I never got OS X to boot without
> >-enable-kvm... I should look at the QEMU hw/intc/ioapic*.c, and see
> >if *it* cares about guest-configured polarity, and maybe get it to
> >*stop* caring :)

> Exactly my point. :)

So this patch gets the accelerated path to work regardless of how
well-behaved a guest OS may or may not be w.r.t. ACPI and polarity.
I'll try to find out whether it's possible to be *extra* nice to these
types of guests by also allowing them to ignore ACPI polarity when not
using acceleration :)

Thanks again,
  Gabriel

 Documentation/virtual/kvm/api.txt | 18 ++++++++++++++++++
 include/uapi/linux/kvm.h          |  1 +
 virt/kvm/ioapic.c                 |  1 -
 3 files changed, 19 insertions(+), 1 deletion(-)

Comments

Michael S. Tsirkin March 2, 2014, 2:55 p.m. UTC | #1
On Thu, Feb 27, 2014 at 11:06:17PM -0500, Gabriel L. Somlo wrote:
> Both QEMU and KVM have already accumulated a significant number of
> optimizations based on the hard-coded assumption that ioapic polarity
> will always use the ActiveHigh convention, where the logical and
> physical states of level-triggered irq lines always match (i.e.,
> active(asserted) == high == 1, inactive == low == 0). QEMU guests
> are expected to follow directions given via ACPI and configure the
> ioapic with polarity 0 (ActiveHigh). However, even when misbehaving
> guests (e.g. OS X <= 10.9) set the ioapic polarity to 1 (ActiveLow),
> QEMU will still use the ActiveHigh signaling convention when
> interfacing with KVM.
> 
> This patch modifies KVM to completely ignore ioapic polarity as set by
> the guest OS, enabling misbehaving guests to work alongside those which
> comply with the ActiveHigh polarity specified by QEMU's ACPI tables.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu>
> ---
> 
> On Fri, Feb 28, 2014 at 12:31:27AM +0100, Paolo Bonzini wrote:
> >>>This is a much better description.  Can you turn it into a patch to
> >>>Documentation/virtual/kvm/api.txt and a more complete commit
> >>>message?
> 
> OK, let me know what you all think of this version.

Looks good to me.

Acked-by: Michael S. Tsirkin <mst@redhat.com>



> > If you change ACPI tables to ActiveLow, and leave the polarity
> > inversion in hw/intc/ioapic.c, then it will matter.
> 
> I only changed ACPI to ActiveLow to cause Linux to misbehave in the
> same way OS X does, it was a temporary/test hack. ACPI should stay
> ActiveHigh to reflect the rest of QEMU/KVM, and misbehaving guest OSs
> should ignore it at their own peril :)

I actually think now that you've fixed intc.c as well, we should change
it as the next step.



> > >(Hmmm, maybe this one of the reasons I never got OS X to boot without
> > >-enable-kvm... I should look at the QEMU hw/intc/ioapic*.c, and see
> > >if *it* cares about guest-configured polarity, and maybe get it to
> > >*stop* caring :)
> 
> > Exactly my point. :)
> 
> So this patch gets the accelerated path to work regardless of how
> well-behaved a guest OS may or may not be w.r.t. ACPI and polarity.
> I'll try to find out whether it's possible to be *extra* nice to these
> types of guests by also allowing them to ignore ACPI polarity when not
> using acceleration :)
> 
> Thanks again,
>   Gabriel
> 
>  Documentation/virtual/kvm/api.txt | 18 ++++++++++++++++++
>  include/uapi/linux/kvm.h          |  1 +
>  virt/kvm/ioapic.c                 |  1 -
>  3 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 6cd63a9..0492a94 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1365,6 +1365,24 @@ struct kvm_irq_routing_msi {
>  	__u32 pad;
>  };
>  
> +NOTE: For each level-triggered interrupt managed by a virtual ioapic,
> +the guest OS may set a polarity value (bit 13 of each corresponding I/O
> +redirection table register). The polarity bit defines the relationship
> +between an irq line's logical state (active/asserted = 1, inactive = 0)
> +and its physical state (high = 1, low = 0). When the polarity bit is 0
> +(ActiveHigh), logical and physical states are matched (active == high == 1,
> +inactive == low == 0). When polarity is set to 1 (ActiveLow), logical and
> +physical state are opposites (logical == !physical). Typically, guests are
> +expected to use the same polarity across all level-triggered interrupts,
> +as directed by ACPI. Historically, both KVM and QEMU have accumulated a
> +significant number of optimizations based on the hard-coded assumption
> +that polarity will always be set to ActiveHigh. When interfacing with KVM,
> +QEMU will use the ActiveHigh convention for all level-triggered irqs,
> +regardless of how the guest OS has configured the polarity bits in the
> +ioapic registers. As such, KVM must also ignore these bits, and always act
> +as if the logical and physical states of an irq line exactly match each
> +other (i.e., follow the ActiveHigh convention regardless of polarity).
> +
>  
>  4.53 KVM_ASSIGN_SET_MSIX_NR
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 932d7f2..5bfc0e6 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -675,6 +675,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_SPAPR_MULTITCE 94
>  #define KVM_CAP_EXT_EMUL_CPUID 95
>  #define KVM_CAP_HYPERV_TIME 96
> +#define KVM_CAP_X86_IOAPIC_POLARITY_IGNORED 97
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> index ce9ed99..1539d37 100644
> --- a/virt/kvm/ioapic.c
> +++ b/virt/kvm/ioapic.c
> @@ -328,7 +328,6 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id,
>  	irq_level = __kvm_irq_line_state(&ioapic->irq_states[irq],
>  					 irq_source_id, level);
>  	entry = ioapic->redirtbl[irq];
> -	irq_level ^= entry.fields.polarity;
>  	if (!irq_level) {
>  		ioapic->irr &= ~mask;
>  		ret = 1;
> -- 
> 1.8.1.4
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini March 13, 2014, 10:53 a.m. UTC | #2
Il 28/02/2014 05:06, Gabriel L. Somlo ha scritto:
> +NOTE: For each level-triggered interrupt managed by a virtual ioapic,
> +the guest OS may set a polarity value (bit 13 of each corresponding I/O
> +redirection table register). The polarity bit defines the relationship
> +between an irq line's logical state (active/asserted = 1, inactive = 0)
> +and its physical state (high = 1, low = 0). When the polarity bit is 0
> +(ActiveHigh), logical and physical states are matched (active == high == 1,
> +inactive == low == 0). When polarity is set to 1 (ActiveLow), logical and
> +physical state are opposites (logical == !physical). Typically, guests are
> +expected to use the same polarity across all level-triggered interrupts,
> +as directed by ACPI. Historically, both KVM and QEMU have accumulated a
> +significant number of optimizations based on the hard-coded assumption
> +that polarity will always be set to ActiveHigh. When interfacing with KVM,
> +QEMU will use the ActiveHigh convention for all level-triggered irqs,
> +regardless of how the guest OS has configured the polarity bits in the
> +ioapic registers. As such, KVM must also ignore these bits, and always act
> +as if the logical and physical states of an irq line exactly match each
> +other (i.e., follow the ActiveHigh convention regardless of polarity).
> +

Instead of this, I'm adding the following to the KVM_IRQ_LINE ioctl:

+On real hardware, interrupt pins can be active-low or active-high.  This
+does not matter for the level field of struct kvm_irq_level: 1 always
+means active (asserted), 0 means inactive (deasserted).
+
+x86 allows the operating system to program the interrupt polarity
+(active-low/active-high) for level-triggered interrupts, and KVM used
+to consider the polarity.  However, due to bitrot in the handling of
+active-low interrupts, the above convention is now valid on x86 too.
+This is signaled by KVM_CAP_X86_IOAPIC_POLARITY_IGNORED.  Userspace
+should not present interrupts to the guest as active-low unless this
+capability is present (or unless it is not using the in-kernel irqchip,
+of course).

and applying the patch to kvm/queue.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gabriel L. Somlo March 13, 2014, 1:43 p.m. UTC | #3
On Thu, Mar 13, 2014 at 11:53:04AM +0100, Paolo Bonzini wrote:
> Instead of this, I'm adding the following to the KVM_IRQ_LINE ioctl:
> 
> +On real hardware, interrupt pins can be active-low or active-high.  This
> +does not matter for the level field of struct kvm_irq_level: 1 always
> +means active (asserted), 0 means inactive (deasserted).
> +
> +x86 allows the operating system to program the interrupt polarity
> +(active-low/active-high) for level-triggered interrupts, and KVM used
> +to consider the polarity.  However, due to bitrot in the handling of
> +active-low interrupts, the above convention is now valid on x86 too.
> +This is signaled by KVM_CAP_X86_IOAPIC_POLARITY_IGNORED.  Userspace
> +should not present interrupts to the guest as active-low unless this
> +capability is present (or unless it is not using the in-kernel irqchip,
> +of course).
> 
> and applying the patch to kvm/queue.

Sounds great to me, thanks !

--Gabriel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 6cd63a9..0492a94 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1365,6 +1365,24 @@  struct kvm_irq_routing_msi {
 	__u32 pad;
 };
 
+NOTE: For each level-triggered interrupt managed by a virtual ioapic,
+the guest OS may set a polarity value (bit 13 of each corresponding I/O
+redirection table register). The polarity bit defines the relationship
+between an irq line's logical state (active/asserted = 1, inactive = 0)
+and its physical state (high = 1, low = 0). When the polarity bit is 0
+(ActiveHigh), logical and physical states are matched (active == high == 1,
+inactive == low == 0). When polarity is set to 1 (ActiveLow), logical and
+physical state are opposites (logical == !physical). Typically, guests are
+expected to use the same polarity across all level-triggered interrupts,
+as directed by ACPI. Historically, both KVM and QEMU have accumulated a
+significant number of optimizations based on the hard-coded assumption
+that polarity will always be set to ActiveHigh. When interfacing with KVM,
+QEMU will use the ActiveHigh convention for all level-triggered irqs,
+regardless of how the guest OS has configured the polarity bits in the
+ioapic registers. As such, KVM must also ignore these bits, and always act
+as if the logical and physical states of an irq line exactly match each
+other (i.e., follow the ActiveHigh convention regardless of polarity).
+
 
 4.53 KVM_ASSIGN_SET_MSIX_NR
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 932d7f2..5bfc0e6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -675,6 +675,7 @@  struct kvm_ppc_smmu_info {
 #define KVM_CAP_SPAPR_MULTITCE 94
 #define KVM_CAP_EXT_EMUL_CPUID 95
 #define KVM_CAP_HYPERV_TIME 96
+#define KVM_CAP_X86_IOAPIC_POLARITY_IGNORED 97
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index ce9ed99..1539d37 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -328,7 +328,6 @@  int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id,
 	irq_level = __kvm_irq_line_state(&ioapic->irq_states[irq],
 					 irq_source_id, level);
 	entry = ioapic->redirtbl[irq];
-	irq_level ^= entry.fields.polarity;
 	if (!irq_level) {
 		ioapic->irr &= ~mask;
 		ret = 1;