diff mbox series

[v4] KVM: x86: ioapic: Optimize EOI handling to reduce unnecessary VM exits

Message ID 20250303052227.523411-1-zijie.wei@linux.alibaba.com (mailing list archive)
State New
Headers show
Series [v4] KVM: x86: ioapic: Optimize EOI handling to reduce unnecessary VM exits | expand

Commit Message

weizijie March 3, 2025, 5:22 a.m. UTC
Configuring IOAPIC routed interrupts triggers KVM to rescan all vCPU's
ioapic_handled_vectors which is used to control which vectors need to
trigger EOI-induced VMEXITs. If any interrupt is already in service on
some vCPU using some vector when the IOAPIC is being rescanned, the
vector is set to vCPU's ioapic_handled_vectors. If the vector is then
reused by other interrupts, each of them will cause a VMEXIT even it is
unnecessary. W/o further IOAPIC rescan, the vector remains set, and this
issue persists, impacting guest's interrupt performance.

Both

  commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")

and

  commit 0fc5a36dd6b3 ("KVM: x86: ioapic: Fix level-triggered EOI and
IOAPIC reconfigure race")

mentioned this issue, but it was considered as "rare" thus was not
addressed. However in real environment this issue can actually happen
in a well-behaved guest.

Simple Fix Proposal:
A straightforward solution is to record highest in-service IRQ that
is pending at the time of the last scan. Then, upon the next guest
exit, do a full KVM_REQ_SCAN_IOAPIC. This ensures that a re-scan of
the ioapic occurs only when the recorded vector is EOI'd, and
subsequently, the extra bits in the eoi_exit_bitmap are cleared,
avoiding unnecessary VM exits.

Co-developed-by: xuyun <xuyun_xy.xy@linux.alibaba.com>
Signed-off-by: xuyun <xuyun_xy.xy@linux.alibaba.com>
Signed-off-by: weizijie <zijie.wei@linux.alibaba.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/ioapic.c           | 10 ++++++++--
 arch/x86/kvm/irq_comm.c         |  9 +++++++--
 arch/x86/kvm/lapic.c            | 13 +++++++++++++
 4 files changed, 29 insertions(+), 4 deletions(-)

Comments

Sean Christopherson March 3, 2025, 5:55 p.m. UTC | #1
Several minor comments.  No need to post v5, I'll do so today as a small series
with preparatory patches to refactor and deduplicate the userspace vs. in-kernel
logic.

On Mon, Mar 03, 2025, weizijie wrote:
> Configuring IOAPIC routed interrupts triggers KVM to rescan all vCPU's
> ioapic_handled_vectors which is used to control which vectors need to
> trigger EOI-induced VMEXITs. If any interrupt is already in service on
> some vCPU using some vector when the IOAPIC is being rescanned, the
> vector is set to vCPU's ioapic_handled_vectors. If the vector is then
> reused by other interrupts, each of them will cause a VMEXIT even it is
> unnecessary. W/o further IOAPIC rescan, the vector remains set, and this
> issue persists, impacting guest's interrupt performance.
> 
> Both
> 
>   commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
> 
> and
> 
>   commit 0fc5a36dd6b3 ("KVM: x86: ioapic: Fix level-triggered EOI and
> IOAPIC reconfigure race")
> 
> mentioned this issue, but it was considered as "rare" thus was not
> addressed. However in real environment this issue can actually happen
> in a well-behaved guest.
> 
> Simple Fix Proposal:
> A straightforward solution is to record highest in-service IRQ that
> is pending at the time of the last scan. Then, upon the next guest
> exit, do a full KVM_REQ_SCAN_IOAPIC. This ensures that a re-scan of
> the ioapic occurs only when the recorded vector is EOI'd, and
> subsequently, the extra bits in the eoi_exit_bitmap are cleared,
> avoiding unnecessary VM exits.

Write changelogs as "commands", i.e. state what changes are actually being made,
as opposed to passively describing a proposed/possible change.

> Co-developed-by: xuyun <xuyun_xy.xy@linux.alibaba.com>
> Signed-off-by: xuyun <xuyun_xy.xy@linux.alibaba.com>
> Signed-off-by: weizijie <zijie.wei@linux.alibaba.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/ioapic.c           | 10 ++++++++--
>  arch/x86/kvm/irq_comm.c         |  9 +++++++--
>  arch/x86/kvm/lapic.c            | 13 +++++++++++++
>  4 files changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0b7af5902ff7..8c50e7b4a96f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1062,6 +1062,7 @@ struct kvm_vcpu_arch {
>  #if IS_ENABLED(CONFIG_HYPERV)
>  	hpa_t hv_root_tdp;
>  #endif
> +	u8 last_pending_vector;

To be consistent with how KVM handles IRQs, this should probably be an "int" with
-1 as the "no pending EOI" value.

I also think we should go with a verbose name to try and precisely capture what
this field tracks, e.g. highest_stale_pending_ioapic_eoi.  It's abusrdly long,
but with massaging and refactoring the line lengths are a non-issue, and I want
to avoid confusion with pending_ioapic_eoi and highest_isr_cache (and others).

>  };
>  
>  struct kvm_lpage_info {
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 995eb5054360..40252a800897 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -297,10 +297,16 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors)
>  			u16 dm = kvm_lapic_irq_dest_mode(!!e->fields.dest_mode);
>  
>  			if (kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
> -						e->fields.dest_id, dm) ||
> -			    kvm_apic_pending_eoi(vcpu, e->fields.vector))
> +						e->fields.dest_id, dm))
>  				__set_bit(e->fields.vector,
>  					  ioapic_handled_vectors);
> +			else if (kvm_apic_pending_eoi(vcpu, e->fields.vector)) {
> +				__set_bit(e->fields.vector,
> +					  ioapic_handled_vectors);
> +				vcpu->arch.last_pending_vector = e->fields.vector >
> +					vcpu->arch.last_pending_vector ? e->fields.vector :
> +					vcpu->arch.last_pending_vector;

Eh, no need to use a ternary operator, last_pending_vector only needs to be written
if it's changing.

>  		}
>  	}
>  	spin_unlock(&ioapic->lock);
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index 8136695f7b96..1d23c52576e1 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -426,9 +426,14 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
>  
>  			if (irq.trig_mode &&
>  			    (kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
> -						 irq.dest_id, irq.dest_mode) ||
> -			     kvm_apic_pending_eoi(vcpu, irq.vector)))
> +						 irq.dest_id, irq.dest_mode)))
>  				__set_bit(irq.vector, ioapic_handled_vectors);
> +			else if (kvm_apic_pending_eoi(vcpu, irq.vector)) {
> +				__set_bit(irq.vector, ioapic_handled_vectors);
> +				vcpu->arch.last_pending_vector = irq.vector >
> +					vcpu->arch.last_pending_vector ? irq.vector :
> +					vcpu->arch.last_pending_vector;

This is wrong.  Well, maybe not "wrong" per se, but unnecessary.  The trig_mode
check guards both the "new" and "old" routing cases, i.e. KVM's behavior is to
intercept EOIs for in-flight IRQs if and only if the IRQ is level-triggered.

This code really needs to be de-duplicated between the userspace and in-kernel
I/O APICs.  It probably should have been de-duplicated by fceb3a36c29a ("KVM: x86:
ioapic: Fix level-triggered EOI and userspace I/OAPIC reconfigure race"), but it's
a much more pressing issue now.

> +			}
>  		}
>  	}
>  	srcu_read_unlock(&kvm->irq_srcu, idx);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index a009c94c26c2..7c540a0eb340 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1466,6 +1466,19 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>  	if (!kvm_ioapic_handles_vector(apic, vector))
>  		return;
>  
> +	/*
> +	 * When there are instances where ioapic_handled_vectors is
> +	 * set due to pending interrupts, clean up the record and do
> +	 * a full KVM_REQ_SCAN_IOAPIC.
> +	 * This ensures the vector is cleared in the vCPU's ioapic_handled_vectors,
> +	 * if the vector is reused by non-IOAPIC interrupts, avoiding unnecessary
> +	 * EOI-induced VMEXITs for that vector.
> +	 */
> +	if (apic->vcpu->arch.last_pending_vector == vector) {
> +		apic->vcpu->arch.last_pending_vector = 0;

I think it makes sense to reset the field when KVM_REQ_SCAN_IOAPIC, mainly so
that it's more obviously correct, i.e. so that it's easier to see that the field
is reset immediately prior to scanning, along with the bitmap itself.

> +		kvm_make_request(KVM_REQ_SCAN_IOAPIC, apic->vcpu);
> +	}
> +
>  	/* Request a KVM exit to inform the userspace IOAPIC. */
>  	if (irqchip_split(apic->vcpu->kvm)) {
>  		apic->vcpu->arch.pending_ioapic_eoi = vector;
> -- 
> 2.43.5
>
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0b7af5902ff7..8c50e7b4a96f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1062,6 +1062,7 @@  struct kvm_vcpu_arch {
 #if IS_ENABLED(CONFIG_HYPERV)
 	hpa_t hv_root_tdp;
 #endif
+	u8 last_pending_vector;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 995eb5054360..40252a800897 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -297,10 +297,16 @@  void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors)
 			u16 dm = kvm_lapic_irq_dest_mode(!!e->fields.dest_mode);
 
 			if (kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
-						e->fields.dest_id, dm) ||
-			    kvm_apic_pending_eoi(vcpu, e->fields.vector))
+						e->fields.dest_id, dm))
 				__set_bit(e->fields.vector,
 					  ioapic_handled_vectors);
+			else if (kvm_apic_pending_eoi(vcpu, e->fields.vector)) {
+				__set_bit(e->fields.vector,
+					  ioapic_handled_vectors);
+				vcpu->arch.last_pending_vector = e->fields.vector >
+					vcpu->arch.last_pending_vector ? e->fields.vector :
+					vcpu->arch.last_pending_vector;
+			}
 		}
 	}
 	spin_unlock(&ioapic->lock);
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 8136695f7b96..1d23c52576e1 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -426,9 +426,14 @@  void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
 
 			if (irq.trig_mode &&
 			    (kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
-						 irq.dest_id, irq.dest_mode) ||
-			     kvm_apic_pending_eoi(vcpu, irq.vector)))
+						 irq.dest_id, irq.dest_mode)))
 				__set_bit(irq.vector, ioapic_handled_vectors);
+			else if (kvm_apic_pending_eoi(vcpu, irq.vector)) {
+				__set_bit(irq.vector, ioapic_handled_vectors);
+				vcpu->arch.last_pending_vector = irq.vector >
+					vcpu->arch.last_pending_vector ? irq.vector :
+					vcpu->arch.last_pending_vector;
+			}
 		}
 	}
 	srcu_read_unlock(&kvm->irq_srcu, idx);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a009c94c26c2..7c540a0eb340 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1466,6 +1466,19 @@  static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 	if (!kvm_ioapic_handles_vector(apic, vector))
 		return;
 
+	/*
+	 * When there are instances where ioapic_handled_vectors is
+	 * set due to pending interrupts, clean up the record and do
+	 * a full KVM_REQ_SCAN_IOAPIC.
+	 * This ensures the vector is cleared in the vCPU's ioapic_handled_vectors,
+	 * if the vector is reused by non-IOAPIC interrupts, avoiding unnecessary
+	 * EOI-induced VMEXITs for that vector.
+	 */
+	if (apic->vcpu->arch.last_pending_vector == vector) {
+		apic->vcpu->arch.last_pending_vector = 0;
+		kvm_make_request(KVM_REQ_SCAN_IOAPIC, apic->vcpu);
+	}
+
 	/* Request a KVM exit to inform the userspace IOAPIC. */
 	if (irqchip_split(apic->vcpu->kvm)) {
 		apic->vcpu->arch.pending_ioapic_eoi = vector;