[v3,3/4] x86, apicv: add virtual interrupt delivery support

Message ID	1354518064-3066-4-git-send-email-yang.z.zhang@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@vger.kernel.org> From: Yang Zhang <yang.z.zhang@intel.com> To: kvm@vger.kernel.org Cc: gleb@redhat.com, Yang Zhang <yang.z.zhang@intel.com>, Kevin Tian <kevin.tian@intel.com> Subject: [PATCH v3 3/4] x86, apicv: add virtual interrupt delivery support Date: Mon, 3 Dec 2012 15:01:03 +0800 Message-Id: <1354518064-3066-4-git-send-email-yang.z.zhang@intel.com> In-Reply-To: <1354518064-3066-1-git-send-email-yang.z.zhang@intel.com> References: <1354518064-3066-1-git-send-email-yang.z.zhang@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk

Zhang, Yang Z Dec. 3, 2012, 7:01 a.m. UTC

Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
manually, which is fully taken care of by the hardware. This needs
some special awareness into existing interrupr injection path:

- for pending interrupt, instead of direct injection, we may need
  update architecture specific indicators before resuming to guest.

- A pending interrupt, which is masked by ISR, should be also
  considered in above update action, since hardware will decide
  when to inject it at right time. Current has_interrupt and
  get_interrupt only returns a valid vector from injection p.o.v.

Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
---
 arch/x86/include/asm/kvm_host.h |    4 +
 arch/x86/include/asm/vmx.h      |   11 +++
 arch/x86/kvm/irq.c              |   53 ++++++++++-----
 arch/x86/kvm/lapic.c            |   56 +++++++++++++---
 arch/x86/kvm/lapic.h            |    6 ++
 arch/x86/kvm/svm.c              |   19 +++++
 arch/x86/kvm/vmx.c              |  140 ++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c              |   34 ++++++++--
 virt/kvm/ioapic.c               |    1 +
 9 files changed, 291 insertions(+), 33 deletions(-)

Gleb Natapov Dec. 3, 2012, 11:37 a.m. UTC | #1

On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> manually, which is fully taken care of by the hardware. This needs
> some special awareness into existing interrupr injection path:
> 
> - for pending interrupt, instead of direct injection, we may need
>   update architecture specific indicators before resuming to guest.
> 
> - A pending interrupt, which is masked by ISR, should be also
>   considered in above update action, since hardware will decide
>   when to inject it at right time. Current has_interrupt and
>   get_interrupt only returns a valid vector from injection p.o.v.
> 
Most of my previous comments still apply.

> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    4 +
>  arch/x86/include/asm/vmx.h      |   11 +++
>  arch/x86/kvm/irq.c              |   53 ++++++++++-----
>  arch/x86/kvm/lapic.c            |   56 +++++++++++++---
>  arch/x86/kvm/lapic.h            |    6 ++
>  arch/x86/kvm/svm.c              |   19 +++++
>  arch/x86/kvm/vmx.c              |  140 ++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.c              |   34 ++++++++--
>  virt/kvm/ioapic.c               |    1 +
>  9 files changed, 291 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index dc87b65..e5352c8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -697,6 +697,10 @@ struct kvm_x86_ops {
>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
> +	void (*update_irq)(struct kvm_vcpu *vcpu);
> +	void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector,
> +			int trig_mode, int always_set);
>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>  	int (*get_tdp_level)(void);
>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 21101b6..1003341 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -62,6 +62,7 @@
>  #define EXIT_REASON_MCE_DURING_VMENTRY  41
>  #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
>  #define EXIT_REASON_APIC_ACCESS         44
> +#define EXIT_REASON_EOI_INDUCED         45
>  #define EXIT_REASON_EPT_VIOLATION       48
>  #define EXIT_REASON_EPT_MISCONFIG       49
>  #define EXIT_REASON_WBINVD              54
> @@ -143,6 +144,7 @@
>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
>  #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
>  #define SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100
> +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
>  #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
>  #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
>  
> @@ -180,6 +182,7 @@ enum vmcs_field {
>  	GUEST_GS_SELECTOR               = 0x0000080a,
>  	GUEST_LDTR_SELECTOR             = 0x0000080c,
>  	GUEST_TR_SELECTOR               = 0x0000080e,
> +	GUEST_INTR_STATUS               = 0x00000810,
>  	HOST_ES_SELECTOR                = 0x00000c00,
>  	HOST_CS_SELECTOR                = 0x00000c02,
>  	HOST_SS_SELECTOR                = 0x00000c04,
> @@ -207,6 +210,14 @@ enum vmcs_field {
>  	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
>  	EPT_POINTER                     = 0x0000201a,
>  	EPT_POINTER_HIGH                = 0x0000201b,
> +	EOI_EXIT_BITMAP0                = 0x0000201c,
> +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
> +	EOI_EXIT_BITMAP1                = 0x0000201e,
> +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
> +	EOI_EXIT_BITMAP2                = 0x00002020,
> +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
> +	EOI_EXIT_BITMAP3                = 0x00002022,
> +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
>  	VMCS_LINK_POINTER               = 0x00002800,
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> index 7e06ba1..f782788 100644
> --- a/arch/x86/kvm/irq.c
> +++ b/arch/x86/kvm/irq.c
> @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
>   */
>  int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
>  {
> -	struct kvm_pic *s;
> -
>  	if (!irqchip_in_kernel(v->kvm))
>  		return v->arch.interrupt.pending;
>  
> -	if (kvm_apic_has_interrupt(v) == -1) {	/* LAPIC */
> -		if (kvm_apic_accept_pic_intr(v)) {
> -			s = pic_irqchip(v->kvm);	/* PIC */
> -			return s->output;
> -		} else
> -			return 0;
> -	}
> +	if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */
> +		return kvm_cpu_has_extint(v); /* non-APIC */
>  	return 1;
>  }
>  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
>  
>  /*
> + * check if there is pending interrupt from
> + * non-APIC source without intack.
> + */
> +int kvm_cpu_has_extint(struct kvm_vcpu *v)
> +{
> +	struct kvm_pic *s;
> +
> +	if (kvm_apic_accept_pic_intr(v)) {
> +		s = pic_irqchip(v->kvm); /* PIC */
> +		return s->output;
> +	} else
> +		return 0;
> +}
> +
> +/*
>   * Read pending interrupt vector and intack.
>   */
>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
>  {
> -	struct kvm_pic *s;
>  	int vector;
>  
>  	if (!irqchip_in_kernel(v->kvm))
>  		return v->arch.interrupt.nr;
>  
>  	vector = kvm_get_apic_interrupt(v);	/* APIC */
> -	if (vector == -1) {
> -		if (kvm_apic_accept_pic_intr(v)) {
> -			s = pic_irqchip(v->kvm);
> -			s->output = 0;		/* PIC */
> -			vector = kvm_pic_read_irq(v->kvm);
> -		}
> -	}
> +	if (vector == -1)
> +		return kvm_cpu_get_extint(v); /* non-APIC */
>  	return vector;
>  }
>  EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
>  
> +/*
> + * Read pending interrupt(from non-APIC source)
> + * vector and intack.
> + */
> +int kvm_cpu_get_extint(struct kvm_vcpu *v)
> +{
> +	struct kvm_pic *s;
> +	int vector = -1;
> +
> +	if (kvm_apic_accept_pic_intr(v)) {
> +		s = pic_irqchip(v->kvm);
> +		s->output = 0;		/* PIC */
> +		vector = kvm_pic_read_irq(v->kvm);
> +	}
> +	return vector;
> +}
> +
>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
>  {
>  	kvm_inject_apic_timer_irqs(vcpu);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 7c96012..400d3ba 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -643,6 +643,14 @@ out:
>  	return ret;
>  }
>  
> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> +		int trig_mode, int always_set)
> +{
> +	if (kvm_x86_ops->set_eoi_exitmap)
> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
> +					trig_mode, always_set);
> +}
> +
>  /*
>   * Add a pending IRQ into lapic.
>   * Return 1 if successfully added and 0 if discarded.
> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>  		if (unlikely(!apic_enabled(apic)))
>  			break;
>  
> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
As I said in the last review rebuild the bitmap when ioapic or irq notifier configuration
changes, user request bit to notify vcpus to reload the bitmap.

>  		if (trig_mode) {
>  			apic_debug("level trig mode for vector %d", vector);
>  			apic_set_vector(vector, apic->regs + APIC_TMR);
> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>  }
>  
> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> +{
> +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> +	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> +		int trigger_mode;
> +		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> +			trigger_mode = IOAPIC_LEVEL_TRIG;
> +		else
> +			trigger_mode = IOAPIC_EDGE_TRIG;
> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> +	}
> +}
> +
>  static int apic_set_eoi(struct kvm_lapic *apic)
>  {
>  	int vector = apic_find_highest_isr(apic);
> @@ -756,19 +778,24 @@ static int apic_set_eoi(struct kvm_lapic *apic)
>  	apic_clear_isr(vector, apic);
>  	apic_update_ppr(apic);
>  
> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> -		int trigger_mode;
> -		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> -			trigger_mode = IOAPIC_LEVEL_TRIG;
> -		else
> -			trigger_mode = IOAPIC_EDGE_TRIG;
> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> -	}
> +	kvm_ioapic_send_eoi(apic, vector);
>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>  	return vector;
>  }
>  
> +/*
> + * this interface assumes a trap-like exit, which has already finished
> + * desired side effect including vISR and vPPR update.
> + */
> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +
trace_kvm_eoi()

> +	kvm_ioapic_send_eoi(apic, vector);
> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> +}
> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
> +
>  static void apic_send_ipi(struct kvm_lapic *apic)
>  {
>  	u32 icr_low = kvm_apic_get_reg(apic, APIC_ICR);
> @@ -1533,6 +1560,17 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
>  	return highest_irr;
>  }
>  
> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +	if (!apic || !apic_enabled(apic))
Use kvm_vcpu_has_lapic() instead of checking arch.apic directly.


> +		return -1;
> +
> +	return apic_find_highest_irr(apic);
> +}
> +EXPORT_SYMBOL_GPL(kvm_apic_get_highest_irr);
> +
>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
>  {
>  	u32 lvt0 = kvm_apic_get_reg(vcpu->arch.apic, APIC_LVT0);
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index c42f111..749661a 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -39,6 +39,9 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
> +int kvm_cpu_has_extint(struct kvm_vcpu *v);
> +int kvm_cpu_get_extint(struct kvm_vcpu *v);
> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu);
>  void kvm_lapic_reset(struct kvm_vcpu *vcpu);
>  u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
>  void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
> @@ -50,6 +53,8 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu);
>  int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
>  int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
>  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> +		int need_eoi, int global);
>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
>  
>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
> @@ -65,6 +70,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
>  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
>  
>  int kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
>  
>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
>  void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index dcb7952..8f0903b 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3573,6 +3573,22 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>  }
>  
> +static int svm_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> +{
> +	return 0;
> +}
> +
> +static void svm_update_irq(struct kvm_vcpu *vcpu)
> +{
> +	return ;
> +}
> +
> +static void svm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> +				int trig_mode, int always_set)
> +{
> +	return ;
> +}
> +
>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4292,6 +4308,9 @@ static struct kvm_x86_ops svm_x86_ops = {
>  	.enable_nmi_window = enable_nmi_window,
>  	.enable_irq_window = enable_irq_window,
>  	.update_cr8_intercept = update_cr8_intercept,
> +	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,
> +	.update_irq = svm_update_irq;
> +	.set_eoi_exitmap = svm_set_eoi_exitmap;
>  
>  	.set_tss_addr = svm_set_tss_addr,
>  	.get_tdp_level = get_npt_level,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 6a5f651..909ce90 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -86,6 +86,9 @@ module_param(fasteoi, bool, S_IRUGO);
>  static bool __read_mostly enable_apicv_reg;
>  module_param(enable_apicv_reg, bool, S_IRUGO);
>  
> +static bool __read_mostly enable_apicv_vid;
> +module_param(enable_apicv_vid, bool, S_IRUGO);
> +
>  /*
>   * If nested=1, nested virtualization is supported, i.e., guests may use
>   * VMX and be a hypervisor for its own guests. If nested=0, guests may not
> @@ -432,6 +435,9 @@ struct vcpu_vmx {
>  
>  	bool rdtscp_enabled;
>  
> +	u8 eoi_exitmap_changed;
> +	u32 eoi_exit_bitmap[8];
> +
>  	/* Support for a guest hypervisor (nested VMX) */
>  	struct nested_vmx nested;
>  };
> @@ -770,6 +776,12 @@ static inline bool cpu_has_vmx_apic_register_virt(void)
>  		SECONDARY_EXEC_APIC_REGISTER_VIRT;
>  }
>  
> +static inline bool cpu_has_vmx_virtual_intr_delivery(void)
> +{
> +	return vmcs_config.cpu_based_2nd_exec_ctrl &
> +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> +}
> +
>  static inline bool cpu_has_vmx_flexpriority(void)
>  {
>  	return cpu_has_vmx_tpr_shadow() &&
> @@ -2508,7 +2520,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
>  			SECONDARY_EXEC_RDTSCP |
>  			SECONDARY_EXEC_ENABLE_INVPCID |
> -			SECONDARY_EXEC_APIC_REGISTER_VIRT;
> +			SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>  		if (adjust_vmx_controls(min2, opt2,
>  					MSR_IA32_VMX_PROCBASED_CTLS2,
>  					&_cpu_based_2nd_exec_control) < 0)
> @@ -2522,7 +2535,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  
>  	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
>  		_cpu_based_2nd_exec_control &= ~(
> -				SECONDARY_EXEC_APIC_REGISTER_VIRT);
> +				SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>  
>  	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
>  		/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
> @@ -2724,6 +2738,14 @@ static __init int hardware_setup(void)
>  	if (!cpu_has_vmx_apic_register_virt())
>  		enable_apicv_reg = 0;
>  
> +	if (!cpu_has_vmx_virtual_intr_delivery())
> +		enable_apicv_vid = 0;
> +
> +	if (!enable_apicv_vid) {
> +		kvm_x86_ops->update_irq = NULL;
Why setting it to NULL? Either drop this since vmx_update_irq() checks
enable_apicv_vid or better set it to function that does nothing and drop
enable_apicv_vid check in vmx_update_irq(). Since kvm_x86_ops->update_irq
will never be NULL you can drop the check before calling it.

> +		kvm_x86_ops->update_cr8_intercept = NULL;
Why? It should be other way around: if apicv is enabled set
update_cr8_intercept callback to NULL.

> +	}
> +
>  	if (nested)
>  		nested_vmx_setup_ctls_msrs();
>  
> @@ -3839,6 +3861,8 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>  	if (!enable_apicv_reg)
>  		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
> +	if (!enable_apicv_vid)
> +		exec_control &= ~SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>  	return exec_control;
>  }
>  
> @@ -3883,6 +3907,15 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  				vmx_secondary_exec_control(vmx));
>  	}
>  
> +	if (enable_apicv_vid) {
> +		vmcs_write64(EOI_EXIT_BITMAP0, 0);
> +		vmcs_write64(EOI_EXIT_BITMAP1, 0);
> +		vmcs_write64(EOI_EXIT_BITMAP2, 0);
> +		vmcs_write64(EOI_EXIT_BITMAP3, 0);
> +
> +		vmcs_write16(GUEST_INTR_STATUS, 0);
> +	}
> +
>  	if (ple_gap) {
>  		vmcs_write32(PLE_GAP, ple_gap);
>  		vmcs_write32(PLE_WINDOW, ple_window);
> @@ -4806,6 +4839,16 @@ static int handle_apic_access(struct kvm_vcpu *vcpu)
>  	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
>  }
>  
> +static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> +	int vector = exit_qualification & 0xff;
> +
> +	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
> +	kvm_apic_set_eoi_accelerated(vcpu, vector);
> +	return 1;
> +}
> +
>  static int handle_apic_write(struct kvm_vcpu *vcpu)
>  {
>  	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> @@ -5755,6 +5798,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
>  	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
>  	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,
> +	[EXIT_REASON_EOI_INDUCED]             = handle_apic_eoi_induced,
>  	[EXIT_REASON_WBINVD]                  = handle_wbinvd,
>  	[EXIT_REASON_XSETBV]                  = handle_xsetbv,
>  	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
> @@ -6096,6 +6140,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
>  
>  static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  {
> +	/* no need for tpr_threshold update if APIC virtual
> +	 * interrupt delivery is enabled */
> +	if (!enable_apicv_vid)
> +		return ;
> +
Since you (will) set ->update_cr8_intercept callback to NULL if vid
is enabled this function will never be called with !enable_apicv_vid,
so this check can be dropped.

>  	if (irr == -1 || tpr < irr) {
>  		vmcs_write32(TPR_THRESHOLD, 0);
>  		return;
> @@ -6104,6 +6153,90 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  	vmcs_write32(TPR_THRESHOLD, irr);
>  }
>  
> +static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> +{
> +	return irqchip_in_kernel(vcpu->kvm) && enable_apicv_vid;
> +}
> +
> +static void vmx_update_eoi_exitmap(struct vcpu_vmx *vmx, int index)
> +{
> +	int tmr;
> +	tmr = kvm_apic_get_reg(vmx->vcpu.arch.apic,
> +			APIC_TMR + 0x10 * index);
> +	vmcs_write32(EOI_EXIT_BITMAP0 + index,
> +			vmx->eoi_exit_bitmap[index] | tmr);
> +}
> +
> +static void vmx_update_rvi(int vector)
> +{
> +	u16 status;
> +	u8 old;
> +
> +	status = vmcs_read16(GUEST_INTR_STATUS);
> +	old = (u8)status & 0xff;
> +	if ((u8)vector != old) {
> +		status &= ~0xff;
> +		status |= (u8)vector;
> +		vmcs_write16(GUEST_INTR_STATUS, status);
> +	}
> +}
> +
> +static void vmx_update_irq(struct kvm_vcpu *vcpu)
> +{
> +	int vector;
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	if (!enable_apicv_vid)
> +		return ;
> +
> +	vector = kvm_apic_get_highest_irr(vcpu);
> +	if (vector == -1)
> +		return;
> +
> +	vmx_update_rvi(vector);
> +
> +	if (vmx->eoi_exitmap_changed) {
> +		int index;
> +		for_each_set_bit(index,
> +				(unsigned long *)(&vmx->eoi_exitmap_changed), 8)
> +			vmx_update_eoi_exitmap(vmx, index);
> +		vmx->eoi_exitmap_changed = 0;
> +	}
> +}
> +
> +static void vmx_set_eoi_exitmap(struct kvm_vcpu *vcpu,
> +				int vector, int trig_mode,
> +				int always_set)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	int index, offset, changed;
> +	struct kvm_lapic *apic;
> +
> +	if (!enable_apicv_vid)
> +		return ;
> +
> +	if (WARN_ONCE((vector < 0) || (vector > 255),
> +		"KVM VMX: vector (%d) out of range\n", vector))
> +		return;
> +
> +	apic = vcpu->arch.apic;
> +	index = vector >> 5;
> +	offset = vector & 31;
> +
> +	if (always_set)
> +		changed = !test_and_set_bit(offset,
> +				(unsigned long *)&vmx->eoi_exit_bitmap);
> +	else if (trig_mode)
> +		changed = !test_bit(offset,
> +				apic->regs + APIC_TMR + index * 0x10);
> +	else
> +		changed = test_bit(offset,
> +				apic->regs + APIC_TMR + index * 0x10);
> +
> +	if (changed)
> +		vmx->eoi_exitmap_changed |= 1 << index;
> +}
> +
>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
>  {
>  	u32 exit_intr_info;
> @@ -7364,6 +7497,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
>  	.enable_nmi_window = enable_nmi_window,
>  	.enable_irq_window = enable_irq_window,
>  	.update_cr8_intercept = update_cr8_intercept,
> +	.has_virtual_interrupt_delivery = vmx_has_virtual_interrupt_delivery,
> +	.update_irq = vmx_update_irq,
> +	.set_eoi_exitmap = vmx_set_eoi_exitmap,
>  
>  	.set_tss_addr = vmx_set_tss_addr,
>  	.get_tdp_level = get_ept_level,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b0b8abe..02fe194 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -164,6 +164,14 @@ static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
>  
>  static int kvm_vcpu_reset(struct kvm_vcpu *vcpu);
>  
> +static inline bool kvm_apic_vid_enabled(struct kvm_vcpu *vcpu)
> +{
> +	if (kvm_x86_ops->has_virtual_interrupt_delivery)
This callback is never NULL.

> +		return kvm_x86_ops->has_virtual_interrupt_delivery(vcpu);
> +
> +	return 0;
> +}
> +
>  static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
>  {
>  	int i;
> @@ -5533,12 +5541,20 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
>  			vcpu->arch.nmi_injected = true;
>  			kvm_x86_ops->set_nmi(vcpu);
>  		}
> -	} else if (kvm_cpu_has_interrupt(vcpu)) {
> -		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
> -			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
> -					    false);
> +	} else if (kvm_cpu_has_interrupt(vcpu) &&
> +			kvm_x86_ops->interrupt_allowed(vcpu)) {
> +		int vector = -1;
> +
> +		if (kvm_apic_vid_enabled(vcpu))
> +			vector = kvm_cpu_get_extint(vcpu);
> +		else
> +			vector = kvm_cpu_get_interrupt(vcpu);
> +
> +		if (vector != -1) {
> +			kvm_queue_interrupt(vcpu, vector, false);
>  			kvm_x86_ops->set_irq(vcpu);
>  		}
If vid is enabled kvm_cpu_has_interrupt() should return true only if there
is extint interrupt. Similarly kvm_cpu_get_interrupt() will only return
extint if vid is enabled. This basically moves kvm_apic_vid_enabled()
logic deeper into kvm_cpu_(has|get)_interrupt() functions instead
of changing interrupt injection logic here and in vcpu_enter_guest()
bellow. We still need kvm_cpu_has_interrupt() variant that always checks
both extint and apic for use in kvm_arch_vcpu_runnable() though.

> +
>  	}
>  }
>  
> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	}
>  
>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> +		/* update archtecture specific hints for APIC
> +		 * virtual interrupt delivery */
> +		if (kvm_x86_ops->update_irq)
> +			kvm_x86_ops->update_irq(vcpu);
> +

I do not see why this have to be here instead of inside
if (kvm_lapic_enabled(vcpu)){} near update_cr8_intercept() a couple of
lines bellow. If you move it there you can drop apic enable check in
kvm_apic_get_highest_irr().
 
>  		inject_pending_event(vcpu);
>  
>  		/* enable NMI/IRQ window open exits if needed */
>  		if (vcpu->arch.nmi_pending)
>  			kvm_x86_ops->enable_nmi_window(vcpu);
> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> +		else if (kvm_apic_vid_enabled(vcpu)) {
> +			if (kvm_cpu_has_extint(vcpu))
> +				kvm_x86_ops->enable_irq_window(vcpu);
> +		} else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>  			kvm_x86_ops->enable_irq_window(vcpu);
>  
>  		if (kvm_lapic_enabled(vcpu)) {
> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> index 166c450..898aa62 100644
> --- a/virt/kvm/ioapic.c
> +++ b/virt/kvm/ioapic.c
> @@ -186,6 +186,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
>  		/* need to read apic_id from apic regiest since
>  		 * it can be rewritten */
>  		irqe.dest_id = ioapic->kvm->bsp_vcpu_id;
> +		kvm_set_eoi_exitmap(ioapic->kvm->vcpus[0], irqe.vector, 1, 1);
>  	}
>  #endif
>  	return kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe);
> -- 
> 1.7.1

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zhang, Yang Z Dec. 4, 2012, 6:39 a.m. UTC | #2

Gleb Natapov wrote on 2012-12-03:
> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
>> manually, which is fully taken care of by the hardware. This needs
>> some special awareness into existing interrupr injection path:
>> 
>> - for pending interrupt, instead of direct injection, we may need
>>   update architecture specific indicators before resuming to guest.
>> - A pending interrupt, which is masked by ISR, should be also
>>   considered in above update action, since hardware will decide
>>   when to inject it at right time. Current has_interrupt and
>>   get_interrupt only returns a valid vector from injection p.o.v.
> Most of my previous comments still apply.
> 
>> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
>> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |    4 + arch/x86/include/asm/vmx.h   
>>    |   11 +++ arch/x86/kvm/irq.c              |   53 ++++++++++-----
>>  arch/x86/kvm/lapic.c            |   56 +++++++++++++---
>>  arch/x86/kvm/lapic.h            |    6 ++ arch/x86/kvm/svm.c          
>>     |   19 +++++ arch/x86/kvm/vmx.c              |  140
>>  ++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c            
>>   |   34 ++++++++-- virt/kvm/ioapic.c               |    1 + 9 files
>>  changed, 291 insertions(+), 33 deletions(-)
>> diff --git a/arch/x86/include/asm/kvm_host.h
>> b/arch/x86/include/asm/kvm_host.h index dc87b65..e5352c8 100644 ---
>> a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -697,6 +697,10 @@ struct kvm_x86_ops {
>>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
>> +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
>> +	void (*update_irq)(struct kvm_vcpu *vcpu);
>> +	void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector,
>> +			int trig_mode, int always_set);
>>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>>  	int (*get_tdp_level)(void);
>>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
>> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
>> index 21101b6..1003341 100644
>> --- a/arch/x86/include/asm/vmx.h
>> +++ b/arch/x86/include/asm/vmx.h
>> @@ -62,6 +62,7 @@
>>  #define EXIT_REASON_MCE_DURING_VMENTRY  41 #define
>>  EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define EXIT_REASON_APIC_ACCESS    
>>      44 +#define EXIT_REASON_EOI_INDUCED         45 #define
>>  EXIT_REASON_EPT_VIOLATION       48 #define EXIT_REASON_EPT_MISCONFIG  
>>      49 #define EXIT_REASON_WBINVD              54 @@ -143,6 +144,7 @@
>>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040 #define
>>  SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080 #define
>>  SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100 +#define
>>  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200 #define
>>  SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400 #define
>>  SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
>> @@ -180,6 +182,7 @@ enum vmcs_field {
>>  	GUEST_GS_SELECTOR               = 0x0000080a, 	GUEST_LDTR_SELECTOR   
>>           = 0x0000080c, 	GUEST_TR_SELECTOR               = 0x0000080e,
>>  +	GUEST_INTR_STATUS               = 0x00000810, 	HOST_ES_SELECTOR     
>>            = 0x00000c00, 	HOST_CS_SELECTOR                = 0x00000c02,
>>  	HOST_SS_SELECTOR                = 0x00000c04, @@ -207,6 +210,14 @@
>>  enum vmcs_field { 	APIC_ACCESS_ADDR_HIGH		= 0x00002015, 	EPT_POINTER  
>>                    = 0x0000201a, 	EPT_POINTER_HIGH                =
>>  0x0000201b,
>> +	EOI_EXIT_BITMAP0                = 0x0000201c,
>> +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
>> +	EOI_EXIT_BITMAP1                = 0x0000201e,
>> +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
>> +	EOI_EXIT_BITMAP2                = 0x00002020,
>> +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
>> +	EOI_EXIT_BITMAP3                = 0x00002022,
>> +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
>>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
>>  	VMCS_LINK_POINTER               = 0x00002800,
>> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
>> index 7e06ba1..f782788 100644
>> --- a/arch/x86/kvm/irq.c
>> +++ b/arch/x86/kvm/irq.c
>> @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
>>   */
>>  int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
>>  {
>> -	struct kvm_pic *s;
>> -
>>  	if (!irqchip_in_kernel(v->kvm))
>>  		return v->arch.interrupt.pending;
>> -	if (kvm_apic_has_interrupt(v) == -1) {	/* LAPIC */
>> -		if (kvm_apic_accept_pic_intr(v)) {
>> -			s = pic_irqchip(v->kvm);	/* PIC */
>> -			return s->output;
>> -		} else
>> -			return 0;
>> -	}
>> +	if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */
>> +		return kvm_cpu_has_extint(v); /* non-APIC */
>>  	return 1;
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
>>  
>>  /*
>> + * check if there is pending interrupt from
>> + * non-APIC source without intack.
>> + */
>> +int kvm_cpu_has_extint(struct kvm_vcpu *v)
>> +{
>> +	struct kvm_pic *s;
>> +
>> +	if (kvm_apic_accept_pic_intr(v)) {
>> +		s = pic_irqchip(v->kvm); /* PIC */
>> +		return s->output;
>> +	} else
>> +		return 0;
>> +}
>> +
>> +/*
>>   * Read pending interrupt vector and intack.
>>   */
>>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v) { -	struct kvm_pic *s;
>>  	int vector;
>>  
>>  	if (!irqchip_in_kernel(v->kvm))
>>  		return v->arch.interrupt.nr;
>>  
>>  	vector = kvm_get_apic_interrupt(v);	/* APIC */
>> -	if (vector == -1) {
>> -		if (kvm_apic_accept_pic_intr(v)) {
>> -			s = pic_irqchip(v->kvm);
>> -			s->output = 0;		/* PIC */
>> -			vector = kvm_pic_read_irq(v->kvm);
>> -		}
>> -	}
>> +	if (vector == -1)
>> +		return kvm_cpu_get_extint(v); /* non-APIC */
>>  	return vector;
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
>> +/*
>> + * Read pending interrupt(from non-APIC source)
>> + * vector and intack.
>> + */
>> +int kvm_cpu_get_extint(struct kvm_vcpu *v)
>> +{
>> +	struct kvm_pic *s;
>> +	int vector = -1;
>> +
>> +	if (kvm_apic_accept_pic_intr(v)) {
>> +		s = pic_irqchip(v->kvm);
>> +		s->output = 0;		/* PIC */
>> +		vector = kvm_pic_read_irq(v->kvm);
>> +	}
>> +	return vector;
>> +}
>> +
>>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
>>  {
>>  	kvm_inject_apic_timer_irqs(vcpu);
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 7c96012..400d3ba 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -643,6 +643,14 @@ out:
>>  	return ret;
>>  }
>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>> +		int trig_mode, int always_set)
>> +{
>> +	if (kvm_x86_ops->set_eoi_exitmap)
>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
>> +					trig_mode, always_set);
>> +}
>> +
>>  /*
>>   * Add a pending IRQ into lapic.
>>   * Return 1 if successfully added and 0 if discarded.
>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int
> delivery_mode,
>>  		if (unlikely(!apic_enabled(apic)))
>>  			break;
>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
> As I said in the last review rebuild the bitmap when ioapic or irq
> notifier configuration changes, user request bit to notify vcpus to
> reload the bitmap.
It is too complicated. When program ioapic entry, we cannot get the target vcpu easily. We need to read destination format register and logical destination register to find out target vcpu if using logical mode. Also, we must trap every modification to the two registers to update eoi bitmap. 
For irq notifier, only PIT is special which is edge trigger but need an EOI notifier. So, just treat it specially. And TMR can cover others.

> 
>>  		if (trig_mode) {
>>  			apic_debug("level trig mode for vector %d", vector);
>>  			apic_set_vector(vector, apic->regs + APIC_TMR);
>> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1,
> struct kvm_vcpu *vcpu2)
>>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>>  }
>> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>> +{
>> +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
>> +	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
>> +		int trigger_mode;
>> +		if (apic_test_vector(vector, apic->regs + APIC_TMR))
>> +			trigger_mode = IOAPIC_LEVEL_TRIG;
>> +		else
>> +			trigger_mode = IOAPIC_EDGE_TRIG;
>> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
>> +	}
>> +}
>> +
>>  static int apic_set_eoi(struct kvm_lapic *apic) { 	int vector =
>>  apic_find_highest_isr(apic); @@ -756,19 +778,24 @@ static int
>>  apic_set_eoi(struct kvm_lapic *apic) 	apic_clear_isr(vector, apic);
>>  	apic_update_ppr(apic);
>> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
>> -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
>> -		int trigger_mode;
>> -		if (apic_test_vector(vector, apic->regs + APIC_TMR))
>> -			trigger_mode = IOAPIC_LEVEL_TRIG;
>> -		else
>> -			trigger_mode = IOAPIC_EDGE_TRIG;
>> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
>> -	}
>> +	kvm_ioapic_send_eoi(apic, vector);
>>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>  	return vector;
>>  }
>> +/*
>> + * this interface assumes a trap-like exit, which has already finished
>> + * desired side effect including vISR and vPPR update.
>> + */
>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
>> +{
>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>> +
> trace_kvm_eoi()
Ok.

>> +	kvm_ioapic_send_eoi(apic, vector);
>> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
>> +
>>  static void apic_send_ipi(struct kvm_lapic *apic) { 	u32 icr_low =
>>  kvm_apic_get_reg(apic, APIC_ICR); @@ -1533,6 +1560,17 @@ int
>>  kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) 	return highest_irr; }
>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
>> +{
>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>> +
>> +	if (!apic || !apic_enabled(apic))
> Use kvm_vcpu_has_lapic() instead of checking arch.apic directly.
Ok.

> 
>> +		return -1;
>> +
>> +	return apic_find_highest_irr(apic);
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_apic_get_highest_irr);
>> +
>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
>>  {
>>  	u32 lvt0 = kvm_apic_get_reg(vcpu->arch.apic, APIC_LVT0);
>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
>> index c42f111..749661a 100644
>> --- a/arch/x86/kvm/lapic.h
>> +++ b/arch/x86/kvm/lapic.h
>> @@ -39,6 +39,9 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
>>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
>>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
>> +int kvm_cpu_has_extint(struct kvm_vcpu *v);
>> +int kvm_cpu_get_extint(struct kvm_vcpu *v);
>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu);
>>  void kvm_lapic_reset(struct kvm_vcpu *vcpu); u64
>>  kvm_lapic_get_cr8(struct kvm_vcpu *vcpu); void
>>  kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8); @@ -50,6
>>  +53,8 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu); int
>>  kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); int
>>  kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); int
>>  kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>> +		int need_eoi, int global);
>>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
>>  
>>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
>> @@ -65,6 +70,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu
> *vcpu);
>>  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
>>  
>>  int kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
>> 
>>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
>>  void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index dcb7952..8f0903b 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -3573,6 +3573,22 @@ static void update_cr8_intercept(struct kvm_vcpu
> *vcpu, int tpr, int irr)
>>  		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>>  }
>> +static int svm_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
>> +{
>> +	return 0;
>> +}
>> +
>> +static void svm_update_irq(struct kvm_vcpu *vcpu)
>> +{
>> +	return ;
>> +}
>> +
>> +static void svm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>> +				int trig_mode, int always_set)
>> +{
>> +	return ;
>> +}
>> +
>>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu) { 	struct vcpu_svm
>>  *svm = to_svm(vcpu); @@ -4292,6 +4308,9 @@ static struct kvm_x86_ops
>>  svm_x86_ops = { 	.enable_nmi_window = enable_nmi_window,
>>  	.enable_irq_window = enable_irq_window, 	.update_cr8_intercept =
>>  update_cr8_intercept,
>> +	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,
>> +	.update_irq = svm_update_irq;
>> +	.set_eoi_exitmap = svm_set_eoi_exitmap;
>> 
>>  	.set_tss_addr = svm_set_tss_addr,
>>  	.get_tdp_level = get_npt_level,
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 6a5f651..909ce90 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -86,6 +86,9 @@ module_param(fasteoi, bool, S_IRUGO);
>>  static bool __read_mostly enable_apicv_reg;
>>  module_param(enable_apicv_reg, bool, S_IRUGO);
>> +static bool __read_mostly enable_apicv_vid;
>> +module_param(enable_apicv_vid, bool, S_IRUGO);
>> +
>>  /*
>>   * If nested=1, nested virtualization is supported, i.e., guests may use
>>   * VMX and be a hypervisor for its own guests. If nested=0, guests may not
>> @@ -432,6 +435,9 @@ struct vcpu_vmx {
>> 
>>  	bool rdtscp_enabled;
>> +	u8 eoi_exitmap_changed;
>> +	u32 eoi_exit_bitmap[8];
>> +
>>  	/* Support for a guest hypervisor (nested VMX) */
>>  	struct nested_vmx nested;
>>  };
>> @@ -770,6 +776,12 @@ static inline bool
> cpu_has_vmx_apic_register_virt(void)
>>  		SECONDARY_EXEC_APIC_REGISTER_VIRT;
>>  }
>> +static inline bool cpu_has_vmx_virtual_intr_delivery(void)
>> +{
>> +	return vmcs_config.cpu_based_2nd_exec_ctrl &
>> +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>> +}
>> +
>>  static inline bool cpu_has_vmx_flexpriority(void)
>>  {
>>  	return cpu_has_vmx_tpr_shadow() &&
>> @@ -2508,7 +2520,8 @@ static __init int setup_vmcs_config(struct
> vmcs_config *vmcs_conf)
>>  			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
>>  			SECONDARY_EXEC_RDTSCP |
>>  			SECONDARY_EXEC_ENABLE_INVPCID |
>> -			SECONDARY_EXEC_APIC_REGISTER_VIRT;
>> +			SECONDARY_EXEC_APIC_REGISTER_VIRT |
>> +			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>>  		if (adjust_vmx_controls(min2, opt2,
>>  					MSR_IA32_VMX_PROCBASED_CTLS2,
>>  					&_cpu_based_2nd_exec_control) < 0)
>> @@ -2522,7 +2535,8 @@ static __init int setup_vmcs_config(struct
>> vmcs_config *vmcs_conf)
>> 
>>  	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
>>  		_cpu_based_2nd_exec_control &= ~(
>> -				SECONDARY_EXEC_APIC_REGISTER_VIRT);
>> +				SECONDARY_EXEC_APIC_REGISTER_VIRT |
>> +				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>> 
>>  	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) { 		/*
>>  CR3 accesses and invlpg don't need to cause VM Exits when EPT @@
>>  -2724,6 +2738,14 @@ static __init int hardware_setup(void) 	if
>>  (!cpu_has_vmx_apic_register_virt()) 		enable_apicv_reg = 0;
>> +	if (!cpu_has_vmx_virtual_intr_delivery())
>> +		enable_apicv_vid = 0;
>> +
>> +	if (!enable_apicv_vid) {
>> +		kvm_x86_ops->update_irq = NULL;
> Why setting it to NULL? Either drop this since vmx_update_irq() checks
> enable_apicv_vid or better set it to function that does nothing and drop
> enable_apicv_vid check in vmx_update_irq(). Since kvm_x86_ops->update_irq
> will never be NULL you can drop the check before calling it.
Sure.
 
>> +		kvm_x86_ops->update_cr8_intercept = NULL;
> Why? It should be other way around: if apicv is enabled set
> update_cr8_intercept callback to NULL.
Yes, this is wrong.
 
>> +	}
>> +
>>  	if (nested)
>>  		nested_vmx_setup_ctls_msrs();
>> @@ -3839,6 +3861,8 @@ static u32 vmx_secondary_exec_control(struct
> vcpu_vmx *vmx)
>>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>>  	if (!enable_apicv_reg)
>>  		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
>> +	if (!enable_apicv_vid)
>> +		exec_control &= ~SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>>  	return exec_control;
>>  }
>> @@ -3883,6 +3907,15 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>>  				vmx_secondary_exec_control(vmx));
>>  	}
>> +	if (enable_apicv_vid) {
>> +		vmcs_write64(EOI_EXIT_BITMAP0, 0);
>> +		vmcs_write64(EOI_EXIT_BITMAP1, 0);
>> +		vmcs_write64(EOI_EXIT_BITMAP2, 0);
>> +		vmcs_write64(EOI_EXIT_BITMAP3, 0);
>> +
>> +		vmcs_write16(GUEST_INTR_STATUS, 0);
>> +	}
>> +
>>  	if (ple_gap) {
>>  		vmcs_write32(PLE_GAP, ple_gap);
>>  		vmcs_write32(PLE_WINDOW, ple_window);
>> @@ -4806,6 +4839,16 @@ static int handle_apic_access(struct kvm_vcpu
> *vcpu)
>>  	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
>>  }
>> +static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
>> +{
>> +	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>> +	int vector = exit_qualification & 0xff;
>> +
>> +	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
>> +	kvm_apic_set_eoi_accelerated(vcpu, vector);
>> +	return 1;
>> +}
>> +
>>  static int handle_apic_write(struct kvm_vcpu *vcpu)
>>  {
>>  	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>> @@ -5755,6 +5798,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct
> kvm_vcpu *vcpu) = {
>>  	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
>>  	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
>>  	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,
>>  +	[EXIT_REASON_EOI_INDUCED]             = handle_apic_eoi_induced,
>>  	[EXIT_REASON_WBINVD]                  = handle_wbinvd,
>>  	[EXIT_REASON_XSETBV]                  = handle_xsetbv,
>>  	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
>> @@ -6096,6 +6140,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
>> 
>>  static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>>  {
>> +	/* no need for tpr_threshold update if APIC virtual
>> +	 * interrupt delivery is enabled */
>> +	if (!enable_apicv_vid)
>> +		return ;
>> +
> Since you (will) set ->update_cr8_intercept callback to NULL if vid
> is enabled this function will never be called with !enable_apicv_vid,
> so this check can be dropped.
Ok.

>>  	if (irr == -1 || tpr < irr) {
>>  		vmcs_write32(TPR_THRESHOLD, 0);
>>  		return;
>> @@ -6104,6 +6153,90 @@ static void update_cr8_intercept(struct kvm_vcpu
> *vcpu, int tpr, int irr)
>>  	vmcs_write32(TPR_THRESHOLD, irr);
>>  }
>> +static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
>> +{
>> +	return irqchip_in_kernel(vcpu->kvm) && enable_apicv_vid;
>> +}
>> +
>> +static void vmx_update_eoi_exitmap(struct vcpu_vmx *vmx, int index)
>> +{
>> +	int tmr;
>> +	tmr = kvm_apic_get_reg(vmx->vcpu.arch.apic,
>> +			APIC_TMR + 0x10 * index);
>> +	vmcs_write32(EOI_EXIT_BITMAP0 + index,
>> +			vmx->eoi_exit_bitmap[index] | tmr);
>> +}
>> +
>> +static void vmx_update_rvi(int vector)
>> +{
>> +	u16 status;
>> +	u8 old;
>> +
>> +	status = vmcs_read16(GUEST_INTR_STATUS);
>> +	old = (u8)status & 0xff;
>> +	if ((u8)vector != old) {
>> +		status &= ~0xff;
>> +		status |= (u8)vector;
>> +		vmcs_write16(GUEST_INTR_STATUS, status);
>> +	}
>> +}
>> +
>> +static void vmx_update_irq(struct kvm_vcpu *vcpu)
>> +{
>> +	int vector;
>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> +
>> +	if (!enable_apicv_vid)
>> +		return ;
>> +
>> +	vector = kvm_apic_get_highest_irr(vcpu);
>> +	if (vector == -1)
>> +		return;
>> +
>> +	vmx_update_rvi(vector);
>> +
>> +	if (vmx->eoi_exitmap_changed) {
>> +		int index;
>> +		for_each_set_bit(index,
>> +				(unsigned long *)(&vmx->eoi_exitmap_changed), 8)
>> +			vmx_update_eoi_exitmap(vmx, index);
>> +		vmx->eoi_exitmap_changed = 0;
>> +	}
>> +}
>> +
>> +static void vmx_set_eoi_exitmap(struct kvm_vcpu *vcpu,
>> +				int vector, int trig_mode,
>> +				int always_set)
>> +{
>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> +	int index, offset, changed;
>> +	struct kvm_lapic *apic;
>> +
>> +	if (!enable_apicv_vid)
>> +		return ;
>> +
>> +	if (WARN_ONCE((vector < 0) || (vector > 255),
>> +		"KVM VMX: vector (%d) out of range\n", vector))
>> +		return;
>> +
>> +	apic = vcpu->arch.apic;
>> +	index = vector >> 5;
>> +	offset = vector & 31;
>> +
>> +	if (always_set)
>> +		changed = !test_and_set_bit(offset,
>> +				(unsigned long *)&vmx->eoi_exit_bitmap);
>> +	else if (trig_mode)
>> +		changed = !test_bit(offset,
>> +				apic->regs + APIC_TMR + index * 0x10);
>> +	else
>> +		changed = test_bit(offset,
>> +				apic->regs + APIC_TMR + index * 0x10);
>> +
>> +	if (changed)
>> +		vmx->eoi_exitmap_changed |= 1 << index;
>> +}
>> +
>>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx) { 	u32
>>  exit_intr_info; @@ -7364,6 +7497,9 @@ static struct kvm_x86_ops
>>  vmx_x86_ops = { 	.enable_nmi_window = enable_nmi_window,
>>  	.enable_irq_window = enable_irq_window, 	.update_cr8_intercept =
>>  update_cr8_intercept,
>> +	.has_virtual_interrupt_delivery = vmx_has_virtual_interrupt_delivery,
>> +	.update_irq = vmx_update_irq,
>> +	.set_eoi_exitmap = vmx_set_eoi_exitmap,
>> 
>>  	.set_tss_addr = vmx_set_tss_addr,
>>  	.get_tdp_level = get_ept_level,
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
>> b0b8abe..02fe194 100644 --- a/arch/x86/kvm/x86.c +++
>> b/arch/x86/kvm/x86.c @@ -164,6 +164,14 @@ static int
>> emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
>> 
>>  static int kvm_vcpu_reset(struct kvm_vcpu *vcpu);
>> +static inline bool kvm_apic_vid_enabled(struct kvm_vcpu *vcpu)
>> +{
>> +	if (kvm_x86_ops->has_virtual_interrupt_delivery)
> This callback is never NULL.
Ok.
 
>> +		return kvm_x86_ops->has_virtual_interrupt_delivery(vcpu);
>> +
>> +	return 0;
>> +}
>> +
>>  static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
>>  {
>>  	int i;
>> @@ -5533,12 +5541,20 @@ static void inject_pending_event(struct kvm_vcpu
> *vcpu)
>>  			vcpu->arch.nmi_injected = true;
>>  			kvm_x86_ops->set_nmi(vcpu);
>>  		}
>> -	} else if (kvm_cpu_has_interrupt(vcpu)) {
>> -		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
>> -			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
>> -					    false);
>> +	} else if (kvm_cpu_has_interrupt(vcpu) &&
>> +			kvm_x86_ops->interrupt_allowed(vcpu)) {
>> +		int vector = -1;
>> +
>> +		if (kvm_apic_vid_enabled(vcpu))
>> +			vector = kvm_cpu_get_extint(vcpu);
>> +		else
>> +			vector = kvm_cpu_get_interrupt(vcpu);
>> +
>> +		if (vector != -1) {
>> +			kvm_queue_interrupt(vcpu, vector, false);
>>  			kvm_x86_ops->set_irq(vcpu);
>>  		}
> If vid is enabled kvm_cpu_has_interrupt() should return true only if there
> is extint interrupt. Similarly kvm_cpu_get_interrupt() will only return
> extint if vid is enabled. This basically moves kvm_apic_vid_enabled()
> logic deeper into kvm_cpu_(has|get)_interrupt() functions instead
> of changing interrupt injection logic here and in vcpu_enter_guest()
> bellow. We still need kvm_cpu_has_interrupt() variant that always checks
> both extint and apic for use in kvm_arch_vcpu_runnable() though.
As you mentioned, we still need to checks both extint and apic interrupt in some case. So how to do this? Introduce another argument to indicate whether check both?

>> +
>>  	}
>>  }
>> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
> *vcpu)
>>  	}
>>  
>>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>> +		/* update archtecture specific hints for APIC
>> +		 * virtual interrupt delivery */
>> +		if (kvm_x86_ops->update_irq)
>> +			kvm_x86_ops->update_irq(vcpu);
>> +
> 
> I do not see why this have to be here instead of inside if
> (kvm_lapic_enabled(vcpu)){} near update_cr8_intercept() a couple of
> lines bellow. If you move it there you can drop apic enable check in
> kvm_apic_get_highest_irr().
Yes, it seems ok to move it.
 
>>  		inject_pending_event(vcpu);
>>  
>>  		/* enable NMI/IRQ window open exits if needed */
>>  		if (vcpu->arch.nmi_pending)
>>  			kvm_x86_ops->enable_nmi_window(vcpu);
>> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>> +		else if (kvm_apic_vid_enabled(vcpu)) {
>> +			if (kvm_cpu_has_extint(vcpu))
>> +				kvm_x86_ops->enable_irq_window(vcpu);
>> +		} else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>>  			kvm_x86_ops->enable_irq_window(vcpu);
>>  
>>  		if (kvm_lapic_enabled(vcpu)) {
>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>> index 166c450..898aa62 100644
>> --- a/virt/kvm/ioapic.c
>> +++ b/virt/kvm/ioapic.c
>> @@ -186,6 +186,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int
> irq)
>>  		/* need to read apic_id from apic regiest since 		 * it can be
>>  rewritten */ 		irqe.dest_id = ioapic->kvm->bsp_vcpu_id;
>>  +		kvm_set_eoi_exitmap(ioapic->kvm->vcpus[0], irqe.vector, 1, 1); 	}
>>  #endif 	return kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe);
>> --
>> 1.7.1
> 
> --
> 			Gleb.


Best regards,
Yang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov Dec. 4, 2012, 10:58 a.m. UTC | #3

On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2012-12-03:
> > On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
> >> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> >> manually, which is fully taken care of by the hardware. This needs
> >> some special awareness into existing interrupr injection path:
> >> 
> >> - for pending interrupt, instead of direct injection, we may need
> >>   update architecture specific indicators before resuming to guest.
> >> - A pending interrupt, which is masked by ISR, should be also
> >>   considered in above update action, since hardware will decide
> >>   when to inject it at right time. Current has_interrupt and
> >>   get_interrupt only returns a valid vector from injection p.o.v.
> > Most of my previous comments still apply.
> > 
> >> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
> >> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> >> ---
> >>  arch/x86/include/asm/kvm_host.h |    4 + arch/x86/include/asm/vmx.h   
> >>    |   11 +++ arch/x86/kvm/irq.c              |   53 ++++++++++-----
> >>  arch/x86/kvm/lapic.c            |   56 +++++++++++++---
> >>  arch/x86/kvm/lapic.h            |    6 ++ arch/x86/kvm/svm.c          
> >>     |   19 +++++ arch/x86/kvm/vmx.c              |  140
> >>  ++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c            
> >>   |   34 ++++++++-- virt/kvm/ioapic.c               |    1 + 9 files
> >>  changed, 291 insertions(+), 33 deletions(-)
> >> diff --git a/arch/x86/include/asm/kvm_host.h
> >> b/arch/x86/include/asm/kvm_host.h index dc87b65..e5352c8 100644 ---
> >> a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -697,6 +697,10 @@ struct kvm_x86_ops {
> >>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
> >>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
> >>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> >> +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
> >> +	void (*update_irq)(struct kvm_vcpu *vcpu);
> >> +	void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector,
> >> +			int trig_mode, int always_set);
> >>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
> >>  	int (*get_tdp_level)(void);
> >>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> >> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> >> index 21101b6..1003341 100644
> >> --- a/arch/x86/include/asm/vmx.h
> >> +++ b/arch/x86/include/asm/vmx.h
> >> @@ -62,6 +62,7 @@
> >>  #define EXIT_REASON_MCE_DURING_VMENTRY  41 #define
> >>  EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define EXIT_REASON_APIC_ACCESS    
> >>      44 +#define EXIT_REASON_EOI_INDUCED         45 #define
> >>  EXIT_REASON_EPT_VIOLATION       48 #define EXIT_REASON_EPT_MISCONFIG  
> >>      49 #define EXIT_REASON_WBINVD              54 @@ -143,6 +144,7 @@
> >>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040 #define
> >>  SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080 #define
> >>  SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100 +#define
> >>  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200 #define
> >>  SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400 #define
> >>  SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
> >> @@ -180,6 +182,7 @@ enum vmcs_field {
> >>  	GUEST_GS_SELECTOR               = 0x0000080a, 	GUEST_LDTR_SELECTOR   
> >>           = 0x0000080c, 	GUEST_TR_SELECTOR               = 0x0000080e,
> >>  +	GUEST_INTR_STATUS               = 0x00000810, 	HOST_ES_SELECTOR     
> >>            = 0x00000c00, 	HOST_CS_SELECTOR                = 0x00000c02,
> >>  	HOST_SS_SELECTOR                = 0x00000c04, @@ -207,6 +210,14 @@
> >>  enum vmcs_field { 	APIC_ACCESS_ADDR_HIGH		= 0x00002015, 	EPT_POINTER  
> >>                    = 0x0000201a, 	EPT_POINTER_HIGH                =
> >>  0x0000201b,
> >> +	EOI_EXIT_BITMAP0                = 0x0000201c,
> >> +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
> >> +	EOI_EXIT_BITMAP1                = 0x0000201e,
> >> +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
> >> +	EOI_EXIT_BITMAP2                = 0x00002020,
> >> +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
> >> +	EOI_EXIT_BITMAP3                = 0x00002022,
> >> +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
> >>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
> >>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
> >>  	VMCS_LINK_POINTER               = 0x00002800,
> >> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> >> index 7e06ba1..f782788 100644
> >> --- a/arch/x86/kvm/irq.c
> >> +++ b/arch/x86/kvm/irq.c
> >> @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
> >>   */
> >>  int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
> >>  {
> >> -	struct kvm_pic *s;
> >> -
> >>  	if (!irqchip_in_kernel(v->kvm))
> >>  		return v->arch.interrupt.pending;
> >> -	if (kvm_apic_has_interrupt(v) == -1) {	/* LAPIC */
> >> -		if (kvm_apic_accept_pic_intr(v)) {
> >> -			s = pic_irqchip(v->kvm);	/* PIC */
> >> -			return s->output;
> >> -		} else
> >> -			return 0;
> >> -	}
> >> +	if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */
> >> +		return kvm_cpu_has_extint(v); /* non-APIC */
> >>  	return 1;
> >>  }
> >>  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
> >>  
> >>  /*
> >> + * check if there is pending interrupt from
> >> + * non-APIC source without intack.
> >> + */
> >> +int kvm_cpu_has_extint(struct kvm_vcpu *v)
> >> +{
> >> +	struct kvm_pic *s;
> >> +
> >> +	if (kvm_apic_accept_pic_intr(v)) {
> >> +		s = pic_irqchip(v->kvm); /* PIC */
> >> +		return s->output;
> >> +	} else
> >> +		return 0;
> >> +}
> >> +
> >> +/*
> >>   * Read pending interrupt vector and intack.
> >>   */
> >>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v) { -	struct kvm_pic *s;
> >>  	int vector;
> >>  
> >>  	if (!irqchip_in_kernel(v->kvm))
> >>  		return v->arch.interrupt.nr;
> >>  
> >>  	vector = kvm_get_apic_interrupt(v);	/* APIC */
> >> -	if (vector == -1) {
> >> -		if (kvm_apic_accept_pic_intr(v)) {
> >> -			s = pic_irqchip(v->kvm);
> >> -			s->output = 0;		/* PIC */
> >> -			vector = kvm_pic_read_irq(v->kvm);
> >> -		}
> >> -	}
> >> +	if (vector == -1)
> >> +		return kvm_cpu_get_extint(v); /* non-APIC */
> >>  	return vector;
> >>  }
> >>  EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
> >> +/*
> >> + * Read pending interrupt(from non-APIC source)
> >> + * vector and intack.
> >> + */
> >> +int kvm_cpu_get_extint(struct kvm_vcpu *v)
> >> +{
> >> +	struct kvm_pic *s;
> >> +	int vector = -1;
> >> +
> >> +	if (kvm_apic_accept_pic_intr(v)) {
> >> +		s = pic_irqchip(v->kvm);
> >> +		s->output = 0;		/* PIC */
> >> +		vector = kvm_pic_read_irq(v->kvm);
> >> +	}
> >> +	return vector;
> >> +}
> >> +
> >>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
> >>  {
> >>  	kvm_inject_apic_timer_irqs(vcpu);
> >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >> index 7c96012..400d3ba 100644
> >> --- a/arch/x86/kvm/lapic.c
> >> +++ b/arch/x86/kvm/lapic.c
> >> @@ -643,6 +643,14 @@ out:
> >>  	return ret;
> >>  }
> >> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> >> +		int trig_mode, int always_set)
> >> +{
> >> +	if (kvm_x86_ops->set_eoi_exitmap)
> >> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
> >> +					trig_mode, always_set);
> >> +}
> >> +
> >>  /*
> >>   * Add a pending IRQ into lapic.
> >>   * Return 1 if successfully added and 0 if discarded.
> >> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int
> > delivery_mode,
> >>  		if (unlikely(!apic_enabled(apic)))
> >>  			break;
> >> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
> > As I said in the last review rebuild the bitmap when ioapic or irq
> > notifier configuration changes, user request bit to notify vcpus to
> > reload the bitmap.
> It is too complicated. When program ioapic entry, we cannot get the target vcpu easily. We need to read destination format register and logical destination register to find out target vcpu if using logical mode. Also, we must trap every modification to the two registers to update eoi bitmap. 
No need to check target vcpu. Enable exit on all vcpus for the vector
programmed into ioapic. Which two registers? All accesses to ioapic are
trapped and reconfiguration is rare.

> For irq notifier, only PIT is special which is edge trigger but need an EOI notifier. So, just treat it specially. And TMR can cover others.
> 
We shouldn't assume that. If another notifier will be added it will be
easy to forget to update apicv code to exclude another vector too.

> > 
> >>  		if (trig_mode) {
> >>  			apic_debug("level trig mode for vector %d", vector);
> >>  			apic_set_vector(vector, apic->regs + APIC_TMR);
> >> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1,
> > struct kvm_vcpu *vcpu2)
> >>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
> >>  }
> >> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> >> +{
> >> +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> >> +	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> >> +		int trigger_mode;
> >> +		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> >> +			trigger_mode = IOAPIC_LEVEL_TRIG;
> >> +		else
> >> +			trigger_mode = IOAPIC_EDGE_TRIG;
> >> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> >> +	}
> >> +}
> >> +
> >>  static int apic_set_eoi(struct kvm_lapic *apic) { 	int vector =
> >>  apic_find_highest_isr(apic); @@ -756,19 +778,24 @@ static int
> >>  apic_set_eoi(struct kvm_lapic *apic) 	apic_clear_isr(vector, apic);
> >>  	apic_update_ppr(apic);
> >> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> >> -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> >> -		int trigger_mode;
> >> -		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> >> -			trigger_mode = IOAPIC_LEVEL_TRIG;
> >> -		else
> >> -			trigger_mode = IOAPIC_EDGE_TRIG;
> >> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> >> -	}
> >> +	kvm_ioapic_send_eoi(apic, vector);
> >>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> >>  	return vector;
> >>  }
> >> +/*
> >> + * this interface assumes a trap-like exit, which has already finished
> >> + * desired side effect including vISR and vPPR update.
> >> + */
> >> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
> >> +{
> >> +	struct kvm_lapic *apic = vcpu->arch.apic;
> >> +
> > trace_kvm_eoi()
> Ok.
> 
> >> +	kvm_ioapic_send_eoi(apic, vector);
> >> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> >> +}
> >> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
> >> +
> >>  static void apic_send_ipi(struct kvm_lapic *apic) { 	u32 icr_low =
> >>  kvm_apic_get_reg(apic, APIC_ICR); @@ -1533,6 +1560,17 @@ int
> >>  kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) 	return highest_irr; }
> >> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
> >> +{
> >> +	struct kvm_lapic *apic = vcpu->arch.apic;
> >> +
> >> +	if (!apic || !apic_enabled(apic))
> > Use kvm_vcpu_has_lapic() instead of checking arch.apic directly.
> Ok.
> 
> > 
> >> +		return -1;
> >> +
> >> +	return apic_find_highest_irr(apic);
> >> +}
> >> +EXPORT_SYMBOL_GPL(kvm_apic_get_highest_irr);
> >> +
> >>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
> >>  {
> >>  	u32 lvt0 = kvm_apic_get_reg(vcpu->arch.apic, APIC_LVT0);
> >> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> >> index c42f111..749661a 100644
> >> --- a/arch/x86/kvm/lapic.h
> >> +++ b/arch/x86/kvm/lapic.h
> >> @@ -39,6 +39,9 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
> >>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
> >>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
> >>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
> >> +int kvm_cpu_has_extint(struct kvm_vcpu *v);
> >> +int kvm_cpu_get_extint(struct kvm_vcpu *v);
> >> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu);
> >>  void kvm_lapic_reset(struct kvm_vcpu *vcpu); u64
> >>  kvm_lapic_get_cr8(struct kvm_vcpu *vcpu); void
> >>  kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8); @@ -50,6
> >>  +53,8 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu); int
> >>  kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); int
> >>  kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); int
> >>  kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
> >> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> >> +		int need_eoi, int global);
> >>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
> >>  
> >>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
> >> @@ -65,6 +70,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu
> > *vcpu);
> >>  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
> >>  
> >>  int kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
> >> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
> >> 
> >>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
> >>  void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
> >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >> index dcb7952..8f0903b 100644
> >> --- a/arch/x86/kvm/svm.c
> >> +++ b/arch/x86/kvm/svm.c
> >> @@ -3573,6 +3573,22 @@ static void update_cr8_intercept(struct kvm_vcpu
> > *vcpu, int tpr, int irr)
> >>  		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> >>  }
> >> +static int svm_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> >> +{
> >> +	return 0;
> >> +}
> >> +
> >> +static void svm_update_irq(struct kvm_vcpu *vcpu)
> >> +{
> >> +	return ;
> >> +}
> >> +
> >> +static void svm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> >> +				int trig_mode, int always_set)
> >> +{
> >> +	return ;
> >> +}
> >> +
> >>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu) { 	struct vcpu_svm
> >>  *svm = to_svm(vcpu); @@ -4292,6 +4308,9 @@ static struct kvm_x86_ops
> >>  svm_x86_ops = { 	.enable_nmi_window = enable_nmi_window,
> >>  	.enable_irq_window = enable_irq_window, 	.update_cr8_intercept =
> >>  update_cr8_intercept,
> >> +	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,
> >> +	.update_irq = svm_update_irq;
> >> +	.set_eoi_exitmap = svm_set_eoi_exitmap;
> >> 
> >>  	.set_tss_addr = svm_set_tss_addr,
> >>  	.get_tdp_level = get_npt_level,
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> index 6a5f651..909ce90 100644
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -86,6 +86,9 @@ module_param(fasteoi, bool, S_IRUGO);
> >>  static bool __read_mostly enable_apicv_reg;
> >>  module_param(enable_apicv_reg, bool, S_IRUGO);
> >> +static bool __read_mostly enable_apicv_vid;
> >> +module_param(enable_apicv_vid, bool, S_IRUGO);
> >> +
> >>  /*
> >>   * If nested=1, nested virtualization is supported, i.e., guests may use
> >>   * VMX and be a hypervisor for its own guests. If nested=0, guests may not
> >> @@ -432,6 +435,9 @@ struct vcpu_vmx {
> >> 
> >>  	bool rdtscp_enabled;
> >> +	u8 eoi_exitmap_changed;
> >> +	u32 eoi_exit_bitmap[8];
> >> +
> >>  	/* Support for a guest hypervisor (nested VMX) */
> >>  	struct nested_vmx nested;
> >>  };
> >> @@ -770,6 +776,12 @@ static inline bool
> > cpu_has_vmx_apic_register_virt(void)
> >>  		SECONDARY_EXEC_APIC_REGISTER_VIRT;
> >>  }
> >> +static inline bool cpu_has_vmx_virtual_intr_delivery(void)
> >> +{
> >> +	return vmcs_config.cpu_based_2nd_exec_ctrl &
> >> +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> >> +}
> >> +
> >>  static inline bool cpu_has_vmx_flexpriority(void)
> >>  {
> >>  	return cpu_has_vmx_tpr_shadow() &&
> >> @@ -2508,7 +2520,8 @@ static __init int setup_vmcs_config(struct
> > vmcs_config *vmcs_conf)
> >>  			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
> >>  			SECONDARY_EXEC_RDTSCP |
> >>  			SECONDARY_EXEC_ENABLE_INVPCID |
> >> -			SECONDARY_EXEC_APIC_REGISTER_VIRT;
> >> +			SECONDARY_EXEC_APIC_REGISTER_VIRT |
> >> +			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> >>  		if (adjust_vmx_controls(min2, opt2,
> >>  					MSR_IA32_VMX_PROCBASED_CTLS2,
> >>  					&_cpu_based_2nd_exec_control) < 0)
> >> @@ -2522,7 +2535,8 @@ static __init int setup_vmcs_config(struct
> >> vmcs_config *vmcs_conf)
> >> 
> >>  	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
> >>  		_cpu_based_2nd_exec_control &= ~(
> >> -				SECONDARY_EXEC_APIC_REGISTER_VIRT);
> >> +				SECONDARY_EXEC_APIC_REGISTER_VIRT |
> >> +				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> >> 
> >>  	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) { 		/*
> >>  CR3 accesses and invlpg don't need to cause VM Exits when EPT @@
> >>  -2724,6 +2738,14 @@ static __init int hardware_setup(void) 	if
> >>  (!cpu_has_vmx_apic_register_virt()) 		enable_apicv_reg = 0;
> >> +	if (!cpu_has_vmx_virtual_intr_delivery())
> >> +		enable_apicv_vid = 0;
> >> +
> >> +	if (!enable_apicv_vid) {
> >> +		kvm_x86_ops->update_irq = NULL;
> > Why setting it to NULL? Either drop this since vmx_update_irq() checks
> > enable_apicv_vid or better set it to function that does nothing and drop
> > enable_apicv_vid check in vmx_update_irq(). Since kvm_x86_ops->update_irq
> > will never be NULL you can drop the check before calling it.
> Sure.
>  
> >> +		kvm_x86_ops->update_cr8_intercept = NULL;
> > Why? It should be other way around: if apicv is enabled set
> > update_cr8_intercept callback to NULL.
> Yes, this is wrong.
Please test the patches with vid disabled and Windows guests. This bug
should have prevented it from working.

>  
> >> +	}
> >> +
> >>  	if (nested)
> >>  		nested_vmx_setup_ctls_msrs();
> >> @@ -3839,6 +3861,8 @@ static u32 vmx_secondary_exec_control(struct
> > vcpu_vmx *vmx)
> >>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
> >>  	if (!enable_apicv_reg)
> >>  		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
> >> +	if (!enable_apicv_vid)
> >> +		exec_control &= ~SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> >>  	return exec_control;
> >>  }
> >> @@ -3883,6 +3907,15 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
> >>  				vmx_secondary_exec_control(vmx));
> >>  	}
> >> +	if (enable_apicv_vid) {
> >> +		vmcs_write64(EOI_EXIT_BITMAP0, 0);
> >> +		vmcs_write64(EOI_EXIT_BITMAP1, 0);
> >> +		vmcs_write64(EOI_EXIT_BITMAP2, 0);
> >> +		vmcs_write64(EOI_EXIT_BITMAP3, 0);
> >> +
> >> +		vmcs_write16(GUEST_INTR_STATUS, 0);
> >> +	}
> >> +
> >>  	if (ple_gap) {
> >>  		vmcs_write32(PLE_GAP, ple_gap);
> >>  		vmcs_write32(PLE_WINDOW, ple_window);
> >> @@ -4806,6 +4839,16 @@ static int handle_apic_access(struct kvm_vcpu
> > *vcpu)
> >>  	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
> >>  }
> >> +static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
> >> +{
> >> +	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> >> +	int vector = exit_qualification & 0xff;
> >> +
> >> +	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
> >> +	kvm_apic_set_eoi_accelerated(vcpu, vector);
> >> +	return 1;
> >> +}
> >> +
> >>  static int handle_apic_write(struct kvm_vcpu *vcpu)
> >>  {
> >>  	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> >> @@ -5755,6 +5798,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct
> > kvm_vcpu *vcpu) = {
> >>  	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
> >>  	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
> >>  	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,
> >>  +	[EXIT_REASON_EOI_INDUCED]             = handle_apic_eoi_induced,
> >>  	[EXIT_REASON_WBINVD]                  = handle_wbinvd,
> >>  	[EXIT_REASON_XSETBV]                  = handle_xsetbv,
> >>  	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
> >> @@ -6096,6 +6140,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
> >> 
> >>  static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
> >>  {
> >> +	/* no need for tpr_threshold update if APIC virtual
> >> +	 * interrupt delivery is enabled */
> >> +	if (!enable_apicv_vid)
> >> +		return ;
> >> +
> > Since you (will) set ->update_cr8_intercept callback to NULL if vid
> > is enabled this function will never be called with !enable_apicv_vid,
> > so this check can be dropped.
> Ok.
> 
> >>  	if (irr == -1 || tpr < irr) {
> >>  		vmcs_write32(TPR_THRESHOLD, 0);
> >>  		return;
> >> @@ -6104,6 +6153,90 @@ static void update_cr8_intercept(struct kvm_vcpu
> > *vcpu, int tpr, int irr)
> >>  	vmcs_write32(TPR_THRESHOLD, irr);
> >>  }
> >> +static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> >> +{
> >> +	return irqchip_in_kernel(vcpu->kvm) && enable_apicv_vid;
> >> +}
> >> +
> >> +static void vmx_update_eoi_exitmap(struct vcpu_vmx *vmx, int index)
> >> +{
> >> +	int tmr;
> >> +	tmr = kvm_apic_get_reg(vmx->vcpu.arch.apic,
> >> +			APIC_TMR + 0x10 * index);
> >> +	vmcs_write32(EOI_EXIT_BITMAP0 + index,
> >> +			vmx->eoi_exit_bitmap[index] | tmr);
> >> +}
> >> +
> >> +static void vmx_update_rvi(int vector)
> >> +{
> >> +	u16 status;
> >> +	u8 old;
> >> +
> >> +	status = vmcs_read16(GUEST_INTR_STATUS);
> >> +	old = (u8)status & 0xff;
> >> +	if ((u8)vector != old) {
> >> +		status &= ~0xff;
> >> +		status |= (u8)vector;
> >> +		vmcs_write16(GUEST_INTR_STATUS, status);
> >> +	}
> >> +}
> >> +
> >> +static void vmx_update_irq(struct kvm_vcpu *vcpu)
> >> +{
> >> +	int vector;
> >> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> >> +
> >> +	if (!enable_apicv_vid)
> >> +		return ;
> >> +
> >> +	vector = kvm_apic_get_highest_irr(vcpu);
> >> +	if (vector == -1)
> >> +		return;
> >> +
> >> +	vmx_update_rvi(vector);
> >> +
> >> +	if (vmx->eoi_exitmap_changed) {
> >> +		int index;
> >> +		for_each_set_bit(index,
> >> +				(unsigned long *)(&vmx->eoi_exitmap_changed), 8)
> >> +			vmx_update_eoi_exitmap(vmx, index);
> >> +		vmx->eoi_exitmap_changed = 0;
> >> +	}
> >> +}
> >> +
> >> +static void vmx_set_eoi_exitmap(struct kvm_vcpu *vcpu,
> >> +				int vector, int trig_mode,
> >> +				int always_set)
> >> +{
> >> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> >> +	int index, offset, changed;
> >> +	struct kvm_lapic *apic;
> >> +
> >> +	if (!enable_apicv_vid)
> >> +		return ;
> >> +
> >> +	if (WARN_ONCE((vector < 0) || (vector > 255),
> >> +		"KVM VMX: vector (%d) out of range\n", vector))
> >> +		return;
> >> +
> >> +	apic = vcpu->arch.apic;
> >> +	index = vector >> 5;
> >> +	offset = vector & 31;
> >> +
> >> +	if (always_set)
> >> +		changed = !test_and_set_bit(offset,
> >> +				(unsigned long *)&vmx->eoi_exit_bitmap);
> >> +	else if (trig_mode)
> >> +		changed = !test_bit(offset,
> >> +				apic->regs + APIC_TMR + index * 0x10);
> >> +	else
> >> +		changed = test_bit(offset,
> >> +				apic->regs + APIC_TMR + index * 0x10);
> >> +
> >> +	if (changed)
> >> +		vmx->eoi_exitmap_changed |= 1 << index;
> >> +}
> >> +
> >>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx) { 	u32
> >>  exit_intr_info; @@ -7364,6 +7497,9 @@ static struct kvm_x86_ops
> >>  vmx_x86_ops = { 	.enable_nmi_window = enable_nmi_window,
> >>  	.enable_irq_window = enable_irq_window, 	.update_cr8_intercept =
> >>  update_cr8_intercept,
> >> +	.has_virtual_interrupt_delivery = vmx_has_virtual_interrupt_delivery,
> >> +	.update_irq = vmx_update_irq,
> >> +	.set_eoi_exitmap = vmx_set_eoi_exitmap,
> >> 
> >>  	.set_tss_addr = vmx_set_tss_addr,
> >>  	.get_tdp_level = get_ept_level,
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
> >> b0b8abe..02fe194 100644 --- a/arch/x86/kvm/x86.c +++
> >> b/arch/x86/kvm/x86.c @@ -164,6 +164,14 @@ static int
> >> emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
> >> 
> >>  static int kvm_vcpu_reset(struct kvm_vcpu *vcpu);
> >> +static inline bool kvm_apic_vid_enabled(struct kvm_vcpu *vcpu)
> >> +{
> >> +	if (kvm_x86_ops->has_virtual_interrupt_delivery)
> > This callback is never NULL.
> Ok.
>  
> >> +		return kvm_x86_ops->has_virtual_interrupt_delivery(vcpu);
> >> +
> >> +	return 0;
> >> +}
> >> +
> >>  static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
> >>  {
> >>  	int i;
> >> @@ -5533,12 +5541,20 @@ static void inject_pending_event(struct kvm_vcpu
> > *vcpu)
> >>  			vcpu->arch.nmi_injected = true;
> >>  			kvm_x86_ops->set_nmi(vcpu);
> >>  		}
> >> -	} else if (kvm_cpu_has_interrupt(vcpu)) {
> >> -		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
> >> -			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
> >> -					    false);
> >> +	} else if (kvm_cpu_has_interrupt(vcpu) &&
> >> +			kvm_x86_ops->interrupt_allowed(vcpu)) {
> >> +		int vector = -1;
> >> +
> >> +		if (kvm_apic_vid_enabled(vcpu))
> >> +			vector = kvm_cpu_get_extint(vcpu);
> >> +		else
> >> +			vector = kvm_cpu_get_interrupt(vcpu);
> >> +
> >> +		if (vector != -1) {
> >> +			kvm_queue_interrupt(vcpu, vector, false);
> >>  			kvm_x86_ops->set_irq(vcpu);
> >>  		}
> > If vid is enabled kvm_cpu_has_interrupt() should return true only if there
> > is extint interrupt. Similarly kvm_cpu_get_interrupt() will only return
> > extint if vid is enabled. This basically moves kvm_apic_vid_enabled()
> > logic deeper into kvm_cpu_(has|get)_interrupt() functions instead
> > of changing interrupt injection logic here and in vcpu_enter_guest()
> > bellow. We still need kvm_cpu_has_interrupt() variant that always checks
> > both extint and apic for use in kvm_arch_vcpu_runnable() though.
> As you mentioned, we still need to checks both extint and apic interrupt in some case. So how to do this? Introduce another argument to indicate whether check both?
Yes, we need to check both in kvm_arch_vcpu_runnable(). Another argument
is good option. We can have two functions: kvm_cpu_has_injectable_interrupt() for
use in irq injection path and kvm_cpu_has_interrupt() for use in
kvm_arch_vcpu_runnable(). They will call common one with additional argument to
avoid code duplication.

> 
> >> +
> >>  	}
> >>  }
> >> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
> > *vcpu)
> >>  	}
> >>  
> >>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> >> +		/* update archtecture specific hints for APIC
> >> +		 * virtual interrupt delivery */
> >> +		if (kvm_x86_ops->update_irq)
> >> +			kvm_x86_ops->update_irq(vcpu);
> >> +
> > 
> > I do not see why this have to be here instead of inside if
> > (kvm_lapic_enabled(vcpu)){} near update_cr8_intercept() a couple of
> > lines bellow. If you move it there you can drop apic enable check in
> > kvm_apic_get_highest_irr().
> Yes, it seems ok to move it.
>  
> >>  		inject_pending_event(vcpu);
> >>  
> >>  		/* enable NMI/IRQ window open exits if needed */
> >>  		if (vcpu->arch.nmi_pending)
> >>  			kvm_x86_ops->enable_nmi_window(vcpu);
> >> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> >> +		else if (kvm_apic_vid_enabled(vcpu)) {
> >> +			if (kvm_cpu_has_extint(vcpu))
> >> +				kvm_x86_ops->enable_irq_window(vcpu);
> >> +		} else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> >>  			kvm_x86_ops->enable_irq_window(vcpu);
> >>  
> >>  		if (kvm_lapic_enabled(vcpu)) {
> >> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> >> index 166c450..898aa62 100644
> >> --- a/virt/kvm/ioapic.c
> >> +++ b/virt/kvm/ioapic.c
> >> @@ -186,6 +186,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int
> > irq)
> >>  		/* need to read apic_id from apic regiest since 		 * it can be
> >>  rewritten */ 		irqe.dest_id = ioapic->kvm->bsp_vcpu_id;
> >>  +		kvm_set_eoi_exitmap(ioapic->kvm->vcpus[0], irqe.vector, 1, 1); 	}
> >>  #endif 	return kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe);
> >> --
> >> 1.7.1
> > 
> > --
> > 			Gleb.
> 
> 
> Best regards,
> Yang
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin Dec. 4, 2012, 4:46 p.m. UTC | #4

On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
> +		int vector = -1;
nitpicking: no need to initialize vector here

> +
> +		if (kvm_apic_vid_enabled(vcpu))
> +			vector = kvm_cpu_get_extint(vcpu);
> +		else
> +			vector = kvm_cpu_get_interrupt(vcpu);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zhang, Yang Z Dec. 5, 2012, 1:55 a.m. UTC | #5

Gleb Natapov wrote on 2012-12-04:
> On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
>> Gleb Natapov wrote on 2012-12-03:
>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
>>>> manually, which is fully taken care of by the hardware. This needs
>>>> some special awareness into existing interrupr injection path:
>>>> 
>>>> - for pending interrupt, instead of direct injection, we may need
>>>>   update architecture specific indicators before resuming to guest. -
>>>>   A pending interrupt, which is masked by ISR, should be also
>>>>   considered in above update action, since hardware will decide when
>>>>   to inject it at right time. Current has_interrupt and get_interrupt
>>>>   only returns a valid vector from injection p.o.v.
>>> Most of my previous comments still apply.
>>> 
>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>> +		int trig_mode, int always_set)
>>>> +{
>>>> +	if (kvm_x86_ops->set_eoi_exitmap)
>>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
>>>> +					trig_mode, always_set);
>>>> +}
>>>> +
>>>>  /*
>>>>   * Add a pending IRQ into lapic.
>>>>   * Return 1 if successfully added and 0 if discarded.
>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic,
> int
>>> delivery_mode,
>>>>  		if (unlikely(!apic_enabled(apic)))
>>>>  			break;
>>>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
>>> As I said in the last review rebuild the bitmap when ioapic or irq
>>> notifier configuration changes, user request bit to notify vcpus to
>>> reload the bitmap.
>> It is too complicated. When program ioapic entry, we cannot get the target vcpu
> easily. We need to read destination format register and logical destination
> register to find out target vcpu if using logical mode. Also, we must trap every
> modification to the two registers to update eoi bitmap.
> No need to check target vcpu. Enable exit on all vcpus for the vector
This is wrong. As we known, modern OS uses per VCPU vector. We cannot ensure all vectors have same trigger mode. And what's worse, the vector in another vcpu is used to handle high frequency interrupts(like 10G NIC), then it will hurt performance.

> programmed into ioapic. Which two registers? All accesses to ioapic are
> trapped and reconfiguration is rare.
In logical mode, the destination VCPU is depend on each CPU's destination format register and logical destination register. So we must also trap the two registers.
And if it uses lowest priority delivery mode, the PPR need to be trapped too. Since PPR will change on each interrupt injection, the cost should be higher than current approach.

>> For irq notifier, only PIT is special which is edge trigger but need an
>> EOI notifier. So, just treat it specially. And TMR can cover others.
>> 
> We shouldn't assume that. If another notifier will be added it will be
> easy to forget to update apicv code to exclude another vector too.
At this point, guest is not running(in device initializing), we cannot not know the vector. As you mentioned, the best point is when guest program ioapic entry. But it also is impossible to get the vector(see above).
I can give some comments on the function to remind the caller to update eoi bitmap when the interrupt is edge and they still want to get EOI vmexit.

>>> 
>>>>  		if (trig_mode) {
>>>>  			apic_debug("level trig mode for vector %d", vector);
>>>>  			apic_set_vector(vector, apic->regs + APIC_TMR);
>>>> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu
> *vcpu1,
>>> struct kvm_vcpu *vcpu2)
>>>>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>>>>  }
>>>> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>>>> +{ +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) &
>>>> APIC_SPIV_DIRECTED_EOI) && +	   
>>>> kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) { +		int
>>>> trigger_mode; +		if (apic_test_vector(vector, apic->regs + APIC_TMR))
>>>> +			trigger_mode = IOAPIC_LEVEL_TRIG; +		else +			trigger_mode =
>>>> IOAPIC_EDGE_TRIG; +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector,
>>>> trigger_mode); +	} +} +
>>>>  static int apic_set_eoi(struct kvm_lapic *apic) { 	int vector =
>>>>  apic_find_highest_isr(apic); @@ -756,19 +778,24 @@ static int
>>>>  apic_set_eoi(struct kvm_lapic *apic) 	apic_clear_isr(vector, apic);
>>>>  	apic_update_ppr(apic);
>>>> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>>>> && -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) { -		int
>>>> trigger_mode; -		if (apic_test_vector(vector, apic->regs + APIC_TMR))
>>>> -			trigger_mode = IOAPIC_LEVEL_TRIG; -		else -			trigger_mode =
>>>> IOAPIC_EDGE_TRIG; -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector,
>>>> trigger_mode); -	} +	kvm_ioapic_send_eoi(apic, vector);
>>>>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>>>  	return vector;
>>>>  }
>>>> +/*
>>>> + * this interface assumes a trap-like exit, which has already finished
>>>> + * desired side effect including vISR and vPPR update.
>>>> + */
>>>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
>>>> +{
>>>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>>>> +
>>> trace_kvm_eoi()
>> Ok.
>> 
>>>> +	kvm_ioapic_send_eoi(apic, vector);
>>>> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
>>>> +
>>>>  static void apic_send_ipi(struct kvm_lapic *apic) { 	u32 icr_low =
>>>>  kvm_apic_get_reg(apic, APIC_ICR); @@ -1533,6 +1560,17 @@ int
>>>>  kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) 	return highest_irr; }
>>>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>>>> +
>>>> +	if (!apic || !apic_enabled(apic))
>>> Use kvm_vcpu_has_lapic() instead of checking arch.apic directly.
>> Ok.
>> 
>>> 
>>>> +		return -1;
>>>> +
>>>> +	return apic_find_highest_irr(apic);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(kvm_apic_get_highest_irr);
>>>> +
>>>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
>>>>  {
>>>>  	u32 lvt0 = kvm_apic_get_reg(vcpu->arch.apic, APIC_LVT0);
>>>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
>>>> index c42f111..749661a 100644
>>>> --- a/arch/x86/kvm/lapic.h
>>>> +++ b/arch/x86/kvm/lapic.h
>>>> @@ -39,6 +39,9 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
>>>>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
>>>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
>>>>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
>>>> +int kvm_cpu_has_extint(struct kvm_vcpu *v);
>>>> +int kvm_cpu_get_extint(struct kvm_vcpu *v);
>>>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu);
>>>>  void kvm_lapic_reset(struct kvm_vcpu *vcpu); u64
>>>>  kvm_lapic_get_cr8(struct kvm_vcpu *vcpu); void
>>>>  kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8); @@ -50,6
>>>>  +53,8 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu); int
>>>>  kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); int
>>>>  kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); int
>>>>  kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>> +		int need_eoi, int global);
>>>>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
>>>>  
>>>>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic
> *src,
>>>> @@ -65,6 +70,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu
>>> *vcpu);
>>>>  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
>>>>  
>>>>  int kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
>>>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
>>>> 
>>>>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t
>>>>  vapic_addr); void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>> index dcb7952..8f0903b 100644
>>>> --- a/arch/x86/kvm/svm.c
>>>> +++ b/arch/x86/kvm/svm.c
>>>> @@ -3573,6 +3573,22 @@ static void update_cr8_intercept(struct
> kvm_vcpu
>>> *vcpu, int tpr, int irr)
>>>>  		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>>>>  }
>>>> +static int svm_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static void svm_update_irq(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	return ;
>>>> +}
>>>> +
>>>> +static void svm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>> +				int trig_mode, int always_set)
>>>> +{
>>>> +	return ;
>>>> +}
>>>> +
>>>>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu) { 	struct vcpu_svm
>>>>  *svm = to_svm(vcpu); @@ -4292,6 +4308,9 @@ static struct kvm_x86_ops
>>>>  svm_x86_ops = { 	.enable_nmi_window = enable_nmi_window,
>>>>  	.enable_irq_window = enable_irq_window, 	.update_cr8_intercept =
>>>>  update_cr8_intercept,
>>>> +	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,
>>>> +	.update_irq = svm_update_irq;
>>>> +	.set_eoi_exitmap = svm_set_eoi_exitmap;
>>>> 
>>>>  	.set_tss_addr = svm_set_tss_addr,
>>>>  	.get_tdp_level = get_npt_level,
>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>> index 6a5f651..909ce90 100644
>>>> --- a/arch/x86/kvm/vmx.c
>>>> +++ b/arch/x86/kvm/vmx.c
>>>> @@ -86,6 +86,9 @@ module_param(fasteoi, bool, S_IRUGO);
>>>>  static bool __read_mostly enable_apicv_reg;
>>>>  module_param(enable_apicv_reg, bool, S_IRUGO);
>>>> +static bool __read_mostly enable_apicv_vid;
>>>> +module_param(enable_apicv_vid, bool, S_IRUGO);
>>>> +
>>>>  /*
>>>>   * If nested=1, nested virtualization is supported, i.e., guests may use
>>>>   * VMX and be a hypervisor for its own guests. If nested=0, guests may
> not
>>>> @@ -432,6 +435,9 @@ struct vcpu_vmx {
>>>> 
>>>>  	bool rdtscp_enabled;
>>>> +	u8 eoi_exitmap_changed;
>>>> +	u32 eoi_exit_bitmap[8];
>>>> +
>>>>  	/* Support for a guest hypervisor (nested VMX) */
>>>>  	struct nested_vmx nested;
>>>>  };
>>>> @@ -770,6 +776,12 @@ static inline bool
>>> cpu_has_vmx_apic_register_virt(void)
>>>>  		SECONDARY_EXEC_APIC_REGISTER_VIRT;
>>>>  }
>>>> +static inline bool cpu_has_vmx_virtual_intr_delivery(void)
>>>> +{
>>>> +	return vmcs_config.cpu_based_2nd_exec_ctrl &
>>>> +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>>>> +}
>>>> +
>>>>  static inline bool cpu_has_vmx_flexpriority(void)
>>>>  {
>>>>  	return cpu_has_vmx_tpr_shadow() &&
>>>> @@ -2508,7 +2520,8 @@ static __init int setup_vmcs_config(struct
>>> vmcs_config *vmcs_conf)
>>>>  			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
>>>>  			SECONDARY_EXEC_RDTSCP |
>>>>  			SECONDARY_EXEC_ENABLE_INVPCID |
>>>> -			SECONDARY_EXEC_APIC_REGISTER_VIRT;
>>>> +			SECONDARY_EXEC_APIC_REGISTER_VIRT |
>>>> +			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>>>>  		if (adjust_vmx_controls(min2, opt2,
>>>>  					MSR_IA32_VMX_PROCBASED_CTLS2,
>>>>  					&_cpu_based_2nd_exec_control) < 0)
>>>> @@ -2522,7 +2535,8 @@ static __init int setup_vmcs_config(struct
>>>> vmcs_config *vmcs_conf)
>>>> 
>>>>  	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
>>>>  		_cpu_based_2nd_exec_control &= ~(
>>>> -				SECONDARY_EXEC_APIC_REGISTER_VIRT);
>>>> +				SECONDARY_EXEC_APIC_REGISTER_VIRT |
>>>> +				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>>>> 
>>>>  	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) { 		/*
>>>>  CR3 accesses and invlpg don't need to cause VM Exits when EPT @@
>>>>  -2724,6 +2738,14 @@ static __init int hardware_setup(void) 	if
>>>>  (!cpu_has_vmx_apic_register_virt()) 		enable_apicv_reg = 0;
>>>> +	if (!cpu_has_vmx_virtual_intr_delivery())
>>>> +		enable_apicv_vid = 0;
>>>> +
>>>> +	if (!enable_apicv_vid) {
>>>> +		kvm_x86_ops->update_irq = NULL;
>>> Why setting it to NULL? Either drop this since vmx_update_irq() checks
>>> enable_apicv_vid or better set it to function that does nothing and
>>> drop enable_apicv_vid check in vmx_update_irq(). Since
>>> kvm_x86_ops->update_irq will never be NULL you can drop the check
>>> before calling it.
>> Sure.
>> 
>>>> +		kvm_x86_ops->update_cr8_intercept = NULL;
>>> Why? It should be other way around: if apicv is enabled set
>>> update_cr8_intercept callback to NULL.
>> Yes, this is wrong.
> Please test the patches with vid disabled and Windows guests. This bug
> should have prevented it from working.
> 
>> 
>>>> +	}
>>>> +
>>>>  	if (nested)
>>>>  		nested_vmx_setup_ctls_msrs();
>>>> @@ -3839,6 +3861,8 @@ static u32 vmx_secondary_exec_control(struct
>>> vcpu_vmx *vmx)
>>>>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>>>>  	if (!enable_apicv_reg)
>>>>  		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
>>>> +	if (!enable_apicv_vid)
>>>> +		exec_control &= ~SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>>>>  	return exec_control;
>>>>  }
>>>> @@ -3883,6 +3907,15 @@ static int vmx_vcpu_setup(struct vcpu_vmx
> *vmx)
>>>>  				vmx_secondary_exec_control(vmx));
>>>>  	}
>>>> +	if (enable_apicv_vid) {
>>>> +		vmcs_write64(EOI_EXIT_BITMAP0, 0);
>>>> +		vmcs_write64(EOI_EXIT_BITMAP1, 0);
>>>> +		vmcs_write64(EOI_EXIT_BITMAP2, 0);
>>>> +		vmcs_write64(EOI_EXIT_BITMAP3, 0);
>>>> +
>>>> +		vmcs_write16(GUEST_INTR_STATUS, 0);
>>>> +	}
>>>> +
>>>>  	if (ple_gap) {
>>>>  		vmcs_write32(PLE_GAP, ple_gap);
>>>>  		vmcs_write32(PLE_WINDOW, ple_window);
>>>> @@ -4806,6 +4839,16 @@ static int handle_apic_access(struct kvm_vcpu
>>> *vcpu)
>>>>  	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
>>>>  }
>>>> +static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>>>> +	int vector = exit_qualification & 0xff;
>>>> +
>>>> +	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
>>>> +	kvm_apic_set_eoi_accelerated(vcpu, vector);
>>>> +	return 1;
>>>> +}
>>>> +
>>>>  static int handle_apic_write(struct kvm_vcpu *vcpu)
>>>>  {
>>>>  	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>>>> @@ -5755,6 +5798,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct
>>> kvm_vcpu *vcpu) = {
>>>>  	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
>>>>  	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
>>>>  	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,
>>>>  +	[EXIT_REASON_EOI_INDUCED]             = handle_apic_eoi_induced,
>>>>  	[EXIT_REASON_WBINVD]                  = handle_wbinvd,
>>>>  	[EXIT_REASON_XSETBV]                  = handle_xsetbv,
>>>>  	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
>>>> @@ -6096,6 +6140,11 @@ static int vmx_handle_exit(struct kvm_vcpu
>>>> *vcpu)
>>>> 
>>>>  static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>>>>  {
>>>> +	/* no need for tpr_threshold update if APIC virtual
>>>> +	 * interrupt delivery is enabled */
>>>> +	if (!enable_apicv_vid)
>>>> +		return ;
>>>> +
>>> Since you (will) set ->update_cr8_intercept callback to NULL if vid
>>> is enabled this function will never be called with !enable_apicv_vid,
>>> so this check can be dropped.
>> Ok.
>> 
>>>>  	if (irr == -1 || tpr < irr) {
>>>>  		vmcs_write32(TPR_THRESHOLD, 0);
>>>>  		return;
>>>> @@ -6104,6 +6153,90 @@ static void update_cr8_intercept(struct
> kvm_vcpu
>>> *vcpu, int tpr, int irr)
>>>>  	vmcs_write32(TPR_THRESHOLD, irr);
>>>>  }
>>>> +static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	return irqchip_in_kernel(vcpu->kvm) && enable_apicv_vid;
>>>> +}
>>>> +
>>>> +static void vmx_update_eoi_exitmap(struct vcpu_vmx *vmx, int index)
>>>> +{
>>>> +	int tmr;
>>>> +	tmr = kvm_apic_get_reg(vmx->vcpu.arch.apic,
>>>> +			APIC_TMR + 0x10 * index);
>>>> +	vmcs_write32(EOI_EXIT_BITMAP0 + index,
>>>> +			vmx->eoi_exit_bitmap[index] | tmr);
>>>> +}
>>>> +
>>>> +static void vmx_update_rvi(int vector)
>>>> +{
>>>> +	u16 status;
>>>> +	u8 old;
>>>> +
>>>> +	status = vmcs_read16(GUEST_INTR_STATUS);
>>>> +	old = (u8)status & 0xff;
>>>> +	if ((u8)vector != old) {
>>>> +		status &= ~0xff;
>>>> +		status |= (u8)vector;
>>>> +		vmcs_write16(GUEST_INTR_STATUS, status);
>>>> +	}
>>>> +}
>>>> +
>>>> +static void vmx_update_irq(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	int vector;
>>>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>>> +
>>>> +	if (!enable_apicv_vid)
>>>> +		return ;
>>>> +
>>>> +	vector = kvm_apic_get_highest_irr(vcpu);
>>>> +	if (vector == -1)
>>>> +		return;
>>>> +
>>>> +	vmx_update_rvi(vector);
>>>> +
>>>> +	if (vmx->eoi_exitmap_changed) {
>>>> +		int index;
>>>> +		for_each_set_bit(index,
>>>> +				(unsigned long *)(&vmx->eoi_exitmap_changed), 8)
>>>> +			vmx_update_eoi_exitmap(vmx, index);
>>>> +		vmx->eoi_exitmap_changed = 0;
>>>> +	}
>>>> +}
>>>> +
>>>> +static void vmx_set_eoi_exitmap(struct kvm_vcpu *vcpu,
>>>> +				int vector, int trig_mode,
>>>> +				int always_set)
>>>> +{
>>>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>>> +	int index, offset, changed;
>>>> +	struct kvm_lapic *apic;
>>>> +
>>>> +	if (!enable_apicv_vid)
>>>> +		return ;
>>>> +
>>>> +	if (WARN_ONCE((vector < 0) || (vector > 255),
>>>> +		"KVM VMX: vector (%d) out of range\n", vector))
>>>> +		return;
>>>> +
>>>> +	apic = vcpu->arch.apic;
>>>> +	index = vector >> 5;
>>>> +	offset = vector & 31;
>>>> +
>>>> +	if (always_set)
>>>> +		changed = !test_and_set_bit(offset,
>>>> +				(unsigned long *)&vmx->eoi_exit_bitmap);
>>>> +	else if (trig_mode)
>>>> +		changed = !test_bit(offset,
>>>> +				apic->regs + APIC_TMR + index * 0x10);
>>>> +	else
>>>> +		changed = test_bit(offset,
>>>> +				apic->regs + APIC_TMR + index * 0x10);
>>>> +
>>>> +	if (changed)
>>>> +		vmx->eoi_exitmap_changed |= 1 << index;
>>>> +}
>>>> +
>>>>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx) { 	u32
>>>>  exit_intr_info; @@ -7364,6 +7497,9 @@ static struct kvm_x86_ops
>>>>  vmx_x86_ops = { 	.enable_nmi_window = enable_nmi_window,
>>>>  	.enable_irq_window = enable_irq_window, 	.update_cr8_intercept =
>>>>  update_cr8_intercept,
>>>> +	.has_virtual_interrupt_delivery = vmx_has_virtual_interrupt_delivery,
>>>> +	.update_irq = vmx_update_irq,
>>>> +	.set_eoi_exitmap = vmx_set_eoi_exitmap,
>>>> 
>>>>  	.set_tss_addr = vmx_set_tss_addr,
>>>>  	.get_tdp_level = get_ept_level,
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
>>>> b0b8abe..02fe194 100644 --- a/arch/x86/kvm/x86.c +++
>>>> b/arch/x86/kvm/x86.c @@ -164,6 +164,14 @@ static int
>>>> emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
>>>> 
>>>>  static int kvm_vcpu_reset(struct kvm_vcpu *vcpu);
>>>> +static inline bool kvm_apic_vid_enabled(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	if (kvm_x86_ops->has_virtual_interrupt_delivery)
>>> This callback is never NULL.
>> Ok.
>> 
>>>> +		return kvm_x86_ops->has_virtual_interrupt_delivery(vcpu);
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
>>>>  {
>>>>  	int i;
>>>> @@ -5533,12 +5541,20 @@ static void inject_pending_event(struct
> kvm_vcpu
>>> *vcpu)
>>>>  			vcpu->arch.nmi_injected = true;
>>>>  			kvm_x86_ops->set_nmi(vcpu);
>>>>  		}
>>>> -	} else if (kvm_cpu_has_interrupt(vcpu)) {
>>>> -		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
>>>> -			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
>>>> -					    false);
>>>> +	} else if (kvm_cpu_has_interrupt(vcpu) &&
>>>> +			kvm_x86_ops->interrupt_allowed(vcpu)) {
>>>> +		int vector = -1;
>>>> +
>>>> +		if (kvm_apic_vid_enabled(vcpu))
>>>> +			vector = kvm_cpu_get_extint(vcpu);
>>>> +		else
>>>> +			vector = kvm_cpu_get_interrupt(vcpu);
>>>> +
>>>> +		if (vector != -1) {
>>>> +			kvm_queue_interrupt(vcpu, vector, false);
>>>>  			kvm_x86_ops->set_irq(vcpu);
>>>>  		}
>>> If vid is enabled kvm_cpu_has_interrupt() should return true only if there
>>> is extint interrupt. Similarly kvm_cpu_get_interrupt() will only return
>>> extint if vid is enabled. This basically moves kvm_apic_vid_enabled()
>>> logic deeper into kvm_cpu_(has|get)_interrupt() functions instead
>>> of changing interrupt injection logic here and in vcpu_enter_guest()
>>> bellow. We still need kvm_cpu_has_interrupt() variant that always checks
>>> both extint and apic for use in kvm_arch_vcpu_runnable() though.
>> As you mentioned, we still need to checks both extint and apic interrupt in
> some case. So how to do this? Introduce another argument to indicate
> whether check both? Yes, we need to check both in
> kvm_arch_vcpu_runnable(). Another argument is good option. We can have
> two functions: kvm_cpu_has_injectable_interrupt() for use in irq
> injection path and kvm_cpu_has_interrupt() for use in
> kvm_arch_vcpu_runnable(). They will call common one with additional
> argument to avoid code duplication.
Ok. will follow this way.

>> 
>>>> +
>>>>  	}
>>>>  }
>>>> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
>>> *vcpu)
>>>>  	}
>>>>  
>>>>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>>>> +		/* update archtecture specific hints for APIC
>>>> +		 * virtual interrupt delivery */
>>>> +		if (kvm_x86_ops->update_irq)
>>>> +			kvm_x86_ops->update_irq(vcpu);
>>>> +
>>> 
>>> I do not see why this have to be here instead of inside if
>>> (kvm_lapic_enabled(vcpu)){} near update_cr8_intercept() a couple of
>>> lines bellow. If you move it there you can drop apic enable check in
>>> kvm_apic_get_highest_irr().
>> Yes, it seems ok to move it.
>> 
>>>>  		inject_pending_event(vcpu);
>>>>  
>>>>  		/* enable NMI/IRQ window open exits if needed */
>>>>  		if (vcpu->arch.nmi_pending)
>>>>  			kvm_x86_ops->enable_nmi_window(vcpu);
>>>> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>>>> +		else if (kvm_apic_vid_enabled(vcpu)) {
>>>> +			if (kvm_cpu_has_extint(vcpu))
>>>> +				kvm_x86_ops->enable_irq_window(vcpu);
>>>> +		} else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>>>>  			kvm_x86_ops->enable_irq_window(vcpu);
>>>>  
>>>>  		if (kvm_lapic_enabled(vcpu)) {
>>>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>>>> index 166c450..898aa62 100644
>>>> --- a/virt/kvm/ioapic.c
>>>> +++ b/virt/kvm/ioapic.c
>>>> @@ -186,6 +186,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int
>>> irq)
>>>>  		/* need to read apic_id from apic regiest since 		 * it can be
>>>>  rewritten */ 		irqe.dest_id = ioapic->kvm->bsp_vcpu_id;
>>>>  +		kvm_set_eoi_exitmap(ioapic->kvm->vcpus[0], irqe.vector, 1, 1); 	}
>>>>  #endif 	return kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe);
>>>> --
>>>> 1.7.1
>>> 
>>> --
>>> 			Gleb.
>> 
>> 
>> Best regards,
>> Yang
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> 			Gleb.


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Marcelo Tosatti Dec. 5, 2012, 2 a.m. UTC | #6

On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> manually, which is fully taken care of by the hardware. This needs
> some special awareness into existing interrupr injection path:
> 
> - for pending interrupt, instead of direct injection, we may need
>   update architecture specific indicators before resuming to guest.
> 
> - A pending interrupt, which is masked by ISR, should be also
>   considered in above update action, since hardware will decide
>   when to inject it at right time. Current has_interrupt and
>   get_interrupt only returns a valid vector from injection p.o.v.
> 
> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    4 +
>  arch/x86/include/asm/vmx.h      |   11 +++
>  arch/x86/kvm/irq.c              |   53 ++++++++++-----
>  arch/x86/kvm/lapic.c            |   56 +++++++++++++---
>  arch/x86/kvm/lapic.h            |    6 ++
>  arch/x86/kvm/svm.c              |   19 +++++
>  arch/x86/kvm/vmx.c              |  140 ++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.c              |   34 ++++++++--
>  virt/kvm/ioapic.c               |    1 +
>  9 files changed, 291 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index dc87b65..e5352c8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -697,6 +697,10 @@ struct kvm_x86_ops {
>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
> +	void (*update_irq)(struct kvm_vcpu *vcpu);
> +	void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector,
> +			int trig_mode, int always_set);
>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>  	int (*get_tdp_level)(void);
>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 21101b6..1003341 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -62,6 +62,7 @@
>  #define EXIT_REASON_MCE_DURING_VMENTRY  41
>  #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
>  #define EXIT_REASON_APIC_ACCESS         44
> +#define EXIT_REASON_EOI_INDUCED         45
>  #define EXIT_REASON_EPT_VIOLATION       48
>  #define EXIT_REASON_EPT_MISCONFIG       49
>  #define EXIT_REASON_WBINVD              54
> @@ -143,6 +144,7 @@
>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
>  #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
>  #define SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100
> +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
>  #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
>  #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
>  
> @@ -180,6 +182,7 @@ enum vmcs_field {
>  	GUEST_GS_SELECTOR               = 0x0000080a,
>  	GUEST_LDTR_SELECTOR             = 0x0000080c,
>  	GUEST_TR_SELECTOR               = 0x0000080e,
> +	GUEST_INTR_STATUS               = 0x00000810,
>  	HOST_ES_SELECTOR                = 0x00000c00,
>  	HOST_CS_SELECTOR                = 0x00000c02,
>  	HOST_SS_SELECTOR                = 0x00000c04,
> @@ -207,6 +210,14 @@ enum vmcs_field {
>  	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
>  	EPT_POINTER                     = 0x0000201a,
>  	EPT_POINTER_HIGH                = 0x0000201b,
> +	EOI_EXIT_BITMAP0                = 0x0000201c,
> +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
> +	EOI_EXIT_BITMAP1                = 0x0000201e,
> +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
> +	EOI_EXIT_BITMAP2                = 0x00002020,
> +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
> +	EOI_EXIT_BITMAP3                = 0x00002022,
> +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
>  	VMCS_LINK_POINTER               = 0x00002800,
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> index 7e06ba1..f782788 100644
> --- a/arch/x86/kvm/irq.c
> +++ b/arch/x86/kvm/irq.c
> @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
>   */
>  int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
>  {
> -	struct kvm_pic *s;
> -
>  	if (!irqchip_in_kernel(v->kvm))
>  		return v->arch.interrupt.pending;
>  
> -	if (kvm_apic_has_interrupt(v) == -1) {	/* LAPIC */
> -		if (kvm_apic_accept_pic_intr(v)) {
> -			s = pic_irqchip(v->kvm);	/* PIC */
> -			return s->output;
> -		} else
> -			return 0;
> -	}
> +	if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */
> +		return kvm_cpu_has_extint(v); /* non-APIC */
>  	return 1;
>  }
>  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
>  
>  /*
> + * check if there is pending interrupt from
> + * non-APIC source without intack.
> + */
> +int kvm_cpu_has_extint(struct kvm_vcpu *v)
> +{
> +	struct kvm_pic *s;
> +
> +	if (kvm_apic_accept_pic_intr(v)) {
> +		s = pic_irqchip(v->kvm); /* PIC */
> +		return s->output;
> +	} else
> +		return 0;
> +}
> +
> +/*
>   * Read pending interrupt vector and intack.
>   */
>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
>  {
> -	struct kvm_pic *s;
>  	int vector;
>  
>  	if (!irqchip_in_kernel(v->kvm))
>  		return v->arch.interrupt.nr;
>  
>  	vector = kvm_get_apic_interrupt(v);	/* APIC */
> -	if (vector == -1) {
> -		if (kvm_apic_accept_pic_intr(v)) {
> -			s = pic_irqchip(v->kvm);
> -			s->output = 0;		/* PIC */
> -			vector = kvm_pic_read_irq(v->kvm);
> -		}
> -	}
> +	if (vector == -1)
> +		return kvm_cpu_get_extint(v); /* non-APIC */
>  	return vector;
>  }
>  EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
>  
> +/*
> + * Read pending interrupt(from non-APIC source)
> + * vector and intack.
> + */
> +int kvm_cpu_get_extint(struct kvm_vcpu *v)
> +{
> +	struct kvm_pic *s;
> +	int vector = -1;
> +
> +	if (kvm_apic_accept_pic_intr(v)) {
> +		s = pic_irqchip(v->kvm);
> +		s->output = 0;		/* PIC */
> +		vector = kvm_pic_read_irq(v->kvm);
> +	}
> +	return vector;
> +}
> +
>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
>  {
>  	kvm_inject_apic_timer_irqs(vcpu);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 7c96012..400d3ba 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -643,6 +643,14 @@ out:
>  	return ret;
>  }
>  
> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> +		int trig_mode, int always_set)
> +{
> +	if (kvm_x86_ops->set_eoi_exitmap)
> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
> +					trig_mode, always_set);
> +}
> +
>  /*
>   * Add a pending IRQ into lapic.
>   * Return 1 if successfully added and 0 if discarded.
> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>  		if (unlikely(!apic_enabled(apic)))
>  			break;
>  
> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
>  		if (trig_mode) {
>  			apic_debug("level trig mode for vector %d", vector);
>  			apic_set_vector(vector, apic->regs + APIC_TMR);
> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>  }
>  
> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> +{
> +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> +	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> +		int trigger_mode;
> +		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> +			trigger_mode = IOAPIC_LEVEL_TRIG;
> +		else
> +			trigger_mode = IOAPIC_EDGE_TRIG;
> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> +	}
> +}
> +
>  static int apic_set_eoi(struct kvm_lapic *apic)
>  {
>  	int vector = apic_find_highest_isr(apic);
> @@ -756,19 +778,24 @@ static int apic_set_eoi(struct kvm_lapic *apic)
>  	apic_clear_isr(vector, apic);
>  	apic_update_ppr(apic);
>  
> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> -		int trigger_mode;
> -		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> -			trigger_mode = IOAPIC_LEVEL_TRIG;
> -		else
> -			trigger_mode = IOAPIC_EDGE_TRIG;
> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> -	}
> +	kvm_ioapic_send_eoi(apic, vector);
>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>  	return vector;
>  }
>  
> +/*
> + * this interface assumes a trap-like exit, which has already finished
> + * desired side effect including vISR and vPPR update.
> + */
> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +	kvm_ioapic_send_eoi(apic, vector);
> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> +}
> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
> +
>  static void apic_send_ipi(struct kvm_lapic *apic)
>  {
>  	u32 icr_low = kvm_apic_get_reg(apic, APIC_ICR);
> @@ -1533,6 +1560,17 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
>  	return highest_irr;
>  }
>  
> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +	if (!apic || !apic_enabled(apic))
> +		return -1;
> +
> +	return apic_find_highest_irr(apic);
> +}
> +EXPORT_SYMBOL_GPL(kvm_apic_get_highest_irr);
> +
>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
>  {
>  	u32 lvt0 = kvm_apic_get_reg(vcpu->arch.apic, APIC_LVT0);
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index c42f111..749661a 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -39,6 +39,9 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
> +int kvm_cpu_has_extint(struct kvm_vcpu *v);
> +int kvm_cpu_get_extint(struct kvm_vcpu *v);
> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu);
>  void kvm_lapic_reset(struct kvm_vcpu *vcpu);
>  u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
>  void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
> @@ -50,6 +53,8 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu);
>  int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
>  int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
>  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> +		int need_eoi, int global);
>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
>  
>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
> @@ -65,6 +70,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
>  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
>  
>  int kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
>  
>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
>  void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index dcb7952..8f0903b 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3573,6 +3573,22 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>  }
>  
> +static int svm_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> +{
> +	return 0;
> +}
> +
> +static void svm_update_irq(struct kvm_vcpu *vcpu)
> +{
> +	return ;
> +}
> +
> +static void svm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> +				int trig_mode, int always_set)
> +{
> +	return ;
> +}
> +
>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4292,6 +4308,9 @@ static struct kvm_x86_ops svm_x86_ops = {
>  	.enable_nmi_window = enable_nmi_window,
>  	.enable_irq_window = enable_irq_window,
>  	.update_cr8_intercept = update_cr8_intercept,
> +	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,
> +	.update_irq = svm_update_irq;
> +	.set_eoi_exitmap = svm_set_eoi_exitmap;
>  
>  	.set_tss_addr = svm_set_tss_addr,
>  	.get_tdp_level = get_npt_level,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 6a5f651..909ce90 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -86,6 +86,9 @@ module_param(fasteoi, bool, S_IRUGO);
>  static bool __read_mostly enable_apicv_reg;
>  module_param(enable_apicv_reg, bool, S_IRUGO);
>  
> +static bool __read_mostly enable_apicv_vid;
> +module_param(enable_apicv_vid, bool, S_IRUGO);
> +
>  /*
>   * If nested=1, nested virtualization is supported, i.e., guests may use
>   * VMX and be a hypervisor for its own guests. If nested=0, guests may not
> @@ -432,6 +435,9 @@ struct vcpu_vmx {
>  
>  	bool rdtscp_enabled;
>  
> +	u8 eoi_exitmap_changed;
> +	u32 eoi_exit_bitmap[8];
> +
>  	/* Support for a guest hypervisor (nested VMX) */
>  	struct nested_vmx nested;
>  };
> @@ -770,6 +776,12 @@ static inline bool cpu_has_vmx_apic_register_virt(void)
>  		SECONDARY_EXEC_APIC_REGISTER_VIRT;
>  }
>  
> +static inline bool cpu_has_vmx_virtual_intr_delivery(void)
> +{
> +	return vmcs_config.cpu_based_2nd_exec_ctrl &
> +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> +}
> +
>  static inline bool cpu_has_vmx_flexpriority(void)
>  {
>  	return cpu_has_vmx_tpr_shadow() &&
> @@ -2508,7 +2520,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
>  			SECONDARY_EXEC_RDTSCP |
>  			SECONDARY_EXEC_ENABLE_INVPCID |
> -			SECONDARY_EXEC_APIC_REGISTER_VIRT;
> +			SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>  		if (adjust_vmx_controls(min2, opt2,
>  					MSR_IA32_VMX_PROCBASED_CTLS2,
>  					&_cpu_based_2nd_exec_control) < 0)
> @@ -2522,7 +2535,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>  
>  	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
>  		_cpu_based_2nd_exec_control &= ~(
> -				SECONDARY_EXEC_APIC_REGISTER_VIRT);
> +				SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>  
>  	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
>  		/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
> @@ -2724,6 +2738,14 @@ static __init int hardware_setup(void)
>  	if (!cpu_has_vmx_apic_register_virt())
>  		enable_apicv_reg = 0;
>  
> +	if (!cpu_has_vmx_virtual_intr_delivery())
> +		enable_apicv_vid = 0;
> +
> +	if (!enable_apicv_vid) {
> +		kvm_x86_ops->update_irq = NULL;
> +		kvm_x86_ops->update_cr8_intercept = NULL;
> +	}
> +
>  	if (nested)
>  		nested_vmx_setup_ctls_msrs();
>  
> @@ -3839,6 +3861,8 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>  	if (!enable_apicv_reg)
>  		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
> +	if (!enable_apicv_vid)
> +		exec_control &= ~SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>  	return exec_control;
>  }
>  
> @@ -3883,6 +3907,15 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  				vmx_secondary_exec_control(vmx));
>  	}
>  
> +	if (enable_apicv_vid) {
> +		vmcs_write64(EOI_EXIT_BITMAP0, 0);
> +		vmcs_write64(EOI_EXIT_BITMAP1, 0);
> +		vmcs_write64(EOI_EXIT_BITMAP2, 0);
> +		vmcs_write64(EOI_EXIT_BITMAP3, 0);
> +
> +		vmcs_write16(GUEST_INTR_STATUS, 0);
> +	}
> +
>  	if (ple_gap) {
>  		vmcs_write32(PLE_GAP, ple_gap);
>  		vmcs_write32(PLE_WINDOW, ple_window);
> @@ -4806,6 +4839,16 @@ static int handle_apic_access(struct kvm_vcpu *vcpu)
>  	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
>  }
>  
> +static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> +	int vector = exit_qualification & 0xff;
> +
> +	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
> +	kvm_apic_set_eoi_accelerated(vcpu, vector);
> +	return 1;
> +}
> +
>  static int handle_apic_write(struct kvm_vcpu *vcpu)
>  {
>  	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> @@ -5755,6 +5798,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
>  	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
>  	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,
> +	[EXIT_REASON_EOI_INDUCED]             = handle_apic_eoi_induced,
>  	[EXIT_REASON_WBINVD]                  = handle_wbinvd,
>  	[EXIT_REASON_XSETBV]                  = handle_xsetbv,
>  	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
> @@ -6096,6 +6140,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
>  
>  static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  {
> +	/* no need for tpr_threshold update if APIC virtual
> +	 * interrupt delivery is enabled */
> +	if (!enable_apicv_vid)
> +		return ;
> +
>  	if (irr == -1 || tpr < irr) {
>  		vmcs_write32(TPR_THRESHOLD, 0);
>  		return;
> @@ -6104,6 +6153,90 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  	vmcs_write32(TPR_THRESHOLD, irr);
>  }
>  
> +static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> +{
> +	return irqchip_in_kernel(vcpu->kvm) && enable_apicv_vid;
> +}
> +
> +static void vmx_update_eoi_exitmap(struct vcpu_vmx *vmx, int index)
> +{
> +	int tmr;
> +	tmr = kvm_apic_get_reg(vmx->vcpu.arch.apic,
> +			APIC_TMR + 0x10 * index);
> +	vmcs_write32(EOI_EXIT_BITMAP0 + index,
> +			vmx->eoi_exit_bitmap[index] | tmr);
> +}
> +
> +static void vmx_update_rvi(int vector)
> +{
> +	u16 status;
> +	u8 old;
> +
> +	status = vmcs_read16(GUEST_INTR_STATUS);
> +	old = (u8)status & 0xff;
> +	if ((u8)vector != old) {
> +		status &= ~0xff;
> +		status |= (u8)vector;
> +		vmcs_write16(GUEST_INTR_STATUS, status);
> +	}
> +}
> +
> +static void vmx_update_irq(struct kvm_vcpu *vcpu)
> +{
> +	int vector;
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	if (!enable_apicv_vid)
> +		return ;
> +
> +	vector = kvm_apic_get_highest_irr(vcpu);
> +	if (vector == -1)
> +		return;

1. Is the pseudocode sequence of virtual interrupt delivery in 29.2.2
guaranteed to be atomic (not interruptible by other events) ?
The question is, when hardware is performing virtual interrupt
delivery is it guaranteed that RVI matches VIRR ?  (the answer must be
yes, just checking).

2. Section 29.6 mentions that "Use of the posted-interrupt descriptor
differs from that of other data structures that are referenced by
pointers in a VMCS. There is a general requirement that software ensure
that each such data structure is modified only when no logical processor
with a current VMCS that references it is in VMX non-root operation.
That requirement does not apply to the posted-interrupt descriptor.
There is a requirement, however, that such modifications be done using
locked read-modify-write instructions."

The APIC virtual page is being modified by a CPU while a logical
processor with current VMCS that references it is in VMX non-root
operation, in fact even modifying the APIC virtual page with EOI
virtualizaton, virtual interrupt delivery, etc. What are the
requirements in this case?

> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	}
>  
>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> +		/* update archtecture specific hints for APIC
> +		 * virtual interrupt delivery */
> +		if (kvm_x86_ops->update_irq)
> +			kvm_x86_ops->update_irq(vcpu);
> +
>  		inject_pending_event(vcpu);
>  
>  		/* enable NMI/IRQ window open exits if needed */
>  		if (vcpu->arch.nmi_pending)
>  			kvm_x86_ops->enable_nmi_window(vcpu);
> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> +		else if (kvm_apic_vid_enabled(vcpu)) {
> +			if (kvm_cpu_has_extint(vcpu))
> +				kvm_x86_ops->enable_irq_window(vcpu);

If RVI is non-zero, then interrupt window should not be enabled,
accordingly to 29.2.2:

"If a virtual interrupt has been recognized (see Section 29.2.1), it will
be delivered at an instruction boundary when the following conditions all
hold: (1) RFLAGS.IF = 1; (2) there is no blocking by STI; (3) there is no 
blocking by MOV SS or by POP SS; and (4) the “interrupt-window exiting”
VM-execution control is 0."

> +		} else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>  			kvm_x86_ops->enable_irq_window(vcpu);
>  
>  		if (kvm_lapic_enabled(vcpu)) {
> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> index 166c450..898aa62 100644
> --- a/virt/kvm/ioapic.c
> +++ b/virt/kvm/ioapic.c
> @@ -186,6 +186,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
>  		/* need to read apic_id from apic regiest since
>  		 * it can be rewritten */
>  		irqe.dest_id = ioapic->kvm->bsp_vcpu_id;
> +		kvm_set_eoi_exitmap(ioapic->kvm->vcpus[0], irqe.vector, 1, 1);
>  	}
>  #endif
>  	return kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zhang, Yang Z Dec. 5, 2012, 3:43 a.m. UTC | #7

Marcelo Tosatti wrote on 2012-12-05:
> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:

>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts

>> manually, which is fully taken care of by the hardware. This needs

>> some special awareness into existing interrupr injection path:

>> 

>> - for pending interrupt, instead of direct injection, we may need

>>   update architecture specific indicators before resuming to guest.

>> - A pending interrupt, which is masked by ISR, should be also

>>   considered in above update action, since hardware will decide

>>   when to inject it at right time. Current has_interrupt and

>>   get_interrupt only returns a valid vector from injection p.o.v.

>> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>

>> Signed-off-by: Kevin Tian <kevin.tian@intel.com>

>> ---

>>  arch/x86/include/asm/kvm_host.h |    4 + arch/x86/include/asm/vmx.h   

>>    |   11 +++ arch/x86/kvm/irq.c              |   53 ++++++++++-----

>>  arch/x86/kvm/lapic.c            |   56 +++++++++++++---

>>  arch/x86/kvm/lapic.h            |    6 ++ arch/x86/kvm/svm.c          

>>     |   19 +++++ arch/x86/kvm/vmx.c              |  140

>>  ++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c            

>>   |   34 ++++++++-- virt/kvm/ioapic.c               |    1 + 9 files

>>  changed, 291 insertions(+), 33 deletions(-)

>> diff --git a/arch/x86/include/asm/kvm_host.h

>> b/arch/x86/include/asm/kvm_host.h index dc87b65..e5352c8 100644 ---

>> a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h

>> @@ -697,6 +697,10 @@ struct kvm_x86_ops {

>>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);

>>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);

>>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);

>> +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);

>> +	void (*update_irq)(struct kvm_vcpu *vcpu);

>> +	void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector,

>> +			int trig_mode, int always_set);

>>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);

>>  	int (*get_tdp_level)(void);

>>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);

>> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h

>> index 21101b6..1003341 100644

>> --- a/arch/x86/include/asm/vmx.h

>> +++ b/arch/x86/include/asm/vmx.h

>> @@ -62,6 +62,7 @@

>>  #define EXIT_REASON_MCE_DURING_VMENTRY  41 #define

>>  EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define EXIT_REASON_APIC_ACCESS    

>>      44 +#define EXIT_REASON_EOI_INDUCED         45 #define

>>  EXIT_REASON_EPT_VIOLATION       48 #define EXIT_REASON_EPT_MISCONFIG  

>>      49 #define EXIT_REASON_WBINVD              54 @@ -143,6 +144,7 @@

>>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040 #define

>>  SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080 #define

>>  SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100 +#define

>>  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200 #define

>>  SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400 #define

>>  SECONDARY_EXEC_ENABLE_INVPCID		0x00001000

>> @@ -180,6 +182,7 @@ enum vmcs_field {

>>  	GUEST_GS_SELECTOR               = 0x0000080a, 	GUEST_LDTR_SELECTOR   

>>           = 0x0000080c, 	GUEST_TR_SELECTOR               = 0x0000080e,

>>  +	GUEST_INTR_STATUS               = 0x00000810, 	HOST_ES_SELECTOR     

>>            = 0x00000c00, 	HOST_CS_SELECTOR                = 0x00000c02,

>>  	HOST_SS_SELECTOR                = 0x00000c04, @@ -207,6 +210,14 @@

>>  enum vmcs_field { 	APIC_ACCESS_ADDR_HIGH		= 0x00002015, 	EPT_POINTER  

>>                    = 0x0000201a, 	EPT_POINTER_HIGH                =

>>  0x0000201b,

>> +	EOI_EXIT_BITMAP0                = 0x0000201c,

>> +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,

>> +	EOI_EXIT_BITMAP1                = 0x0000201e,

>> +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,

>> +	EOI_EXIT_BITMAP2                = 0x00002020,

>> +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,

>> +	EOI_EXIT_BITMAP3                = 0x00002022,

>> +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,

>>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,

>>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,

>>  	VMCS_LINK_POINTER               = 0x00002800,

>> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c

>> index 7e06ba1..f782788 100644

>> --- a/arch/x86/kvm/irq.c

>> +++ b/arch/x86/kvm/irq.c

>> @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer);

>>   */

>>  int kvm_cpu_has_interrupt(struct kvm_vcpu *v)

>>  {

>> -	struct kvm_pic *s;

>> -

>>  	if (!irqchip_in_kernel(v->kvm))

>>  		return v->arch.interrupt.pending;

>> -	if (kvm_apic_has_interrupt(v) == -1) {	/* LAPIC */

>> -		if (kvm_apic_accept_pic_intr(v)) {

>> -			s = pic_irqchip(v->kvm);	/* PIC */

>> -			return s->output;

>> -		} else

>> -			return 0;

>> -	}

>> +	if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */

>> +		return kvm_cpu_has_extint(v); /* non-APIC */

>>  	return 1;

>>  }

>>  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);

>>  

>>  /*

>> + * check if there is pending interrupt from

>> + * non-APIC source without intack.

>> + */

>> +int kvm_cpu_has_extint(struct kvm_vcpu *v)

>> +{

>> +	struct kvm_pic *s;

>> +

>> +	if (kvm_apic_accept_pic_intr(v)) {

>> +		s = pic_irqchip(v->kvm); /* PIC */

>> +		return s->output;

>> +	} else

>> +		return 0;

>> +}

>> +

>> +/*

>>   * Read pending interrupt vector and intack.

>>   */

>>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v) { -	struct kvm_pic *s;

>>  	int vector;

>>  

>>  	if (!irqchip_in_kernel(v->kvm))

>>  		return v->arch.interrupt.nr;

>>  

>>  	vector = kvm_get_apic_interrupt(v);	/* APIC */

>> -	if (vector == -1) {

>> -		if (kvm_apic_accept_pic_intr(v)) {

>> -			s = pic_irqchip(v->kvm);

>> -			s->output = 0;		/* PIC */

>> -			vector = kvm_pic_read_irq(v->kvm);

>> -		}

>> -	}

>> +	if (vector == -1)

>> +		return kvm_cpu_get_extint(v); /* non-APIC */

>>  	return vector;

>>  }

>>  EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);

>> +/*

>> + * Read pending interrupt(from non-APIC source)

>> + * vector and intack.

>> + */

>> +int kvm_cpu_get_extint(struct kvm_vcpu *v)

>> +{

>> +	struct kvm_pic *s;

>> +	int vector = -1;

>> +

>> +	if (kvm_apic_accept_pic_intr(v)) {

>> +		s = pic_irqchip(v->kvm);

>> +		s->output = 0;		/* PIC */

>> +		vector = kvm_pic_read_irq(v->kvm);

>> +	}

>> +	return vector;

>> +}

>> +

>>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)

>>  {

>>  	kvm_inject_apic_timer_irqs(vcpu);

>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c

>> index 7c96012..400d3ba 100644

>> --- a/arch/x86/kvm/lapic.c

>> +++ b/arch/x86/kvm/lapic.c

>> @@ -643,6 +643,14 @@ out:

>>  	return ret;

>>  }

>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,

>> +		int trig_mode, int always_set)

>> +{

>> +	if (kvm_x86_ops->set_eoi_exitmap)

>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,

>> +					trig_mode, always_set);

>> +}

>> +

>>  /*

>>   * Add a pending IRQ into lapic.

>>   * Return 1 if successfully added and 0 if discarded.

>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int

> delivery_mode,

>>  		if (unlikely(!apic_enabled(apic)))

>>  			break;

>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);

>>  		if (trig_mode) {

>>  			apic_debug("level trig mode for vector %d", vector);

>>  			apic_set_vector(vector, apic->regs + APIC_TMR);

>> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1,

> struct kvm_vcpu *vcpu2)

>>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;

>>  }

>> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)

>> +{

>> +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&

>> +	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {

>> +		int trigger_mode;

>> +		if (apic_test_vector(vector, apic->regs + APIC_TMR))

>> +			trigger_mode = IOAPIC_LEVEL_TRIG;

>> +		else

>> +			trigger_mode = IOAPIC_EDGE_TRIG;

>> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);

>> +	}

>> +}

>> +

>>  static int apic_set_eoi(struct kvm_lapic *apic) { 	int vector =

>>  apic_find_highest_isr(apic); @@ -756,19 +778,24 @@ static int

>>  apic_set_eoi(struct kvm_lapic *apic) 	apic_clear_isr(vector, apic);

>>  	apic_update_ppr(apic);

>> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&

>> -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {

>> -		int trigger_mode;

>> -		if (apic_test_vector(vector, apic->regs + APIC_TMR))

>> -			trigger_mode = IOAPIC_LEVEL_TRIG;

>> -		else

>> -			trigger_mode = IOAPIC_EDGE_TRIG;

>> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);

>> -	}

>> +	kvm_ioapic_send_eoi(apic, vector);

>>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);

>>  	return vector;

>>  }

>> +/*

>> + * this interface assumes a trap-like exit, which has already finished

>> + * desired side effect including vISR and vPPR update.

>> + */

>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)

>> +{

>> +	struct kvm_lapic *apic = vcpu->arch.apic;

>> +

>> +	kvm_ioapic_send_eoi(apic, vector);

>> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);

>> +}

>> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);

>> +

>>  static void apic_send_ipi(struct kvm_lapic *apic) { 	u32 icr_low =

>>  kvm_apic_get_reg(apic, APIC_ICR); @@ -1533,6 +1560,17 @@ int

>>  kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) 	return highest_irr; }

>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)

>> +{

>> +	struct kvm_lapic *apic = vcpu->arch.apic;

>> +

>> +	if (!apic || !apic_enabled(apic))

>> +		return -1;

>> +

>> +	return apic_find_highest_irr(apic);

>> +}

>> +EXPORT_SYMBOL_GPL(kvm_apic_get_highest_irr);

>> +

>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)

>>  {

>>  	u32 lvt0 = kvm_apic_get_reg(vcpu->arch.apic, APIC_LVT0);

>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h

>> index c42f111..749661a 100644

>> --- a/arch/x86/kvm/lapic.h

>> +++ b/arch/x86/kvm/lapic.h

>> @@ -39,6 +39,9 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);

>>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);

>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);

>>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);

>> +int kvm_cpu_has_extint(struct kvm_vcpu *v);

>> +int kvm_cpu_get_extint(struct kvm_vcpu *v);

>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu);

>>  void kvm_lapic_reset(struct kvm_vcpu *vcpu); u64

>>  kvm_lapic_get_cr8(struct kvm_vcpu *vcpu); void

>>  kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8); @@ -50,6

>>  +53,8 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu); int

>>  kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); int

>>  kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); int

>>  kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);

>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,

>> +		int need_eoi, int global);

>>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);

>>  

>>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,

>> @@ -65,6 +70,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu

> *vcpu);

>>  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);

>>  

>>  int kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);

>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);

>> 

>>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);

>>  void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);

>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c

>> index dcb7952..8f0903b 100644

>> --- a/arch/x86/kvm/svm.c

>> +++ b/arch/x86/kvm/svm.c

>> @@ -3573,6 +3573,22 @@ static void update_cr8_intercept(struct kvm_vcpu

> *vcpu, int tpr, int irr)

>>  		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);

>>  }

>> +static int svm_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)

>> +{

>> +	return 0;

>> +}

>> +

>> +static void svm_update_irq(struct kvm_vcpu *vcpu)

>> +{

>> +	return ;

>> +}

>> +

>> +static void svm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,

>> +				int trig_mode, int always_set)

>> +{

>> +	return ;

>> +}

>> +

>>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu) { 	struct vcpu_svm

>>  *svm = to_svm(vcpu); @@ -4292,6 +4308,9 @@ static struct kvm_x86_ops

>>  svm_x86_ops = { 	.enable_nmi_window = enable_nmi_window,

>>  	.enable_irq_window = enable_irq_window, 	.update_cr8_intercept =

>>  update_cr8_intercept,

>> +	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,

>> +	.update_irq = svm_update_irq;

>> +	.set_eoi_exitmap = svm_set_eoi_exitmap;

>> 

>>  	.set_tss_addr = svm_set_tss_addr,

>>  	.get_tdp_level = get_npt_level,

>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c

>> index 6a5f651..909ce90 100644

>> --- a/arch/x86/kvm/vmx.c

>> +++ b/arch/x86/kvm/vmx.c

>> @@ -86,6 +86,9 @@ module_param(fasteoi, bool, S_IRUGO);

>>  static bool __read_mostly enable_apicv_reg;

>>  module_param(enable_apicv_reg, bool, S_IRUGO);

>> +static bool __read_mostly enable_apicv_vid;

>> +module_param(enable_apicv_vid, bool, S_IRUGO);

>> +

>>  /*

>>   * If nested=1, nested virtualization is supported, i.e., guests may use

>>   * VMX and be a hypervisor for its own guests. If nested=0, guests may not

>> @@ -432,6 +435,9 @@ struct vcpu_vmx {

>> 

>>  	bool rdtscp_enabled;

>> +	u8 eoi_exitmap_changed;

>> +	u32 eoi_exit_bitmap[8];

>> +

>>  	/* Support for a guest hypervisor (nested VMX) */

>>  	struct nested_vmx nested;

>>  };

>> @@ -770,6 +776,12 @@ static inline bool

> cpu_has_vmx_apic_register_virt(void)

>>  		SECONDARY_EXEC_APIC_REGISTER_VIRT;

>>  }

>> +static inline bool cpu_has_vmx_virtual_intr_delivery(void)

>> +{

>> +	return vmcs_config.cpu_based_2nd_exec_ctrl &

>> +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;

>> +}

>> +

>>  static inline bool cpu_has_vmx_flexpriority(void)

>>  {

>>  	return cpu_has_vmx_tpr_shadow() &&

>> @@ -2508,7 +2520,8 @@ static __init int setup_vmcs_config(struct

> vmcs_config *vmcs_conf)

>>  			SECONDARY_EXEC_PAUSE_LOOP_EXITING |

>>  			SECONDARY_EXEC_RDTSCP |

>>  			SECONDARY_EXEC_ENABLE_INVPCID |

>> -			SECONDARY_EXEC_APIC_REGISTER_VIRT;

>> +			SECONDARY_EXEC_APIC_REGISTER_VIRT |

>> +			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;

>>  		if (adjust_vmx_controls(min2, opt2,

>>  					MSR_IA32_VMX_PROCBASED_CTLS2,

>>  					&_cpu_based_2nd_exec_control) < 0)

>> @@ -2522,7 +2535,8 @@ static __init int setup_vmcs_config(struct

>> vmcs_config *vmcs_conf)

>> 

>>  	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))

>>  		_cpu_based_2nd_exec_control &= ~(

>> -				SECONDARY_EXEC_APIC_REGISTER_VIRT);

>> +				SECONDARY_EXEC_APIC_REGISTER_VIRT |

>> +				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);

>> 

>>  	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) { 		/*

>>  CR3 accesses and invlpg don't need to cause VM Exits when EPT @@

>>  -2724,6 +2738,14 @@ static __init int hardware_setup(void) 	if

>>  (!cpu_has_vmx_apic_register_virt()) 		enable_apicv_reg = 0;

>> +	if (!cpu_has_vmx_virtual_intr_delivery())

>> +		enable_apicv_vid = 0;

>> +

>> +	if (!enable_apicv_vid) {

>> +		kvm_x86_ops->update_irq = NULL;

>> +		kvm_x86_ops->update_cr8_intercept = NULL;

>> +	}

>> +

>>  	if (nested)

>>  		nested_vmx_setup_ctls_msrs();

>> @@ -3839,6 +3861,8 @@ static u32 vmx_secondary_exec_control(struct

> vcpu_vmx *vmx)

>>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;

>>  	if (!enable_apicv_reg)

>>  		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;

>> +	if (!enable_apicv_vid)

>> +		exec_control &= ~SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;

>>  	return exec_control;

>>  }

>> @@ -3883,6 +3907,15 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)

>>  				vmx_secondary_exec_control(vmx));

>>  	}

>> +	if (enable_apicv_vid) {

>> +		vmcs_write64(EOI_EXIT_BITMAP0, 0);

>> +		vmcs_write64(EOI_EXIT_BITMAP1, 0);

>> +		vmcs_write64(EOI_EXIT_BITMAP2, 0);

>> +		vmcs_write64(EOI_EXIT_BITMAP3, 0);

>> +

>> +		vmcs_write16(GUEST_INTR_STATUS, 0);

>> +	}

>> +

>>  	if (ple_gap) {

>>  		vmcs_write32(PLE_GAP, ple_gap);

>>  		vmcs_write32(PLE_WINDOW, ple_window);

>> @@ -4806,6 +4839,16 @@ static int handle_apic_access(struct kvm_vcpu

> *vcpu)

>>  	return emulate_instruction(vcpu, 0) == EMULATE_DONE;

>>  }

>> +static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)

>> +{

>> +	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);

>> +	int vector = exit_qualification & 0xff;

>> +

>> +	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */

>> +	kvm_apic_set_eoi_accelerated(vcpu, vector);

>> +	return 1;

>> +}

>> +

>>  static int handle_apic_write(struct kvm_vcpu *vcpu)

>>  {

>>  	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);

>> @@ -5755,6 +5798,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct

> kvm_vcpu *vcpu) = {

>>  	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,

>>  	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,

>>  	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,

>>  +	[EXIT_REASON_EOI_INDUCED]             = handle_apic_eoi_induced,

>>  	[EXIT_REASON_WBINVD]                  = handle_wbinvd,

>>  	[EXIT_REASON_XSETBV]                  = handle_xsetbv,

>>  	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,

>> @@ -6096,6 +6140,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)

>> 

>>  static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)

>>  {

>> +	/* no need for tpr_threshold update if APIC virtual

>> +	 * interrupt delivery is enabled */

>> +	if (!enable_apicv_vid)

>> +		return ;

>> +

>>  	if (irr == -1 || tpr < irr) {

>>  		vmcs_write32(TPR_THRESHOLD, 0);

>>  		return;

>> @@ -6104,6 +6153,90 @@ static void update_cr8_intercept(struct kvm_vcpu

> *vcpu, int tpr, int irr)

>>  	vmcs_write32(TPR_THRESHOLD, irr);

>>  }

>> +static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)

>> +{

>> +	return irqchip_in_kernel(vcpu->kvm) && enable_apicv_vid;

>> +}

>> +

>> +static void vmx_update_eoi_exitmap(struct vcpu_vmx *vmx, int index)

>> +{

>> +	int tmr;

>> +	tmr = kvm_apic_get_reg(vmx->vcpu.arch.apic,

>> +			APIC_TMR + 0x10 * index);

>> +	vmcs_write32(EOI_EXIT_BITMAP0 + index,

>> +			vmx->eoi_exit_bitmap[index] | tmr);

>> +}

>> +

>> +static void vmx_update_rvi(int vector)

>> +{

>> +	u16 status;

>> +	u8 old;

>> +

>> +	status = vmcs_read16(GUEST_INTR_STATUS);

>> +	old = (u8)status & 0xff;

>> +	if ((u8)vector != old) {

>> +		status &= ~0xff;

>> +		status |= (u8)vector;

>> +		vmcs_write16(GUEST_INTR_STATUS, status);

>> +	}

>> +}

>> +

>> +static void vmx_update_irq(struct kvm_vcpu *vcpu)

>> +{

>> +	int vector;

>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);

>> +

>> +	if (!enable_apicv_vid)

>> +		return ;

>> +

>> +	vector = kvm_apic_get_highest_irr(vcpu);

>> +	if (vector == -1)

>> +		return;

> 

> 1. Is the pseudocode sequence of virtual interrupt delivery in 29.2.2

> guaranteed to be atomic (not interruptible by other events) ?

> The question is, when hardware is performing virtual interrupt

> delivery is it guaranteed that RVI matches VIRR ?  (the answer must be

> yes, just checking).

I don't think so. Think that, we may update VIRR in other vcpu when the target vcpu is doing vmentry. The only problem is that this interrupt will not handled until the previous interrupt is handled. Without this patch, we also will meet this corner case.

> 2. Section 29.6 mentions that "Use of the posted-interrupt descriptor

> differs from that of other data structures that are referenced by

> pointers in a VMCS. There is a general requirement that software ensure

> that each such data structure is modified only when no logical processor

> with a current VMCS that references it is in VMX non-root operation.

> That requirement does not apply to the posted-interrupt descriptor.

> There is a requirement, however, that such modifications be done using

> locked read-modify-write instructions."

> 

> The APIC virtual page is being modified by a CPU while a logical

> processor with current VMCS that references it is in VMX non-root

> operation, in fact even modifying the APIC virtual page with EOI

> virtualizaton, virtual interrupt delivery, etc. What are the

> requirements in this case?

It should be same with posted interrupt. Software must ensure to use atomic access to virtual apic page.

>> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu

> *vcpu)

>>  	}

>>  

>>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {

>> +		/* update archtecture specific hints for APIC

>> +		 * virtual interrupt delivery */

>> +		if (kvm_x86_ops->update_irq)

>> +			kvm_x86_ops->update_irq(vcpu);

>> +

>>  		inject_pending_event(vcpu);

>>  

>>  		/* enable NMI/IRQ window open exits if needed */

>>  		if (vcpu->arch.nmi_pending)

>>  			kvm_x86_ops->enable_nmi_window(vcpu);

>> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)

>> +		else if (kvm_apic_vid_enabled(vcpu)) {

>> +			if (kvm_cpu_has_extint(vcpu))

>> +				kvm_x86_ops->enable_irq_window(vcpu);

> 

> If RVI is non-zero, then interrupt window should not be enabled,

> accordingly to 29.2.2:

> 

> "If a virtual interrupt has been recognized (see Section 29.2.1), it will

> be delivered at an instruction boundary when the following conditions all

> hold: (1) RFLAGS.IF = 1; (2) there is no blocking by STI; (3) there is no

> blocking by MOV SS or by POP SS; and (4) the “interrupt-window exiting”

> VM-execution control is 0."

Right. Must check RVI here.

>> +		} else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)

>>  			kvm_x86_ops->enable_irq_window(vcpu);

>>  

>>  		if (kvm_lapic_enabled(vcpu)) {

>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c

>> index 166c450..898aa62 100644

>> --- a/virt/kvm/ioapic.c

>> +++ b/virt/kvm/ioapic.c

>> @@ -186,6 +186,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int

> irq)

>>  		/* need to read apic_id from apic regiest since 		 * it can be

>>  rewritten */ 		irqe.dest_id = ioapic->kvm->bsp_vcpu_id;

>>  +		kvm_set_eoi_exitmap(ioapic->kvm->vcpus[0], irqe.vector, 1, 1); 	}

>>  #endif 	return kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe);



Best regards,
Yang

Gleb Natapov Dec. 5, 2012, 5:02 a.m. UTC | #8

On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2012-12-04:
> > On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
> >> Gleb Natapov wrote on 2012-12-03:
> >>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
> >>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> >>>> manually, which is fully taken care of by the hardware. This needs
> >>>> some special awareness into existing interrupr injection path:
> >>>> 
> >>>> - for pending interrupt, instead of direct injection, we may need
> >>>>   update architecture specific indicators before resuming to guest. -
> >>>>   A pending interrupt, which is masked by ISR, should be also
> >>>>   considered in above update action, since hardware will decide when
> >>>>   to inject it at right time. Current has_interrupt and get_interrupt
> >>>>   only returns a valid vector from injection p.o.v.
> >>> Most of my previous comments still apply.
> >>> 
> >>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> >>>> +		int trig_mode, int always_set)
> >>>> +{
> >>>> +	if (kvm_x86_ops->set_eoi_exitmap)
> >>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
> >>>> +					trig_mode, always_set);
> >>>> +}
> >>>> +
> >>>>  /*
> >>>>   * Add a pending IRQ into lapic.
> >>>>   * Return 1 if successfully added and 0 if discarded.
> >>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic,
> > int
> >>> delivery_mode,
> >>>>  		if (unlikely(!apic_enabled(apic)))
> >>>>  			break;
> >>>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
> >>> As I said in the last review rebuild the bitmap when ioapic or irq
> >>> notifier configuration changes, user request bit to notify vcpus to
> >>> reload the bitmap.
> >> It is too complicated. When program ioapic entry, we cannot get the target vcpu
> > easily. We need to read destination format register and logical destination
> > register to find out target vcpu if using logical mode. Also, we must trap every
> > modification to the two registers to update eoi bitmap.
> > No need to check target vcpu. Enable exit on all vcpus for the vector
> This is wrong. As we known, modern OS uses per VCPU vector. We cannot ensure all vectors have same trigger mode. And what's worse, the vector in another vcpu is used to handle high frequency interrupts(like 10G NIC), then it will hurt performance.
> 
I never saw OSes reuse vector used by ioapic, as far as I see this
is not how Linux code works. Furthermore it will not work with KVM
currently since apic eoi redirected to ioapic based on vector alone,
not vector/vcpu pair and as far as I am aware this is how real HW works.

> > programmed into ioapic. Which two registers? All accesses to ioapic are
> > trapped and reconfiguration is rare.
> In logical mode, the destination VCPU is depend on each CPU's destination format register and logical destination register. So we must also trap the two registers.
> And if it uses lowest priority delivery mode, the PPR need to be trapped too. Since PPR will change on each interrupt injection, the cost should be higher than current approach.
No need for all of that if bitmask it global.

> 
> >> For irq notifier, only PIT is special which is edge trigger but need an
> >> EOI notifier. So, just treat it specially. And TMR can cover others.
> >> 
> > We shouldn't assume that. If another notifier will be added it will be
> > easy to forget to update apicv code to exclude another vector too.
> At this point, guest is not running(in device initializing), we cannot not know the vector. As you mentioned, the best point is when guest program ioapic entry. But it also is impossible to get the vector(see above).
> I can give some comments on the function to remind the caller to update eoi bitmap when the interrupt is edge and they still want to get EOI vmexit.
> 
> >>> 
> >>>>  		if (trig_mode) {
> >>>>  			apic_debug("level trig mode for vector %d", vector);
> >>>>  			apic_set_vector(vector, apic->regs + APIC_TMR);
> >>>> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu
> > *vcpu1,
> >>> struct kvm_vcpu *vcpu2)
> >>>>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
> >>>>  }
> >>>> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> >>>> +{ +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) &
> >>>> APIC_SPIV_DIRECTED_EOI) && +	   
> >>>> kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) { +		int
> >>>> trigger_mode; +		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> >>>> +			trigger_mode = IOAPIC_LEVEL_TRIG; +		else +			trigger_mode =
> >>>> IOAPIC_EDGE_TRIG; +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector,
> >>>> trigger_mode); +	} +} +
> >>>>  static int apic_set_eoi(struct kvm_lapic *apic) { 	int vector =
> >>>>  apic_find_highest_isr(apic); @@ -756,19 +778,24 @@ static int
> >>>>  apic_set_eoi(struct kvm_lapic *apic) 	apic_clear_isr(vector, apic);
> >>>>  	apic_update_ppr(apic);
> >>>> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> >>>> && -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) { -		int
> >>>> trigger_mode; -		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> >>>> -			trigger_mode = IOAPIC_LEVEL_TRIG; -		else -			trigger_mode =
> >>>> IOAPIC_EDGE_TRIG; -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector,
> >>>> trigger_mode); -	} +	kvm_ioapic_send_eoi(apic, vector);
> >>>>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> >>>>  	return vector;
> >>>>  }
> >>>> +/*
> >>>> + * this interface assumes a trap-like exit, which has already finished
> >>>> + * desired side effect including vISR and vPPR update.
> >>>> + */
> >>>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
> >>>> +{
> >>>> +	struct kvm_lapic *apic = vcpu->arch.apic;
> >>>> +
> >>> trace_kvm_eoi()
> >> Ok.
> >> 
> >>>> +	kvm_ioapic_send_eoi(apic, vector);
> >>>> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
> >>>> +
> >>>>  static void apic_send_ipi(struct kvm_lapic *apic) { 	u32 icr_low =
> >>>>  kvm_apic_get_reg(apic, APIC_ICR); @@ -1533,6 +1560,17 @@ int
> >>>>  kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) 	return highest_irr; }
> >>>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	struct kvm_lapic *apic = vcpu->arch.apic;
> >>>> +
> >>>> +	if (!apic || !apic_enabled(apic))
> >>> Use kvm_vcpu_has_lapic() instead of checking arch.apic directly.
> >> Ok.
> >> 
> >>> 
> >>>> +		return -1;
> >>>> +
> >>>> +	return apic_find_highest_irr(apic);
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(kvm_apic_get_highest_irr);
> >>>> +
> >>>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
> >>>>  {
> >>>>  	u32 lvt0 = kvm_apic_get_reg(vcpu->arch.apic, APIC_LVT0);
> >>>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> >>>> index c42f111..749661a 100644
> >>>> --- a/arch/x86/kvm/lapic.h
> >>>> +++ b/arch/x86/kvm/lapic.h
> >>>> @@ -39,6 +39,9 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
> >>>>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
> >>>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
> >>>>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
> >>>> +int kvm_cpu_has_extint(struct kvm_vcpu *v);
> >>>> +int kvm_cpu_get_extint(struct kvm_vcpu *v);
> >>>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu);
> >>>>  void kvm_lapic_reset(struct kvm_vcpu *vcpu); u64
> >>>>  kvm_lapic_get_cr8(struct kvm_vcpu *vcpu); void
> >>>>  kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8); @@ -50,6
> >>>>  +53,8 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu); int
> >>>>  kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); int
> >>>>  kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); int
> >>>>  kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq);
> >>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> >>>> +		int need_eoi, int global);
> >>>>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
> >>>>  
> >>>>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic
> > *src,
> >>>> @@ -65,6 +70,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu
> >>> *vcpu);
> >>>>  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
> >>>>  
> >>>>  int kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
> >>>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
> >>>> 
> >>>>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t
> >>>>  vapic_addr); void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
> >>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >>>> index dcb7952..8f0903b 100644
> >>>> --- a/arch/x86/kvm/svm.c
> >>>> +++ b/arch/x86/kvm/svm.c
> >>>> @@ -3573,6 +3573,22 @@ static void update_cr8_intercept(struct
> > kvm_vcpu
> >>> *vcpu, int tpr, int irr)
> >>>>  		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> >>>>  }
> >>>> +static int svm_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	return 0;
> >>>> +}
> >>>> +
> >>>> +static void svm_update_irq(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	return ;
> >>>> +}
> >>>> +
> >>>> +static void svm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> >>>> +				int trig_mode, int always_set)
> >>>> +{
> >>>> +	return ;
> >>>> +}
> >>>> +
> >>>>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu) { 	struct vcpu_svm
> >>>>  *svm = to_svm(vcpu); @@ -4292,6 +4308,9 @@ static struct kvm_x86_ops
> >>>>  svm_x86_ops = { 	.enable_nmi_window = enable_nmi_window,
> >>>>  	.enable_irq_window = enable_irq_window, 	.update_cr8_intercept =
> >>>>  update_cr8_intercept,
> >>>> +	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,
> >>>> +	.update_irq = svm_update_irq;
> >>>> +	.set_eoi_exitmap = svm_set_eoi_exitmap;
> >>>> 
> >>>>  	.set_tss_addr = svm_set_tss_addr,
> >>>>  	.get_tdp_level = get_npt_level,
> >>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>> index 6a5f651..909ce90 100644
> >>>> --- a/arch/x86/kvm/vmx.c
> >>>> +++ b/arch/x86/kvm/vmx.c
> >>>> @@ -86,6 +86,9 @@ module_param(fasteoi, bool, S_IRUGO);
> >>>>  static bool __read_mostly enable_apicv_reg;
> >>>>  module_param(enable_apicv_reg, bool, S_IRUGO);
> >>>> +static bool __read_mostly enable_apicv_vid;
> >>>> +module_param(enable_apicv_vid, bool, S_IRUGO);
> >>>> +
> >>>>  /*
> >>>>   * If nested=1, nested virtualization is supported, i.e., guests may use
> >>>>   * VMX and be a hypervisor for its own guests. If nested=0, guests may
> > not
> >>>> @@ -432,6 +435,9 @@ struct vcpu_vmx {
> >>>> 
> >>>>  	bool rdtscp_enabled;
> >>>> +	u8 eoi_exitmap_changed;
> >>>> +	u32 eoi_exit_bitmap[8];
> >>>> +
> >>>>  	/* Support for a guest hypervisor (nested VMX) */
> >>>>  	struct nested_vmx nested;
> >>>>  };
> >>>> @@ -770,6 +776,12 @@ static inline bool
> >>> cpu_has_vmx_apic_register_virt(void)
> >>>>  		SECONDARY_EXEC_APIC_REGISTER_VIRT;
> >>>>  }
> >>>> +static inline bool cpu_has_vmx_virtual_intr_delivery(void)
> >>>> +{
> >>>> +	return vmcs_config.cpu_based_2nd_exec_ctrl &
> >>>> +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> >>>> +}
> >>>> +
> >>>>  static inline bool cpu_has_vmx_flexpriority(void)
> >>>>  {
> >>>>  	return cpu_has_vmx_tpr_shadow() &&
> >>>> @@ -2508,7 +2520,8 @@ static __init int setup_vmcs_config(struct
> >>> vmcs_config *vmcs_conf)
> >>>>  			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
> >>>>  			SECONDARY_EXEC_RDTSCP |
> >>>>  			SECONDARY_EXEC_ENABLE_INVPCID |
> >>>> -			SECONDARY_EXEC_APIC_REGISTER_VIRT;
> >>>> +			SECONDARY_EXEC_APIC_REGISTER_VIRT |
> >>>> +			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> >>>>  		if (adjust_vmx_controls(min2, opt2,
> >>>>  					MSR_IA32_VMX_PROCBASED_CTLS2,
> >>>>  					&_cpu_based_2nd_exec_control) < 0)
> >>>> @@ -2522,7 +2535,8 @@ static __init int setup_vmcs_config(struct
> >>>> vmcs_config *vmcs_conf)
> >>>> 
> >>>>  	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
> >>>>  		_cpu_based_2nd_exec_control &= ~(
> >>>> -				SECONDARY_EXEC_APIC_REGISTER_VIRT);
> >>>> +				SECONDARY_EXEC_APIC_REGISTER_VIRT |
> >>>> +				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> >>>> 
> >>>>  	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) { 		/*
> >>>>  CR3 accesses and invlpg don't need to cause VM Exits when EPT @@
> >>>>  -2724,6 +2738,14 @@ static __init int hardware_setup(void) 	if
> >>>>  (!cpu_has_vmx_apic_register_virt()) 		enable_apicv_reg = 0;
> >>>> +	if (!cpu_has_vmx_virtual_intr_delivery())
> >>>> +		enable_apicv_vid = 0;
> >>>> +
> >>>> +	if (!enable_apicv_vid) {
> >>>> +		kvm_x86_ops->update_irq = NULL;
> >>> Why setting it to NULL? Either drop this since vmx_update_irq() checks
> >>> enable_apicv_vid or better set it to function that does nothing and
> >>> drop enable_apicv_vid check in vmx_update_irq(). Since
> >>> kvm_x86_ops->update_irq will never be NULL you can drop the check
> >>> before calling it.
> >> Sure.
> >> 
> >>>> +		kvm_x86_ops->update_cr8_intercept = NULL;
> >>> Why? It should be other way around: if apicv is enabled set
> >>> update_cr8_intercept callback to NULL.
> >> Yes, this is wrong.
> > Please test the patches with vid disabled and Windows guests. This bug
> > should have prevented it from working.
> > 
> >> 
> >>>> +	}
> >>>> +
> >>>>  	if (nested)
> >>>>  		nested_vmx_setup_ctls_msrs();
> >>>> @@ -3839,6 +3861,8 @@ static u32 vmx_secondary_exec_control(struct
> >>> vcpu_vmx *vmx)
> >>>>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
> >>>>  	if (!enable_apicv_reg)
> >>>>  		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
> >>>> +	if (!enable_apicv_vid)
> >>>> +		exec_control &= ~SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> >>>>  	return exec_control;
> >>>>  }
> >>>> @@ -3883,6 +3907,15 @@ static int vmx_vcpu_setup(struct vcpu_vmx
> > *vmx)
> >>>>  				vmx_secondary_exec_control(vmx));
> >>>>  	}
> >>>> +	if (enable_apicv_vid) {
> >>>> +		vmcs_write64(EOI_EXIT_BITMAP0, 0);
> >>>> +		vmcs_write64(EOI_EXIT_BITMAP1, 0);
> >>>> +		vmcs_write64(EOI_EXIT_BITMAP2, 0);
> >>>> +		vmcs_write64(EOI_EXIT_BITMAP3, 0);
> >>>> +
> >>>> +		vmcs_write16(GUEST_INTR_STATUS, 0);
> >>>> +	}
> >>>> +
> >>>>  	if (ple_gap) {
> >>>>  		vmcs_write32(PLE_GAP, ple_gap);
> >>>>  		vmcs_write32(PLE_WINDOW, ple_window);
> >>>> @@ -4806,6 +4839,16 @@ static int handle_apic_access(struct kvm_vcpu
> >>> *vcpu)
> >>>>  	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
> >>>>  }
> >>>> +static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> >>>> +	int vector = exit_qualification & 0xff;
> >>>> +
> >>>> +	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
> >>>> +	kvm_apic_set_eoi_accelerated(vcpu, vector);
> >>>> +	return 1;
> >>>> +}
> >>>> +
> >>>>  static int handle_apic_write(struct kvm_vcpu *vcpu)
> >>>>  {
> >>>>  	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> >>>> @@ -5755,6 +5798,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct
> >>> kvm_vcpu *vcpu) = {
> >>>>  	[EXIT_REASON_TPR_BELOW_THRESHOLD]     = handle_tpr_below_threshold,
> >>>>  	[EXIT_REASON_APIC_ACCESS]             = handle_apic_access,
> >>>>  	[EXIT_REASON_APIC_WRITE]              = handle_apic_write,
> >>>>  +	[EXIT_REASON_EOI_INDUCED]             = handle_apic_eoi_induced,
> >>>>  	[EXIT_REASON_WBINVD]                  = handle_wbinvd,
> >>>>  	[EXIT_REASON_XSETBV]                  = handle_xsetbv,
> >>>>  	[EXIT_REASON_TASK_SWITCH]             = handle_task_switch,
> >>>> @@ -6096,6 +6140,11 @@ static int vmx_handle_exit(struct kvm_vcpu
> >>>> *vcpu)
> >>>> 
> >>>>  static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
> >>>>  {
> >>>> +	/* no need for tpr_threshold update if APIC virtual
> >>>> +	 * interrupt delivery is enabled */
> >>>> +	if (!enable_apicv_vid)
> >>>> +		return ;
> >>>> +
> >>> Since you (will) set ->update_cr8_intercept callback to NULL if vid
> >>> is enabled this function will never be called with !enable_apicv_vid,
> >>> so this check can be dropped.
> >> Ok.
> >> 
> >>>>  	if (irr == -1 || tpr < irr) {
> >>>>  		vmcs_write32(TPR_THRESHOLD, 0);
> >>>>  		return;
> >>>> @@ -6104,6 +6153,90 @@ static void update_cr8_intercept(struct
> > kvm_vcpu
> >>> *vcpu, int tpr, int irr)
> >>>>  	vmcs_write32(TPR_THRESHOLD, irr);
> >>>>  }
> >>>> +static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	return irqchip_in_kernel(vcpu->kvm) && enable_apicv_vid;
> >>>> +}
> >>>> +
> >>>> +static void vmx_update_eoi_exitmap(struct vcpu_vmx *vmx, int index)
> >>>> +{
> >>>> +	int tmr;
> >>>> +	tmr = kvm_apic_get_reg(vmx->vcpu.arch.apic,
> >>>> +			APIC_TMR + 0x10 * index);
> >>>> +	vmcs_write32(EOI_EXIT_BITMAP0 + index,
> >>>> +			vmx->eoi_exit_bitmap[index] | tmr);
> >>>> +}
> >>>> +
> >>>> +static void vmx_update_rvi(int vector)
> >>>> +{
> >>>> +	u16 status;
> >>>> +	u8 old;
> >>>> +
> >>>> +	status = vmcs_read16(GUEST_INTR_STATUS);
> >>>> +	old = (u8)status & 0xff;
> >>>> +	if ((u8)vector != old) {
> >>>> +		status &= ~0xff;
> >>>> +		status |= (u8)vector;
> >>>> +		vmcs_write16(GUEST_INTR_STATUS, status);
> >>>> +	}
> >>>> +}
> >>>> +
> >>>> +static void vmx_update_irq(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	int vector;
> >>>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> >>>> +
> >>>> +	if (!enable_apicv_vid)
> >>>> +		return ;
> >>>> +
> >>>> +	vector = kvm_apic_get_highest_irr(vcpu);
> >>>> +	if (vector == -1)
> >>>> +		return;
> >>>> +
> >>>> +	vmx_update_rvi(vector);
> >>>> +
> >>>> +	if (vmx->eoi_exitmap_changed) {
> >>>> +		int index;
> >>>> +		for_each_set_bit(index,
> >>>> +				(unsigned long *)(&vmx->eoi_exitmap_changed), 8)
> >>>> +			vmx_update_eoi_exitmap(vmx, index);
> >>>> +		vmx->eoi_exitmap_changed = 0;
> >>>> +	}
> >>>> +}
> >>>> +
> >>>> +static void vmx_set_eoi_exitmap(struct kvm_vcpu *vcpu,
> >>>> +				int vector, int trig_mode,
> >>>> +				int always_set)
> >>>> +{
> >>>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> >>>> +	int index, offset, changed;
> >>>> +	struct kvm_lapic *apic;
> >>>> +
> >>>> +	if (!enable_apicv_vid)
> >>>> +		return ;
> >>>> +
> >>>> +	if (WARN_ONCE((vector < 0) || (vector > 255),
> >>>> +		"KVM VMX: vector (%d) out of range\n", vector))
> >>>> +		return;
> >>>> +
> >>>> +	apic = vcpu->arch.apic;
> >>>> +	index = vector >> 5;
> >>>> +	offset = vector & 31;
> >>>> +
> >>>> +	if (always_set)
> >>>> +		changed = !test_and_set_bit(offset,
> >>>> +				(unsigned long *)&vmx->eoi_exit_bitmap);
> >>>> +	else if (trig_mode)
> >>>> +		changed = !test_bit(offset,
> >>>> +				apic->regs + APIC_TMR + index * 0x10);
> >>>> +	else
> >>>> +		changed = test_bit(offset,
> >>>> +				apic->regs + APIC_TMR + index * 0x10);
> >>>> +
> >>>> +	if (changed)
> >>>> +		vmx->eoi_exitmap_changed |= 1 << index;
> >>>> +}
> >>>> +
> >>>>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx) { 	u32
> >>>>  exit_intr_info; @@ -7364,6 +7497,9 @@ static struct kvm_x86_ops
> >>>>  vmx_x86_ops = { 	.enable_nmi_window = enable_nmi_window,
> >>>>  	.enable_irq_window = enable_irq_window, 	.update_cr8_intercept =
> >>>>  update_cr8_intercept,
> >>>> +	.has_virtual_interrupt_delivery = vmx_has_virtual_interrupt_delivery,
> >>>> +	.update_irq = vmx_update_irq,
> >>>> +	.set_eoi_exitmap = vmx_set_eoi_exitmap,
> >>>> 
> >>>>  	.set_tss_addr = vmx_set_tss_addr,
> >>>>  	.get_tdp_level = get_ept_level,
> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
> >>>> b0b8abe..02fe194 100644 --- a/arch/x86/kvm/x86.c +++
> >>>> b/arch/x86/kvm/x86.c @@ -164,6 +164,14 @@ static int
> >>>> emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
> >>>> 
> >>>>  static int kvm_vcpu_reset(struct kvm_vcpu *vcpu);
> >>>> +static inline bool kvm_apic_vid_enabled(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	if (kvm_x86_ops->has_virtual_interrupt_delivery)
> >>> This callback is never NULL.
> >> Ok.
> >> 
> >>>> +		return kvm_x86_ops->has_virtual_interrupt_delivery(vcpu);
> >>>> +
> >>>> +	return 0;
> >>>> +}
> >>>> +
> >>>>  static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
> >>>>  {
> >>>>  	int i;
> >>>> @@ -5533,12 +5541,20 @@ static void inject_pending_event(struct
> > kvm_vcpu
> >>> *vcpu)
> >>>>  			vcpu->arch.nmi_injected = true;
> >>>>  			kvm_x86_ops->set_nmi(vcpu);
> >>>>  		}
> >>>> -	} else if (kvm_cpu_has_interrupt(vcpu)) {
> >>>> -		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
> >>>> -			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
> >>>> -					    false);
> >>>> +	} else if (kvm_cpu_has_interrupt(vcpu) &&
> >>>> +			kvm_x86_ops->interrupt_allowed(vcpu)) {
> >>>> +		int vector = -1;
> >>>> +
> >>>> +		if (kvm_apic_vid_enabled(vcpu))
> >>>> +			vector = kvm_cpu_get_extint(vcpu);
> >>>> +		else
> >>>> +			vector = kvm_cpu_get_interrupt(vcpu);
> >>>> +
> >>>> +		if (vector != -1) {
> >>>> +			kvm_queue_interrupt(vcpu, vector, false);
> >>>>  			kvm_x86_ops->set_irq(vcpu);
> >>>>  		}
> >>> If vid is enabled kvm_cpu_has_interrupt() should return true only if there
> >>> is extint interrupt. Similarly kvm_cpu_get_interrupt() will only return
> >>> extint if vid is enabled. This basically moves kvm_apic_vid_enabled()
> >>> logic deeper into kvm_cpu_(has|get)_interrupt() functions instead
> >>> of changing interrupt injection logic here and in vcpu_enter_guest()
> >>> bellow. We still need kvm_cpu_has_interrupt() variant that always checks
> >>> both extint and apic for use in kvm_arch_vcpu_runnable() though.
> >> As you mentioned, we still need to checks both extint and apic interrupt in
> > some case. So how to do this? Introduce another argument to indicate
> > whether check both? Yes, we need to check both in
> > kvm_arch_vcpu_runnable(). Another argument is good option. We can have
> > two functions: kvm_cpu_has_injectable_interrupt() for use in irq
> > injection path and kvm_cpu_has_interrupt() for use in
> > kvm_arch_vcpu_runnable(). They will call common one with additional
> > argument to avoid code duplication.
> Ok. will follow this way.
> 
> >> 
> >>>> +
> >>>>  	}
> >>>>  }
> >>>> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
> >>> *vcpu)
> >>>>  	}
> >>>>  
> >>>>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> >>>> +		/* update archtecture specific hints for APIC
> >>>> +		 * virtual interrupt delivery */
> >>>> +		if (kvm_x86_ops->update_irq)
> >>>> +			kvm_x86_ops->update_irq(vcpu);
> >>>> +
> >>> 
> >>> I do not see why this have to be here instead of inside if
> >>> (kvm_lapic_enabled(vcpu)){} near update_cr8_intercept() a couple of
> >>> lines bellow. If you move it there you can drop apic enable check in
> >>> kvm_apic_get_highest_irr().
> >> Yes, it seems ok to move it.
> >> 
> >>>>  		inject_pending_event(vcpu);
> >>>>  
> >>>>  		/* enable NMI/IRQ window open exits if needed */
> >>>>  		if (vcpu->arch.nmi_pending)
> >>>>  			kvm_x86_ops->enable_nmi_window(vcpu);
> >>>> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> >>>> +		else if (kvm_apic_vid_enabled(vcpu)) {
> >>>> +			if (kvm_cpu_has_extint(vcpu))
> >>>> +				kvm_x86_ops->enable_irq_window(vcpu);
> >>>> +		} else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> >>>>  			kvm_x86_ops->enable_irq_window(vcpu);
> >>>>  
> >>>>  		if (kvm_lapic_enabled(vcpu)) {
> >>>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> >>>> index 166c450..898aa62 100644
> >>>> --- a/virt/kvm/ioapic.c
> >>>> +++ b/virt/kvm/ioapic.c
> >>>> @@ -186,6 +186,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int
> >>> irq)
> >>>>  		/* need to read apic_id from apic regiest since 		 * it can be
> >>>>  rewritten */ 		irqe.dest_id = ioapic->kvm->bsp_vcpu_id;
> >>>>  +		kvm_set_eoi_exitmap(ioapic->kvm->vcpus[0], irqe.vector, 1, 1); 	}
> >>>>  #endif 	return kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe);
> >>>> --
> >>>> 1.7.1
> >>> 
> >>> --
> >>> 			Gleb.
> >> 
> >> 
> >> Best regards,
> >> Yang
> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > --
> > 			Gleb.
> 
> 
> Best regards,
> Yang
> 

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zhang, Yang Z Dec. 5, 2012, 6:02 a.m. UTC | #9

Gleb Natapov wrote on 2012-12-05:
> On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote:
>> Gleb Natapov wrote on 2012-12-04:
>>> On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
>>>> Gleb Natapov wrote on 2012-12-03:
>>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
>>>>>> manually, which is fully taken care of by the hardware. This needs
>>>>>> some special awareness into existing interrupr injection path:
>>>>>> 
>>>>>> - for pending interrupt, instead of direct injection, we may need
>>>>>>   update architecture specific indicators before resuming to guest. -
>>>>>>   A pending interrupt, which is masked by ISR, should be also
>>>>>>   considered in above update action, since hardware will decide when
>>>>>>   to inject it at right time. Current has_interrupt and get_interrupt
>>>>>>   only returns a valid vector from injection p.o.v.
>>>>> Most of my previous comments still apply.
>>>>> 
>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>>>> +		int trig_mode, int always_set)
>>>>>> +{
>>>>>> +	if (kvm_x86_ops->set_eoi_exitmap)
>>>>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
>>>>>> +					trig_mode, always_set);
>>>>>> +}
>>>>>> +
>>>>>>  /*
>>>>>>   * Add a pending IRQ into lapic.
>>>>>>   * Return 1 if successfully added and 0 if discarded.
>>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic
> *apic,
>>> int
>>>>> delivery_mode,
>>>>>>  		if (unlikely(!apic_enabled(apic)))
>>>>>>  			break;
>>>>>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
>>>>> As I said in the last review rebuild the bitmap when ioapic or irq
>>>>> notifier configuration changes, user request bit to notify vcpus to
>>>>> reload the bitmap.
>>>> It is too complicated. When program ioapic entry, we cannot get the target
> vcpu
>>> easily. We need to read destination format register and logical destination
>>> register to find out target vcpu if using logical mode. Also, we must trap every
>>> modification to the two registers to update eoi bitmap.
>>> No need to check target vcpu. Enable exit on all vcpus for the vector
>> This is wrong. As we known, modern OS uses per VCPU vector. We cannot
> ensure all vectors have same trigger mode. And what's worse, the vector in
> another vcpu is used to handle high frequency interrupts(like 10G NIC), then it
> will hurt performance.
>> 
> I never saw OSes reuse vector used by ioapic, as far as I see this
Could you point out which code does this check in Linux kernel? I don't see any special checks when Linux kernel allocates a vector.

> is not how Linux code works. Furthermore it will not work with KVM
> currently since apic eoi redirected to ioapic based on vector alone,
> not vector/vcpu pair and as far as I am aware this is how real HW works.
yes, real HW works in this way. But why it is helpful in this case?

>>> programmed into ioapic. Which two registers? All accesses to ioapic are
>>> trapped and reconfiguration is rare.
>> In logical mode, the destination VCPU is depend on each CPU's destination
> format register and logical destination register. So we must also trap the two
> registers.
>> And if it uses lowest priority delivery mode, the PPR need to be trapped too.
> Since PPR will change on each interrupt injection, the cost should be higher than
> current approach.
> No need for all of that if bitmask it global.
No, the bitmask is per VCPU. Also, why it will work if bitmask is global?

>> 
>>>> For irq notifier, only PIT is special which is edge trigger but need an
>>>> EOI notifier. So, just treat it specially. And TMR can cover others.
>>>> 
>>> We shouldn't assume that. If another notifier will be added it will be
>>> easy to forget to update apicv code to exclude another vector too.
>> At this point, guest is not running(in device initializing), we cannot not know the
> vector. As you mentioned, the best point is when guest program ioapic entry. But
> it also is impossible to get the vector(see above).
>> I can give some comments on the function to remind the caller to update
>> eoi bitmap when the interrupt is edge and they still want to get EOI
>> vmexit.
>> 
>>>>> 
>>>>>>  		if (trig_mode) {
>>>>>>  			apic_debug("level trig mode for vector %d", vector);
>>>>>>  			apic_set_vector(vector, apic->regs + APIC_TMR);
>>>>>> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu
>>> *vcpu1,
>>>>> struct kvm_vcpu *vcpu2)
>>>>>>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>>>>>>  }
>>>>>> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int
>>>>>> vector) +{ +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) &
>>>>>> APIC_SPIV_DIRECTED_EOI) && +
>>>>>> kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) { +		int
>>>>>> trigger_mode; +		if (apic_test_vector(vector, apic->regs +
>>>>>> APIC_TMR)) +			trigger_mode = IOAPIC_LEVEL_TRIG; +		else +
>>>>>> 		trigger_mode = IOAPIC_EDGE_TRIG;
>>>>>> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
>>>>>> +	} +} +
>>>>>>  static int apic_set_eoi(struct kvm_lapic *apic) { 	int vector =
>>>>>>  apic_find_highest_isr(apic); @@ -756,19 +778,24 @@ static int
>>>>>>  apic_set_eoi(struct kvm_lapic *apic) 	apic_clear_isr(vector, apic);
>>>>>>  	apic_update_ppr(apic);
>>>>>> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>>>>>> && -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) { -
>>>>>> 		int trigger_mode; -		if (apic_test_vector(vector, apic->regs +
>>>>>> APIC_TMR)) -			trigger_mode = IOAPIC_LEVEL_TRIG; -		else -
>>>>>> 		trigger_mode = IOAPIC_EDGE_TRIG;
>>>>>> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
>>>>>> -	} +	kvm_ioapic_send_eoi(apic, vector);
>>>>>>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>>>>>  	return vector;
>>>>>>  }
>>>>>> +/*
>>>>>> + * this interface assumes a trap-like exit, which has already finished
>>>>>> + * desired side effect including vISR and vPPR update.
>>>>>> + */
>>>>>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
>>>>>> +{
>>>>>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>>>>>> +
>>>>> trace_kvm_eoi()
>>>> Ok.
>>>> 
>>>>>> +	kvm_ioapic_send_eoi(apic, vector);
>>>>>> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
>>>>>> +
>>>>>>  static void apic_send_ipi(struct kvm_lapic *apic) { 	u32 icr_low =
>>>>>>  kvm_apic_get_reg(apic, APIC_ICR); @@ -1533,6 +1560,17 @@ int
>>>>>>  kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) 	return highest_irr; }
>>>>>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>>>>>> +
>>>>>> +	if (!apic || !apic_enabled(apic))
>>>>> Use kvm_vcpu_has_lapic() instead of checking arch.apic directly.
>>>> Ok.
>>>> 
>>>>> 
>>>>>> +		return -1;
>>>>>> +
>>>>>> +	return apic_find_highest_irr(apic);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(kvm_apic_get_highest_irr);
>>>>>> +
>>>>>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
>>>>>>  {
>>>>>>  	u32 lvt0 = kvm_apic_get_reg(vcpu->arch.apic, APIC_LVT0);
>>>>>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
>>>>>> index c42f111..749661a 100644
>>>>>> --- a/arch/x86/kvm/lapic.h
>>>>>> +++ b/arch/x86/kvm/lapic.h
>>>>>> @@ -39,6 +39,9 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
>>>>>>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
>>>>>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
>>>>>>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
>>>>>> +int kvm_cpu_has_extint(struct kvm_vcpu *v);
>>>>>> +int kvm_cpu_get_extint(struct kvm_vcpu *v);
>>>>>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu);
>>>>>>  void kvm_lapic_reset(struct kvm_vcpu *vcpu); u64
>>>>>>  kvm_lapic_get_cr8(struct kvm_vcpu *vcpu); void
>>>>>>  kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8); @@
>>>>>>  -50,6 +53,8 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu);
>>>>>>  int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16
>>>>>>  dest); int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8
>>>>>>  mda); int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct
>>>>>>  kvm_lapic_irq *irq);
>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>>>> +		int need_eoi, int global);
>>>>>>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
>>>>>>  
>>>>>>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic
>>> *src,
>>>>>> @@ -65,6 +70,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct
> kvm_vcpu
>>>>> *vcpu);
>>>>>>  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64
> data);
>>>>>> 
>>>>>>  int kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
>>>>>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
>>>>>> 
>>>>>>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t
>>>>>>  vapic_addr); void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
>>>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>>>> index dcb7952..8f0903b 100644
>>>>>> --- a/arch/x86/kvm/svm.c
>>>>>> +++ b/arch/x86/kvm/svm.c
>>>>>> @@ -3573,6 +3573,22 @@ static void update_cr8_intercept(struct
>>> kvm_vcpu
>>>>> *vcpu, int tpr, int irr)
>>>>>>  		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>>>>>>  }
>>>>>> +static int svm_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static void svm_update_irq(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	return ;
>>>>>> +}
>>>>>> +
>>>>>> +static void svm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>>>> +				int trig_mode, int always_set)
>>>>>> +{
>>>>>> +	return ;
>>>>>> +}
>>>>>> +
>>>>>>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu) { 	struct
>>>>>>  vcpu_svm *svm = to_svm(vcpu); @@ -4292,6 +4308,9 @@ static struct
>>>>>>  kvm_x86_ops svm_x86_ops = { 	.enable_nmi_window =
>>>>>>  enable_nmi_window, 	.enable_irq_window = enable_irq_window,
>>>>>>  	.update_cr8_intercept = update_cr8_intercept,
>>>>>> +	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,
>>>>>> +	.update_irq = svm_update_irq;
>>>>>> +	.set_eoi_exitmap = svm_set_eoi_exitmap;
>>>>>> 
>>>>>>  	.set_tss_addr = svm_set_tss_addr,
>>>>>>  	.get_tdp_level = get_npt_level,
>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>>> index 6a5f651..909ce90 100644
>>>>>> --- a/arch/x86/kvm/vmx.c
>>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>>> @@ -86,6 +86,9 @@ module_param(fasteoi, bool, S_IRUGO);
>>>>>>  static bool __read_mostly enable_apicv_reg;
>>>>>>  module_param(enable_apicv_reg, bool, S_IRUGO);
>>>>>> +static bool __read_mostly enable_apicv_vid;
>>>>>> +module_param(enable_apicv_vid, bool, S_IRUGO);
>>>>>> +
>>>>>>  /*
>>>>>>   * If nested=1, nested virtualization is supported, i.e., guests may use
>>>>>>   * VMX and be a hypervisor for its own guests. If nested=0, guests may
>>> not
>>>>>> @@ -432,6 +435,9 @@ struct vcpu_vmx {
>>>>>> 
>>>>>>  	bool rdtscp_enabled;
>>>>>> +	u8 eoi_exitmap_changed;
>>>>>> +	u32 eoi_exit_bitmap[8];
>>>>>> +
>>>>>>  	/* Support for a guest hypervisor (nested VMX) */
>>>>>>  	struct nested_vmx nested;
>>>>>>  };
>>>>>> @@ -770,6 +776,12 @@ static inline bool
>>>>> cpu_has_vmx_apic_register_virt(void)
>>>>>>  		SECONDARY_EXEC_APIC_REGISTER_VIRT;
>>>>>>  }
>>>>>> +static inline bool cpu_has_vmx_virtual_intr_delivery(void)
>>>>>> +{
>>>>>> +	return vmcs_config.cpu_based_2nd_exec_ctrl &
>>>>>> +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>>>>>> +}
>>>>>> +
>>>>>>  static inline bool cpu_has_vmx_flexpriority(void)
>>>>>>  {
>>>>>>  	return cpu_has_vmx_tpr_shadow() &&
>>>>>> @@ -2508,7 +2520,8 @@ static __init int setup_vmcs_config(struct
>>>>> vmcs_config *vmcs_conf)
>>>>>>  			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
>>>>>>  			SECONDARY_EXEC_RDTSCP |
>>>>>>  			SECONDARY_EXEC_ENABLE_INVPCID |
>>>>>> -			SECONDARY_EXEC_APIC_REGISTER_VIRT;
>>>>>> +			SECONDARY_EXEC_APIC_REGISTER_VIRT |
>>>>>> +			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>>>>>>  		if (adjust_vmx_controls(min2, opt2,
>>>>>>  					MSR_IA32_VMX_PROCBASED_CTLS2,
>>>>>>  					&_cpu_based_2nd_exec_control) < 0)
>>>>>> @@ -2522,7 +2535,8 @@ static __init int setup_vmcs_config(struct
>>>>>> vmcs_config *vmcs_conf)
>>>>>> 
>>>>>>  	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
>>>>>>  		_cpu_based_2nd_exec_control &= ~(
>>>>>> -				SECONDARY_EXEC_APIC_REGISTER_VIRT);
>>>>>> +				SECONDARY_EXEC_APIC_REGISTER_VIRT |
>>>>>> +				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>>>>>> 
>>>>>>  	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
>>>>>>  		/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
>>>>>>  @@ -2724,6 +2738,14 @@ static __init int hardware_setup(void) 	if
>>>>>>  (!cpu_has_vmx_apic_register_virt()) 		enable_apicv_reg = 0;
>>>>>> +	if (!cpu_has_vmx_virtual_intr_delivery())
>>>>>> +		enable_apicv_vid = 0;
>>>>>> +
>>>>>> +	if (!enable_apicv_vid) {
>>>>>> +		kvm_x86_ops->update_irq = NULL;
>>>>> Why setting it to NULL? Either drop this since vmx_update_irq() checks
>>>>> enable_apicv_vid or better set it to function that does nothing and
>>>>> drop enable_apicv_vid check in vmx_update_irq(). Since
>>>>> kvm_x86_ops->update_irq will never be NULL you can drop the check
>>>>> before calling it.
>>>> Sure.
>>>> 
>>>>>> +		kvm_x86_ops->update_cr8_intercept = NULL;
>>>>> Why? It should be other way around: if apicv is enabled set
>>>>> update_cr8_intercept callback to NULL.
>>>> Yes, this is wrong.
>>> Please test the patches with vid disabled and Windows guests. This bug
>>> should have prevented it from working.
>>> 
>>>> 
>>>>>> +	}
>>>>>> +
>>>>>>  	if (nested)
>>>>>>  		nested_vmx_setup_ctls_msrs();
>>>>>> @@ -3839,6 +3861,8 @@ static u32 vmx_secondary_exec_control(struct
>>>>> vcpu_vmx *vmx)
>>>>>>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>>>>>>  	if (!enable_apicv_reg)
>>>>>>  		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
>>>>>> +	if (!enable_apicv_vid)
>>>>>> +		exec_control &= ~SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>>>>>>  	return exec_control;
>>>>>>  }
>>>>>> @@ -3883,6 +3907,15 @@ static int vmx_vcpu_setup(struct vcpu_vmx
>>> *vmx)
>>>>>>  				vmx_secondary_exec_control(vmx));
>>>>>>  	}
>>>>>> +	if (enable_apicv_vid) {
>>>>>> +		vmcs_write64(EOI_EXIT_BITMAP0, 0);
>>>>>> +		vmcs_write64(EOI_EXIT_BITMAP1, 0);
>>>>>> +		vmcs_write64(EOI_EXIT_BITMAP2, 0);
>>>>>> +		vmcs_write64(EOI_EXIT_BITMAP3, 0);
>>>>>> +
>>>>>> +		vmcs_write16(GUEST_INTR_STATUS, 0);
>>>>>> +	}
>>>>>> +
>>>>>>  	if (ple_gap) {
>>>>>>  		vmcs_write32(PLE_GAP, ple_gap);
>>>>>>  		vmcs_write32(PLE_WINDOW, ple_window);
>>>>>> @@ -4806,6 +4839,16 @@ static int handle_apic_access(struct kvm_vcpu
>>>>> *vcpu)
>>>>>>  	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
>>>>>>  }
>>>>>> +static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>>>>>> +	int vector = exit_qualification & 0xff;
>>>>>> +
>>>>>> +	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
>>>>>> +	kvm_apic_set_eoi_accelerated(vcpu, vector);
>>>>>> +	return 1;
>>>>>> +}
>>>>>> +
>>>>>>  static int handle_apic_write(struct kvm_vcpu *vcpu)
>>>>>>  {
>>>>>>  	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>>>>>> @@ -5755,6 +5798,7 @@ static int (*const
> kvm_vmx_exit_handlers[])(struct
>>>>> kvm_vcpu *vcpu) = {
>>>>>>  	[EXIT_REASON_TPR_BELOW_THRESHOLD]     =
>>>>>>  handle_tpr_below_threshold, 	[EXIT_REASON_APIC_ACCESS]            
>>>>>>  = handle_apic_access, 	[EXIT_REASON_APIC_WRITE]              =
>>>>>>  handle_apic_write, +	[EXIT_REASON_EOI_INDUCED]             =
>>>>>>  handle_apic_eoi_induced, 	[EXIT_REASON_WBINVD]                  =
>>>>>>  handle_wbinvd, 	[EXIT_REASON_XSETBV]                  =
>>>>>>  handle_xsetbv, 	[EXIT_REASON_TASK_SWITCH]             =
>>>>>>  handle_task_switch,
>>>>>> @@ -6096,6 +6140,11 @@ static int vmx_handle_exit(struct kvm_vcpu
>>>>>> *vcpu)
>>>>>> 
>>>>>>  static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>>>>>>  {
>>>>>> +	/* no need for tpr_threshold update if APIC virtual
>>>>>> +	 * interrupt delivery is enabled */
>>>>>> +	if (!enable_apicv_vid)
>>>>>> +		return ;
>>>>>> +
>>>>> Since you (will) set ->update_cr8_intercept callback to NULL if vid
>>>>> is enabled this function will never be called with !enable_apicv_vid,
>>>>> so this check can be dropped.
>>>> Ok.
>>>> 
>>>>>>  	if (irr == -1 || tpr < irr) {
>>>>>>  		vmcs_write32(TPR_THRESHOLD, 0);
>>>>>>  		return;
>>>>>> @@ -6104,6 +6153,90 @@ static void update_cr8_intercept(struct
>>> kvm_vcpu
>>>>> *vcpu, int tpr, int irr)
>>>>>>  	vmcs_write32(TPR_THRESHOLD, irr);
>>>>>>  }
>>>>>> +static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	return irqchip_in_kernel(vcpu->kvm) && enable_apicv_vid;
>>>>>> +}
>>>>>> +
>>>>>> +static void vmx_update_eoi_exitmap(struct vcpu_vmx *vmx, int index)
>>>>>> +{
>>>>>> +	int tmr;
>>>>>> +	tmr = kvm_apic_get_reg(vmx->vcpu.arch.apic,
>>>>>> +			APIC_TMR + 0x10 * index);
>>>>>> +	vmcs_write32(EOI_EXIT_BITMAP0 + index,
>>>>>> +			vmx->eoi_exit_bitmap[index] | tmr);
>>>>>> +}
>>>>>> +
>>>>>> +static void vmx_update_rvi(int vector)
>>>>>> +{
>>>>>> +	u16 status;
>>>>>> +	u8 old;
>>>>>> +
>>>>>> +	status = vmcs_read16(GUEST_INTR_STATUS);
>>>>>> +	old = (u8)status & 0xff;
>>>>>> +	if ((u8)vector != old) {
>>>>>> +		status &= ~0xff;
>>>>>> +		status |= (u8)vector;
>>>>>> +		vmcs_write16(GUEST_INTR_STATUS, status);
>>>>>> +	}
>>>>>> +}
>>>>>> +
>>>>>> +static void vmx_update_irq(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	int vector;
>>>>>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>>>>> +
>>>>>> +	if (!enable_apicv_vid)
>>>>>> +		return ;
>>>>>> +
>>>>>> +	vector = kvm_apic_get_highest_irr(vcpu);
>>>>>> +	if (vector == -1)
>>>>>> +		return;
>>>>>> +
>>>>>> +	vmx_update_rvi(vector);
>>>>>> +
>>>>>> +	if (vmx->eoi_exitmap_changed) {
>>>>>> +		int index;
>>>>>> +		for_each_set_bit(index,
>>>>>> +				(unsigned long *)(&vmx->eoi_exitmap_changed), 8)
>>>>>> +			vmx_update_eoi_exitmap(vmx, index);
>>>>>> +		vmx->eoi_exitmap_changed = 0;
>>>>>> +	}
>>>>>> +}
>>>>>> +
>>>>>> +static void vmx_set_eoi_exitmap(struct kvm_vcpu *vcpu,
>>>>>> +				int vector, int trig_mode,
>>>>>> +				int always_set)
>>>>>> +{
>>>>>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>>>>> +	int index, offset, changed;
>>>>>> +	struct kvm_lapic *apic;
>>>>>> +
>>>>>> +	if (!enable_apicv_vid)
>>>>>> +		return ;
>>>>>> +
>>>>>> +	if (WARN_ONCE((vector < 0) || (vector > 255),
>>>>>> +		"KVM VMX: vector (%d) out of range\n", vector))
>>>>>> +		return;
>>>>>> +
>>>>>> +	apic = vcpu->arch.apic;
>>>>>> +	index = vector >> 5;
>>>>>> +	offset = vector & 31;
>>>>>> +
>>>>>> +	if (always_set)
>>>>>> +		changed = !test_and_set_bit(offset,
>>>>>> +				(unsigned long *)&vmx->eoi_exit_bitmap);
>>>>>> +	else if (trig_mode)
>>>>>> +		changed = !test_bit(offset,
>>>>>> +				apic->regs + APIC_TMR + index * 0x10);
>>>>>> +	else
>>>>>> +		changed = test_bit(offset,
>>>>>> +				apic->regs + APIC_TMR + index * 0x10);
>>>>>> +
>>>>>> +	if (changed)
>>>>>> +		vmx->eoi_exitmap_changed |= 1 << index;
>>>>>> +}
>>>>>> +
>>>>>>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx) { 	u32
>>>>>>  exit_intr_info; @@ -7364,6 +7497,9 @@ static struct kvm_x86_ops
>>>>>>  vmx_x86_ops = { 	.enable_nmi_window = enable_nmi_window,
>>>>>>  	.enable_irq_window = enable_irq_window, 	.update_cr8_intercept =
>>>>>>  update_cr8_intercept,
>>>>>> +	.has_virtual_interrupt_delivery = vmx_has_virtual_interrupt_delivery,
>>>>>> +	.update_irq = vmx_update_irq,
>>>>>> +	.set_eoi_exitmap = vmx_set_eoi_exitmap,
>>>>>> 
>>>>>>  	.set_tss_addr = vmx_set_tss_addr,
>>>>>>  	.get_tdp_level = get_ept_level,
>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
>>>>>> b0b8abe..02fe194 100644 --- a/arch/x86/kvm/x86.c +++
>>>>>> b/arch/x86/kvm/x86.c @@ -164,6 +164,14 @@ static int
>>>>>> emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
>>>>>> 
>>>>>>  static int kvm_vcpu_reset(struct kvm_vcpu *vcpu);
>>>>>> +static inline bool kvm_apic_vid_enabled(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	if (kvm_x86_ops->has_virtual_interrupt_delivery)
>>>>> This callback is never NULL.
>>>> Ok.
>>>> 
>>>>>> +		return kvm_x86_ops->has_virtual_interrupt_delivery(vcpu);
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>>>>>>  static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
>>>>>>  {
>>>>>>  	int i;
>>>>>> @@ -5533,12 +5541,20 @@ static void inject_pending_event(struct
>>> kvm_vcpu
>>>>> *vcpu)
>>>>>>  			vcpu->arch.nmi_injected = true;
>>>>>>  			kvm_x86_ops->set_nmi(vcpu);
>>>>>>  		}
>>>>>> -	} else if (kvm_cpu_has_interrupt(vcpu)) {
>>>>>> -		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
>>>>>> -			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
>>>>>> -					    false);
>>>>>> +	} else if (kvm_cpu_has_interrupt(vcpu) &&
>>>>>> +			kvm_x86_ops->interrupt_allowed(vcpu)) {
>>>>>> +		int vector = -1;
>>>>>> +
>>>>>> +		if (kvm_apic_vid_enabled(vcpu))
>>>>>> +			vector = kvm_cpu_get_extint(vcpu);
>>>>>> +		else
>>>>>> +			vector = kvm_cpu_get_interrupt(vcpu);
>>>>>> +
>>>>>> +		if (vector != -1) {
>>>>>> +			kvm_queue_interrupt(vcpu, vector, false);
>>>>>>  			kvm_x86_ops->set_irq(vcpu);
>>>>>>  		}
>>>>> If vid is enabled kvm_cpu_has_interrupt() should return true only if there
>>>>> is extint interrupt. Similarly kvm_cpu_get_interrupt() will only return
>>>>> extint if vid is enabled. This basically moves kvm_apic_vid_enabled()
>>>>> logic deeper into kvm_cpu_(has|get)_interrupt() functions instead
>>>>> of changing interrupt injection logic here and in vcpu_enter_guest()
>>>>> bellow. We still need kvm_cpu_has_interrupt() variant that always checks
>>>>> both extint and apic for use in kvm_arch_vcpu_runnable() though.
>>>> As you mentioned, we still need to checks both extint and apic interrupt in
>>> some case. So how to do this? Introduce another argument to indicate
>>> whether check both? Yes, we need to check both in
>>> kvm_arch_vcpu_runnable(). Another argument is good option. We can have
>>> two functions: kvm_cpu_has_injectable_interrupt() for use in irq
>>> injection path and kvm_cpu_has_interrupt() for use in
>>> kvm_arch_vcpu_runnable(). They will call common one with additional
>>> argument to avoid code duplication.
>> Ok. will follow this way.
>> 
>>>> 
>>>>>> +
>>>>>>  	}
>>>>>>  }
>>>>>> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
>>>>> *vcpu)
>>>>>>  	}
>>>>>>  
>>>>>>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>>>>>> +		/* update archtecture specific hints for APIC
>>>>>> +		 * virtual interrupt delivery */
>>>>>> +		if (kvm_x86_ops->update_irq)
>>>>>> +			kvm_x86_ops->update_irq(vcpu);
>>>>>> +
>>>>> 
>>>>> I do not see why this have to be here instead of inside if
>>>>> (kvm_lapic_enabled(vcpu)){} near update_cr8_intercept() a couple of
>>>>> lines bellow. If you move it there you can drop apic enable check in
>>>>> kvm_apic_get_highest_irr().
>>>> Yes, it seems ok to move it.
>>>> 
>>>>>>  		inject_pending_event(vcpu);
>>>>>>  
>>>>>>  		/* enable NMI/IRQ window open exits if needed */
>>>>>>  		if (vcpu->arch.nmi_pending)
>>>>>>  			kvm_x86_ops->enable_nmi_window(vcpu);
>>>>>> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>>>>>> +		else if (kvm_apic_vid_enabled(vcpu)) {
>>>>>> +			if (kvm_cpu_has_extint(vcpu))
>>>>>> +				kvm_x86_ops->enable_irq_window(vcpu);
>>>>>> +		} else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>>>>>>  			kvm_x86_ops->enable_irq_window(vcpu);
>>>>>>  
>>>>>>  		if (kvm_lapic_enabled(vcpu)) {
>>>>>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>>>>>> index 166c450..898aa62 100644
>>>>>> --- a/virt/kvm/ioapic.c
>>>>>> +++ b/virt/kvm/ioapic.c
>>>>>> @@ -186,6 +186,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic,
> int
>>>>> irq)
>>>>>>  		/* need to read apic_id from apic regiest since 		 * it can be
>>>>>>  rewritten */ 		irqe.dest_id = ioapic->kvm->bsp_vcpu_id;
>>>>>>  +		kvm_set_eoi_exitmap(ioapic->kvm->vcpus[0], irqe.vector, 1, 1);
>>>>>>  	} #endif 	return kvm_irq_delivery_to_apic(ioapic->kvm, NULL,
>>>>>>  &irqe);
>>>>>> --
>>>>>> 1.7.1
>>>>> 
>>>>> --
>>>>> 			Gleb.
>>>> 
>>>> 
>>>> Best regards,
>>>> Yang
>>>> 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
>>> --
>>> 			Gleb.
>> 
>> 
>> Best regards,
>> Yang
>> 
> 
> --
> 			Gleb.


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov Dec. 5, 2012, 10:18 a.m. UTC | #10

On Wed, Dec 05, 2012 at 06:02:59AM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2012-12-05:
> > On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote:
> >> Gleb Natapov wrote on 2012-12-04:
> >>> On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
> >>>> Gleb Natapov wrote on 2012-12-03:
> >>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
> >>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> >>>>>> manually, which is fully taken care of by the hardware. This needs
> >>>>>> some special awareness into existing interrupr injection path:
> >>>>>> 
> >>>>>> - for pending interrupt, instead of direct injection, we may need
> >>>>>>   update architecture specific indicators before resuming to guest. -
> >>>>>>   A pending interrupt, which is masked by ISR, should be also
> >>>>>>   considered in above update action, since hardware will decide when
> >>>>>>   to inject it at right time. Current has_interrupt and get_interrupt
> >>>>>>   only returns a valid vector from injection p.o.v.
> >>>>> Most of my previous comments still apply.
> >>>>> 
> >>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> >>>>>> +		int trig_mode, int always_set)
> >>>>>> +{
> >>>>>> +	if (kvm_x86_ops->set_eoi_exitmap)
> >>>>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
> >>>>>> +					trig_mode, always_set);
> >>>>>> +}
> >>>>>> +
> >>>>>>  /*
> >>>>>>   * Add a pending IRQ into lapic.
> >>>>>>   * Return 1 if successfully added and 0 if discarded.
> >>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic
> > *apic,
> >>> int
> >>>>> delivery_mode,
> >>>>>>  		if (unlikely(!apic_enabled(apic)))
> >>>>>>  			break;
> >>>>>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
> >>>>> As I said in the last review rebuild the bitmap when ioapic or irq
> >>>>> notifier configuration changes, user request bit to notify vcpus to
> >>>>> reload the bitmap.
> >>>> It is too complicated. When program ioapic entry, we cannot get the target
> > vcpu
> >>> easily. We need to read destination format register and logical destination
> >>> register to find out target vcpu if using logical mode. Also, we must trap every
> >>> modification to the two registers to update eoi bitmap.
> >>> No need to check target vcpu. Enable exit on all vcpus for the vector
> >> This is wrong. As we known, modern OS uses per VCPU vector. We cannot
> > ensure all vectors have same trigger mode. And what's worse, the vector in
> > another vcpu is used to handle high frequency interrupts(like 10G NIC), then it
> > will hurt performance.
> >> 
> > I never saw OSes reuse vector used by ioapic, as far as I see this
> Could you point out which code does this check in Linux kernel? I don't see any special checks when Linux kernel allocates a vector.
> 
arch/x86/kernel/apic/io_apic.c:create_irq_nr(). It uses
apic->target_cpus() to get cpu mask. target_cpus() return mask of all
online cpus. Actually you wrote arch_pi_create_irq() in PI patches to
workaround this behaviour and allocated vector per cpu, no?

Are you aware of any guest that I can run, examine ioapic/apic
configuration and see that the same vector is used on different vcpus
for different devices? Can you point me to it?

> > is not how Linux code works. Furthermore it will not work with KVM
> > currently since apic eoi redirected to ioapic based on vector alone,
> > not vector/vcpu pair and as far as I am aware this is how real HW works.
> yes, real HW works in this way. But why it is helpful in this case?
It makes it impossible to use the same vector for different devices on
different cpus if the vector is delivered to at least one cpu through ioapic.
It may cause spurious interrupts, it will bring havoc to our ack
notifiers (albeit this is KVM's implementation problem). Also look at
various comment in arch/x86/kernel/apic/io_apic.c, it looks like ioapics
tend to misbehave if you look at them funny. Who knows what troubles EOIing
the same vector twice on real HW may bring.

> 
> >>> programmed into ioapic. Which two registers? All accesses to ioapic are
> >>> trapped and reconfiguration is rare.
> >> In logical mode, the destination VCPU is depend on each CPU's destination
> > format register and logical destination register. So we must also trap the two
> > registers.
> >> And if it uses lowest priority delivery mode, the PPR need to be trapped too.
> > Since PPR will change on each interrupt injection, the cost should be higher than
> > current approach.
> > No need for all of that if bitmask it global.
> No, the bitmask is per VCPU. Also, why it will work if bitmask is global?
Make in global. Why what will work?

And we need to trap format/logical destination/id registers anyway since
we need to build kvm->arch.apic_map table that is used to deliver
interrupts. BTW you can use this table to build per VCPU eoi bitmask
too, but I am not convinced it is needed in practice.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov Dec. 5, 2012, 11:14 a.m. UTC | #11

On Wed, Dec 05, 2012 at 03:43:41AM +0000, Zhang, Yang Z wrote:
> >> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
> > *vcpu)
> >>  	}
> >>  
> >>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> >> +		/* update archtecture specific hints for APIC
> >> +		 * virtual interrupt delivery */
> >> +		if (kvm_x86_ops->update_irq)
> >> +			kvm_x86_ops->update_irq(vcpu);
> >> +
> >>  		inject_pending_event(vcpu);
> >>  
> >>  		/* enable NMI/IRQ window open exits if needed */
> >>  		if (vcpu->arch.nmi_pending)
> >>  			kvm_x86_ops->enable_nmi_window(vcpu);
> >> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> >> +		else if (kvm_apic_vid_enabled(vcpu)) {
> >> +			if (kvm_cpu_has_extint(vcpu))
> >> +				kvm_x86_ops->enable_irq_window(vcpu);
> > 
> > If RVI is non-zero, then interrupt window should not be enabled,
> > accordingly to 29.2.2:
> > 
> > "If a virtual interrupt has been recognized (see Section 29.2.1), it will
> > be delivered at an instruction boundary when the following conditions all
> > hold: (1) RFLAGS.IF = 1; (2) there is no blocking by STI; (3) there is no
> > blocking by MOV SS or by POP SS; and (4) the “interrupt-window exiting”
> > VM-execution control is 0."
> Right. Must check RVI here.
> 
Why? We request interrupt window here because there is ExtINT interrupt
pending. ExtINT interrupt has a precedence over APIC interrupts (our
current code is incorrect!), so we want vmexit as soon as interrupts are
allowed to inject ExtINT and we do not want virtual interrupt to be
delivered. I think the (4) there is exactly for this situation.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zhang, Yang Z Dec. 5, 2012, 1:51 p.m. UTC | #12

Gleb Natapov wrote on 2012-12-05:
> On Wed, Dec 05, 2012 at 06:02:59AM +0000, Zhang, Yang Z wrote:
>> Gleb Natapov wrote on 2012-12-05:
>>> On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote:
>>>> Gleb Natapov wrote on 2012-12-04:
>>>>> On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
>>>>>> Gleb Natapov wrote on 2012-12-03:
>>>>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
>>>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
>>>>>>>> manually, which is fully taken care of by the hardware. This needs
>>>>>>>> some special awareness into existing interrupr injection path:
>>>>>>>> 
>>>>>>>> - for pending interrupt, instead of direct injection, we may need
>>>>>>>>   update architecture specific indicators before resuming to guest. -
>>>>>>>>   A pending interrupt, which is masked by ISR, should be also
>>>>>>>>   considered in above update action, since hardware will decide when
>>>>>>>>   to inject it at right time. Current has_interrupt and get_interrupt
>>>>>>>>   only returns a valid vector from injection p.o.v.
>>>>>>> Most of my previous comments still apply.
>>>>>>> 
>>>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>>>>>> +		int trig_mode, int always_set)
>>>>>>>> +{
>>>>>>>> +	if (kvm_x86_ops->set_eoi_exitmap)
>>>>>>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
>>>>>>>> +					trig_mode, always_set);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>  /*
>>>>>>>>   * Add a pending IRQ into lapic.
>>>>>>>>   * Return 1 if successfully added and 0 if discarded.
>>>>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic
>>> *apic,
>>>>> int
>>>>>>> delivery_mode,
>>>>>>>>  		if (unlikely(!apic_enabled(apic)))
>>>>>>>>  			break;
>>>>>>>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
>>>>>>> As I said in the last review rebuild the bitmap when ioapic or irq
>>>>>>> notifier configuration changes, user request bit to notify vcpus to
>>>>>>> reload the bitmap.
>>>>>> It is too complicated. When program ioapic entry, we cannot get the
> target
>>> vcpu
>>>>> easily. We need to read destination format register and logical
>>>>> destination register to find out target vcpu if using logical mode.
>>>>> Also, we must trap every modification to the two registers to update
>>>>> eoi bitmap. No need to check target vcpu. Enable exit on all vcpus
>>>>> for the vector
>>>> This is wrong. As we known, modern OS uses per VCPU vector. We cannot
>>> ensure all vectors have same trigger mode. And what's worse, the
>>> vector in another vcpu is used to handle high frequency
>>> interrupts(like 10G NIC), then it will hurt performance.
>>>> 
>>> I never saw OSes reuse vector used by ioapic, as far as I see this
>> Could you point out which code does this check in Linux kernel? I don't
>> see any special checks when Linux kernel allocates a vector.
>> 
> arch/x86/kernel/apic/io_apic.c:create_irq_nr(). It uses
> apic->target_cpus() to get cpu mask. target_cpus() return mask of all
> online cpus. Actually you wrote arch_pi_create_irq() in PI patches to
> workaround this behaviour and allocated vector per cpu, no?
Yes, when create an new irq, it will allocate vector from all online cpus. But after user changes the irq affinity, then the vector will reallocate with new cpumask. And this will leave the vector available on other cpus. 
 
> Are you aware of any guest that I can run, examine ioapic/apic
> configuration and see that the same vector is used on different vcpus
> for different devices? Can you point me to it?
> 
>>> is not how Linux code works. Furthermore it will not work with KVM
>>> currently since apic eoi redirected to ioapic based on vector alone,
>>> not vector/vcpu pair and as far as I am aware this is how real HW works.
>> yes, real HW works in this way. But why it is helpful in this case?
> It makes it impossible to use the same vector for different devices on
> different cpus if the vector is delivered to at least one cpu through ioapic.
> It may cause spurious interrupts, it will bring havoc to our ack
> notifiers (albeit this is KVM's implementation problem). Also look at
> various comment in arch/x86/kernel/apic/io_apic.c, it looks like ioapics
> tend to misbehave if you look at them funny. Who knows what troubles EOIing
> the same vector twice on real HW may bring.
> 
>> 
>>>>> programmed into ioapic. Which two registers? All accesses to ioapic are
>>>>> trapped and reconfiguration is rare.
>>>> In logical mode, the destination VCPU is depend on each CPU's destination
>>> format register and logical destination register. So we must also trap the two
>>> registers.
>>>> And if it uses lowest priority delivery mode, the PPR need to be trapped too.
>>> Since PPR will change on each interrupt injection, the cost should be
>>> higher than current approach. No need for all of that if bitmask it
>>> global.
>> No, the bitmask is per VCPU. Also, why it will work if bitmask is global?
> Make in global. Why what will work?
> 
> And we need to trap format/logical destination/id registers anyway since
> we need to build kvm->arch.apic_map table that is used to deliver
> interrupts. BTW you can use this table to build per VCPU eoi bitmask
> too, but I am not convinced it is needed in practice.
Even KVM uses a simple way to implement the lowest priority delivery mode, we still need to trap all interrupts that use the lowest priority delivery mode. Because each interrupt will change CPU's priority and we need to recalculate the priority and iterate the whole ioapic entry to renew the eoi exiting bitmap. The cost should be worse than current way. I don't think it worth to do.


Best regards,
Yang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov Dec. 5, 2012, 2 p.m. UTC | #13

On Wed, Dec 05, 2012 at 01:51:36PM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2012-12-05:
> > On Wed, Dec 05, 2012 at 06:02:59AM +0000, Zhang, Yang Z wrote:
> >> Gleb Natapov wrote on 2012-12-05:
> >>> On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote:
> >>>> Gleb Natapov wrote on 2012-12-04:
> >>>>> On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
> >>>>>> Gleb Natapov wrote on 2012-12-03:
> >>>>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
> >>>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> >>>>>>>> manually, which is fully taken care of by the hardware. This needs
> >>>>>>>> some special awareness into existing interrupr injection path:
> >>>>>>>> 
> >>>>>>>> - for pending interrupt, instead of direct injection, we may need
> >>>>>>>>   update architecture specific indicators before resuming to guest. -
> >>>>>>>>   A pending interrupt, which is masked by ISR, should be also
> >>>>>>>>   considered in above update action, since hardware will decide when
> >>>>>>>>   to inject it at right time. Current has_interrupt and get_interrupt
> >>>>>>>>   only returns a valid vector from injection p.o.v.
> >>>>>>> Most of my previous comments still apply.
> >>>>>>> 
> >>>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> >>>>>>>> +		int trig_mode, int always_set)
> >>>>>>>> +{
> >>>>>>>> +	if (kvm_x86_ops->set_eoi_exitmap)
> >>>>>>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
> >>>>>>>> +					trig_mode, always_set);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>>  /*
> >>>>>>>>   * Add a pending IRQ into lapic.
> >>>>>>>>   * Return 1 if successfully added and 0 if discarded.
> >>>>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic
> >>> *apic,
> >>>>> int
> >>>>>>> delivery_mode,
> >>>>>>>>  		if (unlikely(!apic_enabled(apic)))
> >>>>>>>>  			break;
> >>>>>>>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
> >>>>>>> As I said in the last review rebuild the bitmap when ioapic or irq
> >>>>>>> notifier configuration changes, user request bit to notify vcpus to
> >>>>>>> reload the bitmap.
> >>>>>> It is too complicated. When program ioapic entry, we cannot get the
> > target
> >>> vcpu
> >>>>> easily. We need to read destination format register and logical
> >>>>> destination register to find out target vcpu if using logical mode.
> >>>>> Also, we must trap every modification to the two registers to update
> >>>>> eoi bitmap. No need to check target vcpu. Enable exit on all vcpus
> >>>>> for the vector
> >>>> This is wrong. As we known, modern OS uses per VCPU vector. We cannot
> >>> ensure all vectors have same trigger mode. And what's worse, the
> >>> vector in another vcpu is used to handle high frequency
> >>> interrupts(like 10G NIC), then it will hurt performance.
> >>>> 
> >>> I never saw OSes reuse vector used by ioapic, as far as I see this
> >> Could you point out which code does this check in Linux kernel? I don't
> >> see any special checks when Linux kernel allocates a vector.
> >> 
> > arch/x86/kernel/apic/io_apic.c:create_irq_nr(). It uses
> > apic->target_cpus() to get cpu mask. target_cpus() return mask of all
> > online cpus. Actually you wrote arch_pi_create_irq() in PI patches to
> > workaround this behaviour and allocated vector per cpu, no?
> Yes, when create an new irq, it will allocate vector from all online cpus. But after user changes the irq affinity, then the vector will reallocate with new cpumask. And this will leave the vector available on other cpus. 
>  
Since during vector allocation all cpus are checked vector will not be
reused if it is allocated on any cpu.

> > Are you aware of any guest that I can run, examine ioapic/apic
> > configuration and see that the same vector is used on different vcpus
> > for different devices? Can you point me to it?
> > 
Can you answer this?

> >>> is not how Linux code works. Furthermore it will not work with KVM
> >>> currently since apic eoi redirected to ioapic based on vector alone,
> >>> not vector/vcpu pair and as far as I am aware this is how real HW works.
> >> yes, real HW works in this way. But why it is helpful in this case?
> > It makes it impossible to use the same vector for different devices on
> > different cpus if the vector is delivered to at least one cpu through ioapic.
> > It may cause spurious interrupts, it will bring havoc to our ack
> > notifiers (albeit this is KVM's implementation problem). Also look at
> > various comment in arch/x86/kernel/apic/io_apic.c, it looks like ioapics
> > tend to misbehave if you look at them funny. Who knows what troubles EOIing
> > the same vector twice on real HW may bring.
> > 
> >> 
> >>>>> programmed into ioapic. Which two registers? All accesses to ioapic are
> >>>>> trapped and reconfiguration is rare.
> >>>> In logical mode, the destination VCPU is depend on each CPU's destination
> >>> format register and logical destination register. So we must also trap the two
> >>> registers.
> >>>> And if it uses lowest priority delivery mode, the PPR need to be trapped too.
> >>> Since PPR will change on each interrupt injection, the cost should be
> >>> higher than current approach. No need for all of that if bitmask it
> >>> global.
> >> No, the bitmask is per VCPU. Also, why it will work if bitmask is global?
> > Make in global. Why what will work?
> > 
> > And we need to trap format/logical destination/id registers anyway since
> > we need to build kvm->arch.apic_map table that is used to deliver
> > interrupts. BTW you can use this table to build per VCPU eoi bitmask
> > too, but I am not convinced it is needed in practice.
> Even KVM uses a simple way to implement the lowest priority delivery mode, we still need to trap all interrupts that use the lowest priority delivery mode. Because each interrupt will change CPU's priority and we need to recalculate the priority and iterate the whole ioapic entry to renew the eoi exiting bitmap. The cost should be worse than current way. I don't think it worth to do.
> 
Just set the bit on every vcpu that can get interrupt with lowest priority.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zhang, Yang Z Dec. 5, 2012, 2:16 p.m. UTC | #14

Gleb Natapov wrote on 2012-12-05:
> On Wed, Dec 05, 2012 at 03:43:41AM +0000, Zhang, Yang Z wrote:
>>>> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
>>> *vcpu)
>>>>  	}
>>>>  
>>>>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>>>> +		/* update archtecture specific hints for APIC
>>>> +		 * virtual interrupt delivery */
>>>> +		if (kvm_x86_ops->update_irq)
>>>> +			kvm_x86_ops->update_irq(vcpu);
>>>> +
>>>>  		inject_pending_event(vcpu);
>>>>  
>>>>  		/* enable NMI/IRQ window open exits if needed */
>>>>  		if (vcpu->arch.nmi_pending)
>>>>  			kvm_x86_ops->enable_nmi_window(vcpu);
>>>> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>>>> +		else if (kvm_apic_vid_enabled(vcpu)) {
>>>> +			if (kvm_cpu_has_extint(vcpu))
>>>> +				kvm_x86_ops->enable_irq_window(vcpu);
>>> 
>>> If RVI is non-zero, then interrupt window should not be enabled,
>>> accordingly to 29.2.2:
>>> 
>>> "If a virtual interrupt has been recognized (see Section 29.2.1), it will
>>> be delivered at an instruction boundary when the following conditions all
>>> hold: (1) RFLAGS.IF = 1; (2) there is no blocking by STI; (3) there is no
>>> blocking by MOV SS or by POP SS; and (4) the "interrupt-window exiting"
>>> VM-execution control is 0."
>> Right. Must check RVI here.
>> 
> Why? We request interrupt window here because there is ExtINT interrupt
> pending. ExtINT interrupt has a precedence over APIC interrupts (our
> current code is incorrect!), so we want vmexit as soon as interrupts are
>
> allowed to inject ExtINT and we do not want virtual interrupt to be
> delivered. I think the (4) there is exactly for this situation.
One queston: kvm_cpu_has_extint() function check the interrupt from PIC. If PIC is working, APIC must in virtual wire mode. According to spec, when APIC is virtual wire mode, then APIC is totally bypassing. So apic_has_interrupt() and pic_has_interrupt() are mutually exclusive. If kvm_apic_has_interrupt() return true, then kvm_cpu_has_extint will never return true, and vice versa. Am I right? If answer is yes, then we don't check RVI here.
> --
> 			Gleb.


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov Dec. 5, 2012, 2:34 p.m. UTC | #15

On Wed, Dec 05, 2012 at 02:16:52PM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2012-12-05:
> > On Wed, Dec 05, 2012 at 03:43:41AM +0000, Zhang, Yang Z wrote:
> >>>> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
> >>> *vcpu)
> >>>>  	}
> >>>>  
> >>>>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> >>>> +		/* update archtecture specific hints for APIC
> >>>> +		 * virtual interrupt delivery */
> >>>> +		if (kvm_x86_ops->update_irq)
> >>>> +			kvm_x86_ops->update_irq(vcpu);
> >>>> +
> >>>>  		inject_pending_event(vcpu);
> >>>>  
> >>>>  		/* enable NMI/IRQ window open exits if needed */
> >>>>  		if (vcpu->arch.nmi_pending)
> >>>>  			kvm_x86_ops->enable_nmi_window(vcpu);
> >>>> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> >>>> +		else if (kvm_apic_vid_enabled(vcpu)) {
> >>>> +			if (kvm_cpu_has_extint(vcpu))
> >>>> +				kvm_x86_ops->enable_irq_window(vcpu);
> >>> 
> >>> If RVI is non-zero, then interrupt window should not be enabled,
> >>> accordingly to 29.2.2:
> >>> 
> >>> "If a virtual interrupt has been recognized (see Section 29.2.1), it will
> >>> be delivered at an instruction boundary when the following conditions all
> >>> hold: (1) RFLAGS.IF = 1; (2) there is no blocking by STI; (3) there is no
> >>> blocking by MOV SS or by POP SS; and (4) the "interrupt-window exiting"
> >>> VM-execution control is 0."
> >> Right. Must check RVI here.
> >> 
> > Why? We request interrupt window here because there is ExtINT interrupt
> > pending. ExtINT interrupt has a precedence over APIC interrupts (our
> > current code is incorrect!), so we want vmexit as soon as interrupts are
> >
> > allowed to inject ExtINT and we do not want virtual interrupt to be
> > delivered. I think the (4) there is exactly for this situation.
> One queston: kvm_cpu_has_extint() function check the interrupt from PIC. If PIC is working, APIC must in virtual wire mode. According to spec, when APIC is virtual wire mode, then APIC is totally bypassing. So apic_has_interrupt() and pic_has_interrupt() are mutually exclusive. If kvm_apic_has_interrupt() return true, then kvm_cpu_has_extint will never return true, and vice versa. Am I right? If answer is yes, then we don't check RVI here.
According to what spec "when APIC is virtual wire mode, then APIC is
totally bypassed"? SDP volume 3 section 10.8.2 says differently, but it
gives priority to ExtINT interrupts. And we do it other way around
currently, but your apicv patches actually do it correct by injecting
ExtINT if it is asserted without considering apic state.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Marcelo Tosatti Dec. 5, 2012, 10:31 p.m. UTC | #16

On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> manually, which is fully taken care of by the hardware. This needs
> some special awareness into existing interrupr injection path:
> 
> - for pending interrupt, instead of direct injection, we may need
>   update architecture specific indicators before resuming to guest.
> 
> - A pending interrupt, which is masked by ISR, should be also
>   considered in above update action, since hardware will decide
>   when to inject it at right time. Current has_interrupt and
>   get_interrupt only returns a valid vector from injection p.o.v.
> 
> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    4 +
>  arch/x86/include/asm/vmx.h      |   11 +++
>  arch/x86/kvm/irq.c              |   53 ++++++++++-----
>  arch/x86/kvm/lapic.c            |   56 +++++++++++++---
>  arch/x86/kvm/lapic.h            |    6 ++
>  arch/x86/kvm/svm.c              |   19 +++++
>  arch/x86/kvm/vmx.c              |  140 ++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.c              |   34 ++++++++--
>  virt/kvm/ioapic.c               |    1 +
>  9 files changed, 291 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index dc87b65..e5352c8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -697,6 +697,10 @@ struct kvm_x86_ops {
>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
> +	void (*update_irq)(struct kvm_vcpu *vcpu);
> +	void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector,
> +			int trig_mode, int always_set);
>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>  	int (*get_tdp_level)(void);
>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 21101b6..1003341 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -62,6 +62,7 @@
>  #define EXIT_REASON_MCE_DURING_VMENTRY  41
>  #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
>  #define EXIT_REASON_APIC_ACCESS         44
> +#define EXIT_REASON_EOI_INDUCED         45
>  #define EXIT_REASON_EPT_VIOLATION       48
>  #define EXIT_REASON_EPT_MISCONFIG       49
>  #define EXIT_REASON_WBINVD              54
> @@ -143,6 +144,7 @@
>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
>  #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
>  #define SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100
> +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
>  #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
>  #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
>  
> @@ -180,6 +182,7 @@ enum vmcs_field {
>  	GUEST_GS_SELECTOR               = 0x0000080a,
>  	GUEST_LDTR_SELECTOR             = 0x0000080c,
>  	GUEST_TR_SELECTOR               = 0x0000080e,
> +	GUEST_INTR_STATUS               = 0x00000810,
>  	HOST_ES_SELECTOR                = 0x00000c00,
>  	HOST_CS_SELECTOR                = 0x00000c02,
>  	HOST_SS_SELECTOR                = 0x00000c04,
> @@ -207,6 +210,14 @@ enum vmcs_field {
>  	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
>  	EPT_POINTER                     = 0x0000201a,
>  	EPT_POINTER_HIGH                = 0x0000201b,
> +	EOI_EXIT_BITMAP0                = 0x0000201c,
> +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
> +	EOI_EXIT_BITMAP1                = 0x0000201e,
> +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
> +	EOI_EXIT_BITMAP2                = 0x00002020,
> +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
> +	EOI_EXIT_BITMAP3                = 0x00002022,
> +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
>  	VMCS_LINK_POINTER               = 0x00002800,
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> index 7e06ba1..f782788 100644
> --- a/arch/x86/kvm/irq.c
> +++ b/arch/x86/kvm/irq.c
> @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
>   */
>  int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
>  {
> -	struct kvm_pic *s;
> -
>  	if (!irqchip_in_kernel(v->kvm))
>  		return v->arch.interrupt.pending;
>  
> -	if (kvm_apic_has_interrupt(v) == -1) {	/* LAPIC */
> -		if (kvm_apic_accept_pic_intr(v)) {
> -			s = pic_irqchip(v->kvm);	/* PIC */
> -			return s->output;
> -		} else
> -			return 0;
> -	}
> +	if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */
> +		return kvm_cpu_has_extint(v); /* non-APIC */
>  	return 1;
>  }
>  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
>  
>  /*
> + * check if there is pending interrupt from
> + * non-APIC source without intack.
> + */
> +int kvm_cpu_has_extint(struct kvm_vcpu *v)
> +{
> +	struct kvm_pic *s;
> +
> +	if (kvm_apic_accept_pic_intr(v)) {
> +		s = pic_irqchip(v->kvm); /* PIC */
> +		return s->output;
> +	} else
> +		return 0;
> +}
> +
> +/*
>   * Read pending interrupt vector and intack.
>   */
>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
>  {
> -	struct kvm_pic *s;
>  	int vector;
>  
>  	if (!irqchip_in_kernel(v->kvm))
>  		return v->arch.interrupt.nr;
>  
>  	vector = kvm_get_apic_interrupt(v);	/* APIC */
> -	if (vector == -1) {
> -		if (kvm_apic_accept_pic_intr(v)) {
> -			s = pic_irqchip(v->kvm);
> -			s->output = 0;		/* PIC */
> -			vector = kvm_pic_read_irq(v->kvm);
> -		}
> -	}
> +	if (vector == -1)
> +		return kvm_cpu_get_extint(v); /* non-APIC */
>  	return vector;
>  }
>  EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
>  
> +/*
> + * Read pending interrupt(from non-APIC source)
> + * vector and intack.
> + */
> +int kvm_cpu_get_extint(struct kvm_vcpu *v)
> +{
> +	struct kvm_pic *s;
> +	int vector = -1;
> +
> +	if (kvm_apic_accept_pic_intr(v)) {
> +		s = pic_irqchip(v->kvm);
> +		s->output = 0;		/* PIC */
> +		vector = kvm_pic_read_irq(v->kvm);
> +	}
> +	return vector;
> +}
> +
>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
>  {
>  	kvm_inject_apic_timer_irqs(vcpu);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 7c96012..400d3ba 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -643,6 +643,14 @@ out:
>  	return ret;
>  }
>  
> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> +		int trig_mode, int always_set)
> +{
> +	if (kvm_x86_ops->set_eoi_exitmap)
> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
> +					trig_mode, always_set);
> +}
> +
>  /*
>   * Add a pending IRQ into lapic.
>   * Return 1 if successfully added and 0 if discarded.
> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>  		if (unlikely(!apic_enabled(apic)))
>  			break;
>  
> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
>  		if (trig_mode) {
>  			apic_debug("level trig mode for vector %d", vector);
>  			apic_set_vector(vector, apic->regs + APIC_TMR);
> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>  }
>  
> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> +{
> +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> +	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> +		int trigger_mode;
> +		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> +			trigger_mode = IOAPIC_LEVEL_TRIG;
> +		else
> +			trigger_mode = IOAPIC_EDGE_TRIG;
> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> +	}
> +}
> +
>  static int apic_set_eoi(struct kvm_lapic *apic)
>  {
>  	int vector = apic_find_highest_isr(apic);
> @@ -756,19 +778,24 @@ static int apic_set_eoi(struct kvm_lapic *apic)
>  	apic_clear_isr(vector, apic);
>  	apic_update_ppr(apic);
>  
> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> -		int trigger_mode;
> -		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> -			trigger_mode = IOAPIC_LEVEL_TRIG;
> -		else
> -			trigger_mode = IOAPIC_EDGE_TRIG;
> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> -	}
> +	kvm_ioapic_send_eoi(apic, vector);
>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>  	return vector;
>  }
>  
> +/*
> + * this interface assumes a trap-like exit, which has already finished
> + * desired side effect including vISR and vPPR update.
> + */
> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +	kvm_ioapic_send_eoi(apic, vector);
> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> +}
> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
> +
>  static void apic_send_ipi(struct kvm_lapic *apic)
>  {
>  	u32 icr_low = kvm_apic_get_reg(apic, APIC_ICR);
> @@ -1533,6 +1560,17 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
>  	return highest_irr;
>  }
>  
> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +	if (!apic || !apic_enabled(apic))
> +		return -1;
> +
> +	return apic_find_highest_irr(apic);
> +}

irr_pending of apic_find_highest_irr() is meaningless (stale) if
HW is updating VIRR.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Marcelo Tosatti Dec. 5, 2012, 10:38 p.m. UTC | #17

On Wed, Dec 05, 2012 at 01:14:38PM +0200, Gleb Natapov wrote:
> On Wed, Dec 05, 2012 at 03:43:41AM +0000, Zhang, Yang Z wrote:
> > >> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
> > > *vcpu)
> > >>  	}
> > >>  
> > >>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> > >> +		/* update archtecture specific hints for APIC
> > >> +		 * virtual interrupt delivery */
> > >> +		if (kvm_x86_ops->update_irq)
> > >> +			kvm_x86_ops->update_irq(vcpu);
> > >> +
> > >>  		inject_pending_event(vcpu);
> > >>  
> > >>  		/* enable NMI/IRQ window open exits if needed */
> > >>  		if (vcpu->arch.nmi_pending)
> > >>  			kvm_x86_ops->enable_nmi_window(vcpu);
> > >> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> > >> +		else if (kvm_apic_vid_enabled(vcpu)) {
> > >> +			if (kvm_cpu_has_extint(vcpu))
> > >> +				kvm_x86_ops->enable_irq_window(vcpu);
> > > 
> > > If RVI is non-zero, then interrupt window should not be enabled,
> > > accordingly to 29.2.2:
> > > 
> > > "If a virtual interrupt has been recognized (see Section 29.2.1), it will
> > > be delivered at an instruction boundary when the following conditions all
> > > hold: (1) RFLAGS.IF = 1; (2) there is no blocking by STI; (3) there is no
> > > blocking by MOV SS or by POP SS; and (4) the “interrupt-window exiting”
> > > VM-execution control is 0."
> > Right. Must check RVI here.
> > 
> Why? We request interrupt window here because there is ExtINT interrupt
> pending. ExtINT interrupt has a precedence over APIC interrupts (our
> current code is incorrect!), so we want vmexit as soon as interrupts are
> allowed to inject ExtINT and we do not want virtual interrupt to be
> delivered. I think the (4) there is exactly for this situation.
> 
> --
> 			Gleb.

Right. BTW, delivery of ExtINT has no EOI, so there is no evaluation
of pending virtual interrupts. Therefore, shouldnt interrupt window be
enabled when injecting ExtINT so that evaluation of pending virtual
interrupts is performed on next vm-entry?




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zhang, Yang Z Dec. 6, 2012, 2:55 a.m. UTC | #18

Gleb Natapov wrote on 2012-12-05:
> On Wed, Dec 05, 2012 at 01:51:36PM +0000, Zhang, Yang Z wrote:
>> Gleb Natapov wrote on 2012-12-05:
>>> On Wed, Dec 05, 2012 at 06:02:59AM +0000, Zhang, Yang Z wrote:
>>>> Gleb Natapov wrote on 2012-12-05:
>>>>> On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote:
>>>>>> Gleb Natapov wrote on 2012-12-04:
>>>>>>> On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
>>>>>>>> Gleb Natapov wrote on 2012-12-03:
>>>>>>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
>>>>>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
>>>>>>>>>> manually, which is fully taken care of by the hardware. This needs
>>>>>>>>>> some special awareness into existing interrupr injection path:
>>>>>>>>>> 
>>>>>>>>>> - for pending interrupt, instead of direct injection, we may need
>>>>>>>>>>   update architecture specific indicators before resuming to
>>>>>>>>>>   guest. - A pending interrupt, which is masked by ISR, should
>>>>>>>>>>   be also considered in above update action, since hardware
>>>>>>>>>>   will decide when to inject it at right time. Current
>>>>>>>>>>   has_interrupt and get_interrupt only returns a valid vector
>>>>>>>>>>   from injection p.o.v.
>>>>>>>>> Most of my previous comments still apply.
>>>>>>>>> 
>>>>>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>>>>>>>> +		int trig_mode, int always_set)
>>>>>>>>>> +{
>>>>>>>>>> +	if (kvm_x86_ops->set_eoi_exitmap)
>>>>>>>>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
>>>>>>>>>> +					trig_mode, always_set);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>  /*
>>>>>>>>>>   * Add a pending IRQ into lapic.
>>>>>>>>>>   * Return 1 if successfully added and 0 if discarded.
>>>>>>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic
>>>>> *apic,
>>>>>>> int
>>>>>>>>> delivery_mode,
>>>>>>>>>>  		if (unlikely(!apic_enabled(apic)))
>>>>>>>>>>  			break;
>>>>>>>>>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
>>>>>>>>> As I said in the last review rebuild the bitmap when ioapic or irq
>>>>>>>>> notifier configuration changes, user request bit to notify vcpus to
>>>>>>>>> reload the bitmap.
>>>>>>>> It is too complicated. When program ioapic entry, we cannot get the
>>> target
>>>>> vcpu
>>>>>>> easily. We need to read destination format register and logical
>>>>>>> destination register to find out target vcpu if using logical mode.
>>>>>>> Also, we must trap every modification to the two registers to update
>>>>>>> eoi bitmap. No need to check target vcpu. Enable exit on all vcpus
>>>>>>> for the vector
>>>>>> This is wrong. As we known, modern OS uses per VCPU vector. We cannot
>>>>> ensure all vectors have same trigger mode. And what's worse, the
>>>>> vector in another vcpu is used to handle high frequency
>>>>> interrupts(like 10G NIC), then it will hurt performance.
>>>>>> 
>>>>> I never saw OSes reuse vector used by ioapic, as far as I see this
>>>> Could you point out which code does this check in Linux kernel? I don't
>>>> see any special checks when Linux kernel allocates a vector.
>>>> 
>>> arch/x86/kernel/apic/io_apic.c:create_irq_nr(). It uses
>>> apic->target_cpus() to get cpu mask. target_cpus() return mask of all
>>> online cpus. Actually you wrote arch_pi_create_irq() in PI patches to
>>> workaround this behaviour and allocated vector per cpu, no?
>> Yes, when create an new irq, it will allocate vector from all online cpus. But after
> user changes the irq affinity, then the vector will reallocate with new cpumask.
> And this will leave the vector available on other cpus.
>> 
> Since during vector allocation all cpus are checked vector will not be
> reused if it is allocated on any cpu.
Sorry, I still cannot find this check in kernel. Can you point me out to it?
Also, I do see the vector reused by MSI and IOAPIC in my system.
 
>>> Are you aware of any guest that I can run, examine ioapic/apic
>>> configuration and see that the same vector is used on different vcpus
>>> for different devices? Can you point me to it?
>>> 
> Can you answer this?
I am sure Xen will reused the IOAPIC vector.

>>>>> is not how Linux code works. Furthermore it will not work with KVM
>>>>> currently since apic eoi redirected to ioapic based on vector alone,
>>>>> not vector/vcpu pair and as far as I am aware this is how real HW works.
>>>> yes, real HW works in this way. But why it is helpful in this case?
>>> It makes it impossible to use the same vector for different devices on
>>> different cpus if the vector is delivered to at least one cpu through
>>> ioapic. It may cause spurious interrupts, it will bring havoc to our
>>> ack notifiers (albeit this is KVM's implementation problem). Also look
>>> at various comment in arch/x86/kernel/apic/io_apic.c, it looks like
>>> ioapics tend to misbehave if you look at them funny. Who knows what
>>> troubles EOIing the same vector twice on real HW may bring.
>>> 
>>>> 
>>>>>>> programmed into ioapic. Which two registers? All accesses to ioapic are
>>>>>>> trapped and reconfiguration is rare.
>>>>>> In logical mode, the destination VCPU is depend on each CPU's
> destination
>>>>> format register and logical destination register. So we must also
>>>>> trap the two registers.
>>>>>> And if it uses lowest priority delivery mode, the PPR need to be trapped
> too.
>>>>> Since PPR will change on each interrupt injection, the cost should be
>>>>> higher than current approach. No need for all of that if bitmask it
>>>>> global.
>>>> No, the bitmask is per VCPU. Also, why it will work if bitmask is global?
>>> Make in global. Why what will work?
>>> 
>>> And we need to trap format/logical destination/id registers anyway since
>>> we need to build kvm->arch.apic_map table that is used to deliver
>>> interrupts. BTW you can use this table to build per VCPU eoi bitmask
>>> too, but I am not convinced it is needed in practice.
>> Even KVM uses a simple way to implement the lowest priority delivery mode,
> we still need to trap all interrupts that use the lowest priority delivery mode.
> Because each interrupt will change CPU's priority and we need to recalculate the
> priority and iterate the whole ioapic entry to renew the eoi exiting bitmap. The
> cost should be worse than current way. I don't think it worth to do.
>> 
> Just set the bit on every vcpu that can get interrupt with lowest priority.
> 
> --
> 			Gleb.


Best regards,
Yang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zhang, Yang Z Dec. 6, 2012, 2:58 a.m. UTC | #19

Gleb Natapov wrote on 2012-12-05:
> On Wed, Dec 05, 2012 at 02:16:52PM +0000, Zhang, Yang Z wrote:
>> Gleb Natapov wrote on 2012-12-05:
>>> On Wed, Dec 05, 2012 at 03:43:41AM +0000, Zhang, Yang Z wrote:
>>>>>> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
>>>>> *vcpu)
>>>>>>  	}
>>>>>>  
>>>>>>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>>>>>> +		/* update archtecture specific hints for APIC
>>>>>> +		 * virtual interrupt delivery */
>>>>>> +		if (kvm_x86_ops->update_irq)
>>>>>> +			kvm_x86_ops->update_irq(vcpu);
>>>>>> +
>>>>>>  		inject_pending_event(vcpu);
>>>>>>  
>>>>>>  		/* enable NMI/IRQ window open exits if needed */
>>>>>>  		if (vcpu->arch.nmi_pending)
>>>>>>  			kvm_x86_ops->enable_nmi_window(vcpu);
>>>>>> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>>>>>> +		else if (kvm_apic_vid_enabled(vcpu)) {
>>>>>> +			if (kvm_cpu_has_extint(vcpu))
>>>>>> +				kvm_x86_ops->enable_irq_window(vcpu);
>>>>> 
>>>>> If RVI is non-zero, then interrupt window should not be enabled,
>>>>> accordingly to 29.2.2:
>>>>> 
>>>>> "If a virtual interrupt has been recognized (see Section 29.2.1), it will
>>>>> be delivered at an instruction boundary when the following conditions all
>>>>> hold: (1) RFLAGS.IF = 1; (2) there is no blocking by STI; (3) there is no
>>>>> blocking by MOV SS or by POP SS; and (4) the "interrupt-window exiting"
>>>>> VM-execution control is 0."
>>>> Right. Must check RVI here.
>>>> 
>>> Why? We request interrupt window here because there is ExtINT interrupt
>>> pending. ExtINT interrupt has a precedence over APIC interrupts (our
>>> current code is incorrect!), so we want vmexit as soon as interrupts are
>>> 
>>> allowed to inject ExtINT and we do not want virtual interrupt to be
>>> delivered. I think the (4) there is exactly for this situation.
>> One queston: kvm_cpu_has_extint() function check the interrupt from PIC. If
> PIC is working, APIC must in virtual wire mode. According to spec, when APIC is
> virtual wire mode, then APIC is totally bypassing. So apic_has_interrupt() and
> pic_has_interrupt() are mutually exclusive. If kvm_apic_has_interrupt() return
> true, then kvm_cpu_has_extint will never return true, and vice versa. Am I right?
> If answer is yes, then we don't check RVI here.
> According to what spec "when APIC is virtual wire mode, then APIC is
> totally bypassed"? SDP volume 3 section 10.8.2 says differently, but it
> gives priority to ExtINT interrupts. And we do it other way around
> currently, but your apicv patches actually do it correct by injecting
> ExtINT if it is asserted without considering apic state.
Thanks for correct me. I misunderstood the spec. :(
 
> --
> 			Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zhang, Yang Z Dec. 6, 2012, 3:31 a.m. UTC | #20

Marcelo Tosatti wrote on 2012-12-06:
> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
>> manually, which is fully taken care of by the hardware. This needs
>> some special awareness into existing interrupr injection path:
>> 
>> - for pending interrupt, instead of direct injection, we may need
>>   update architecture specific indicators before resuming to guest.
>> - A pending interrupt, which is masked by ISR, should be also
>>   considered in above update action, since hardware will decide
>>   when to inject it at right time. Current has_interrupt and
>>   get_interrupt only returns a valid vector from injection p.o.v.
>> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
>> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |    4 + arch/x86/include/asm/vmx.h   
>>    |   11 +++ arch/x86/kvm/irq.c              |   53 ++++++++++-----
>>  arch/x86/kvm/lapic.c            |   56 +++++++++++++---
>>  arch/x86/kvm/lapic.h            |    6 ++ arch/x86/kvm/svm.c          
>>     |   19 +++++ arch/x86/kvm/vmx.c              |  140
>>  ++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c            
>>   |   34 ++++++++-- virt/kvm/ioapic.c               |    1 + 9 files
>>  changed, 291 insertions(+), 33 deletions(-)
>> diff --git a/arch/x86/include/asm/kvm_host.h
>> b/arch/x86/include/asm/kvm_host.h index dc87b65..e5352c8 100644 ---
>> a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -697,6 +697,10 @@ struct kvm_x86_ops {
>>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
>> +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
>> +	void (*update_irq)(struct kvm_vcpu *vcpu);
>> +	void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector,
>> +			int trig_mode, int always_set);
>>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>>  	int (*get_tdp_level)(void);
>>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
>> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
>> index 21101b6..1003341 100644
>> --- a/arch/x86/include/asm/vmx.h
>> +++ b/arch/x86/include/asm/vmx.h
>> @@ -62,6 +62,7 @@
>>  #define EXIT_REASON_MCE_DURING_VMENTRY  41 #define
>>  EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define EXIT_REASON_APIC_ACCESS    
>>      44 +#define EXIT_REASON_EOI_INDUCED         45 #define
>>  EXIT_REASON_EPT_VIOLATION       48 #define EXIT_REASON_EPT_MISCONFIG  
>>      49 #define EXIT_REASON_WBINVD              54 @@ -143,6 +144,7 @@
>>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040 #define
>>  SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080 #define
>>  SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100 +#define
>>  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200 #define
>>  SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400 #define
>>  SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
>> @@ -180,6 +182,7 @@ enum vmcs_field {
>>  	GUEST_GS_SELECTOR               = 0x0000080a, 	GUEST_LDTR_SELECTOR   
>>           = 0x0000080c, 	GUEST_TR_SELECTOR               = 0x0000080e,
>>  +	GUEST_INTR_STATUS               = 0x00000810, 	HOST_ES_SELECTOR     
>>            = 0x00000c00, 	HOST_CS_SELECTOR                = 0x00000c02,
>>  	HOST_SS_SELECTOR                = 0x00000c04, @@ -207,6 +210,14 @@
>>  enum vmcs_field { 	APIC_ACCESS_ADDR_HIGH		= 0x00002015, 	EPT_POINTER  
>>                    = 0x0000201a, 	EPT_POINTER_HIGH                =
>>  0x0000201b,
>> +	EOI_EXIT_BITMAP0                = 0x0000201c,
>> +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
>> +	EOI_EXIT_BITMAP1                = 0x0000201e,
>> +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
>> +	EOI_EXIT_BITMAP2                = 0x00002020,
>> +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
>> +	EOI_EXIT_BITMAP3                = 0x00002022,
>> +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
>>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
>>  	VMCS_LINK_POINTER               = 0x00002800,
>> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
>> index 7e06ba1..f782788 100644
>> --- a/arch/x86/kvm/irq.c
>> +++ b/arch/x86/kvm/irq.c
>> @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
>>   */
>>  int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
>>  {
>> -	struct kvm_pic *s;
>> -
>>  	if (!irqchip_in_kernel(v->kvm))
>>  		return v->arch.interrupt.pending;
>> -	if (kvm_apic_has_interrupt(v) == -1) {	/* LAPIC */
>> -		if (kvm_apic_accept_pic_intr(v)) {
>> -			s = pic_irqchip(v->kvm);	/* PIC */
>> -			return s->output;
>> -		} else
>> -			return 0;
>> -	}
>> +	if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */
>> +		return kvm_cpu_has_extint(v); /* non-APIC */
>>  	return 1;
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
>>  
>>  /*
>> + * check if there is pending interrupt from
>> + * non-APIC source without intack.
>> + */
>> +int kvm_cpu_has_extint(struct kvm_vcpu *v)
>> +{
>> +	struct kvm_pic *s;
>> +
>> +	if (kvm_apic_accept_pic_intr(v)) {
>> +		s = pic_irqchip(v->kvm); /* PIC */
>> +		return s->output;
>> +	} else
>> +		return 0;
>> +}
>> +
>> +/*
>>   * Read pending interrupt vector and intack.
>>   */
>>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v) { -	struct kvm_pic *s;
>>  	int vector;
>>  
>>  	if (!irqchip_in_kernel(v->kvm))
>>  		return v->arch.interrupt.nr;
>>  
>>  	vector = kvm_get_apic_interrupt(v);	/* APIC */
>> -	if (vector == -1) {
>> -		if (kvm_apic_accept_pic_intr(v)) {
>> -			s = pic_irqchip(v->kvm);
>> -			s->output = 0;		/* PIC */
>> -			vector = kvm_pic_read_irq(v->kvm);
>> -		}
>> -	}
>> +	if (vector == -1)
>> +		return kvm_cpu_get_extint(v); /* non-APIC */
>>  	return vector;
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
>> +/*
>> + * Read pending interrupt(from non-APIC source)
>> + * vector and intack.
>> + */
>> +int kvm_cpu_get_extint(struct kvm_vcpu *v)
>> +{
>> +	struct kvm_pic *s;
>> +	int vector = -1;
>> +
>> +	if (kvm_apic_accept_pic_intr(v)) {
>> +		s = pic_irqchip(v->kvm);
>> +		s->output = 0;		/* PIC */
>> +		vector = kvm_pic_read_irq(v->kvm);
>> +	}
>> +	return vector;
>> +}
>> +
>>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
>>  {
>>  	kvm_inject_apic_timer_irqs(vcpu);
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 7c96012..400d3ba 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -643,6 +643,14 @@ out:
>>  	return ret;
>>  }
>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>> +		int trig_mode, int always_set)
>> +{
>> +	if (kvm_x86_ops->set_eoi_exitmap)
>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
>> +					trig_mode, always_set);
>> +}
>> +
>>  /*
>>   * Add a pending IRQ into lapic.
>>   * Return 1 if successfully added and 0 if discarded.
>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int
> delivery_mode,
>>  		if (unlikely(!apic_enabled(apic)))
>>  			break;
>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
>>  		if (trig_mode) {
>>  			apic_debug("level trig mode for vector %d", vector);
>>  			apic_set_vector(vector, apic->regs + APIC_TMR);
>> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1,
> struct kvm_vcpu *vcpu2)
>>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>>  }
>> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>> +{
>> +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
>> +	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
>> +		int trigger_mode;
>> +		if (apic_test_vector(vector, apic->regs + APIC_TMR))
>> +			trigger_mode = IOAPIC_LEVEL_TRIG;
>> +		else
>> +			trigger_mode = IOAPIC_EDGE_TRIG;
>> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
>> +	}
>> +}
>> +
>>  static int apic_set_eoi(struct kvm_lapic *apic) { 	int vector =
>>  apic_find_highest_isr(apic); @@ -756,19 +778,24 @@ static int
>>  apic_set_eoi(struct kvm_lapic *apic) 	apic_clear_isr(vector, apic);
>>  	apic_update_ppr(apic);
>> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
>> -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
>> -		int trigger_mode;
>> -		if (apic_test_vector(vector, apic->regs + APIC_TMR))
>> -			trigger_mode = IOAPIC_LEVEL_TRIG;
>> -		else
>> -			trigger_mode = IOAPIC_EDGE_TRIG;
>> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
>> -	}
>> +	kvm_ioapic_send_eoi(apic, vector);
>>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>  	return vector;
>>  }
>> +/*
>> + * this interface assumes a trap-like exit, which has already finished
>> + * desired side effect including vISR and vPPR update.
>> + */
>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
>> +{
>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>> +
>> +	kvm_ioapic_send_eoi(apic, vector);
>> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
>> +
>>  static void apic_send_ipi(struct kvm_lapic *apic) { 	u32 icr_low =
>>  kvm_apic_get_reg(apic, APIC_ICR); @@ -1533,6 +1560,17 @@ int
>>  kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) 	return highest_irr; }
>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
>> +{
>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>> +
>> +	if (!apic || !apic_enabled(apic))
>> +		return -1;
>> +
>> +	return apic_find_highest_irr(apic);
>> +}
> 
> irr_pending of apic_find_highest_irr() is meaningless (stale) if
> HW is updating VIRR.
I don't think target vcpu is running when we call this function. So it is safe to check irr_pending and read the irr.

> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zhang, Yang Z Dec. 6, 2012, 5:02 a.m. UTC | #21

Zhang, Yang Z wrote on 2012-12-06:
> Marcelo Tosatti wrote on 2012-12-06:
>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
>>> manually, which is fully taken care of by the hardware. This needs
>>> some special awareness into existing interrupr injection path:
>>> 
>>> - for pending interrupt, instead of direct injection, we may need
>>>   update architecture specific indicators before resuming to guest. -
>>>   A pending interrupt, which is masked by ISR, should be also
>>>   considered in above update action, since hardware will decide when
>>>   to inject it at right time. Current has_interrupt and get_interrupt
>>>   only returns a valid vector from injection p.o.v.
>>> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
>>> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
>>> ---
>>>  arch/x86/include/asm/kvm_host.h |    4 + arch/x86/include/asm/vmx.h
>>>    |   11 +++ arch/x86/kvm/irq.c              |   53 ++++++++++-----
>>>  arch/x86/kvm/lapic.c            |   56 +++++++++++++---
>>>  arch/x86/kvm/lapic.h            |    6 ++ arch/x86/kvm/svm.c
>>>     |   19 +++++ arch/x86/kvm/vmx.c              |  140
>>>  ++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c
>>>   |   34 ++++++++-- virt/kvm/ioapic.c               |    1 + 9 files
>>>  changed, 291 insertions(+), 33 deletions(-)
>>> diff --git a/arch/x86/include/asm/kvm_host.h
>>> b/arch/x86/include/asm/kvm_host.h index dc87b65..e5352c8 100644 ---
>>> a/arch/x86/include/asm/kvm_host.h +++
>>> b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,10 @@ struct
>>> kvm_x86_ops {
>>>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>>>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>>>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
>>> +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
>>> +	void (*update_irq)(struct kvm_vcpu *vcpu);
>>> +	void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector,
>>> +			int trig_mode, int always_set);
>>>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>>>  	int (*get_tdp_level)(void);
>>>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool
> is_mmio);
>>> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
>>> index 21101b6..1003341 100644
>>> --- a/arch/x86/include/asm/vmx.h
>>> +++ b/arch/x86/include/asm/vmx.h
>>> @@ -62,6 +62,7 @@
>>>  #define EXIT_REASON_MCE_DURING_VMENTRY  41 #define
>>>  EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define
> EXIT_REASON_APIC_ACCESS
>>>      44 +#define EXIT_REASON_EOI_INDUCED         45 #define
>>>  EXIT_REASON_EPT_VIOLATION       48 #define
> EXIT_REASON_EPT_MISCONFIG
>>>      49 #define EXIT_REASON_WBINVD              54 @@ -143,6
> +144,7 @@
>>>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040 #define
>>>  SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080 #define
>>>  SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100 +#define
>>>  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200 #define
>>>  SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400 #define
>>>  SECONDARY_EXEC_ENABLE_INVPCID		0x00001000 @@ -180,6 +182,7 @@ enum
>>>  vmcs_field { 	GUEST_GS_SELECTOR               = 0x0000080a,
> 	GUEST_LDTR_SELECTOR
>>>           = 0x0000080c, 	GUEST_TR_SELECTOR               =
> 0x0000080e,
>>>  +	GUEST_INTR_STATUS               = 0x00000810,
> 	HOST_ES_SELECTOR
>>>            = 0x00000c00, 	HOST_CS_SELECTOR                =
> 0x00000c02,
>>>  	HOST_SS_SELECTOR                = 0x00000c04, @@ -207,6 +210,14 @@
>>>  enum vmcs_field { 	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
> 	EPT_POINTER
>>>                    = 0x0000201a, 	EPT_POINTER_HIGH
> =
>>>  0x0000201b,
>>> +	EOI_EXIT_BITMAP0                = 0x0000201c,
>>> +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
>>> +	EOI_EXIT_BITMAP1                = 0x0000201e,
>>> +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
>>> +	EOI_EXIT_BITMAP2                = 0x00002020,
>>> +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
>>> +	EOI_EXIT_BITMAP3                = 0x00002022,
>>> +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
>>>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>>>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
>>>  	VMCS_LINK_POINTER               = 0x00002800,
>>> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
>>> index 7e06ba1..f782788 100644
>>> --- a/arch/x86/kvm/irq.c
>>> +++ b/arch/x86/kvm/irq.c
>>> @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
>>>   */
>>>  int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
>>>  {
>>> -	struct kvm_pic *s;
>>> -
>>>  	if (!irqchip_in_kernel(v->kvm))
>>>  		return v->arch.interrupt.pending;
>>> -	if (kvm_apic_has_interrupt(v) == -1) {	/* LAPIC */
>>> -		if (kvm_apic_accept_pic_intr(v)) {
>>> -			s = pic_irqchip(v->kvm);	/* PIC */
>>> -			return s->output;
>>> -		} else
>>> -			return 0;
>>> -	}
>>> +	if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */
>>> +		return kvm_cpu_has_extint(v); /* non-APIC */
>>>  	return 1;
>>>  }
>>>  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
>>>  
>>>  /*
>>> + * check if there is pending interrupt from
>>> + * non-APIC source without intack.
>>> + */
>>> +int kvm_cpu_has_extint(struct kvm_vcpu *v)
>>> +{
>>> +	struct kvm_pic *s;
>>> +
>>> +	if (kvm_apic_accept_pic_intr(v)) {
>>> +		s = pic_irqchip(v->kvm); /* PIC */
>>> +		return s->output;
>>> +	} else
>>> +		return 0;
>>> +}
>>> +
>>> +/*
>>>   * Read pending interrupt vector and intack.
>>>   */
>>>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v) { -	struct kvm_pic *s;
>>>  	int vector;
>>>  
>>>  	if (!irqchip_in_kernel(v->kvm))
>>>  		return v->arch.interrupt.nr;
>>>  
>>>  	vector = kvm_get_apic_interrupt(v);	/* APIC */
>>> -	if (vector == -1) {
>>> -		if (kvm_apic_accept_pic_intr(v)) {
>>> -			s = pic_irqchip(v->kvm);
>>> -			s->output = 0;		/* PIC */
>>> -			vector = kvm_pic_read_irq(v->kvm);
>>> -		}
>>> -	}
>>> +	if (vector == -1)
>>> +		return kvm_cpu_get_extint(v); /* non-APIC */
>>>  	return vector;
>>>  }
>>>  EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
>>> +/*
>>> + * Read pending interrupt(from non-APIC source)
>>> + * vector and intack.
>>> + */
>>> +int kvm_cpu_get_extint(struct kvm_vcpu *v)
>>> +{
>>> +	struct kvm_pic *s;
>>> +	int vector = -1;
>>> +
>>> +	if (kvm_apic_accept_pic_intr(v)) {
>>> +		s = pic_irqchip(v->kvm);
>>> +		s->output = 0;		/* PIC */
>>> +		vector = kvm_pic_read_irq(v->kvm);
>>> +	}
>>> +	return vector;
>>> +}
>>> +
>>>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
>>>  {
>>>  	kvm_inject_apic_timer_irqs(vcpu);
>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> index 7c96012..400d3ba 100644
>>> --- a/arch/x86/kvm/lapic.c
>>> +++ b/arch/x86/kvm/lapic.c
>>> @@ -643,6 +643,14 @@ out:
>>>  	return ret;
>>>  }
>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>> +		int trig_mode, int always_set)
>>> +{
>>> +	if (kvm_x86_ops->set_eoi_exitmap)
>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
>>> +					trig_mode, always_set);
>>> +}
>>> +
>>>  /*
>>>   * Add a pending IRQ into lapic.
>>>   * Return 1 if successfully added and 0 if discarded.
>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int
>> delivery_mode,
>>>  		if (unlikely(!apic_enabled(apic))) 			break;
>>>  +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0); 		if (trig_mode)
>>>  { 			apic_debug("level trig mode for vector %d", vector);
>>>  			apic_set_vector(vector, apic->regs + APIC_TMR);
>>> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1,
>> struct kvm_vcpu *vcpu2)
>>>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>>>  }
>>> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>>> +{
>>> +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
>>> +	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
>>> +		int trigger_mode;
>>> +		if (apic_test_vector(vector, apic->regs + APIC_TMR))
>>> +			trigger_mode = IOAPIC_LEVEL_TRIG;
>>> +		else
>>> +			trigger_mode = IOAPIC_EDGE_TRIG;
>>> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
>>> +	}
>>> +}
>>> +
>>>  static int apic_set_eoi(struct kvm_lapic *apic) { 	int vector =
>>>  apic_find_highest_isr(apic); @@ -756,19 +778,24 @@ static int
>>>  apic_set_eoi(struct kvm_lapic *apic) 	apic_clear_isr(vector, apic);
>>>  	apic_update_ppr(apic);
>>> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
>>> -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
>>> -		int trigger_mode;
>>> -		if (apic_test_vector(vector, apic->regs + APIC_TMR))
>>> -			trigger_mode = IOAPIC_LEVEL_TRIG;
>>> -		else
>>> -			trigger_mode = IOAPIC_EDGE_TRIG;
>>> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
>>> -	}
>>> +	kvm_ioapic_send_eoi(apic, vector);
>>>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>>  	return vector;
>>>  }
>>> +/*
>>> + * this interface assumes a trap-like exit, which has already finished
>>> + * desired side effect including vISR and vPPR update.
>>> + */
>>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
>>> +{
>>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>>> +
>>> +	kvm_ioapic_send_eoi(apic, vector);
>>> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>> +}
>>> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
>>> +
>>>  static void apic_send_ipi(struct kvm_lapic *apic) { 	u32 icr_low =
>>>  kvm_apic_get_reg(apic, APIC_ICR); @@ -1533,6 +1560,17 @@ int
>>>  kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) 	return highest_irr; }
>>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
>>> +{
>>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>>> +
>>> +	if (!apic || !apic_enabled(apic))
>>> +		return -1;
>>> +
>>> +	return apic_find_highest_irr(apic);
>>> +}
>> 
>> irr_pending of apic_find_highest_irr() is meaningless (stale) if
>> HW is updating VIRR.
> I don't think target vcpu is running when we call this function. So it is safe to
> check irr_pending and read the irr.
One problem is that when HW clears VIRR, after vmexit, the irr_peding will still true if there are more than one bit is set in VIRR in last vmentry.
How about to recaculate irr_pending according the VIRR on each vmexit?

Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov Dec. 6, 2012, 6:36 a.m. UTC | #22

On Thu, Dec 06, 2012 at 05:02:15AM +0000, Zhang, Yang Z wrote:
> Zhang, Yang Z wrote on 2012-12-06:
> > Marcelo Tosatti wrote on 2012-12-06:
> >> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
> >>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> >>> manually, which is fully taken care of by the hardware. This needs
> >>> some special awareness into existing interrupr injection path:
> >>> 
> >>> - for pending interrupt, instead of direct injection, we may need
> >>>   update architecture specific indicators before resuming to guest. -
> >>>   A pending interrupt, which is masked by ISR, should be also
> >>>   considered in above update action, since hardware will decide when
> >>>   to inject it at right time. Current has_interrupt and get_interrupt
> >>>   only returns a valid vector from injection p.o.v.
> >>> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
> >>> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> >>> ---
> >>>  arch/x86/include/asm/kvm_host.h |    4 + arch/x86/include/asm/vmx.h
> >>>    |   11 +++ arch/x86/kvm/irq.c              |   53 ++++++++++-----
> >>>  arch/x86/kvm/lapic.c            |   56 +++++++++++++---
> >>>  arch/x86/kvm/lapic.h            |    6 ++ arch/x86/kvm/svm.c
> >>>     |   19 +++++ arch/x86/kvm/vmx.c              |  140
> >>>  ++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c
> >>>   |   34 ++++++++-- virt/kvm/ioapic.c               |    1 + 9 files
> >>>  changed, 291 insertions(+), 33 deletions(-)
> >>> diff --git a/arch/x86/include/asm/kvm_host.h
> >>> b/arch/x86/include/asm/kvm_host.h index dc87b65..e5352c8 100644 ---
> >>> a/arch/x86/include/asm/kvm_host.h +++
> >>> b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,10 @@ struct
> >>> kvm_x86_ops {
> >>>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
> >>>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
> >>>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> >>> +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
> >>> +	void (*update_irq)(struct kvm_vcpu *vcpu);
> >>> +	void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector,
> >>> +			int trig_mode, int always_set);
> >>>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
> >>>  	int (*get_tdp_level)(void);
> >>>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool
> > is_mmio);
> >>> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> >>> index 21101b6..1003341 100644
> >>> --- a/arch/x86/include/asm/vmx.h
> >>> +++ b/arch/x86/include/asm/vmx.h
> >>> @@ -62,6 +62,7 @@
> >>>  #define EXIT_REASON_MCE_DURING_VMENTRY  41 #define
> >>>  EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define
> > EXIT_REASON_APIC_ACCESS
> >>>      44 +#define EXIT_REASON_EOI_INDUCED         45 #define
> >>>  EXIT_REASON_EPT_VIOLATION       48 #define
> > EXIT_REASON_EPT_MISCONFIG
> >>>      49 #define EXIT_REASON_WBINVD              54 @@ -143,6
> > +144,7 @@
> >>>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040 #define
> >>>  SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080 #define
> >>>  SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100 +#define
> >>>  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200 #define
> >>>  SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400 #define
> >>>  SECONDARY_EXEC_ENABLE_INVPCID		0x00001000 @@ -180,6 +182,7 @@ enum
> >>>  vmcs_field { 	GUEST_GS_SELECTOR               = 0x0000080a,
> > 	GUEST_LDTR_SELECTOR
> >>>           = 0x0000080c, 	GUEST_TR_SELECTOR               =
> > 0x0000080e,
> >>>  +	GUEST_INTR_STATUS               = 0x00000810,
> > 	HOST_ES_SELECTOR
> >>>            = 0x00000c00, 	HOST_CS_SELECTOR                =
> > 0x00000c02,
> >>>  	HOST_SS_SELECTOR                = 0x00000c04, @@ -207,6 +210,14 @@
> >>>  enum vmcs_field { 	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
> > 	EPT_POINTER
> >>>                    = 0x0000201a, 	EPT_POINTER_HIGH
> > =
> >>>  0x0000201b,
> >>> +	EOI_EXIT_BITMAP0                = 0x0000201c,
> >>> +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
> >>> +	EOI_EXIT_BITMAP1                = 0x0000201e,
> >>> +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
> >>> +	EOI_EXIT_BITMAP2                = 0x00002020,
> >>> +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
> >>> +	EOI_EXIT_BITMAP3                = 0x00002022,
> >>> +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
> >>>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
> >>>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
> >>>  	VMCS_LINK_POINTER               = 0x00002800,
> >>> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> >>> index 7e06ba1..f782788 100644
> >>> --- a/arch/x86/kvm/irq.c
> >>> +++ b/arch/x86/kvm/irq.c
> >>> @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
> >>>   */
> >>>  int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
> >>>  {
> >>> -	struct kvm_pic *s;
> >>> -
> >>>  	if (!irqchip_in_kernel(v->kvm))
> >>>  		return v->arch.interrupt.pending;
> >>> -	if (kvm_apic_has_interrupt(v) == -1) {	/* LAPIC */
> >>> -		if (kvm_apic_accept_pic_intr(v)) {
> >>> -			s = pic_irqchip(v->kvm);	/* PIC */
> >>> -			return s->output;
> >>> -		} else
> >>> -			return 0;
> >>> -	}
> >>> +	if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */
> >>> +		return kvm_cpu_has_extint(v); /* non-APIC */
> >>>  	return 1;
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
> >>>  
> >>>  /*
> >>> + * check if there is pending interrupt from
> >>> + * non-APIC source without intack.
> >>> + */
> >>> +int kvm_cpu_has_extint(struct kvm_vcpu *v)
> >>> +{
> >>> +	struct kvm_pic *s;
> >>> +
> >>> +	if (kvm_apic_accept_pic_intr(v)) {
> >>> +		s = pic_irqchip(v->kvm); /* PIC */
> >>> +		return s->output;
> >>> +	} else
> >>> +		return 0;
> >>> +}
> >>> +
> >>> +/*
> >>>   * Read pending interrupt vector and intack.
> >>>   */
> >>>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v) { -	struct kvm_pic *s;
> >>>  	int vector;
> >>>  
> >>>  	if (!irqchip_in_kernel(v->kvm))
> >>>  		return v->arch.interrupt.nr;
> >>>  
> >>>  	vector = kvm_get_apic_interrupt(v);	/* APIC */
> >>> -	if (vector == -1) {
> >>> -		if (kvm_apic_accept_pic_intr(v)) {
> >>> -			s = pic_irqchip(v->kvm);
> >>> -			s->output = 0;		/* PIC */
> >>> -			vector = kvm_pic_read_irq(v->kvm);
> >>> -		}
> >>> -	}
> >>> +	if (vector == -1)
> >>> +		return kvm_cpu_get_extint(v); /* non-APIC */
> >>>  	return vector;
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
> >>> +/*
> >>> + * Read pending interrupt(from non-APIC source)
> >>> + * vector and intack.
> >>> + */
> >>> +int kvm_cpu_get_extint(struct kvm_vcpu *v)
> >>> +{
> >>> +	struct kvm_pic *s;
> >>> +	int vector = -1;
> >>> +
> >>> +	if (kvm_apic_accept_pic_intr(v)) {
> >>> +		s = pic_irqchip(v->kvm);
> >>> +		s->output = 0;		/* PIC */
> >>> +		vector = kvm_pic_read_irq(v->kvm);
> >>> +	}
> >>> +	return vector;
> >>> +}
> >>> +
> >>>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
> >>>  {
> >>>  	kvm_inject_apic_timer_irqs(vcpu);
> >>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >>> index 7c96012..400d3ba 100644
> >>> --- a/arch/x86/kvm/lapic.c
> >>> +++ b/arch/x86/kvm/lapic.c
> >>> @@ -643,6 +643,14 @@ out:
> >>>  	return ret;
> >>>  }
> >>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> >>> +		int trig_mode, int always_set)
> >>> +{
> >>> +	if (kvm_x86_ops->set_eoi_exitmap)
> >>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
> >>> +					trig_mode, always_set);
> >>> +}
> >>> +
> >>>  /*
> >>>   * Add a pending IRQ into lapic.
> >>>   * Return 1 if successfully added and 0 if discarded.
> >>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int
> >> delivery_mode,
> >>>  		if (unlikely(!apic_enabled(apic))) 			break;
> >>>  +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0); 		if (trig_mode)
> >>>  { 			apic_debug("level trig mode for vector %d", vector);
> >>>  			apic_set_vector(vector, apic->regs + APIC_TMR);
> >>> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1,
> >> struct kvm_vcpu *vcpu2)
> >>>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
> >>>  }
> >>> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> >>> +{
> >>> +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> >>> +	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> >>> +		int trigger_mode;
> >>> +		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> >>> +			trigger_mode = IOAPIC_LEVEL_TRIG;
> >>> +		else
> >>> +			trigger_mode = IOAPIC_EDGE_TRIG;
> >>> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> >>> +	}
> >>> +}
> >>> +
> >>>  static int apic_set_eoi(struct kvm_lapic *apic) { 	int vector =
> >>>  apic_find_highest_isr(apic); @@ -756,19 +778,24 @@ static int
> >>>  apic_set_eoi(struct kvm_lapic *apic) 	apic_clear_isr(vector, apic);
> >>>  	apic_update_ppr(apic);
> >>> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> >>> -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> >>> -		int trigger_mode;
> >>> -		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> >>> -			trigger_mode = IOAPIC_LEVEL_TRIG;
> >>> -		else
> >>> -			trigger_mode = IOAPIC_EDGE_TRIG;
> >>> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> >>> -	}
> >>> +	kvm_ioapic_send_eoi(apic, vector);
> >>>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> >>>  	return vector;
> >>>  }
> >>> +/*
> >>> + * this interface assumes a trap-like exit, which has already finished
> >>> + * desired side effect including vISR and vPPR update.
> >>> + */
> >>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
> >>> +{
> >>> +	struct kvm_lapic *apic = vcpu->arch.apic;
> >>> +
> >>> +	kvm_ioapic_send_eoi(apic, vector);
> >>> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
> >>> +
> >>>  static void apic_send_ipi(struct kvm_lapic *apic) { 	u32 icr_low =
> >>>  kvm_apic_get_reg(apic, APIC_ICR); @@ -1533,6 +1560,17 @@ int
> >>>  kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) 	return highest_irr; }
> >>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +	struct kvm_lapic *apic = vcpu->arch.apic;
> >>> +
> >>> +	if (!apic || !apic_enabled(apic))
> >>> +		return -1;
> >>> +
> >>> +	return apic_find_highest_irr(apic);
> >>> +}
> >> 
> >> irr_pending of apic_find_highest_irr() is meaningless (stale) if
> >> HW is updating VIRR.
> > I don't think target vcpu is running when we call this function. So it is safe to
> > check irr_pending and read the irr.
> One problem is that when HW clears VIRR, after vmexit, the irr_peding will still true if there are more than one bit is set in VIRR in last vmentry.
It doesn't matter how much bits were set in VIRR before entry, as long
as some bit were set irr_peding will be true.

> How about to recaculate irr_pending according the VIRR on each vmexit?
> 
No need really. Since HW can only clear VIRR the only situation that may
happen is that irr_pending will be true but VIRR is empty and
apic_find_highest_irr() will return correct result in this case.

If we will see a lot of unneeded irr scans because of stale irr_pending
value we can do irr_pending = rvi != 0 on vmexit.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zhang, Yang Z Dec. 6, 2012, 6:56 a.m. UTC | #23

Gleb Natapov wrote on 2012-12-06:
> On Thu, Dec 06, 2012 at 05:02:15AM +0000, Zhang, Yang Z wrote:
>> Zhang, Yang Z wrote on 2012-12-06:
>>> Marcelo Tosatti wrote on 2012-12-06:
>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
>>>>> manually, which is fully taken care of by the hardware. This needs
>>>>> some special awareness into existing interrupr injection path:
>>>>> 
>>>>> - for pending interrupt, instead of direct injection, we may need
>>>>>   update architecture specific indicators before resuming to guest. -
>>>>>   A pending interrupt, which is masked by ISR, should be also
>>>>>   considered in above update action, since hardware will decide when
>>>>>   to inject it at right time. Current has_interrupt and get_interrupt
>>>>>   only returns a valid vector from injection p.o.v.
>>>>> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
>>>>> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
>>>>> ---
>>>>>  arch/x86/include/asm/kvm_host.h |    4 +
> arch/x86/include/asm/vmx.h
>>>>>    |   11 +++ arch/x86/kvm/irq.c              |   53
> ++++++++++-----
>>>>>  arch/x86/kvm/lapic.c            |   56 +++++++++++++---
>>>>>  arch/x86/kvm/lapic.h            |    6 ++ arch/x86/kvm/svm.c
>>>>>     |   19 +++++ arch/x86/kvm/vmx.c              |  140
>>>>>  ++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c
>>>>>   |   34 ++++++++-- virt/kvm/ioapic.c               |    1 + 9 files
>>>>>  changed, 291 insertions(+), 33 deletions(-)
>>>>> diff --git a/arch/x86/include/asm/kvm_host.h
>>>>> b/arch/x86/include/asm/kvm_host.h index dc87b65..e5352c8 100644 ---
>>>>> a/arch/x86/include/asm/kvm_host.h +++
>>>>> b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,10 @@ struct
>>>>> kvm_x86_ops {
>>>>>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>>>>>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>>>>>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
>>>>> +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
>>>>> +	void (*update_irq)(struct kvm_vcpu *vcpu);
>>>>> +	void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector,
>>>>> +			int trig_mode, int always_set);
>>>>>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>>>>>  	int (*get_tdp_level)(void);
>>>>>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool
>>> is_mmio);
>>>>> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
>>>>> index 21101b6..1003341 100644
>>>>> --- a/arch/x86/include/asm/vmx.h
>>>>> +++ b/arch/x86/include/asm/vmx.h
>>>>> @@ -62,6 +62,7 @@
>>>>>  #define EXIT_REASON_MCE_DURING_VMENTRY  41 #define
>>>>>  EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define
>>> EXIT_REASON_APIC_ACCESS
>>>>>      44 +#define EXIT_REASON_EOI_INDUCED         45 #define
>>>>>  EXIT_REASON_EPT_VIOLATION       48 #define
>>> EXIT_REASON_EPT_MISCONFIG
>>>>>      49 #define EXIT_REASON_WBINVD              54 @@ -143,6
>>> +144,7 @@
>>>>>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040 #define
>>>>>  SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080 #define
>>>>>  SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100 +#define
>>>>>  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200 #define
>>>>>  SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400 #define
>>>>>  SECONDARY_EXEC_ENABLE_INVPCID		0x00001000 @@ -180,6 +182,7 @@ enum
>>>>>  vmcs_field { 	GUEST_GS_SELECTOR               = 0x0000080a,
>>> 	GUEST_LDTR_SELECTOR
>>>>>           = 0x0000080c, 	GUEST_TR_SELECTOR               =
>>> 0x0000080e,
>>>>>  +	GUEST_INTR_STATUS               = 0x00000810,
>>> 	HOST_ES_SELECTOR
>>>>>            = 0x00000c00, 	HOST_CS_SELECTOR                =
>>> 0x00000c02,
>>>>>  	HOST_SS_SELECTOR                = 0x00000c04, @@ -207,6 +210,14 @@
>>>>>  enum vmcs_field { 	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
>>> 	EPT_POINTER
>>>>>                    = 0x0000201a, 	EPT_POINTER_HIGH
>>> =
>>>>>  0x0000201b,
>>>>> +	EOI_EXIT_BITMAP0                = 0x0000201c,
>>>>> +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
>>>>> +	EOI_EXIT_BITMAP1                = 0x0000201e,
>>>>> +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
>>>>> +	EOI_EXIT_BITMAP2                = 0x00002020,
>>>>> +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
>>>>> +	EOI_EXIT_BITMAP3                = 0x00002022,
>>>>> +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
>>>>>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>>>>>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
>>>>>  	VMCS_LINK_POINTER               = 0x00002800,
>>>>> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
>>>>> index 7e06ba1..f782788 100644
>>>>> --- a/arch/x86/kvm/irq.c
>>>>> +++ b/arch/x86/kvm/irq.c
>>>>> @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
>>>>>   */
>>>>>  int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
>>>>>  {
>>>>> -	struct kvm_pic *s;
>>>>> -
>>>>>  	if (!irqchip_in_kernel(v->kvm))
>>>>>  		return v->arch.interrupt.pending;
>>>>> -	if (kvm_apic_has_interrupt(v) == -1) {	/* LAPIC */
>>>>> -		if (kvm_apic_accept_pic_intr(v)) {
>>>>> -			s = pic_irqchip(v->kvm);	/* PIC */
>>>>> -			return s->output;
>>>>> -		} else
>>>>> -			return 0;
>>>>> -	}
>>>>> +	if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */
>>>>> +		return kvm_cpu_has_extint(v); /* non-APIC */
>>>>>  	return 1;
>>>>>  }
>>>>>  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
>>>>>  
>>>>>  /*
>>>>> + * check if there is pending interrupt from
>>>>> + * non-APIC source without intack.
>>>>> + */
>>>>> +int kvm_cpu_has_extint(struct kvm_vcpu *v)
>>>>> +{
>>>>> +	struct kvm_pic *s;
>>>>> +
>>>>> +	if (kvm_apic_accept_pic_intr(v)) {
>>>>> +		s = pic_irqchip(v->kvm); /* PIC */
>>>>> +		return s->output;
>>>>> +	} else
>>>>> +		return 0;
>>>>> +}
>>>>> +
>>>>> +/*
>>>>>   * Read pending interrupt vector and intack.
>>>>>   */
>>>>>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v) { -	struct kvm_pic *s;
>>>>>  	int vector;
>>>>>  
>>>>>  	if (!irqchip_in_kernel(v->kvm))
>>>>>  		return v->arch.interrupt.nr;
>>>>>  
>>>>>  	vector = kvm_get_apic_interrupt(v);	/* APIC */
>>>>> -	if (vector == -1) {
>>>>> -		if (kvm_apic_accept_pic_intr(v)) {
>>>>> -			s = pic_irqchip(v->kvm);
>>>>> -			s->output = 0;		/* PIC */
>>>>> -			vector = kvm_pic_read_irq(v->kvm);
>>>>> -		}
>>>>> -	}
>>>>> +	if (vector == -1)
>>>>> +		return kvm_cpu_get_extint(v); /* non-APIC */
>>>>>  	return vector;
>>>>>  }
>>>>>  EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
>>>>> +/*
>>>>> + * Read pending interrupt(from non-APIC source)
>>>>> + * vector and intack.
>>>>> + */
>>>>> +int kvm_cpu_get_extint(struct kvm_vcpu *v)
>>>>> +{
>>>>> +	struct kvm_pic *s;
>>>>> +	int vector = -1;
>>>>> +
>>>>> +	if (kvm_apic_accept_pic_intr(v)) {
>>>>> +		s = pic_irqchip(v->kvm);
>>>>> +		s->output = 0;		/* PIC */
>>>>> +		vector = kvm_pic_read_irq(v->kvm);
>>>>> +	}
>>>>> +	return vector;
>>>>> +}
>>>>> +
>>>>>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
>>>>>  {
>>>>>  	kvm_inject_apic_timer_irqs(vcpu);
>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>>> index 7c96012..400d3ba 100644
>>>>> --- a/arch/x86/kvm/lapic.c
>>>>> +++ b/arch/x86/kvm/lapic.c
>>>>> @@ -643,6 +643,14 @@ out:
>>>>>  	return ret;
>>>>>  }
>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>>> +		int trig_mode, int always_set)
>>>>> +{
>>>>> +	if (kvm_x86_ops->set_eoi_exitmap)
>>>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
>>>>> +					trig_mode, always_set);
>>>>> +}
>>>>> +
>>>>>  /*
>>>>>   * Add a pending IRQ into lapic.
>>>>>   * Return 1 if successfully added and 0 if discarded.
>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic,
> int
>>>> delivery_mode,
>>>>>  		if (unlikely(!apic_enabled(apic))) 			break;
>>>>>  +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0); 		if
>>>>>  (trig_mode) { 			apic_debug("level trig mode for vector %d",
>>>>>  vector); 			apic_set_vector(vector, apic->regs + APIC_TMR);
>>>>> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu
> *vcpu1,
>>>> struct kvm_vcpu *vcpu2)
>>>>>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>>>>>  }
>>>>> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>>>>> +{ +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) &
>>>>> APIC_SPIV_DIRECTED_EOI) && +	   
>>>>> kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) { +		int
>>>>> trigger_mode; +		if (apic_test_vector(vector, apic->regs +
>>>>> APIC_TMR)) +			trigger_mode = IOAPIC_LEVEL_TRIG; +		else
>>>>> +			trigger_mode = IOAPIC_EDGE_TRIG;
>>>>> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode); +	}
>>>>> +} +
>>>>>  static int apic_set_eoi(struct kvm_lapic *apic) { 	int vector =
>>>>>  apic_find_highest_isr(apic); @@ -756,19 +778,24 @@ static int
>>>>>  apic_set_eoi(struct kvm_lapic *apic) 	apic_clear_isr(vector, apic);
>>>>>  	apic_update_ppr(apic);
>>>>> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>>>>> && -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
>>>>> -		int trigger_mode; -		if (apic_test_vector(vector, apic->regs +
>>>>> APIC_TMR)) -			trigger_mode = IOAPIC_LEVEL_TRIG; -		else
>>>>> -			trigger_mode = IOAPIC_EDGE_TRIG;
>>>>> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode); -	}
>>>>> +	kvm_ioapic_send_eoi(apic, vector);
>>>>>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>>>>  	return vector;
>>>>>  }
>>>>> +/*
>>>>> + * this interface assumes a trap-like exit, which has already finished
>>>>> + * desired side effect including vISR and vPPR update.
>>>>> + */
>>>>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
>>>>> +{
>>>>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>>>>> +
>>>>> +	kvm_ioapic_send_eoi(apic, vector);
>>>>> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
>>>>> +
>>>>>  static void apic_send_ipi(struct kvm_lapic *apic) { 	u32 icr_low =
>>>>>  kvm_apic_get_reg(apic, APIC_ICR); @@ -1533,6 +1560,17 @@ int
>>>>>  kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) 	return highest_irr; }
>>>>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
>>>>> +{
>>>>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>>>>> +
>>>>> +	if (!apic || !apic_enabled(apic))
>>>>> +		return -1;
>>>>> +
>>>>> +	return apic_find_highest_irr(apic);
>>>>> +}
>>>> 
>>>> irr_pending of apic_find_highest_irr() is meaningless (stale) if
>>>> HW is updating VIRR.
>>> I don't think target vcpu is running when we call this function. So it is safe to
>>> check irr_pending and read the irr.
>> One problem is that when HW clears VIRR, after vmexit, the irr_peding will still
> true if there are more than one bit is set in VIRR in last vmentry.
> It doesn't matter how much bits were set in VIRR before entry, as long
> as some bit were set irr_peding will be true.
>
>> How about to recaculate irr_pending according the VIRR on each vmexit?
>> 
> No need really. Since HW can only clear VIRR the only situation that may
> happen is that irr_pending will be true but VIRR is empty and
> apic_find_highest_irr() will return correct result in this case.
>
> If we will see a lot of unneeded irr scans because of stale irr_pending
> value we can do irr_pending = rvi != 0 on vmexit.
Ok. Currently, I think no need to add this. I will do some data collection to see whether it is needed in future.

> --
> 			Gleb.


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov Dec. 6, 2012, 7:07 a.m. UTC | #24

On Thu, Dec 06, 2012 at 02:55:16AM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2012-12-05:
> > On Wed, Dec 05, 2012 at 01:51:36PM +0000, Zhang, Yang Z wrote:
> >> Gleb Natapov wrote on 2012-12-05:
> >>> On Wed, Dec 05, 2012 at 06:02:59AM +0000, Zhang, Yang Z wrote:
> >>>> Gleb Natapov wrote on 2012-12-05:
> >>>>> On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote:
> >>>>>> Gleb Natapov wrote on 2012-12-04:
> >>>>>>> On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
> >>>>>>>> Gleb Natapov wrote on 2012-12-03:
> >>>>>>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
> >>>>>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> >>>>>>>>>> manually, which is fully taken care of by the hardware. This needs
> >>>>>>>>>> some special awareness into existing interrupr injection path:
> >>>>>>>>>> 
> >>>>>>>>>> - for pending interrupt, instead of direct injection, we may need
> >>>>>>>>>>   update architecture specific indicators before resuming to
> >>>>>>>>>>   guest. - A pending interrupt, which is masked by ISR, should
> >>>>>>>>>>   be also considered in above update action, since hardware
> >>>>>>>>>>   will decide when to inject it at right time. Current
> >>>>>>>>>>   has_interrupt and get_interrupt only returns a valid vector
> >>>>>>>>>>   from injection p.o.v.
> >>>>>>>>> Most of my previous comments still apply.
> >>>>>>>>> 
> >>>>>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> >>>>>>>>>> +		int trig_mode, int always_set)
> >>>>>>>>>> +{
> >>>>>>>>>> +	if (kvm_x86_ops->set_eoi_exitmap)
> >>>>>>>>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
> >>>>>>>>>> +					trig_mode, always_set);
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>>  /*
> >>>>>>>>>>   * Add a pending IRQ into lapic.
> >>>>>>>>>>   * Return 1 if successfully added and 0 if discarded.
> >>>>>>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic
> >>>>> *apic,
> >>>>>>> int
> >>>>>>>>> delivery_mode,
> >>>>>>>>>>  		if (unlikely(!apic_enabled(apic)))
> >>>>>>>>>>  			break;
> >>>>>>>>>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
> >>>>>>>>> As I said in the last review rebuild the bitmap when ioapic or irq
> >>>>>>>>> notifier configuration changes, user request bit to notify vcpus to
> >>>>>>>>> reload the bitmap.
> >>>>>>>> It is too complicated. When program ioapic entry, we cannot get the
> >>> target
> >>>>> vcpu
> >>>>>>> easily. We need to read destination format register and logical
> >>>>>>> destination register to find out target vcpu if using logical mode.
> >>>>>>> Also, we must trap every modification to the two registers to update
> >>>>>>> eoi bitmap. No need to check target vcpu. Enable exit on all vcpus
> >>>>>>> for the vector
> >>>>>> This is wrong. As we known, modern OS uses per VCPU vector. We cannot
> >>>>> ensure all vectors have same trigger mode. And what's worse, the
> >>>>> vector in another vcpu is used to handle high frequency
> >>>>> interrupts(like 10G NIC), then it will hurt performance.
> >>>>>> 
> >>>>> I never saw OSes reuse vector used by ioapic, as far as I see this
> >>>> Could you point out which code does this check in Linux kernel? I don't
> >>>> see any special checks when Linux kernel allocates a vector.
> >>>> 
> >>> arch/x86/kernel/apic/io_apic.c:create_irq_nr(). It uses
> >>> apic->target_cpus() to get cpu mask. target_cpus() return mask of all
> >>> online cpus. Actually you wrote arch_pi_create_irq() in PI patches to
> >>> workaround this behaviour and allocated vector per cpu, no?
> >> Yes, when create an new irq, it will allocate vector from all online cpus. But after
> > user changes the irq affinity, then the vector will reallocate with new cpumask.
> > And this will leave the vector available on other cpus.
> >> 
> > Since during vector allocation all cpus are checked vector will not be
> > reused if it is allocated on any cpu.
> Sorry, I still cannot find this check in kernel. Can you point me out to it?
You pointed to it above by yourself:
  "Yes, when create an new irq, it will allocate vector from all online
cpus"
So if vector is allocated by at least one online cpu it cannot be reused
during allocation.

> Also, I do see the vector reused by MSI and IOAPIC in my system.
>  
What is your system? What is your qemu command line? We only care if
MSI uses the same vector as IOAPICs level interrupt but on different
cpu. If this happens we can use apic_map to calculate per cpu eoi exit
bitmap.

> >>> Are you aware of any guest that I can run, examine ioapic/apic
> >>> configuration and see that the same vector is used on different vcpus
> >>> for different devices? Can you point me to it?
> >>> 
> > Can you answer this?
> I am sure Xen will reused the IOAPIC vector.
> 
What configuration it creates exactly? Also if Xen running as KVM guest
will takes small performance hit it is not a big problem. Remember there
is not correctness issue here.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zhang, Yang Z Dec. 6, 2012, 7:16 a.m. UTC | #25

Gleb Natapov wrote on 2012-12-06:
> On Thu, Dec 06, 2012 at 02:55:16AM +0000, Zhang, Yang Z wrote:
>> Gleb Natapov wrote on 2012-12-05:
>>> On Wed, Dec 05, 2012 at 01:51:36PM +0000, Zhang, Yang Z wrote:
>>>> Gleb Natapov wrote on 2012-12-05:
>>>>> On Wed, Dec 05, 2012 at 06:02:59AM +0000, Zhang, Yang Z wrote:
>>>>>> Gleb Natapov wrote on 2012-12-05:
>>>>>>> On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote:
>>>>>>>> Gleb Natapov wrote on 2012-12-04:
>>>>>>>>> On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
>>>>>>>>>> Gleb Natapov wrote on 2012-12-03:
>>>>>>>>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
>>>>>>>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
>>>>>>>>>>>> manually, which is fully taken care of by the hardware. This needs
>>>>>>>>>>>> some special awareness into existing interrupr injection path:
>>>>>>>>>>>> 
>>>>>>>>>>>> - for pending interrupt, instead of direct injection, we may need
>>>>>>>>>>>>   update architecture specific indicators before resuming to
>>>>>>>>>>>>   guest. - A pending interrupt, which is masked by ISR, should
>>>>>>>>>>>>   be also considered in above update action, since hardware
>>>>>>>>>>>>   will decide when to inject it at right time. Current
>>>>>>>>>>>>   has_interrupt and get_interrupt only returns a valid vector
>>>>>>>>>>>>   from injection p.o.v.
>>>>>>>>>>> Most of my previous comments still apply.
>>>>>>>>>>> 
>>>>>>>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>>>>>>>>>> +		int trig_mode, int always_set)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +	if (kvm_x86_ops->set_eoi_exitmap)
>>>>>>>>>>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
>>>>>>>>>>>> +					trig_mode, always_set);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>>  /*
>>>>>>>>>>>>   * Add a pending IRQ into lapic.
>>>>>>>>>>>>   * Return 1 if successfully added and 0 if discarded.
>>>>>>>>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct
> kvm_lapic
>>>>>>> *apic,
>>>>>>>>> int
>>>>>>>>>>> delivery_mode,
>>>>>>>>>>>>  		if (unlikely(!apic_enabled(apic)))
>>>>>>>>>>>>  			break;
>>>>>>>>>>>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
>>>>>>>>>>> As I said in the last review rebuild the bitmap when ioapic or irq
>>>>>>>>>>> notifier configuration changes, user request bit to notify vcpus to
>>>>>>>>>>> reload the bitmap.
>>>>>>>>>> It is too complicated. When program ioapic entry, we cannot get the
>>>>> target
>>>>>>> vcpu
>>>>>>>>> easily. We need to read destination format register and logical
>>>>>>>>> destination register to find out target vcpu if using logical mode.
>>>>>>>>> Also, we must trap every modification to the two registers to update
>>>>>>>>> eoi bitmap. No need to check target vcpu. Enable exit on all vcpus
>>>>>>>>> for the vector
>>>>>>>> This is wrong. As we known, modern OS uses per VCPU vector. We
> cannot
>>>>>>> ensure all vectors have same trigger mode. And what's worse, the
>>>>>>> vector in another vcpu is used to handle high frequency
>>>>>>> interrupts(like 10G NIC), then it will hurt performance.
>>>>>>>> 
>>>>>>> I never saw OSes reuse vector used by ioapic, as far as I see this
>>>>>> Could you point out which code does this check in Linux kernel? I don't
>>>>>> see any special checks when Linux kernel allocates a vector.
>>>>>> 
>>>>> arch/x86/kernel/apic/io_apic.c:create_irq_nr(). It uses
>>>>> apic->target_cpus() to get cpu mask. target_cpus() return mask of all
>>>>> online cpus. Actually you wrote arch_pi_create_irq() in PI patches to
>>>>> workaround this behaviour and allocated vector per cpu, no?
>>>> Yes, when create an new irq, it will allocate vector from all online cpus. But
> after
>>> user changes the irq affinity, then the vector will reallocate with
>>> new cpumask. And this will leave the vector available on other cpus.
>>>> 
>>> Since during vector allocation all cpus are checked vector will not be
>>> reused if it is allocated on any cpu.
>> Sorry, I still cannot find this check in kernel. Can you point me out to it?
> You pointed to it above by yourself:
>   "Yes, when create an new irq, it will allocate vector from all online
> cpus"
> So if vector is allocated by at least one online cpu it cannot be reused
> during allocation.
> 
>> Also, I do see the vector reused by MSI and IOAPIC in my system.
>> 
> What is your system? What is your qemu command line? We only care if MSI
> uses the same vector as IOAPICs level interrupt but on different cpu. If
> this happens we can use apic_map to calculate per cpu eoi exit bitmap.
> 
>>>>> Are you aware of any guest that I can run, examine ioapic/apic
>>>>> configuration and see that the same vector is used on different vcpus
>>>>> for different devices? Can you point me to it?
>>>>> 
>>> Can you answer this?
>> I am sure Xen will reused the IOAPIC vector.
>> 
> What configuration it creates exactly? Also if Xen running as KVM guest
> will takes small performance hit it is not a big problem. Remember there
> is not correctness issue here.
Anyway, it's doesn't matter to discuss which OS will reuse the vector.
So use apic_map is the final decision?

Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov Dec. 6, 2012, 7:19 a.m. UTC | #26

On Thu, Dec 06, 2012 at 07:16:07AM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2012-12-06:
> > On Thu, Dec 06, 2012 at 02:55:16AM +0000, Zhang, Yang Z wrote:
> >> Gleb Natapov wrote on 2012-12-05:
> >>> On Wed, Dec 05, 2012 at 01:51:36PM +0000, Zhang, Yang Z wrote:
> >>>> Gleb Natapov wrote on 2012-12-05:
> >>>>> On Wed, Dec 05, 2012 at 06:02:59AM +0000, Zhang, Yang Z wrote:
> >>>>>> Gleb Natapov wrote on 2012-12-05:
> >>>>>>> On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote:
> >>>>>>>> Gleb Natapov wrote on 2012-12-04:
> >>>>>>>>> On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
> >>>>>>>>>> Gleb Natapov wrote on 2012-12-03:
> >>>>>>>>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
> >>>>>>>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> >>>>>>>>>>>> manually, which is fully taken care of by the hardware. This needs
> >>>>>>>>>>>> some special awareness into existing interrupr injection path:
> >>>>>>>>>>>> 
> >>>>>>>>>>>> - for pending interrupt, instead of direct injection, we may need
> >>>>>>>>>>>>   update architecture specific indicators before resuming to
> >>>>>>>>>>>>   guest. - A pending interrupt, which is masked by ISR, should
> >>>>>>>>>>>>   be also considered in above update action, since hardware
> >>>>>>>>>>>>   will decide when to inject it at right time. Current
> >>>>>>>>>>>>   has_interrupt and get_interrupt only returns a valid vector
> >>>>>>>>>>>>   from injection p.o.v.
> >>>>>>>>>>> Most of my previous comments still apply.
> >>>>>>>>>>> 
> >>>>>>>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> >>>>>>>>>>>> +		int trig_mode, int always_set)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> +	if (kvm_x86_ops->set_eoi_exitmap)
> >>>>>>>>>>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
> >>>>>>>>>>>> +					trig_mode, always_set);
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>>  /*
> >>>>>>>>>>>>   * Add a pending IRQ into lapic.
> >>>>>>>>>>>>   * Return 1 if successfully added and 0 if discarded.
> >>>>>>>>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct
> > kvm_lapic
> >>>>>>> *apic,
> >>>>>>>>> int
> >>>>>>>>>>> delivery_mode,
> >>>>>>>>>>>>  		if (unlikely(!apic_enabled(apic)))
> >>>>>>>>>>>>  			break;
> >>>>>>>>>>>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
> >>>>>>>>>>> As I said in the last review rebuild the bitmap when ioapic or irq
> >>>>>>>>>>> notifier configuration changes, user request bit to notify vcpus to
> >>>>>>>>>>> reload the bitmap.
> >>>>>>>>>> It is too complicated. When program ioapic entry, we cannot get the
> >>>>> target
> >>>>>>> vcpu
> >>>>>>>>> easily. We need to read destination format register and logical
> >>>>>>>>> destination register to find out target vcpu if using logical mode.
> >>>>>>>>> Also, we must trap every modification to the two registers to update
> >>>>>>>>> eoi bitmap. No need to check target vcpu. Enable exit on all vcpus
> >>>>>>>>> for the vector
> >>>>>>>> This is wrong. As we known, modern OS uses per VCPU vector. We
> > cannot
> >>>>>>> ensure all vectors have same trigger mode. And what's worse, the
> >>>>>>> vector in another vcpu is used to handle high frequency
> >>>>>>> interrupts(like 10G NIC), then it will hurt performance.
> >>>>>>>> 
> >>>>>>> I never saw OSes reuse vector used by ioapic, as far as I see this
> >>>>>> Could you point out which code does this check in Linux kernel? I don't
> >>>>>> see any special checks when Linux kernel allocates a vector.
> >>>>>> 
> >>>>> arch/x86/kernel/apic/io_apic.c:create_irq_nr(). It uses
> >>>>> apic->target_cpus() to get cpu mask. target_cpus() return mask of all
> >>>>> online cpus. Actually you wrote arch_pi_create_irq() in PI patches to
> >>>>> workaround this behaviour and allocated vector per cpu, no?
> >>>> Yes, when create an new irq, it will allocate vector from all online cpus. But
> > after
> >>> user changes the irq affinity, then the vector will reallocate with
> >>> new cpumask. And this will leave the vector available on other cpus.
> >>>> 
> >>> Since during vector allocation all cpus are checked vector will not be
> >>> reused if it is allocated on any cpu.
> >> Sorry, I still cannot find this check in kernel. Can you point me out to it?
> > You pointed to it above by yourself:
> >   "Yes, when create an new irq, it will allocate vector from all online
> > cpus"
> > So if vector is allocated by at least one online cpu it cannot be reused
> > during allocation.
> > 
> >> Also, I do see the vector reused by MSI and IOAPIC in my system.
> >> 
> > What is your system? What is your qemu command line? We only care if MSI
> > uses the same vector as IOAPICs level interrupt but on different cpu. If
> > this happens we can use apic_map to calculate per cpu eoi exit bitmap.
> > 
> >>>>> Are you aware of any guest that I can run, examine ioapic/apic
> >>>>> configuration and see that the same vector is used on different vcpus
> >>>>> for different devices? Can you point me to it?
> >>>>> 
> >>> Can you answer this?
> >> I am sure Xen will reused the IOAPIC vector.
> >> 
> > What configuration it creates exactly? Also if Xen running as KVM guest
> > will takes small performance hit it is not a big problem. Remember there
> > is not correctness issue here.
> Anyway, it's doesn't matter to discuss which OS will reuse the vector.
> So use apic_map is the final decision?
> 
If you are firmly committed to per vcpu bitmap then yes. I tried to make
it simpler for you :)

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zhang, Yang Z Dec. 6, 2012, 7:20 a.m. UTC | #27

Gleb Natapov wrote on 2012-12-06:
> On Thu, Dec 06, 2012 at 07:16:07AM +0000, Zhang, Yang Z wrote:
>> Gleb Natapov wrote on 2012-12-06:
>>> On Thu, Dec 06, 2012 at 02:55:16AM +0000, Zhang, Yang Z wrote:
>>>> Gleb Natapov wrote on 2012-12-05:
>>>>> On Wed, Dec 05, 2012 at 01:51:36PM +0000, Zhang, Yang Z wrote:
>>>>>> Gleb Natapov wrote on 2012-12-05:
>>>>>>> On Wed, Dec 05, 2012 at 06:02:59AM +0000, Zhang, Yang Z wrote:
>>>>>>>> Gleb Natapov wrote on 2012-12-05:
>>>>>>>>> On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote:
>>>>>>>>>> Gleb Natapov wrote on 2012-12-04:
>>>>>>>>>>> On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
>>>>>>>>>>>> Gleb Natapov wrote on 2012-12-03:
>>>>>>>>>>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
>>>>>>>>>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC
>>>>>>>>>>>>>> interrupts manually, which is fully taken care of by the
>>>>>>>>>>>>>> hardware. This needs some special awareness into existing
>>>>>>>>>>>>>> interrupr injection path:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - for pending interrupt, instead of direct injection, we may
> need
>>>>>>>>>>>>>>   update architecture specific indicators before resuming to
>>>>>>>>>>>>>>   guest. - A pending interrupt, which is masked by ISR, should
>>>>>>>>>>>>>>   be also considered in above update action, since hardware
>>>>>>>>>>>>>>   will decide when to inject it at right time. Current
>>>>>>>>>>>>>>   has_interrupt and get_interrupt only returns a valid vector
>>>>>>>>>>>>>>   from injection p.o.v.
>>>>>>>>>>>>> Most of my previous comments still apply.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int
>>>>>>>>>>>>>> vector, +		int trig_mode, int always_set) +{ +	if
>>>>>>>>>>>>>> (kvm_x86_ops->set_eoi_exitmap)
>>>>>>>>>>>>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
>>>>>>>>>>>>>> +					trig_mode, always_set); +} +
>>>>>>>>>>>>>>  /*
>>>>>>>>>>>>>>   * Add a pending IRQ into lapic.
>>>>>>>>>>>>>>   * Return 1 if successfully added and 0 if discarded.
>>>>>>>>>>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct
>>> kvm_lapic
>>>>>>>>> *apic,
>>>>>>>>>>> int
>>>>>>>>>>>>> delivery_mode,
>>>>>>>>>>>>>>  		if (unlikely(!apic_enabled(apic)))
>>>>>>>>>>>>>>  			break;
>>>>>>>>>>>>>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
>>>>>>>>>>>>> As I said in the last review rebuild the bitmap when ioapic
>>>>>>>>>>>>> or irq notifier configuration changes, user request bit to
>>>>>>>>>>>>> notify vcpus to reload the bitmap.
>>>>>>>>>>>> It is too complicated. When program ioapic entry, we cannot get
> the
>>>>>>> target
>>>>>>>>> vcpu
>>>>>>>>>>> easily. We need to read destination format register and
>>>>>>>>>>> logical destination register to find out target vcpu if using
>>>>>>>>>>> logical mode. Also, we must trap every modification to the two
>>>>>>>>>>> registers to update eoi bitmap. No need to check target vcpu.
>>>>>>>>>>> Enable exit on all vcpus for the vector
>>>>>>>>>> This is wrong. As we known, modern OS uses per VCPU vector. We
>>> cannot
>>>>>>>>> ensure all vectors have same trigger mode. And what's worse, the
>>>>>>>>> vector in another vcpu is used to handle high frequency
>>>>>>>>> interrupts(like 10G NIC), then it will hurt performance.
>>>>>>>>>> 
>>>>>>>>> I never saw OSes reuse vector used by ioapic, as far as I see this
>>>>>>>> Could you point out which code does this check in Linux kernel? I don't
>>>>>>>> see any special checks when Linux kernel allocates a vector.
>>>>>>>> 
>>>>>>> arch/x86/kernel/apic/io_apic.c:create_irq_nr(). It uses
>>>>>>> apic->target_cpus() to get cpu mask. target_cpus() return mask of all
>>>>>>> online cpus. Actually you wrote arch_pi_create_irq() in PI patches to
>>>>>>> workaround this behaviour and allocated vector per cpu, no?
>>>>>> Yes, when create an new irq, it will allocate vector from all online cpus.
> But
>>> after
>>>>> user changes the irq affinity, then the vector will reallocate with
>>>>> new cpumask. And this will leave the vector available on other cpus.
>>>>>> 
>>>>> Since during vector allocation all cpus are checked vector will not be
>>>>> reused if it is allocated on any cpu.
>>>> Sorry, I still cannot find this check in kernel. Can you point me out to it?
>>> You pointed to it above by yourself:
>>>   "Yes, when create an new irq, it will allocate vector from all online
>>> cpus"
>>> So if vector is allocated by at least one online cpu it cannot be reused
>>> during allocation.
>>> 
>>>> Also, I do see the vector reused by MSI and IOAPIC in my system.
>>>> 
>>> What is your system? What is your qemu command line? We only care if MSI
>>> uses the same vector as IOAPICs level interrupt but on different cpu. If
>>> this happens we can use apic_map to calculate per cpu eoi exit bitmap.
>>> 
>>>>>>> Are you aware of any guest that I can run, examine ioapic/apic
>>>>>>> configuration and see that the same vector is used on different vcpus
>>>>>>> for different devices? Can you point me to it?
>>>>>>> 
>>>>> Can you answer this?
>>>> I am sure Xen will reused the IOAPIC vector.
>>>> 
>>> What configuration it creates exactly? Also if Xen running as KVM guest
>>> will takes small performance hit it is not a big problem. Remember there
>>> is not correctness issue here.
>> Anyway, it's doesn't matter to discuss which OS will reuse the vector.
>> So use apic_map is the final decision?
>> 
> If you are firmly committed to per vcpu bitmap then yes. I tried to make
> it simpler for you :)
Ok. Got it. Thanks.:)

Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov Dec. 6, 2012, 9:28 a.m. UTC | #28

On Wed, Dec 05, 2012 at 08:38:59PM -0200, Marcelo Tosatti wrote:
> On Wed, Dec 05, 2012 at 01:14:38PM +0200, Gleb Natapov wrote:
> > On Wed, Dec 05, 2012 at 03:43:41AM +0000, Zhang, Yang Z wrote:
> > > >> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
> > > > *vcpu)
> > > >>  	}
> > > >>  
> > > >>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> > > >> +		/* update archtecture specific hints for APIC
> > > >> +		 * virtual interrupt delivery */
> > > >> +		if (kvm_x86_ops->update_irq)
> > > >> +			kvm_x86_ops->update_irq(vcpu);
> > > >> +
> > > >>  		inject_pending_event(vcpu);
> > > >>  
> > > >>  		/* enable NMI/IRQ window open exits if needed */
> > > >>  		if (vcpu->arch.nmi_pending)
> > > >>  			kvm_x86_ops->enable_nmi_window(vcpu);
> > > >> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> > > >> +		else if (kvm_apic_vid_enabled(vcpu)) {
> > > >> +			if (kvm_cpu_has_extint(vcpu))
> > > >> +				kvm_x86_ops->enable_irq_window(vcpu);
> > > > 
> > > > If RVI is non-zero, then interrupt window should not be enabled,
> > > > accordingly to 29.2.2:
> > > > 
> > > > "If a virtual interrupt has been recognized (see Section 29.2.1), it will
> > > > be delivered at an instruction boundary when the following conditions all
> > > > hold: (1) RFLAGS.IF = 1; (2) there is no blocking by STI; (3) there is no
> > > > blocking by MOV SS or by POP SS; and (4) the “interrupt-window exiting”
> > > > VM-execution control is 0."
> > > Right. Must check RVI here.
> > > 
> > Why? We request interrupt window here because there is ExtINT interrupt
> > pending. ExtINT interrupt has a precedence over APIC interrupts (our
> > current code is incorrect!), so we want vmexit as soon as interrupts are
> > allowed to inject ExtINT and we do not want virtual interrupt to be
> > delivered. I think the (4) there is exactly for this situation.
> > 
> > --
> > 			Gleb.
> 
> Right. BTW, delivery of ExtINT has no EOI, so there is no evaluation
> of pending virtual interrupts. Therefore, shouldnt interrupt window be
> enabled when injecting ExtINT so that evaluation of pending virtual
> interrupts is performed on next vm-entry?
> 
Good question and I think, luckily for us, the answer is no. Spec uses
two different terms when it talks about virtual interrupts "Evaluation
of Pending Virtual Interrupts" and "Virtual-Interrupt Delivery". As far
as my reading of the spec goes they are not necessary happen at the same
time. So during ExtINT injection "evaluation" will happen (due to vmentry)
and virtual interrupt will be recognized, but not "delivered". It will
be delivered when condition described in section 29.2.2 will be met i.e
when interrupts will be enabled.

Yang, can you confirm this?

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zhang, Yang Z Dec. 6, 2012, 11:35 a.m. UTC | #29

Gleb Natapov wrote on 2012-12-06:
> On Wed, Dec 05, 2012 at 08:38:59PM -0200, Marcelo Tosatti wrote:
>> On Wed, Dec 05, 2012 at 01:14:38PM +0200, Gleb Natapov wrote:
>>> On Wed, Dec 05, 2012 at 03:43:41AM +0000, Zhang, Yang Z wrote:
>>>>>> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
>>>>> *vcpu)
>>>>>>  	}
>>>>>>  
>>>>>>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win)
> {
>>>>>> +		/* update archtecture specific hints for APIC
>>>>>> +		 * virtual interrupt delivery */
>>>>>> +		if (kvm_x86_ops->update_irq)
>>>>>> +			kvm_x86_ops->update_irq(vcpu);
>>>>>> +
>>>>>>  		inject_pending_event(vcpu);
>>>>>>  
>>>>>>  		/* enable NMI/IRQ window open exits if needed */
>>>>>>  		if (vcpu->arch.nmi_pending)
>>>>>>  			kvm_x86_ops->enable_nmi_window(vcpu);
>>>>>> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>>>>>> +		else if (kvm_apic_vid_enabled(vcpu)) {
>>>>>> +			if (kvm_cpu_has_extint(vcpu))
>>>>>> +				kvm_x86_ops->enable_irq_window(vcpu);
>>>>> 
>>>>> If RVI is non-zero, then interrupt window should not be enabled,
>>>>> accordingly to 29.2.2:
>>>>> 
>>>>> "If a virtual interrupt has been recognized (see Section 29.2.1), it will
>>>>> be delivered at an instruction boundary when the following conditions all
>>>>> hold: (1) RFLAGS.IF = 1; (2) there is no blocking by STI; (3) there is no
>>>>> blocking by MOV SS or by POP SS; and (4) the "interrupt-window exiting"
>>>>> VM-execution control is 0."
>>>> Right. Must check RVI here.
>>>> 
>>> Why? We request interrupt window here because there is ExtINT interrupt
>>> pending. ExtINT interrupt has a precedence over APIC interrupts (our
>>> current code is incorrect!), so we want vmexit as soon as interrupts are
>>> allowed to inject ExtINT and we do not want virtual interrupt to be
>>> delivered. I think the (4) there is exactly for this situation.
>>> 
>>> --
>>> 			Gleb.
>> 
>> Right. BTW, delivery of ExtINT has no EOI, so there is no evaluation
>> of pending virtual interrupts. Therefore, shouldnt interrupt window be
>> enabled when injecting ExtINT so that evaluation of pending virtual
>> interrupts is performed on next vm-entry?
>> 
> Good question and I think, luckily for us, the answer is no. Spec uses
> two different terms when it talks about virtual interrupts "Evaluation
> of Pending Virtual Interrupts" and "Virtual-Interrupt Delivery". As far
> as my reading of the spec goes they are not necessary happen at the same
> time. So during ExtINT injection "evaluation" will happen (due to vmentry)
> and virtual interrupt will be recognized, but not "delivered". It will
> be delivered when condition described in section 29.2.2 will be met i.e
> when interrupts will be enabled.
> 
> Yang, can you confirm this?
Right. 
Vmentry causes the evaluation of pending virtual interrupt even during ExtINT injection. If RVI[7:4] > VPPR[7:4], the logical process recognizes a pending virtual interrupt. Then it will be delivery when condition is met.

Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Marcelo Tosatti Dec. 6, 2012, 8:08 p.m. UTC | #30

On Thu, Dec 06, 2012 at 08:36:52AM +0200, Gleb Natapov wrote:
> On Thu, Dec 06, 2012 at 05:02:15AM +0000, Zhang, Yang Z wrote:
> > Zhang, Yang Z wrote on 2012-12-06:
> > > Marcelo Tosatti wrote on 2012-12-06:
> > >> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
> > >>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> > >>> manually, which is fully taken care of by the hardware. This needs
> > >>> some special awareness into existing interrupr injection path:
> > >>> 
> > >>> - for pending interrupt, instead of direct injection, we may need
> > >>>   update architecture specific indicators before resuming to guest. -
> > >>>   A pending interrupt, which is masked by ISR, should be also
> > >>>   considered in above update action, since hardware will decide when
> > >>>   to inject it at right time. Current has_interrupt and get_interrupt
> > >>>   only returns a valid vector from injection p.o.v.
> > >>> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
> > >>> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> > >>> ---
> > >>>  arch/x86/include/asm/kvm_host.h |    4 + arch/x86/include/asm/vmx.h
> > >>>    |   11 +++ arch/x86/kvm/irq.c              |   53 ++++++++++-----
> > >>>  arch/x86/kvm/lapic.c            |   56 +++++++++++++---
> > >>>  arch/x86/kvm/lapic.h            |    6 ++ arch/x86/kvm/svm.c
> > >>>     |   19 +++++ arch/x86/kvm/vmx.c              |  140
> > >>>  ++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c
> > >>>   |   34 ++++++++-- virt/kvm/ioapic.c               |    1 + 9 files
> > >>>  changed, 291 insertions(+), 33 deletions(-)
> > >>> diff --git a/arch/x86/include/asm/kvm_host.h
> > >>> b/arch/x86/include/asm/kvm_host.h index dc87b65..e5352c8 100644 ---
> > >>> a/arch/x86/include/asm/kvm_host.h +++
> > >>> b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,10 @@ struct
> > >>> kvm_x86_ops {
> > >>>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
> > >>>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
> > >>>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> > >>> +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
> > >>> +	void (*update_irq)(struct kvm_vcpu *vcpu);
> > >>> +	void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector,
> > >>> +			int trig_mode, int always_set);
> > >>>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
> > >>>  	int (*get_tdp_level)(void);
> > >>>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool
> > > is_mmio);
> > >>> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> > >>> index 21101b6..1003341 100644
> > >>> --- a/arch/x86/include/asm/vmx.h
> > >>> +++ b/arch/x86/include/asm/vmx.h
> > >>> @@ -62,6 +62,7 @@
> > >>>  #define EXIT_REASON_MCE_DURING_VMENTRY  41 #define
> > >>>  EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define
> > > EXIT_REASON_APIC_ACCESS
> > >>>      44 +#define EXIT_REASON_EOI_INDUCED         45 #define
> > >>>  EXIT_REASON_EPT_VIOLATION       48 #define
> > > EXIT_REASON_EPT_MISCONFIG
> > >>>      49 #define EXIT_REASON_WBINVD              54 @@ -143,6
> > > +144,7 @@
> > >>>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040 #define
> > >>>  SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080 #define
> > >>>  SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100 +#define
> > >>>  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200 #define
> > >>>  SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400 #define
> > >>>  SECONDARY_EXEC_ENABLE_INVPCID		0x00001000 @@ -180,6 +182,7 @@ enum
> > >>>  vmcs_field { 	GUEST_GS_SELECTOR               = 0x0000080a,
> > > 	GUEST_LDTR_SELECTOR
> > >>>           = 0x0000080c, 	GUEST_TR_SELECTOR               =
> > > 0x0000080e,
> > >>>  +	GUEST_INTR_STATUS               = 0x00000810,
> > > 	HOST_ES_SELECTOR
> > >>>            = 0x00000c00, 	HOST_CS_SELECTOR                =
> > > 0x00000c02,
> > >>>  	HOST_SS_SELECTOR                = 0x00000c04, @@ -207,6 +210,14 @@
> > >>>  enum vmcs_field { 	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
> > > 	EPT_POINTER
> > >>>                    = 0x0000201a, 	EPT_POINTER_HIGH
> > > =
> > >>>  0x0000201b,
> > >>> +	EOI_EXIT_BITMAP0                = 0x0000201c,
> > >>> +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
> > >>> +	EOI_EXIT_BITMAP1                = 0x0000201e,
> > >>> +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
> > >>> +	EOI_EXIT_BITMAP2                = 0x00002020,
> > >>> +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
> > >>> +	EOI_EXIT_BITMAP3                = 0x00002022,
> > >>> +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
> > >>>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
> > >>>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
> > >>>  	VMCS_LINK_POINTER               = 0x00002800,
> > >>> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> > >>> index 7e06ba1..f782788 100644
> > >>> --- a/arch/x86/kvm/irq.c
> > >>> +++ b/arch/x86/kvm/irq.c
> > >>> @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
> > >>>   */
> > >>>  int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
> > >>>  {
> > >>> -	struct kvm_pic *s;
> > >>> -
> > >>>  	if (!irqchip_in_kernel(v->kvm))
> > >>>  		return v->arch.interrupt.pending;
> > >>> -	if (kvm_apic_has_interrupt(v) == -1) {	/* LAPIC */
> > >>> -		if (kvm_apic_accept_pic_intr(v)) {
> > >>> -			s = pic_irqchip(v->kvm);	/* PIC */
> > >>> -			return s->output;
> > >>> -		} else
> > >>> -			return 0;
> > >>> -	}
> > >>> +	if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */
> > >>> +		return kvm_cpu_has_extint(v); /* non-APIC */
> > >>>  	return 1;
> > >>>  }
> > >>>  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
> > >>>  
> > >>>  /*
> > >>> + * check if there is pending interrupt from
> > >>> + * non-APIC source without intack.
> > >>> + */
> > >>> +int kvm_cpu_has_extint(struct kvm_vcpu *v)
> > >>> +{
> > >>> +	struct kvm_pic *s;
> > >>> +
> > >>> +	if (kvm_apic_accept_pic_intr(v)) {
> > >>> +		s = pic_irqchip(v->kvm); /* PIC */
> > >>> +		return s->output;
> > >>> +	} else
> > >>> +		return 0;
> > >>> +}
> > >>> +
> > >>> +/*
> > >>>   * Read pending interrupt vector and intack.
> > >>>   */
> > >>>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v) { -	struct kvm_pic *s;
> > >>>  	int vector;
> > >>>  
> > >>>  	if (!irqchip_in_kernel(v->kvm))
> > >>>  		return v->arch.interrupt.nr;
> > >>>  
> > >>>  	vector = kvm_get_apic_interrupt(v);	/* APIC */
> > >>> -	if (vector == -1) {
> > >>> -		if (kvm_apic_accept_pic_intr(v)) {
> > >>> -			s = pic_irqchip(v->kvm);
> > >>> -			s->output = 0;		/* PIC */
> > >>> -			vector = kvm_pic_read_irq(v->kvm);
> > >>> -		}
> > >>> -	}
> > >>> +	if (vector == -1)
> > >>> +		return kvm_cpu_get_extint(v); /* non-APIC */
> > >>>  	return vector;
> > >>>  }
> > >>>  EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
> > >>> +/*
> > >>> + * Read pending interrupt(from non-APIC source)
> > >>> + * vector and intack.
> > >>> + */
> > >>> +int kvm_cpu_get_extint(struct kvm_vcpu *v)
> > >>> +{
> > >>> +	struct kvm_pic *s;
> > >>> +	int vector = -1;
> > >>> +
> > >>> +	if (kvm_apic_accept_pic_intr(v)) {
> > >>> +		s = pic_irqchip(v->kvm);
> > >>> +		s->output = 0;		/* PIC */
> > >>> +		vector = kvm_pic_read_irq(v->kvm);
> > >>> +	}
> > >>> +	return vector;
> > >>> +}
> > >>> +
> > >>>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
> > >>>  {
> > >>>  	kvm_inject_apic_timer_irqs(vcpu);
> > >>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > >>> index 7c96012..400d3ba 100644
> > >>> --- a/arch/x86/kvm/lapic.c
> > >>> +++ b/arch/x86/kvm/lapic.c
> > >>> @@ -643,6 +643,14 @@ out:
> > >>>  	return ret;
> > >>>  }
> > >>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> > >>> +		int trig_mode, int always_set)
> > >>> +{
> > >>> +	if (kvm_x86_ops->set_eoi_exitmap)
> > >>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
> > >>> +					trig_mode, always_set);
> > >>> +}
> > >>> +
> > >>>  /*
> > >>>   * Add a pending IRQ into lapic.
> > >>>   * Return 1 if successfully added and 0 if discarded.
> > >>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int
> > >> delivery_mode,
> > >>>  		if (unlikely(!apic_enabled(apic))) 			break;
> > >>>  +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0); 		if (trig_mode)
> > >>>  { 			apic_debug("level trig mode for vector %d", vector);
> > >>>  			apic_set_vector(vector, apic->regs + APIC_TMR);
> > >>> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1,
> > >> struct kvm_vcpu *vcpu2)
> > >>>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
> > >>>  }
> > >>> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> > >>> +{
> > >>> +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> > >>> +	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> > >>> +		int trigger_mode;
> > >>> +		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> > >>> +			trigger_mode = IOAPIC_LEVEL_TRIG;
> > >>> +		else
> > >>> +			trigger_mode = IOAPIC_EDGE_TRIG;
> > >>> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> > >>> +	}
> > >>> +}
> > >>> +
> > >>>  static int apic_set_eoi(struct kvm_lapic *apic) { 	int vector =
> > >>>  apic_find_highest_isr(apic); @@ -756,19 +778,24 @@ static int
> > >>>  apic_set_eoi(struct kvm_lapic *apic) 	apic_clear_isr(vector, apic);
> > >>>  	apic_update_ppr(apic);
> > >>> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
> > >>> -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
> > >>> -		int trigger_mode;
> > >>> -		if (apic_test_vector(vector, apic->regs + APIC_TMR))
> > >>> -			trigger_mode = IOAPIC_LEVEL_TRIG;
> > >>> -		else
> > >>> -			trigger_mode = IOAPIC_EDGE_TRIG;
> > >>> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> > >>> -	}
> > >>> +	kvm_ioapic_send_eoi(apic, vector);
> > >>>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> > >>>  	return vector;
> > >>>  }
> > >>> +/*
> > >>> + * this interface assumes a trap-like exit, which has already finished
> > >>> + * desired side effect including vISR and vPPR update.
> > >>> + */
> > >>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
> > >>> +{
> > >>> +	struct kvm_lapic *apic = vcpu->arch.apic;
> > >>> +
> > >>> +	kvm_ioapic_send_eoi(apic, vector);
> > >>> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> > >>> +}
> > >>> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
> > >>> +
> > >>>  static void apic_send_ipi(struct kvm_lapic *apic) { 	u32 icr_low =
> > >>>  kvm_apic_get_reg(apic, APIC_ICR); @@ -1533,6 +1560,17 @@ int
> > >>>  kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) 	return highest_irr; }
> > >>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
> > >>> +{
> > >>> +	struct kvm_lapic *apic = vcpu->arch.apic;
> > >>> +
> > >>> +	if (!apic || !apic_enabled(apic))
> > >>> +		return -1;
> > >>> +
> > >>> +	return apic_find_highest_irr(apic);
> > >>> +}
> > >> 
> > >> irr_pending of apic_find_highest_irr() is meaningless (stale) if
> > >> HW is updating VIRR.
> > > I don't think target vcpu is running when we call this function. So it is safe to
> > > check irr_pending and read the irr.
> > One problem is that when HW clears VIRR, after vmexit, the irr_peding will still true if there are more than one bit is set in VIRR in last vmentry.
> It doesn't matter how much bits were set in VIRR before entry, as long
> as some bit were set irr_peding will be true.
> 
> > How about to recaculate irr_pending according the VIRR on each vmexit?
> > 
> No need really. Since HW can only clear VIRR the only situation that may
> happen is that irr_pending will be true but VIRR is empty and
> apic_find_highest_irr() will return correct result in this case.
> 
> If we will see a lot of unneeded irr scans because of stale irr_pending
> value we can do irr_pending = rvi != 0 on vmexit.
> 
> --
> 			Gleb.

Yes, it is harmless ATM. But its clearer if irr_pending is not used when
virtual interrupt delivery is in effect (that is, just skip irr_pending
if virtual interrupt delivery is enabled).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zhang, Yang Z Dec. 7, 2012, 1 a.m. UTC | #31

Marcelo Tosatti wrote on 2012-12-07:
> On Thu, Dec 06, 2012 at 08:36:52AM +0200, Gleb Natapov wrote:
>> On Thu, Dec 06, 2012 at 05:02:15AM +0000, Zhang, Yang Z wrote:
>>> Zhang, Yang Z wrote on 2012-12-06:
>>>> Marcelo Tosatti wrote on 2012-12-06:
>>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
>>>>>> manually, which is fully taken care of by the hardware. This needs
>>>>>> some special awareness into existing interrupr injection path:
>>>>>> 
>>>>>> - for pending interrupt, instead of direct injection, we may need
>>>>>>   update architecture specific indicators before resuming to guest. -
>>>>>>   A pending interrupt, which is masked by ISR, should be also
>>>>>>   considered in above update action, since hardware will decide when
>>>>>>   to inject it at right time. Current has_interrupt and get_interrupt
>>>>>>   only returns a valid vector from injection p.o.v.
>>>>>> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
>>>>>> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
>>>>>> ---
>>>>>>  arch/x86/include/asm/kvm_host.h |    4 +
> arch/x86/include/asm/vmx.h
>>>>>>    |   11 +++ arch/x86/kvm/irq.c              |   53
> ++++++++++-----
>>>>>>  arch/x86/kvm/lapic.c            |   56 +++++++++++++---
>>>>>>  arch/x86/kvm/lapic.h            |    6 ++ arch/x86/kvm/svm.c
>>>>>>     |   19 +++++ arch/x86/kvm/vmx.c              |  140
>>>>>>  ++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c
>>>>>>   |   34 ++++++++-- virt/kvm/ioapic.c               |    1 + 9 files
>>>>>>  changed, 291 insertions(+), 33 deletions(-)
>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h
>>>>>> b/arch/x86/include/asm/kvm_host.h index dc87b65..e5352c8 100644 ---
>>>>>> a/arch/x86/include/asm/kvm_host.h +++
>>>>>> b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,10 @@ struct
>>>>>> kvm_x86_ops {
>>>>>>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>>>>>>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>>>>>>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
>>>>>> +	int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
>>>>>> +	void (*update_irq)(struct kvm_vcpu *vcpu);
>>>>>> +	void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector,
>>>>>> +			int trig_mode, int always_set);
>>>>>>  	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>>>>>>  	int (*get_tdp_level)(void);
>>>>>>  	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool
>>>> is_mmio);
>>>>>> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
>>>>>> index 21101b6..1003341 100644
>>>>>> --- a/arch/x86/include/asm/vmx.h
>>>>>> +++ b/arch/x86/include/asm/vmx.h
>>>>>> @@ -62,6 +62,7 @@
>>>>>>  #define EXIT_REASON_MCE_DURING_VMENTRY  41 #define
>>>>>>  EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define
>>>> EXIT_REASON_APIC_ACCESS
>>>>>>      44 +#define EXIT_REASON_EOI_INDUCED         45 #define
>>>>>>  EXIT_REASON_EPT_VIOLATION       48 #define
>>>> EXIT_REASON_EPT_MISCONFIG
>>>>>>      49 #define EXIT_REASON_WBINVD              54 @@ -143,6
>>>> +144,7 @@
>>>>>>  #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040 #define
>>>>>>  SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080 #define
>>>>>>  SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100 +#define
>>>>>>  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200 #define
>>>>>>  SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400 #define
>>>>>>  SECONDARY_EXEC_ENABLE_INVPCID		0x00001000 @@ -180,6 +182,7 @@ enum
>>>>>>  vmcs_field { 	GUEST_GS_SELECTOR               = 0x0000080a,
>>>> 	GUEST_LDTR_SELECTOR
>>>>>>           = 0x0000080c, 	GUEST_TR_SELECTOR               =
>>>> 0x0000080e,
>>>>>>  +	GUEST_INTR_STATUS               = 0x00000810,
>>>> 	HOST_ES_SELECTOR
>>>>>>            = 0x00000c00, 	HOST_CS_SELECTOR                =
>>>> 0x00000c02,
>>>>>>  	HOST_SS_SELECTOR                = 0x00000c04, @@ -207,6 +210,14
>>>>>>  @@ enum vmcs_field { 	APIC_ACCESS_ADDR_HIGH		= 0x00002015,
>>>> 	EPT_POINTER
>>>>>>                    = 0x0000201a, 	EPT_POINTER_HIGH
>>>> =
>>>>>>  0x0000201b,
>>>>>> +	EOI_EXIT_BITMAP0                = 0x0000201c,
>>>>>> +	EOI_EXIT_BITMAP0_HIGH           = 0x0000201d,
>>>>>> +	EOI_EXIT_BITMAP1                = 0x0000201e,
>>>>>> +	EOI_EXIT_BITMAP1_HIGH           = 0x0000201f,
>>>>>> +	EOI_EXIT_BITMAP2                = 0x00002020,
>>>>>> +	EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
>>>>>> +	EOI_EXIT_BITMAP3                = 0x00002022,
>>>>>> +	EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
>>>>>>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>>>>>>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
>>>>>>  	VMCS_LINK_POINTER               = 0x00002800,
>>>>>> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
>>>>>> index 7e06ba1..f782788 100644
>>>>>> --- a/arch/x86/kvm/irq.c
>>>>>> +++ b/arch/x86/kvm/irq.c
>>>>>> @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
>>>>>>   */
>>>>>>  int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
>>>>>>  {
>>>>>> -	struct kvm_pic *s;
>>>>>> -
>>>>>>  	if (!irqchip_in_kernel(v->kvm))
>>>>>>  		return v->arch.interrupt.pending;
>>>>>> -	if (kvm_apic_has_interrupt(v) == -1) {	/* LAPIC */
>>>>>> -		if (kvm_apic_accept_pic_intr(v)) {
>>>>>> -			s = pic_irqchip(v->kvm);	/* PIC */
>>>>>> -			return s->output;
>>>>>> -		} else
>>>>>> -			return 0;
>>>>>> -	}
>>>>>> +	if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */
>>>>>> +		return kvm_cpu_has_extint(v); /* non-APIC */
>>>>>>  	return 1;
>>>>>>  }
>>>>>>  EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
>>>>>>  
>>>>>>  /*
>>>>>> + * check if there is pending interrupt from
>>>>>> + * non-APIC source without intack.
>>>>>> + */
>>>>>> +int kvm_cpu_has_extint(struct kvm_vcpu *v)
>>>>>> +{
>>>>>> +	struct kvm_pic *s;
>>>>>> +
>>>>>> +	if (kvm_apic_accept_pic_intr(v)) {
>>>>>> +		s = pic_irqchip(v->kvm); /* PIC */
>>>>>> +		return s->output;
>>>>>> +	} else
>>>>>> +		return 0;
>>>>>> +}
>>>>>> +
>>>>>> +/*
>>>>>>   * Read pending interrupt vector and intack.
>>>>>>   */
>>>>>>  int kvm_cpu_get_interrupt(struct kvm_vcpu *v) { -	struct kvm_pic
>>>>>>  *s; 	int vector;
>>>>>>  
>>>>>>  	if (!irqchip_in_kernel(v->kvm))
>>>>>>  		return v->arch.interrupt.nr;
>>>>>>  
>>>>>>  	vector = kvm_get_apic_interrupt(v);	/* APIC */
>>>>>> -	if (vector == -1) {
>>>>>> -		if (kvm_apic_accept_pic_intr(v)) {
>>>>>> -			s = pic_irqchip(v->kvm);
>>>>>> -			s->output = 0;		/* PIC */
>>>>>> -			vector = kvm_pic_read_irq(v->kvm);
>>>>>> -		}
>>>>>> -	}
>>>>>> +	if (vector == -1)
>>>>>> +		return kvm_cpu_get_extint(v); /* non-APIC */
>>>>>>  	return vector;
>>>>>>  }
>>>>>>  EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
>>>>>> +/*
>>>>>> + * Read pending interrupt(from non-APIC source)
>>>>>> + * vector and intack.
>>>>>> + */
>>>>>> +int kvm_cpu_get_extint(struct kvm_vcpu *v)
>>>>>> +{
>>>>>> +	struct kvm_pic *s;
>>>>>> +	int vector = -1;
>>>>>> +
>>>>>> +	if (kvm_apic_accept_pic_intr(v)) {
>>>>>> +		s = pic_irqchip(v->kvm);
>>>>>> +		s->output = 0;		/* PIC */
>>>>>> +		vector = kvm_pic_read_irq(v->kvm);
>>>>>> +	}
>>>>>> +	return vector;
>>>>>> +}
>>>>>> +
>>>>>>  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
>>>>>>  {
>>>>>>  	kvm_inject_apic_timer_irqs(vcpu);
>>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>>>> index 7c96012..400d3ba 100644
>>>>>> --- a/arch/x86/kvm/lapic.c
>>>>>> +++ b/arch/x86/kvm/lapic.c
>>>>>> @@ -643,6 +643,14 @@ out:
>>>>>>  	return ret;
>>>>>>  }
>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>>>> +		int trig_mode, int always_set)
>>>>>> +{
>>>>>> +	if (kvm_x86_ops->set_eoi_exitmap)
>>>>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
>>>>>> +					trig_mode, always_set);
>>>>>> +}
>>>>>> +
>>>>>>  /*
>>>>>>   * Add a pending IRQ into lapic.
>>>>>>   * Return 1 if successfully added and 0 if discarded.
>>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic,
> int
>>>>> delivery_mode,
>>>>>>  		if (unlikely(!apic_enabled(apic))) 			break;
>>>>>>  +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0); 	if
>>>>>>  (trig_mode) { 			apic_debug("level trig mode for vector %d",
>>>>>>  vector); 			apic_set_vector(vector, apic->regs + APIC_TMR);
>>>>>> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu
> *vcpu1,
>>>>> struct kvm_vcpu *vcpu2)
>>>>>>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>>>>>>  }
>>>>>> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int
>>>>>> vector) +{ +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) &
>>>>>> APIC_SPIV_DIRECTED_EOI) && +	   
>>>>>> kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) { +		int
>>>>>> trigger_mode; +		if (apic_test_vector(vector, apic->regs +
>>>>>> APIC_TMR)) +			trigger_mode = IOAPIC_LEVEL_TRIG; +		else
>>>>>> +			trigger_mode = IOAPIC_EDGE_TRIG;
>>>>>> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
>>>>>> +	} +} +
>>>>>>  static int apic_set_eoi(struct kvm_lapic *apic) { 	int vector =
>>>>>>  apic_find_highest_isr(apic); @@ -756,19 +778,24 @@ static int
>>>>>>  apic_set_eoi(struct kvm_lapic *apic) 	apic_clear_isr(vector, apic);
>>>>>>  	apic_update_ppr(apic);
>>>>>> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>>>>>> && -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) {
>>>>>> -		int trigger_mode; -		if (apic_test_vector(vector, apic->regs +
>>>>>> APIC_TMR)) -			trigger_mode = IOAPIC_LEVEL_TRIG; -		else
>>>>>> -			trigger_mode = IOAPIC_EDGE_TRIG;
>>>>>> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
>>>>>> -	} +	kvm_ioapic_send_eoi(apic, vector);
>>>>>>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>>>>>  	return vector;
>>>>>>  }
>>>>>> +/*
>>>>>> + * this interface assumes a trap-like exit, which has already finished
>>>>>> + * desired side effect including vISR and vPPR update.
>>>>>> + */
>>>>>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
>>>>>> +{
>>>>>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>>>>>> +
>>>>>> +	kvm_ioapic_send_eoi(apic, vector);
>>>>>> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
>>>>>> +
>>>>>>  static void apic_send_ipi(struct kvm_lapic *apic) { 	u32 icr_low =
>>>>>>  kvm_apic_get_reg(apic, APIC_ICR); @@ -1533,6 +1560,17 @@ int
>>>>>>  kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) 	return highest_irr; }
>>>>>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>>>>>> +
>>>>>> +	if (!apic || !apic_enabled(apic))
>>>>>> +		return -1;
>>>>>> +
>>>>>> +	return apic_find_highest_irr(apic);
>>>>>> +}
>>>>> 
>>>>> irr_pending of apic_find_highest_irr() is meaningless (stale) if
>>>>> HW is updating VIRR.
>>>> I don't think target vcpu is running when we call this function. So it is safe to
>>>> check irr_pending and read the irr.
>>> One problem is that when HW clears VIRR, after vmexit, the irr_peding will
> still true if there are more than one bit is set in VIRR in last vmentry.
>> It doesn't matter how much bits were set in VIRR before entry, as long
>> as some bit were set irr_peding will be true.
>> 
>>> How about to recaculate irr_pending according the VIRR on each vmexit?
>>> 
>> No need really. Since HW can only clear VIRR the only situation that may
>> happen is that irr_pending will be true but VIRR is empty and
>> apic_find_highest_irr() will return correct result in this case.
>> 
>> If we will see a lot of unneeded irr scans because of stale irr_pending
>> value we can do irr_pending = rvi != 0 on vmexit.
>> 
>> --
>> 			Gleb.
> 
> Yes, it is harmless ATM. But its clearer if irr_pending is not used when
> virtual interrupt delivery is in effect (that is, just skip irr_pending
> if virtual interrupt delivery is enabled).
irr_pending still is useful in virtual interrupt delivery case. Or else, as gleb said ,there may be lots of unneeded irr scan.

Best regards,
Yang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov Dec. 7, 2012, 7:28 a.m. UTC | #32

On Fri, Dec 07, 2012 at 01:00:18AM +0000, Zhang, Yang Z wrote:
> Marcelo Tosatti wrote on 2012-12-07:
> >>> How about to recaculate irr_pending according the VIRR on each vmexit?
> >>> 
> >> No need really. Since HW can only clear VIRR the only situation that may
> >> happen is that irr_pending will be true but VIRR is empty and
> >> apic_find_highest_irr() will return correct result in this case.
> >> 
> >> If we will see a lot of unneeded irr scans because of stale irr_pending
> >> value we can do irr_pending = rvi != 0 on vmexit.
> >> 
> >> --
> >> 			Gleb.
> > 
> > Yes, it is harmless ATM. But its clearer if irr_pending is not used when
> > virtual interrupt delivery is in effect (that is, just skip irr_pending
> > if virtual interrupt delivery is enabled).
> irr_pending still is useful in virtual interrupt delivery case. Or else, as gleb said ,there may be lots of unneeded irr scan.
> 
Actually, looking at it close, irr_pending will always be true (and
thus effectively disabled without any additional checks) since it is
cleared only in kvm_get_apic_interrupt() which will not be called with
vid enabled. It is racy to do irr_pending = rvi != 0 on vmexit either.
The code should be something like that:
 irr_pending = (rvi != 0) ? : apic_search_irr(apic) != -1;
But we do not want to do that on each exit since rvi will be mostly
zero and irr is, more often than not, empty.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v3,3/4] x86, apicv: add virtual interrupt delivery support

Commit Message

Comments

Patch