Message ID | 1456216452-3745-2-git-send-email-feng.wu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 23/02/16 08:34, Feng Wu wrote: > This is the core logic handling for VT-d posted-interrupts. Basically it > deals with how and when to update posted-interrupts during the following > scenarios: > - vCPU is preempted > - vCPU is slept > - vCPU is blocked > > When vCPU is preempted/slept, we update the posted-interrupts during > scheduling by introducing two new architecutral scheduler hooks: > vmx_pi_switch_from() and vmx_pi_switch_to(). When vCPU is blocked, we > introduce a new architectural hook: arch_vcpu_block() to update > posted-interrupts descriptor. > > Besides that, before VM-entry, we will make sure the 'NV' filed is set > to 'posted_intr_vector' and the vCPU is not in any blocking lists, which > is needed when vCPU is running in non-root mode. The reason we do this check > is because we change the posted-interrupts descriptor in vcpu_block(), > however, we don't change it back in vcpu_unblock() or when vcpu_block() > directly returns due to event delivery (in fact, we don't need to do it > in the two places, that is why we do it before VM-Entry). > > When we handle the lazy context switch for the following two scenarios: > - Preempted by a tasklet, which uses in an idle context. > - the prev vcpu is in offline and no new available vcpus in run queue. > We don't change the 'SN' bit in posted-interrupt descriptor, this > may incur spurious PI notification events, but since PI notification > event is only sent when 'ON' is clear, and once the PI notificatoin > is sent, ON is set by hardware, hence no more notification events > before 'ON' is clear. Besides that, spurious PI notification events are > going to happen from time to time in Xen hypervisor, such as, when > guests trap to Xen and PI notification event happens, there is > nothing Xen actually needs to do about it, the interrupts will be > delivered to guest atht the next time we do a VMENTRY. > > CC: Keir Fraser <keir@xen.org> > CC: Jan Beulich <jbeulich@suse.com> > CC: Andrew Cooper <andrew.cooper3@citrix.com> > CC: Kevin Tian <kevin.tian@intel.com> > CC: George Dunlap <george.dunlap@eu.citrix.com> > CC: Dario Faggioli <dario.faggioli@citrix.com> > Suggested-by: Yang Zhang <yang.z.zhang@intel.com> > Suggested-by: Dario Faggioli <dario.faggioli@citrix.com> > Suggested-by: George Dunlap <george.dunlap@eu.citrix.com> > Suggested-by: Jan Beulich <jbeulich@suse.com> > Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> And you can retain my Acked-by wrt the scheduler bits if you make any further changes to the non-scheduler parts that would require dropping the Reviewed-by.
>>> On 23.02.16 at 09:34, <feng.wu@intel.com> wrote: > +static void vmx_vcpu_block(struct vcpu *v) > +{ > + unsigned long flags; > + unsigned int dest; > + spinlock_t *old_lock = pi_blocking_list_lock(v); > + spinlock_t *pi_blocking_list_lock = &vmx_pi_blocking_list_lock(v->processor); > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; > + > + spin_lock_irqsave(pi_blocking_list_lock, flags); > + old_lock = cmpxchg(&pi_blocking_list_lock(v), old_lock, > + &vmx_pi_blocking_list_lock(v->processor)); See my comment on v12. > --- a/xen/drivers/passthrough/vtd/iommu.c > +++ b/xen/drivers/passthrough/vtd/iommu.c > @@ -2283,9 +2283,17 @@ static int reassign_device_ownership( > if ( ret ) > return ret; > > + if ( !target->arch.hvm_domain.vmx.vcpu_block ) > + vmx_pi_hooks_assign(target); Why not just if ( !has_arch_pdevs(target) )? > ret = domain_context_mapping(target, devfn, pdev); > if ( ret ) > + { > + if ( target->arch.hvm_domain.vmx.vcpu_block && !has_arch_pdevs(target) ) > + vmx_pi_hooks_deassign(target); Same here. > @@ -2293,6 +2301,9 @@ static int reassign_device_ownership( > pdev->domain = target; > } > > + if ( source->arch.hvm_domain.vmx.vcpu_block && !has_arch_pdevs(source) ) > + vmx_pi_hooks_deassign(source); And here. > --- a/xen/include/asm-x86/hvm/hvm.h > +++ b/xen/include/asm-x86/hvm/hvm.h > @@ -565,6 +565,12 @@ const char *hvm_efer_valid(const struct vcpu *v, > uint64_t value, > signed int cr0_pg); > unsigned long hvm_cr4_guest_reserved_bits(const struct vcpu *v, bool_t > restore); > > +#define arch_vcpu_block(v) ({ \ > + void (*func) (struct vcpu *) = (v)->domain->arch.hvm_domain.vmx.vcpu_block;\ > + if ( func ) \ > + func(v); \ > +}) See my comment on v12. The code structure actually was better there, and all you needed to do is introduce a local variable. > @@ -101,6 +160,17 @@ struct pi_desc { > > #define NR_PML_ENTRIES 512 > > +#define pi_blocking_vcpu_list(v) \ > + ((v)->arch.hvm_vmx.pi_blocking_vcpu_info.pi_blocking_vcpu_list) > + > +#define pi_blocking_list_lock(v) \ > + ((v)->arch.hvm_vmx.pi_blocking_vcpu_info.pi_blocking_list_lock) The latest when writing this it should have occurred to you that there are too many pi_blocking_ prefixes. Please strive to name thinks such that macros like these aren't really necessary. The same naturally applies to struct vmx_pi_blocking_vcpu, albeit there the VMX maintainer have the final say. Jan
> -----Original Message----- > From: George Dunlap [mailto:george.dunlap@citrix.com] > Sent: Tuesday, February 23, 2016 7:07 PM > To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org > Cc: Keir Fraser <keir@xen.org>; Jan Beulich <jbeulich@suse.com>; Andrew > Cooper <andrew.cooper3@citrix.com>; Tian, Kevin <kevin.tian@intel.com>; > George Dunlap <george.dunlap@eu.citrix.com>; Dario Faggioli > <dario.faggioli@citrix.com> > Subject: Re: [PATCH v13 1/2] vmx: VT-d posted-interrupt core logic handling > > On 23/02/16 08:34, Feng Wu wrote: > > Reviewed-by: George Dunlap <george.dunlap@citrix.com> > > And you can retain my Acked-by wrt the scheduler bits if you make any > further changes to the non-scheduler parts that would require dropping > the Reviewed-by. Thanks for the effort on this series, George! Thanks, Feng
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Wednesday, February 24, 2016 12:34 AM > To: Wu, Feng <feng.wu@intel.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Dario Faggioli > <dario.faggioli@citrix.com>; George Dunlap <george.dunlap@eu.citrix.com>; > Tian, Kevin <kevin.tian@intel.com>; xen-devel@lists.xen.org; Keir Fraser > <keir@xen.org> > Subject: Re: [PATCH v13 1/2] vmx: VT-d posted-interrupt core logic handling > > >>> On 23.02.16 at 09:34, <feng.wu@intel.com> wrote: > > +static void vmx_vcpu_block(struct vcpu *v) > > +{ > > + unsigned long flags; > > + unsigned int dest; > > + spinlock_t *old_lock = pi_blocking_list_lock(v); > > + spinlock_t *pi_blocking_list_lock = &vmx_pi_blocking_list_lock(v- > >processor); > > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; > > + > > + spin_lock_irqsave(pi_blocking_list_lock, flags); > > + old_lock = cmpxchg(&pi_blocking_list_lock(v), old_lock, > > + &vmx_pi_blocking_list_lock(v->processor)); > > See my comment on v12. Here is your comment on v12 " Why don't you use the local variable here?", here I need to assign new values to 'v->arch.hvm_vmx.pi_block_list_lock', I am not sure how to use the "local variable here", could you please elaborate a bit more? Thanks a lot! > > --- a/xen/include/asm-x86/hvm/hvm.h > > +++ b/xen/include/asm-x86/hvm/hvm.h > > @@ -565,6 +565,12 @@ const char *hvm_efer_valid(const struct vcpu *v, > > uint64_t value, > > signed int cr0_pg); > > unsigned long hvm_cr4_guest_reserved_bits(const struct vcpu *v, bool_t > > restore); > > > > +#define arch_vcpu_block(v) ({ \ > > + void (*func) (struct vcpu *) = (v)->domain- > >arch.hvm_domain.vmx.vcpu_block;\ > > + if ( func ) \ > > + func(v); \ > > +}) > > See my comment on v12. The code structure actually was better > there, and all you needed to do is introduce a local variable. Do you mean something like the following: +#define arch_vcpu_block(v) ({ \ + struct vcpu *vcpu = v; \ + if ( (vcpu)->domain->arch.hvm_domain.vmx.vcpu_block ) \ + (vcpu)->domain->arch.hvm_domain.vmx.vcpu_block((vcpu)); \ +}) Why is this better than the one in v12? Thanks! > > > @@ -101,6 +160,17 @@ struct pi_desc { > > > > #define NR_PML_ENTRIES 512 > > > > +#define pi_blocking_vcpu_list(v) \ > > + ((v)->arch.hvm_vmx.pi_blocking_vcpu_info.pi_blocking_vcpu_list) > > + > > +#define pi_blocking_list_lock(v) \ > > + ((v)->arch.hvm_vmx.pi_blocking_vcpu_info.pi_blocking_list_lock) > > The latest when writing this it should have occurred to you that > there are too many pi_blocking_ prefixes. Please strive to name > thinks such that macros like these aren't really necessary. The > same naturally applies to struct vmx_pi_blocking_vcpu, albeit > there the VMX maintainer have the final say. Using these macros can shorten the code length, or it is hard to read when using the original one, such as 'v->arch.hvm_vmx.pi_blocking_vcpu_info.pi_blocking_vcpu_list '. Even we change the member name in the structure, it is still very long, such as 'v->arch.hvm_vmx.pi_blocking_vcpu_info.vcpu_list ' 'v->arch.hvm_vmx.pi_blocking_vcpu_info.list_lock ' In most case, it is still beyond the 80 characters limitation, which makes the code a little hard to read. Thanks, Feng > > Jan
On 2/23/16 10:34 AM, Jan Beulich wrote: >>>> On 23.02.16 at 09:34, <feng.wu@intel.com> wrote: > >> --- a/xen/include/asm-x86/hvm/hvm.h >> +++ b/xen/include/asm-x86/hvm/hvm.h >> @@ -565,6 +565,12 @@ const char *hvm_efer_valid(const struct vcpu *v, >> uint64_t value, >> signed int cr0_pg); >> unsigned long hvm_cr4_guest_reserved_bits(const struct vcpu *v, bool_t >> restore); >> >> +#define arch_vcpu_block(v) ({ \ >> + void (*func) (struct vcpu *) = (v)->domain->arch.hvm_domain.vmx.vcpu_block;\ >> + if ( func ) \ >> + func(v); \ >> +}) > > See my comment on v12. The code structure actually was better > there, and all you needed to do is introduce a local variable. Wouldn't this be a bit cleaner (and type-safier (inventing a word here)) to do with a static inline function?
> -----Original Message----- > From: Doug Goldstein [mailto:cardoe@cardoe.com] > Sent: Wednesday, February 24, 2016 11:02 AM > To: Jan Beulich <JBeulich@suse.com>; Wu, Feng <feng.wu@intel.com> > Cc: Tian, Kevin <kevin.tian@intel.com>; Keir Fraser <keir@xen.org>; George > Dunlap <george.dunlap@eu.citrix.com>; Andrew Cooper > <andrew.cooper3@citrix.com>; Dario Faggioli <dario.faggioli@citrix.com>; xen- > devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH v13 1/2] vmx: VT-d posted-interrupt core logic > handling > > On 2/23/16 10:34 AM, Jan Beulich wrote: > >>>> On 23.02.16 at 09:34, <feng.wu@intel.com> wrote: > > > >> --- a/xen/include/asm-x86/hvm/hvm.h > >> +++ b/xen/include/asm-x86/hvm/hvm.h > >> @@ -565,6 +565,12 @@ const char *hvm_efer_valid(const struct vcpu *v, > >> uint64_t value, > >> signed int cr0_pg); > >> unsigned long hvm_cr4_guest_reserved_bits(const struct vcpu *v, bool_t > >> restore); > >> > >> +#define arch_vcpu_block(v) ({ \ > >> + void (*func) (struct vcpu *) = (v)->domain- > >arch.hvm_domain.vmx.vcpu_block;\ > >> + if ( func ) \ > >> + func(v); \ > >> +}) > > > > See my comment on v12. The code structure actually was better > > there, and all you needed to do is introduce a local variable. > > Wouldn't this be a bit cleaner (and type-safier (inventing a word here)) > to do with a static inline function? As I mentioned in earlier version, after making it a inline function, I encountered building failures, which is related to using '(v)->domain->arch.hvm_domain.vmx.vcpu_block ' here since it refers to some data structure, it is not so straightforward to address it, so I change it to a macro, just like other micros in this file, which refers to ' (v)->arch.hvm_vcpu.....'. Thanks, Feng > > -- > Doug Goldstein
>>> On 24.02.16 at 02:32, <feng.wu@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@suse.com] >> Sent: Wednesday, February 24, 2016 12:34 AM >> >>> On 23.02.16 at 09:34, <feng.wu@intel.com> wrote: >> > +static void vmx_vcpu_block(struct vcpu *v) >> > +{ >> > + unsigned long flags; >> > + unsigned int dest; >> > + spinlock_t *old_lock = pi_blocking_list_lock(v); >> > + spinlock_t *pi_blocking_list_lock = &vmx_pi_blocking_list_lock(v- >> >processor); >> > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; >> > + >> > + spin_lock_irqsave(pi_blocking_list_lock, flags); >> > + old_lock = cmpxchg(&pi_blocking_list_lock(v), old_lock, >> > + &vmx_pi_blocking_list_lock(v->processor)); >> >> See my comment on v12. > > Here is your comment on v12 " Why don't you use the local variable here?", > here I need to assign new values to 'v->arch.hvm_vmx.pi_block_list_lock', > I am not sure how to use the "local variable here", could you please > elaborate > a bit more? Thanks a lot! Why can't the last argument to cmpxchg() be pi_blocking_list_lock? >> > --- a/xen/include/asm-x86/hvm/hvm.h >> > +++ b/xen/include/asm-x86/hvm/hvm.h >> > @@ -565,6 +565,12 @@ const char *hvm_efer_valid(const struct vcpu *v, >> > uint64_t value, >> > signed int cr0_pg); >> > unsigned long hvm_cr4_guest_reserved_bits(const struct vcpu *v, bool_t >> > restore); >> > >> > +#define arch_vcpu_block(v) ({ \ >> > + void (*func) (struct vcpu *) = (v)->domain- >> >arch.hvm_domain.vmx.vcpu_block;\ >> > + if ( func ) \ >> > + func(v); \ >> > +}) >> >> See my comment on v12. The code structure actually was better >> there, and all you needed to do is introduce a local variable. > > Do you mean something like the following: > > +#define arch_vcpu_block(v) ({ \ > + struct vcpu *vcpu = v; \ > + if ( (vcpu)->domain->arch.hvm_domain.vmx.vcpu_block ) \ > + (vcpu)->domain->arch.hvm_domain.vmx.vcpu_block((vcpu)); \ > +}) > > Why is this better than the one in v12? Thanks! Because, as I said, it results in the macro argument to be evaluated just once. But note that "vcpu" is not a good name here, we would normally use e.g. "v_". And note further that you now again have the pointless double parentheses in function call, and instead lack any around the now single use of the macro parameter. >> > @@ -101,6 +160,17 @@ struct pi_desc { >> > >> > #define NR_PML_ENTRIES 512 >> > >> > +#define pi_blocking_vcpu_list(v) \ >> > + ((v)->arch.hvm_vmx.pi_blocking_vcpu_info.pi_blocking_vcpu_list) >> > + >> > +#define pi_blocking_list_lock(v) \ >> > + ((v)->arch.hvm_vmx.pi_blocking_vcpu_info.pi_blocking_list_lock) >> >> The latest when writing this it should have occurred to you that >> there are too many pi_blocking_ prefixes. Please strive to name >> thinks such that macros like these aren't really necessary. The >> same naturally applies to struct vmx_pi_blocking_vcpu, albeit >> there the VMX maintainer have the final say. > > Using these macros can shorten the code length, or it is hard to read when > using the original one, such as > 'v->arch.hvm_vmx.pi_blocking_vcpu_info.pi_blocking_vcpu_list '. > Even we change the member name in the structure, it is still very long, such > as > 'v->arch.hvm_vmx.pi_blocking_vcpu_info.vcpu_list ' > 'v->arch.hvm_vmx.pi_blocking_vcpu_info.list_lock ' > In most case, it is still beyond the 80 characters limitation, which makes > the code > a little hard to read. Right, because - as you see - names are _still_ too long after dropping those prefixes. I don't see why the above couldn't become as short as v->arch.hvm_vmx.pi_blocking.list v->arch.hvm_vmx.pi_blocking.lock without losing any necessary information. The whole idea of using a container struct here is to have the name of the struct field in the containing struct convey the information what basic aspect the access is about, and have the leaf struct field name convey information on what specific piece thereof it is. No need for any redundancy in naming. Jan
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index edd4c8d..2e535de 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -676,6 +676,8 @@ int vmx_cpu_up(void) if ( cpu_has_vmx_vpid ) vpid_sync_all(); + vmx_pi_per_cpu_init(cpu); + return 0; } diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 7917fb7..87d668e 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -83,7 +83,154 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content); static void vmx_invlpg_intercept(unsigned long vaddr); static int vmx_vmfunc_intercept(struct cpu_user_regs *regs); +struct vmx_pi_blocking_vcpu { + struct list_head pi_blocking_vcpu_list; + spinlock_t pi_blocking_list_lock; +}; + +/* + * We maintain a per-CPU linked-list of vCPUs, so in PI wakeup + * handler we can find which vCPU should be woken up. + */ +static DEFINE_PER_CPU(struct vmx_pi_blocking_vcpu, vmx_pi_blocking_vcpu_info); + +#define vmx_pi_blocking_vcpu_list(cpu) \ + per_cpu(vmx_pi_blocking_vcpu_info, cpu).pi_blocking_vcpu_list + +#define vmx_pi_blocking_list_lock(cpu) \ + per_cpu(vmx_pi_blocking_vcpu_info, cpu).pi_blocking_list_lock + uint8_t __read_mostly posted_intr_vector; +static uint8_t __read_mostly pi_wakeup_vector; + +void vmx_pi_per_cpu_init(unsigned int cpu) +{ + INIT_LIST_HEAD(&vmx_pi_blocking_vcpu_list(cpu)); + spin_lock_init(&vmx_pi_blocking_list_lock(cpu)); +} + +static void vmx_vcpu_block(struct vcpu *v) +{ + unsigned long flags; + unsigned int dest; + spinlock_t *old_lock = pi_blocking_list_lock(v); + spinlock_t *pi_blocking_list_lock = &vmx_pi_blocking_list_lock(v->processor); + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; + + spin_lock_irqsave(pi_blocking_list_lock, flags); + old_lock = cmpxchg(&pi_blocking_list_lock(v), old_lock, + &vmx_pi_blocking_list_lock(v->processor)); + + /* + * 'v->arch.hvm_vmx.pi_blocking_vcpu_info.pi_blocking_list_lock' should + * be NULL before being assigned to a new value, since the vCPU is currently + * running and it cannot be on any blocking list. + */ + ASSERT(old_lock == NULL); + + list_add_tail(&pi_blocking_vcpu_list(v), + &vmx_pi_blocking_vcpu_list(v->processor)); + spin_unlock_irqrestore(pi_blocking_list_lock, flags); + + ASSERT(!pi_test_sn(pi_desc)); + + dest = cpu_physical_id(v->processor); + + ASSERT(pi_desc->ndst == + (x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK))); + + write_atomic(&pi_desc->nv, pi_wakeup_vector); +} + +static void vmx_pi_switch_from(struct vcpu *v) +{ + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; + + if ( test_bit(_VPF_blocked, &v->pause_flags) ) + return; + + pi_set_sn(pi_desc); +} + +static void vmx_pi_switch_to(struct vcpu *v) +{ + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; + unsigned int dest = cpu_physical_id(v->processor); + + write_atomic(&pi_desc->ndst, + x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK)); + + pi_clear_sn(pi_desc); +} + +static void vmx_pi_do_resume(struct vcpu *v) +{ + unsigned long flags; + spinlock_t *pi_blocking_list_lock; + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; + + ASSERT(!test_bit(_VPF_blocked, &v->pause_flags)); + + /* + * Set 'NV' field back to posted_intr_vector, so the + * Posted-Interrupts can be delivered to the vCPU when + * it is running in non-root mode. + */ + write_atomic(&pi_desc->nv, posted_intr_vector); + + /* The vCPU is not on any blocking list. */ + pi_blocking_list_lock = pi_blocking_list_lock(v); + + /* Prevent the compiler from eliminating the local variable.*/ + smp_rmb(); + + if ( pi_blocking_list_lock == NULL ) + return; + + spin_lock_irqsave(pi_blocking_list_lock, flags); + + /* + * v->arch.hvm_vmx.pi_blocking_vcpu_info.pi_blocking_list_lock == NULL + * here means the vCPU was removed from the blocking list while we are + * acquiring the lock. + */ + if ( pi_blocking_list_lock(v) != NULL ) + { + ASSERT(pi_blocking_list_lock(v) == pi_blocking_list_lock); + list_del(&pi_blocking_vcpu_list(v)); + pi_blocking_list_lock(v) = NULL; + } + + spin_unlock_irqrestore(pi_blocking_list_lock, flags); +} + +/* This function is called when pcidevs_lock is held */ +void vmx_pi_hooks_assign(struct domain *d) +{ + if ( !iommu_intpost || !has_hvm_container_domain(d) ) + return; + + ASSERT(!d->arch.hvm_domain.vmx.vcpu_block); + + d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block; + d->arch.hvm_domain.vmx.pi_switch_from = vmx_pi_switch_from; + d->arch.hvm_domain.vmx.pi_switch_to = vmx_pi_switch_to; + d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume; +} + +/* This function is called when pcidevs_lock is held */ +void vmx_pi_hooks_deassign(struct domain *d) +{ + if ( !iommu_intpost || !has_hvm_container_domain(d) ) + return; + + ASSERT(d->arch.hvm_domain.vmx.vcpu_block); + + d->arch.hvm_domain.vmx.vcpu_block = NULL; + d->arch.hvm_domain.vmx.pi_switch_from = NULL; + d->arch.hvm_domain.vmx.pi_switch_to = NULL; + d->arch.hvm_domain.vmx.pi_do_resume = NULL; +} static int vmx_domain_initialise(struct domain *d) { @@ -112,6 +259,8 @@ static int vmx_vcpu_initialise(struct vcpu *v) spin_lock_init(&v->arch.hvm_vmx.vmcs_lock); + INIT_LIST_HEAD(&pi_blocking_vcpu_list(v)); + v->arch.schedule_tail = vmx_do_resume; v->arch.ctxt_switch_from = vmx_ctxt_switch_from; v->arch.ctxt_switch_to = vmx_ctxt_switch_to; @@ -740,6 +889,9 @@ static void vmx_ctxt_switch_from(struct vcpu *v) vmx_save_guest_msrs(v); vmx_restore_host_msrs(); vmx_save_dr(v); + + if ( v->domain->arch.hvm_domain.vmx.pi_switch_from ) + v->domain->arch.hvm_domain.vmx.pi_switch_from(v); } static void vmx_ctxt_switch_to(struct vcpu *v) @@ -752,6 +904,9 @@ static void vmx_ctxt_switch_to(struct vcpu *v) vmx_restore_guest_msrs(v); vmx_restore_dr(v); + + if ( v->domain->arch.hvm_domain.vmx.pi_switch_to ) + v->domain->arch.hvm_domain.vmx.pi_switch_to(v); } @@ -2010,6 +2165,38 @@ static struct hvm_function_table __initdata vmx_function_table = { .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc, }; +/* Handle VT-d posted-interrupt when VCPU is blocked. */ +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) +{ + struct arch_vmx_struct *vmx, *tmp; + spinlock_t *lock = &vmx_pi_blocking_list_lock(smp_processor_id()); + struct list_head *blocked_vcpus = &vmx_pi_blocking_vcpu_list(smp_processor_id()); + + ack_APIC_irq(); + this_cpu(irq_count)++; + + spin_lock(lock); + + /* + * XXX: The length of the list depends on how many vCPU is current + * blocked on this specific pCPU. This may hurt the interrupt latency + * if the list grows to too many entries. + */ + list_for_each_entry_safe(vmx, tmp, blocked_vcpus, + pi_blocking_vcpu_info.pi_blocking_vcpu_list) + { + if ( pi_test_on(&vmx->pi_desc) ) + { + list_del(&vmx->pi_blocking_vcpu_info.pi_blocking_vcpu_list); + ASSERT(vmx->pi_blocking_vcpu_info.pi_blocking_list_lock == lock); + vmx->pi_blocking_vcpu_info.pi_blocking_list_lock = NULL; + vcpu_unblock(container_of(vmx, struct vcpu, arch.hvm_vmx)); + } + } + + spin_unlock(lock); +} + /* Handle VT-d posted-interrupt when VCPU is running. */ static void pi_notification_interrupt(struct cpu_user_regs *regs) { @@ -2096,7 +2283,10 @@ const struct hvm_function_table * __init start_vmx(void) if ( cpu_has_vmx_posted_intr_processing ) { if ( iommu_intpost ) + { alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt); + alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt); + } else alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt); } @@ -3574,6 +3764,9 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs) struct hvm_vcpu_asid *p_asid; bool_t need_flush; + if ( curr->domain->arch.hvm_domain.vmx.pi_do_resume ) + curr->domain->arch.hvm_domain.vmx.pi_do_resume(curr); + if ( !cpu_has_vmx_vpid ) goto out; if ( nestedhvm_vcpu_in_guestmode(curr) ) diff --git a/xen/common/schedule.c b/xen/common/schedule.c index d121896..2d87021 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -802,6 +802,8 @@ void vcpu_block(void) set_bit(_VPF_blocked, &v->pause_flags); + arch_vcpu_block(v); + /* Check for events /after/ blocking: avoids wakeup waiting race. */ if ( local_events_need_delivery() ) { @@ -839,6 +841,8 @@ static long do_poll(struct sched_poll *sched_poll) v->poll_evtchn = -1; set_bit(v->vcpu_id, d->poll_mask); + arch_vcpu_block(v); + #ifndef CONFIG_X86 /* set_bit() implies mb() on x86 */ /* Check for events /after/ setting flags: avoids wakeup waiting race. */ smp_mb(); diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index ec31c6b..14223db 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2283,9 +2283,17 @@ static int reassign_device_ownership( if ( ret ) return ret; + if ( !target->arch.hvm_domain.vmx.vcpu_block ) + vmx_pi_hooks_assign(target); + ret = domain_context_mapping(target, devfn, pdev); if ( ret ) + { + if ( target->arch.hvm_domain.vmx.vcpu_block && !has_arch_pdevs(target) ) + vmx_pi_hooks_deassign(target); + return ret; + } if ( devfn == pdev->devfn ) { @@ -2293,6 +2301,9 @@ static int reassign_device_ownership( pdev->domain = target; } + if ( source->arch.hvm_domain.vmx.vcpu_block && !has_arch_pdevs(source) ) + vmx_pi_hooks_deassign(source); + return ret; } diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index aa7f283..37afa80 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -310,6 +310,8 @@ static inline void free_vcpu_guest_context(struct vcpu_guest_context *vgc) xfree(vgc); } +static inline void arch_vcpu_block(struct vcpu *v) {} + #endif /* __ASM_DOMAIN_H__ */ /* diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index b9d893d..b90ad1c 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -565,6 +565,12 @@ const char *hvm_efer_valid(const struct vcpu *v, uint64_t value, signed int cr0_pg); unsigned long hvm_cr4_guest_reserved_bits(const struct vcpu *v, bool_t restore); +#define arch_vcpu_block(v) ({ \ + void (*func) (struct vcpu *) = (v)->domain->arch.hvm_domain.vmx.vcpu_block;\ + if ( func ) \ + func(v); \ +}) + #endif /* __ASM_X86_HVM_HVM_H__ */ /* diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h index d1496b8..72a17c6 100644 --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -77,6 +77,65 @@ struct vmx_domain { unsigned long apic_access_mfn; /* VMX_DOMAIN_* */ unsigned int status; + + /* + * To handle posted interrupts correctly, we need to set the following + * state: + * + * * The PI notification vector (NV) + * * The PI notification destination processor (NDST) + * * The PI "suppress notification" bit (SN) + * * The vcpu pi "blocked" list + * + * If a VM is currently running, we want the PI delivered to the guest vcpu + * on the proper pcpu (NDST = v->processor, SN clear). + * + * If the vm is blocked, we want the PI delivered to Xen so that it can + * wake it up (SN clear, NV = pi_wakeup_vector, vcpu on block list). + * + * If the VM is currently either preempted or offline (i.e., not running + * because of some reason other than blocking waiting for an interrupt), + * there's nothing Xen can do -- we want the interrupt pending bit set in + * the guest, but we don't want to bother Xen with an interrupt (SN clear). + * + * There's a brief window of time between vmx_intr_assist() and checking + * softirqs where if an interrupt comes in it may be lost; so we need Xen + * to get an interrupt and raise a softirq so that it will go through the + * vmx_intr_assist() path again (SN clear, NV = posted_interrupt). + * + * The way we implement this now is by looking at what needs to happen on + * the following runstate transitions: + * + * A: runnable -> running + * - SN = 0 + * - NDST = v->processor + * B: running -> runnable + * - SN = 1 + * C: running -> blocked + * - NV = pi_wakeup_vector + * - Add vcpu to blocked list + * D: blocked -> runnable + * - NV = posted_intr_vector + * - Take vcpu off blocked list + * + * For transitions A and B, we add hooks into vmx_ctxt_switch_{from,to} + * paths. + * + * For transition C, we add a new arch hook, arch_vcpu_block(), which is + * called from vcpu_block() and vcpu_do_poll(). + * + * For transition D, rather than add an extra arch hook on vcpu_wake, we + * add a hook on the vmentry path which checks to see if either of the two + * actions need to be taken. + * + * These hooks only need to be called when the domain in question actually + * has a physical device assigned to it, so we set and clear the callbacks + * as appropriate when device assignment changes. + */ + void (*vcpu_block) (struct vcpu *); + void (*pi_switch_from) (struct vcpu *v); + void (*pi_switch_to) (struct vcpu *v); + void (*pi_do_resume) (struct vcpu *v); }; struct pi_desc { @@ -101,6 +160,17 @@ struct pi_desc { #define NR_PML_ENTRIES 512 +#define pi_blocking_vcpu_list(v) \ + ((v)->arch.hvm_vmx.pi_blocking_vcpu_info.pi_blocking_vcpu_list) + +#define pi_blocking_list_lock(v) \ + ((v)->arch.hvm_vmx.pi_blocking_vcpu_info.pi_blocking_list_lock) + +struct pi_blocking_vcpu { + struct list_head pi_blocking_vcpu_list; + spinlock_t *pi_blocking_list_lock; +}; + struct arch_vmx_struct { /* Physical address of VMCS. */ paddr_t vmcs_pa; @@ -160,6 +230,13 @@ struct arch_vmx_struct { struct page_info *vmwrite_bitmap; struct page_info *pml_pg; + + /* + * Before it is blocked, vCPU is added to the per-cpu list. + * VT-d engine can send wakeup notification event to the + * pCPU and wakeup the related vCPU. + */ + struct pi_blocking_vcpu pi_blocking_vcpu_info; }; int vmx_create_vmcs(struct vcpu *v); diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h index 1719965..359b2a9 100644 --- a/xen/include/asm-x86/hvm/vmx/vmx.h +++ b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -563,6 +563,11 @@ int alloc_p2m_hap_data(struct p2m_domain *p2m); void free_p2m_hap_data(struct p2m_domain *p2m); void p2m_init_hap_data(struct p2m_domain *p2m); +void vmx_pi_per_cpu_init(unsigned int cpu); + +void vmx_pi_hooks_assign(struct domain *d); +void vmx_pi_hooks_deassign(struct domain *d); + /* EPT violation qualifications definitions */ #define _EPT_READ_VIOLATION 0 #define EPT_READ_VIOLATION (1UL<<_EPT_READ_VIOLATION)
This is the core logic handling for VT-d posted-interrupts. Basically it deals with how and when to update posted-interrupts during the following scenarios: - vCPU is preempted - vCPU is slept - vCPU is blocked When vCPU is preempted/slept, we update the posted-interrupts during scheduling by introducing two new architecutral scheduler hooks: vmx_pi_switch_from() and vmx_pi_switch_to(). When vCPU is blocked, we introduce a new architectural hook: arch_vcpu_block() to update posted-interrupts descriptor. Besides that, before VM-entry, we will make sure the 'NV' filed is set to 'posted_intr_vector' and the vCPU is not in any blocking lists, which is needed when vCPU is running in non-root mode. The reason we do this check is because we change the posted-interrupts descriptor in vcpu_block(), however, we don't change it back in vcpu_unblock() or when vcpu_block() directly returns due to event delivery (in fact, we don't need to do it in the two places, that is why we do it before VM-Entry). When we handle the lazy context switch for the following two scenarios: - Preempted by a tasklet, which uses in an idle context. - the prev vcpu is in offline and no new available vcpus in run queue. We don't change the 'SN' bit in posted-interrupt descriptor, this may incur spurious PI notification events, but since PI notification event is only sent when 'ON' is clear, and once the PI notificatoin is sent, ON is set by hardware, hence no more notification events before 'ON' is clear. Besides that, spurious PI notification events are going to happen from time to time in Xen hypervisor, such as, when guests trap to Xen and PI notification event happens, there is nothing Xen actually needs to do about it, the interrupts will be delivered to guest atht the next time we do a VMENTRY. CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <jbeulich@suse.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Kevin Tian <kevin.tian@intel.com> CC: George Dunlap <george.dunlap@eu.citrix.com> CC: Dario Faggioli <dario.faggioli@citrix.com> Suggested-by: Yang Zhang <yang.z.zhang@intel.com> Suggested-by: Dario Faggioli <dario.faggioli@citrix.com> Suggested-by: George Dunlap <george.dunlap@eu.citrix.com> Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Feng Wu <feng.wu@intel.com> --- v13: - Define the blocking vcpu list and lock in a structure - Define the two local per-CPU variables in a structure - Some adjustment to vmx_pi_hooks_assign() and vmx_pi_hooks_deassign() - Use smp_rmb() instead of barrier(), and put it a little earlier - Minor changes to macro arch_vcpu_block() to make 'v' evaluated only once. - Remove the pointless parentheses in the function arguments in macro arch_vcpu_block() - coding style v12: - Move the ASSERT to the locked region in vmx_vcpu_block() - Add barrier() before using the local variable in vmx_pi_do_resume() - Split vmx_pi_hooks_reassign() to two functions: * vmx_pi_hooks_assign() * vmx_pi_hooks_deassign() - Add more comments about how PI works during vCPU state transition - coding style v11: - Add ASSERT() in vmx_vcpu_block() - Add some comments in vmx_pi_switch_from() - Remove some comments which should have been removed when the related code was removed during v9 -> v10 - Rename 'vmx_pi_state_to_normal' to 'vmx_pi_do_resume' - Coding style - Make arch_vcpu_block() a macro - Make 'pi_wakeup_vector' static - Move hook 'vcpu_block' to 'struct hvm_vcpu' - Initial hook 'vcpu_block' when assigning the first pci device and zap it on removal of the last device - Save pointer to the block list lock instead of the processor id in 'struct arch_vmx_struct' - Implement the following functions as hooks, so we can elimilate lots of checkings and spinlocks in scheduling related code path, which is good for performance. vmx_pi_switch_from vmx_pi_switch_to vmx_pi_do_resume v10: - Check iommu_intpost first - Remove pointless checking of has_hvm_container_vcpu(v) - Rename 'vmx_pi_state_change' to 'vmx_pi_state_to_normal' - Since vcpu_unblock() doesn't acquire 'pi_blocked_vcpu_lock', we don't need use another list to save the vCPUs with 'ON' set, just directly call vcpu_unblock(v). v9: - Remove arch_vcpu_block_cancel() and arch_vcpu_wake_prepare() - Add vmx_pi_state_change() and call it before VM Entry v8: - Remove the lazy context switch handling for PI state transition - Change PI state in vcpu_block() and do_poll() when the vCPU is going to be blocked v7: - Merge [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts and "[PATCH v6 14/18] vmx: posted-interrupt handling when vCPU is blocked" into this patch, so it is self-contained and more convenient for code review. - Make 'pi_blocked_vcpu' and 'pi_blocked_vcpu_lock' static - Coding style - Use per_cpu() instead of this_cpu() in pi_wakeup_interrupt() - Move ack_APIC_irq() to the beginning of pi_wakeup_interrupt() - Rename 'pi_ctxt_switch_from' to 'ctxt_switch_prepare' - Rename 'pi_ctxt_switch_to' to 'ctxt_switch_cancel' - Use 'has_hvm_container_vcpu' instead of 'is_hvm_vcpu' - Use 'spin_lock' and 'spin_unlock' when the interrupt has been already disabled. - Rename arch_vcpu_wake_prepare to vmx_vcpu_wake_prepare - Define vmx_vcpu_wake_prepare in xen/arch/x86/hvm/hvm.c - Call .pi_ctxt_switch_to() __context_switch() instead of directly calling vmx_post_ctx_switch_pi() in vmx_ctxt_switch_to() - Make .pi_block_cpu unsigned int - Use list_del() instead of list_del_init() - Coding style One remaining item in v7: Jan has concern about calling vcpu_unblock() in vmx_pre_ctx_switch_pi(), need Dario or George's input about this. v6: - Add two static inline functions for pi context switch - Fix typos v5: - Rename arch_vcpu_wake to arch_vcpu_wake_prepare - Make arch_vcpu_wake_prepare() inline for ARM - Merge the ARM dummy hook with together - Changes to some code comments - Leave 'pi_ctxt_switch_from' and 'pi_ctxt_switch_to' NULL if PI is disabled or the vCPU is not in HVM - Coding style v4: - Newly added Changlog for "vmx: posted-interrupt handling when vCPU is blocked" v6: - Fix some typos - Ack the interrupt right after the spin_unlock in pi_wakeup_interrupt() v4: - Use local variables in pi_wakeup_interrupt() - Remove vcpu from the blocked list when pi_desc.on==1, this - avoid kick vcpu multiple times. - Remove tasklet v3: - This patch is generated by merging the following three patches in v2: [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU [RFC v2 10/15] vmx: Define two per-cpu variables [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet' - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct' - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler' - Make pi_wakeup_interrupt() static - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list' - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct' - Rename 'blocked_vcpu' to 'pi_blocked_vcpu' - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock' xen/arch/x86/hvm/vmx/vmcs.c | 2 + xen/arch/x86/hvm/vmx/vmx.c | 193 ++++++++++++++++++++++++++++++++++++ xen/common/schedule.c | 4 + xen/drivers/passthrough/vtd/iommu.c | 11 ++ xen/include/asm-arm/domain.h | 2 + xen/include/asm-x86/hvm/hvm.h | 6 ++ xen/include/asm-x86/hvm/vmx/vmcs.h | 77 ++++++++++++++ xen/include/asm-x86/hvm/vmx/vmx.h | 5 + 8 files changed, 300 insertions(+)