diff mbox

[3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination

Message ID 1463734431-22353-4-git-send-email-feng.wu@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wu, Feng May 20, 2016, 8:53 a.m. UTC
We need to make sure the bocking vcpu is not in any per-cpu blocking list
when the associated domain is going to be destroyed.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

Comments

Tian, Kevin May 23, 2016, 5:19 a.m. UTC | #1
> From: Wu, Feng
> Sent: Friday, May 20, 2016 4:54 PM
> 
> We need to make sure the bocking vcpu is not in any per-cpu blocking list
> when the associated domain is going to be destroyed.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 32
> ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 4862b13..e74b3e7 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
>      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
>  }
> 
> +static void vmx_pi_blocking_list_cleanup(struct domain *d)

Is it more natural to move such cleanup under vcpu destroy?

> +{
> +    unsigned int cpu;
> +
> +    for_each_online_cpu ( cpu )
> +    {
> +        struct vcpu *v;
> +        unsigned long flags;
> +        struct arch_vmx_struct *vmx, *tmp;
> +        spinlock_t *lock = &per_cpu(vmx_pi_blocking, cpu).lock;
> +        struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking, cpu).list;
> +
> +        spin_lock_irqsave(lock, flags);
> +
> +        list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
> +        {
> +            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> +
> +            if (v->domain == d)
> +            {
> +                list_del(&vmx->pi_blocking.list);
> +                ASSERT(vmx->pi_blocking.lock == lock);
> +                vmx->pi_blocking.lock = NULL;
> +            }
> +        }
> +
> +        spin_unlock_irqrestore(lock, flags);
> +    }
> +}
> +
>  static int vmx_domain_initialise(struct domain *d)
>  {
>      int rc;
> @@ -265,6 +295,8 @@ static int vmx_domain_initialise(struct domain *d)
> 
>  static void vmx_domain_destroy(struct domain *d)
>  {
> +    vmx_pi_blocking_list_cleanup(d);
> +
>      if ( !has_vlapic(d) )
>          return;
> 
> --
> 2.1.0
Wu, Feng May 23, 2016, 5:48 a.m. UTC | #2
> -----Original Message-----
> From: Tian, Kevin
> Sent: Monday, May 23, 2016 1:19 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: keir@xen.org; jbeulich@suse.com; andrew.cooper3@citrix.com;
> george.dunlap@eu.citrix.com; dario.faggioli@citrix.com;
> konrad.wilk@oracle.com
> Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> after domain termination
> 
> > From: Wu, Feng
> > Sent: Friday, May 20, 2016 4:54 PM
> >
> > We need to make sure the bocking vcpu is not in any per-cpu blocking list
> > when the associated domain is going to be destroyed.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  xen/arch/x86/hvm/vmx/vmx.c | 32
> > ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index 4862b13..e74b3e7 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
> >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> >  }
> >
> > +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> 
> Is it more natural to move such cleanup under vcpu destroy?

Yes, indeed it is more natural to add this function when vcpu is destroyed,
however, the reason I add it here is I still have some concerns about the timing.
When the VM is destroyed, here is the calling path:

- vmx_pi_hooks_deassign() -->
......
- vmx_vcpu_destroy() --> 
......
- vmx_domain_destroy()
......

As I replied in the previous mail, when we remove the vcpus from the blocking
list, there might be some _in-flight_ call to the hooks, so I put the cleanup
code in the vmx_domain_destroy(), which is a bit more far from vmx_pi_hooks_deassign,
and hence safer. If you have any other good ideas, I am all ears!:)

Thanks,
Feng
Tian, Kevin May 23, 2016, 6:54 a.m. UTC | #3
> From: Wu, Feng
> Sent: Monday, May 23, 2016 1:48 PM
> > -----Original Message-----
> > From: Tian, Kevin
> > Sent: Monday, May 23, 2016 1:19 PM
> > To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> > Cc: keir@xen.org; jbeulich@suse.com; andrew.cooper3@citrix.com;
> > george.dunlap@eu.citrix.com; dario.faggioli@citrix.com;
> > konrad.wilk@oracle.com
> > Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> > after domain termination
> >
> > > From: Wu, Feng
> > > Sent: Friday, May 20, 2016 4:54 PM
> > >
> > > We need to make sure the bocking vcpu is not in any per-cpu blocking list
> > > when the associated domain is going to be destroyed.
> > >
> > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > ---
> > >  xen/arch/x86/hvm/vmx/vmx.c | 32
> > > ++++++++++++++++++++++++++++++++
> > >  1 file changed, 32 insertions(+)
> > >
> > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > > index 4862b13..e74b3e7 100644
> > > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
> > >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> > >  }
> > >
> > > +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> >
> > Is it more natural to move such cleanup under vcpu destroy?
> 
> Yes, indeed it is more natural to add this function when vcpu is destroyed,
> however, the reason I add it here is I still have some concerns about the timing.
> When the VM is destroyed, here is the calling path:
> 
> - vmx_pi_hooks_deassign() -->
> ......
> - vmx_vcpu_destroy() -->
> ......
> - vmx_domain_destroy()
> ......
> 
> As I replied in the previous mail, when we remove the vcpus from the blocking
> list, there might be some _in-flight_ call to the hooks, so I put the cleanup
> code in the vmx_domain_destroy(), which is a bit more far from vmx_pi_hooks_deassign,
> and hence safer. If you have any other good ideas, I am all ears!:)
> 

If we don't need reset callbacks at deassign path, as commented for patch 1/3,
would it make above puzzle away? :-)

Thanks
Kevin
Jan Beulich May 23, 2016, 9:08 a.m. UTC | #4
>>> On 23.05.16 at 07:48, <feng.wu@intel.com> wrote:
>> From: Tian, Kevin
>> Sent: Monday, May 23, 2016 1:19 PM
>> > From: Wu, Feng
>> > Sent: Friday, May 20, 2016 4:54 PM
>> > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
>> >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
>> >  }
>> >
>> > +static void vmx_pi_blocking_list_cleanup(struct domain *d)
>> 
>> Is it more natural to move such cleanup under vcpu destroy?
> 
> Yes, indeed it is more natural to add this function when vcpu is destroyed,
> however, the reason I add it here is I still have some concerns about the 
> timing.
> When the VM is destroyed, here is the calling path:
> 
> - vmx_pi_hooks_deassign() -->
> ......
> - vmx_vcpu_destroy() --> 
> ......
> - vmx_domain_destroy()
> ......
> 
> As I replied in the previous mail, when we remove the vcpus from the 
> blocking
> list, there might be some _in-flight_ call to the hooks, so I put the cleanup
> code in the vmx_domain_destroy(), which is a bit more far from 
> vmx_pi_hooks_deassign,
> and hence safer. If you have any other good ideas, I am all ears!:)

Well, either there is a possible race (then moving the addition
later just reduces the chances, but doesn't eliminate it), or there
isn't (in which case Kevin's suggestion should probably be followed).

Jan
Wu, Feng May 23, 2016, 9:17 a.m. UTC | #5
> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, May 23, 2016 5:08 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> after domain termination
> 
> >>> On 23.05.16 at 07:48, <feng.wu@intel.com> wrote:
> >> From: Tian, Kevin
> >> Sent: Monday, May 23, 2016 1:19 PM
> >> > From: Wu, Feng
> >> > Sent: Friday, May 20, 2016 4:54 PM
> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >> > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
> >> >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> >> >  }
> >> >
> >> > +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> >>
> >> Is it more natural to move such cleanup under vcpu destroy?
> >
> > Yes, indeed it is more natural to add this function when vcpu is destroyed,
> > however, the reason I add it here is I still have some concerns about the
> > timing.
> > When the VM is destroyed, here is the calling path:
> >
> > - vmx_pi_hooks_deassign() -->
> > ......
> > - vmx_vcpu_destroy() -->
> > ......
> > - vmx_domain_destroy()
> > ......
> >
> > As I replied in the previous mail, when we remove the vcpus from the
> > blocking
> > list, there might be some _in-flight_ call to the hooks, so I put the cleanup
> > code in the vmx_domain_destroy(), which is a bit more far from
> > vmx_pi_hooks_deassign,
> > and hence safer. If you have any other good ideas, I am all ears!:)
> 
> Well, either there is a possible race (then moving the addition
> later just reduces the chances, but doesn't eliminate it), or there
> isn't (in which case Kevin's suggestion should probably be followed).

Yes, I agree, adding the cleanup code in domain destroy other than
vcpu destroy point just reduces the risk, but not eliminate. So far I don't
get a perfect solution to solve this possible race condition.

Thanks,
Feng

> 
> Jan
Wu, Feng May 23, 2016, 10:35 a.m. UTC | #6
> -----Original Message-----
> From: Wu, Feng
> Sent: Monday, May 23, 2016 5:18 PM
> To: Jan Beulich <JBeulich@suse.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org; Wu, Feng
> <feng.wu@intel.com>
> Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> after domain termination
> 
> 
> 
> > -----Original Message-----
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: Monday, May 23, 2016 5:08 PM
> > To: Wu, Feng <feng.wu@intel.com>
> > Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> > george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> > devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> > Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> > after domain termination
> >
> > >>> On 23.05.16 at 07:48, <feng.wu@intel.com> wrote:
> > >> From: Tian, Kevin
> > >> Sent: Monday, May 23, 2016 1:19 PM
> > >> > From: Wu, Feng
> > >> > Sent: Friday, May 20, 2016 4:54 PM
> > >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > >> > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
> > >> >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> > >> >  }
> > >> >
> > >> > +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> > >>
> > >> Is it more natural to move such cleanup under vcpu destroy?
> > >
> > > Yes, indeed it is more natural to add this function when vcpu is destroyed,
> > > however, the reason I add it here is I still have some concerns about the
> > > timing.
> > > When the VM is destroyed, here is the calling path:
> > >
> > > - vmx_pi_hooks_deassign() -->
> > > ......
> > > - vmx_vcpu_destroy() -->
> > > ......
> > > - vmx_domain_destroy()
> > > ......
> > >
> > > As I replied in the previous mail, when we remove the vcpus from the
> > > blocking
> > > list, there might be some _in-flight_ call to the hooks, so I put the cleanup
> > > code in the vmx_domain_destroy(), which is a bit more far from
> > > vmx_pi_hooks_deassign,
> > > and hence safer. If you have any other good ideas, I am all ears!:)
> >
> > Well, either there is a possible race (then moving the addition
> > later just reduces the chances, but doesn't eliminate it), or there
> > isn't (in which case Kevin's suggestion should probably be followed).
> 
> Yes, I agree, adding the cleanup code in domain destroy other than
> vcpu destroy point just reduces the risk, but not eliminate. So far I don't
> get a perfect solution to solve this possible race condition.

After more thinking about this, I think this race condition can be resolve
in the following way:
1. Define a per-vCPU flag, say, 'v->arch.hvm_vmx.pi_back_from_hotplug'
2. In vmx_pi_blocking_list_cleanup(), when we find the vCPU from an
blocking list, after removing it, set the flag to 1
3. In vmx_vcpu_block(), add the following check:

     spin_lock_irqsave(pi_blocking_list_lock, flags);
+    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up == 1) )
+    {
+        /*
+         * The vcpu is to be destroyed and it has already been removed
+         * from the per-CPU list if it is blocking, we shouldn't add
+         * new vCPUs to the list.
+         */
+        spin_unlock_irqrestore(pi_blocking_list_lock, flags);
+        return;
+    }
+
     old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
                        pi_blocking_list_lock);

Then we can following Kevin's suggestion to move the addition
to vmx_vcpu_destory().

Any ideas?

Thanks,
Feng


> 
> Thanks,
> Feng
> 
> >
> > Jan
Jan Beulich May 23, 2016, 11:11 a.m. UTC | #7
>>> On 23.05.16 at 12:35, <feng.wu@intel.com> wrote:
>> From: Wu, Feng
>> Sent: Monday, May 23, 2016 5:18 PM
>> > From: Jan Beulich [mailto:JBeulich@suse.com]
>> > Sent: Monday, May 23, 2016 5:08 PM
>> > To: Wu, Feng <feng.wu@intel.com>
>> > >>> On 23.05.16 at 07:48, <feng.wu@intel.com> wrote:
>> > > Yes, indeed it is more natural to add this function when vcpu is destroyed,
>> > > however, the reason I add it here is I still have some concerns about the
>> > > timing.
>> > > When the VM is destroyed, here is the calling path:
>> > >
>> > > - vmx_pi_hooks_deassign() -->
>> > > ......
>> > > - vmx_vcpu_destroy() -->
>> > > ......
>> > > - vmx_domain_destroy()
>> > > ......
>> > >
>> > > As I replied in the previous mail, when we remove the vcpus from the
>> > > blocking
>> > > list, there might be some _in-flight_ call to the hooks, so I put the cleanup
>> > > code in the vmx_domain_destroy(), which is a bit more far from
>> > > vmx_pi_hooks_deassign,
>> > > and hence safer. If you have any other good ideas, I am all ears!:)
>> >
>> > Well, either there is a possible race (then moving the addition
>> > later just reduces the chances, but doesn't eliminate it), or there
>> > isn't (in which case Kevin's suggestion should probably be followed).
>> 
>> Yes, I agree, adding the cleanup code in domain destroy other than
>> vcpu destroy point just reduces the risk, but not eliminate. So far I don't
>> get a perfect solution to solve this possible race condition.
> 
> After more thinking about this, I think this race condition can be resolve
> in the following way:
> 1. Define a per-vCPU flag, say, 'v->arch.hvm_vmx.pi_back_from_hotplug'
> 2. In vmx_pi_blocking_list_cleanup(), when we find the vCPU from an
> blocking list, after removing it, set the flag to 1
> 3. In vmx_vcpu_block(), add the following check:
> 
>      spin_lock_irqsave(pi_blocking_list_lock, flags);
> +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up == 1) )
> +    {
> +        /*
> +         * The vcpu is to be destroyed and it has already been removed
> +         * from the per-CPU list if it is blocking, we shouldn't add
> +         * new vCPUs to the list.
> +         */
> +        spin_unlock_irqrestore(pi_blocking_list_lock, flags);
> +        return;
> +    }
> +
>      old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
>                         pi_blocking_list_lock);
> 
> Then we can following Kevin's suggestion to move the addition
> to vmx_vcpu_destory().

Before adding yet another PI-related field, I'd really like to see other
alternatives explored. In particular - can determination be based on
some other state (considering the subject, e.g. per-domain one)?

Jan
Wu, Feng May 23, 2016, 12:24 p.m. UTC | #8
> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, May 23, 2016 7:11 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> after domain termination
> 
> >>> On 23.05.16 at 12:35, <feng.wu@intel.com> wrote:
> >> From: Wu, Feng
> >> Sent: Monday, May 23, 2016 5:18 PM
> >> > From: Jan Beulich [mailto:JBeulich@suse.com]
> >> > Sent: Monday, May 23, 2016 5:08 PM
> >> > To: Wu, Feng <feng.wu@intel.com>
> >> > >>> On 23.05.16 at 07:48, <feng.wu@intel.com> wrote:
> >> > > Yes, indeed it is more natural to add this function when vcpu is destroyed,
> >> > > however, the reason I add it here is I still have some concerns about the
> >> > > timing.
> >> > > When the VM is destroyed, here is the calling path:
> >> > >
> >> > > - vmx_pi_hooks_deassign() -->
> >> > > ......
> >> > > - vmx_vcpu_destroy() -->
> >> > > ......
> >> > > - vmx_domain_destroy()
> >> > > ......
> >> > >
> >> > > As I replied in the previous mail, when we remove the vcpus from the
> >> > > blocking
> >> > > list, there might be some _in-flight_ call to the hooks, so I put the cleanup
> >> > > code in the vmx_domain_destroy(), which is a bit more far from
> >> > > vmx_pi_hooks_deassign,
> >> > > and hence safer. If you have any other good ideas, I am all ears!:)
> >> >
> >> > Well, either there is a possible race (then moving the addition
> >> > later just reduces the chances, but doesn't eliminate it), or there
> >> > isn't (in which case Kevin's suggestion should probably be followed).
> >>
> >> Yes, I agree, adding the cleanup code in domain destroy other than
> >> vcpu destroy point just reduces the risk, but not eliminate. So far I don't
> >> get a perfect solution to solve this possible race condition.
> >
> > After more thinking about this, I think this race condition can be resolve
> > in the following way:
> > 1. Define a per-vCPU flag, say, 'v->arch.hvm_vmx.pi_back_from_hotplug'
> > 2. In vmx_pi_blocking_list_cleanup(), when we find the vCPU from an
> > blocking list, after removing it, set the flag to 1
> > 3. In vmx_vcpu_block(), add the following check:
> >
> >      spin_lock_irqsave(pi_blocking_list_lock, flags);
> > +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up == 1) )
> > +    {
> > +        /*
> > +         * The vcpu is to be destroyed and it has already been removed
> > +         * from the per-CPU list if it is blocking, we shouldn't add
> > +         * new vCPUs to the list.
> > +         */
> > +        spin_unlock_irqrestore(pi_blocking_list_lock, flags);
> > +        return;
> > +    }
> > +
> >      old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
> >                         pi_blocking_list_lock);
> >
> > Then we can following Kevin's suggestion to move the addition
> > to vmx_vcpu_destory().
> 
> Before adding yet another PI-related field, I'd really like to see other
> alternatives explored. In particular - can determination be based on
> some other state (considering the subject, e.g. per-domain one)?

I think the point is we need to set some flag inside the
spin_lock_irqsave()/spin_unlock_irqrestore() section in
vmx_pi_blocking_list_cleanup() and check it after acquiring the lock
in vmx_vcpu_block(), so the case condition can be eliminated, right?
If that is the case, I am not sure how we can use other state.

Thanks,
Feng

> 
> Jan
Dario Faggioli May 23, 2016, 12:30 p.m. UTC | #9
On Fri, 2016-05-20 at 16:53 +0800, Feng Wu wrote:
> We need to make sure the bocking vcpu is not in any per-cpu blocking
> list
> when the associated domain is going to be destroyed.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---

> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
>      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
>  }
>  
> +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> +{
> +    unsigned int cpu;
> +
> +    for_each_online_cpu ( cpu )
> +    {
> +        struct vcpu *v;
> +        unsigned long flags;
> +        struct arch_vmx_struct *vmx, *tmp;
> +        spinlock_t *lock = &per_cpu(vmx_pi_blocking, cpu).lock;
> +        struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking,
> cpu).list;
> +
> +        spin_lock_irqsave(lock, flags);
> +
> +        list_for_each_entry_safe(vmx, tmp, blocked_vcpus,
> pi_blocking.list)
> +        {
> +            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> +
> +            if (v->domain == d)
> +            {
> +                list_del(&vmx->pi_blocking.list);
> +                ASSERT(vmx->pi_blocking.lock == lock);
> +                vmx->pi_blocking.lock = NULL;
> +            }
> +        }
> +
> +        spin_unlock_irqrestore(lock, flags);
> +    }
>
So, I'm probably missing something very ver basic, but I don't see
what's the reason why we need this loop... can't we arrange for
checking

 list_empty(&v->arch.hvm_vmx.pi_blocking.list)

?

:-O

Regards,
Dario
Jan Beulich May 23, 2016, 12:35 p.m. UTC | #10
>>> On 20.05.16 at 10:53, <feng.wu@intel.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
>      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
>  }
>  
> +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> +{
> +    unsigned int cpu;
> +
> +    for_each_online_cpu ( cpu )
> +    {
> +        struct vcpu *v;
> +        unsigned long flags;
> +        struct arch_vmx_struct *vmx, *tmp;
> +        spinlock_t *lock = &per_cpu(vmx_pi_blocking, cpu).lock;
> +        struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking, cpu).list;
> +
> +        spin_lock_irqsave(lock, flags);
> +
> +        list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
> +        {

Did you consider how long these two nested loops may take on a
large system?

Jan
Jan Beulich May 23, 2016, 12:46 p.m. UTC | #11
>>> On 23.05.16 at 14:24, <feng.wu@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Monday, May 23, 2016 7:11 PM
>> To: Wu, Feng <feng.wu@intel.com>
>> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
>> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
>> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org 
>> Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
>> after domain termination
>> 
>> >>> On 23.05.16 at 12:35, <feng.wu@intel.com> wrote:
>> >> From: Wu, Feng
>> >> Sent: Monday, May 23, 2016 5:18 PM
>> >> > From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> > Sent: Monday, May 23, 2016 5:08 PM
>> >> > To: Wu, Feng <feng.wu@intel.com>
>> >> > >>> On 23.05.16 at 07:48, <feng.wu@intel.com> wrote:
>> >> > > Yes, indeed it is more natural to add this function when vcpu is 
> destroyed,
>> >> > > however, the reason I add it here is I still have some concerns about the
>> >> > > timing.
>> >> > > When the VM is destroyed, here is the calling path:
>> >> > >
>> >> > > - vmx_pi_hooks_deassign() -->
>> >> > > ......
>> >> > > - vmx_vcpu_destroy() -->
>> >> > > ......
>> >> > > - vmx_domain_destroy()
>> >> > > ......
>> >> > >
>> >> > > As I replied in the previous mail, when we remove the vcpus from the
>> >> > > blocking
>> >> > > list, there might be some _in-flight_ call to the hooks, so I put the 
> cleanup
>> >> > > code in the vmx_domain_destroy(), which is a bit more far from
>> >> > > vmx_pi_hooks_deassign,
>> >> > > and hence safer. If you have any other good ideas, I am all ears!:)
>> >> >
>> >> > Well, either there is a possible race (then moving the addition
>> >> > later just reduces the chances, but doesn't eliminate it), or there
>> >> > isn't (in which case Kevin's suggestion should probably be followed).
>> >>
>> >> Yes, I agree, adding the cleanup code in domain destroy other than
>> >> vcpu destroy point just reduces the risk, but not eliminate. So far I don't
>> >> get a perfect solution to solve this possible race condition.
>> >
>> > After more thinking about this, I think this race condition can be resolve
>> > in the following way:
>> > 1. Define a per-vCPU flag, say, 'v->arch.hvm_vmx.pi_back_from_hotplug'
>> > 2. In vmx_pi_blocking_list_cleanup(), when we find the vCPU from an
>> > blocking list, after removing it, set the flag to 1
>> > 3. In vmx_vcpu_block(), add the following check:
>> >
>> >      spin_lock_irqsave(pi_blocking_list_lock, flags);
>> > +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up == 1) )
>> > +    {
>> > +        /*
>> > +         * The vcpu is to be destroyed and it has already been removed
>> > +         * from the per-CPU list if it is blocking, we shouldn't add
>> > +         * new vCPUs to the list.
>> > +         */
>> > +        spin_unlock_irqrestore(pi_blocking_list_lock, flags);
>> > +        return;
>> > +    }
>> > +
>> >      old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
>> >                         pi_blocking_list_lock);
>> >
>> > Then we can following Kevin's suggestion to move the addition
>> > to vmx_vcpu_destory().
>> 
>> Before adding yet another PI-related field, I'd really like to see other
>> alternatives explored. In particular - can determination be based on
>> some other state (considering the subject, e.g. per-domain one)?
> 
> I think the point is we need to set some flag inside the
> spin_lock_irqsave()/spin_unlock_irqrestore() section in
> vmx_pi_blocking_list_cleanup() and check it after acquiring the lock
> in vmx_vcpu_block(), so the case condition can be eliminated, right?
> If that is the case, I am not sure how we can use other state.

Since you only need this during domain shutdown, I'm not sure. For
example, can't you simply use d->is_dying or d->is_shutting_down?

Jan
Wu, Feng May 23, 2016, 1:32 p.m. UTC | #12
> -----Original Message-----

> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]

> Sent: Monday, May 23, 2016 8:31 PM

> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org

> Cc: keir@xen.org; Tian, Kevin <kevin.tian@intel.com>; jbeulich@suse.com;

> andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com;

> konrad.wilk@oracle.com

> Subject: Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list

> after domain termination

> 

> On Fri, 2016-05-20 at 16:53 +0800, Feng Wu wrote:

> > We need to make sure the bocking vcpu is not in any per-cpu blocking

> > list

> > when the associated domain is going to be destroyed.

> >

> > Signed-off-by: Feng Wu <feng.wu@intel.com>

> > ---

> >

> > --- a/xen/arch/x86/hvm/vmx/vmx.c

> > +++ b/xen/arch/x86/hvm/vmx/vmx.c

> > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)

> >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;

> >  }

> >

> > +static void vmx_pi_blocking_list_cleanup(struct domain *d)

> > +{

> > +    unsigned int cpu;

> > +

> > +    for_each_online_cpu ( cpu )

> > +    {

> > +        struct vcpu *v;

> > +        unsigned long flags;

> > +        struct arch_vmx_struct *vmx, *tmp;

> > +        spinlock_t *lock = &per_cpu(vmx_pi_blocking, cpu).lock;

> > +        struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking,

> > cpu).list;

> > +

> > +        spin_lock_irqsave(lock, flags);

> > +

> > +        list_for_each_entry_safe(vmx, tmp, blocked_vcpus,

> > pi_blocking.list)

> > +        {

> > +            v = container_of(vmx, struct vcpu, arch.hvm_vmx);

> > +

> > +            if (v->domain == d)

> > +            {

> > +                list_del(&vmx->pi_blocking.list);

> > +                ASSERT(vmx->pi_blocking.lock == lock);

> > +                vmx->pi_blocking.lock = NULL;

> > +            }

> > +        }

> > +

> > +        spin_unlock_irqrestore(lock, flags);

> > +    }

> >

> So, I'm probably missing something very ver basic, but I don't see

> what's the reason why we need this loop... can't we arrange for

> checking

> 

>  list_empty(&v->arch.hvm_vmx.pi_blocking.list)


Yes, I also cannot find the reason why can't we use this good
suggestion, Except we need use list_del_init() instead of
list_del() in the current code. Or we can just check whether
' vmx->pi_blocking.lock ' is NULL? I total don't know why I
missed it! :)

Thanks,
Feng

> 

> ?

> 

> :-O

> 

> Regards,

> Dario

> --

> <<This happens because I choose it to happen!>> (Raistlin Majere)

> -----------------------------------------------------------------

> Dario Faggioli, Ph.D, http://about.me/dario.faggioli

> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Wu, Feng May 23, 2016, 1:33 p.m. UTC | #13
> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, May 23, 2016 8:36 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> after domain termination
> 
> >>> On 20.05.16 at 10:53, <feng.wu@intel.com> wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
> >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> >  }
> >
> > +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> > +{
> > +    unsigned int cpu;
> > +
> > +    for_each_online_cpu ( cpu )
> > +    {
> > +        struct vcpu *v;
> > +        unsigned long flags;
> > +        struct arch_vmx_struct *vmx, *tmp;
> > +        spinlock_t *lock = &per_cpu(vmx_pi_blocking, cpu).lock;
> > +        struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking, cpu).list;
> > +
> > +        spin_lock_irqsave(lock, flags);
> > +
> > +        list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
> > +        {
> 
> Did you consider how long these two nested loops may take on a
> large system?
> 

As Dario just mentioned, we may not need this loop at all.

Thanks,
Feng

> Jan
Wu, Feng May 23, 2016, 1:41 p.m. UTC | #14
> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, May 23, 2016 8:47 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> after domain termination
> 
> >>> On 23.05.16 at 14:24, <feng.wu@intel.com> wrote:
> 
> >
> >> -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Monday, May 23, 2016 7:11 PM
> >> To: Wu, Feng <feng.wu@intel.com>
> >> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> >> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> >> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> >> Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking
> list
> >> after domain termination
> >>
> >> >>> On 23.05.16 at 12:35, <feng.wu@intel.com> wrote:
> >> >> From: Wu, Feng
> >> >> Sent: Monday, May 23, 2016 5:18 PM
> >> >> > From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> > Sent: Monday, May 23, 2016 5:08 PM
> >> >> > To: Wu, Feng <feng.wu@intel.com>
> >> >> > >>> On 23.05.16 at 07:48, <feng.wu@intel.com> wrote:
> >> >> > > Yes, indeed it is more natural to add this function when vcpu is
> > destroyed,
> >> >> > > however, the reason I add it here is I still have some concerns about
> the
> >> >> > > timing.
> >> >> > > When the VM is destroyed, here is the calling path:
> >> >> > >
> >> >> > > - vmx_pi_hooks_deassign() -->
> >> >> > > ......
> >> >> > > - vmx_vcpu_destroy() -->
> >> >> > > ......
> >> >> > > - vmx_domain_destroy()
> >> >> > > ......
> >> >> > >
> >> >> > > As I replied in the previous mail, when we remove the vcpus from the
> >> >> > > blocking
> >> >> > > list, there might be some _in-flight_ call to the hooks, so I put the
> > cleanup
> >> >> > > code in the vmx_domain_destroy(), which is a bit more far from
> >> >> > > vmx_pi_hooks_deassign,
> >> >> > > and hence safer. If you have any other good ideas, I am all ears!:)
> >> >> >
> >> >> > Well, either there is a possible race (then moving the addition
> >> >> > later just reduces the chances, but doesn't eliminate it), or there
> >> >> > isn't (in which case Kevin's suggestion should probably be followed).
> >> >>
> >> >> Yes, I agree, adding the cleanup code in domain destroy other than
> >> >> vcpu destroy point just reduces the risk, but not eliminate. So far I don't
> >> >> get a perfect solution to solve this possible race condition.
> >> >
> >> > After more thinking about this, I think this race condition can be resolve
> >> > in the following way:
> >> > 1. Define a per-vCPU flag, say, 'v->arch.hvm_vmx.pi_back_from_hotplug'
> >> > 2. In vmx_pi_blocking_list_cleanup(), when we find the vCPU from an
> >> > blocking list, after removing it, set the flag to 1
> >> > 3. In vmx_vcpu_block(), add the following check:
> >> >
> >> >      spin_lock_irqsave(pi_blocking_list_lock, flags);
> >> > +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up == 1) )
> >> > +    {
> >> > +        /*
> >> > +         * The vcpu is to be destroyed and it has already been removed
> >> > +         * from the per-CPU list if it is blocking, we shouldn't add
> >> > +         * new vCPUs to the list.
> >> > +         */
> >> > +        spin_unlock_irqrestore(pi_blocking_list_lock, flags);
> >> > +        return;
> >> > +    }
> >> > +
> >> >      old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
> >> >                         pi_blocking_list_lock);
> >> >
> >> > Then we can following Kevin's suggestion to move the addition
> >> > to vmx_vcpu_destory().
> >>
> >> Before adding yet another PI-related field, I'd really like to see other
> >> alternatives explored. In particular - can determination be based on
> >> some other state (considering the subject, e.g. per-domain one)?
> >
> > I think the point is we need to set some flag inside the
> > spin_lock_irqsave()/spin_unlock_irqrestore() section in
> > vmx_pi_blocking_list_cleanup() and check it after acquiring the lock
> > in vmx_vcpu_block(), so the case condition can be eliminated, right?
> > If that is the case, I am not sure how we can use other state.
> 
> Since you only need this during domain shutdown, I'm not sure. For
> example, can't you simply use d->is_dying or d->is_shutting_down?

I am not sure those flags can be used for this case, since I think we
should set the flags inside the spin lock area as mentioned above
Anyway, I will think it more about this.

Thanks,
Feng

> 
> Jan
Dario Faggioli May 23, 2016, 2:45 p.m. UTC | #15
On Mon, 2016-05-23 at 13:32 +0000, Wu, Feng wrote:

> > > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
> > >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> > >  }
> > > 
> > > +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> > > +{
> > > +    unsigned int cpu;
> > > +
> > > +    for_each_online_cpu ( cpu )
> > > +    {
> > > +        struct vcpu *v;
> > > +        unsigned long flags;
> > > +        struct arch_vmx_struct *vmx, *tmp;
> > > +        spinlock_t *lock = &per_cpu(vmx_pi_blocking, cpu).lock;
> > > +        struct list_head *blocked_vcpus =
> > > &per_cpu(vmx_pi_blocking,
> > > cpu).list;
> > > +
> > > +        spin_lock_irqsave(lock, flags);
> > > +
> > > +        list_for_each_entry_safe(vmx, tmp, blocked_vcpus,
> > > pi_blocking.list)
> > > +        {
> > > +            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> > > +
> > > +            if (v->domain == d)
> > > +            {
> > > +                list_del(&vmx->pi_blocking.list);
> > > +                ASSERT(vmx->pi_blocking.lock == lock);
> > > +                vmx->pi_blocking.lock = NULL;
> > > +            }
> > > +        }
> > > +
> > > +        spin_unlock_irqrestore(lock, flags);
> > > +    }
> > > 
> > So, I'm probably missing something very ver basic, but I don't see
> > what's the reason why we need this loop... can't we arrange for
> > checking
> > 
> >  list_empty(&v->arch.hvm_vmx.pi_blocking.list)
> Yes, I also cannot find the reason why can't we use this good
> suggestion, Except we need use list_del_init() instead of
> list_del() in the current code. 
>
Yes, I saw that, and it's well worth doing that, to get rid of the
loop. :-)

> Or we can just check whether
> ' vmx->pi_blocking.lock ' is NULL? 
>
I guess that will work as well. Still, if it were me doing this, I'd go
for the list_del_init()/list_empty() approach.

> I total don't know why I
> missed it! :)
> 
:-)

Regards,
Dario
diff mbox

Patch

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 4862b13..e74b3e7 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -248,6 +248,36 @@  void vmx_pi_hooks_deassign(struct domain *d)
     d->arch.hvm_domain.vmx.pi_switch_to = NULL;
 }
 
+static void vmx_pi_blocking_list_cleanup(struct domain *d)
+{
+    unsigned int cpu;
+
+    for_each_online_cpu ( cpu )
+    {
+        struct vcpu *v;
+        unsigned long flags;
+        struct arch_vmx_struct *vmx, *tmp;
+        spinlock_t *lock = &per_cpu(vmx_pi_blocking, cpu).lock;
+        struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking, cpu).list;
+
+        spin_lock_irqsave(lock, flags);
+
+        list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
+        {
+            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
+
+            if (v->domain == d)
+            {
+                list_del(&vmx->pi_blocking.list);
+                ASSERT(vmx->pi_blocking.lock == lock);
+                vmx->pi_blocking.lock = NULL;
+            }
+        }
+
+        spin_unlock_irqrestore(lock, flags);
+    }
+}
+
 static int vmx_domain_initialise(struct domain *d)
 {
     int rc;
@@ -265,6 +295,8 @@  static int vmx_domain_initialise(struct domain *d)
 
 static void vmx_domain_destroy(struct domain *d)
 {
+    vmx_pi_blocking_list_cleanup(d);
+
     if ( !has_vlapic(d) )
         return;