diff mbox

[v2] x86/vpmu: add cpu hot unplug notifier for vpmu

Message ID 82D7661F83C1A047AF7DC287873BF1E167D01B03@SHSMSX101.ccr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Luwei Kang May 18, 2017, 11:51 a.m. UTC
> >>> On 17.05.17 at 17:57, <luwei.kang@intel.com> wrote:
> > @@ -581,9 +582,14 @@ static void vpmu_arch_destroy(struct vcpu *v)
> >
> >      if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
> >      {
> > -        /* Unload VPMU first. This will stop counters */
> > -        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
> > -                         vpmu_save_force, v, 1);
> > +        /*
> > +         * Unload VPMU first if VPMU_CONTEXT_LOADED being set.
> > +         * This will stop counters.
> > +         */
> > +        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
> > +            on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
> > +                             vpmu_save_force, v, 1);
> > +
> >           vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
> >      }
> >  }
> 
> So this is a good step towards what was requested during v1 review, provided it is correct (I'll let Boris comment). You didn't,
> however, do anything about the other unguarded last_pcpu uses (in vpmu_load() and upwards from the code above in
> vpmu_arch_destroy()). These _may_ be implicitly fine, but if so please at least add suitable ASSERT()s.
> 

Hi Jan,
    Thanks for your reply. I think I understand the issue you mentioned. But sorry,  I am not very clear what is your solution from your description.
    At first, I want to change like this:
    In summary, I think it is enough to solve the issue in vpmu_load() and vpmu_arch_destroy().
    After cpu_callback() function, per_cpu(last_vcpu, vpmu->last_pcpu) will be NULL and VPMU_CONTEXT_LOADED will be clear.
    In vpmu_arch_destroy(), there will not make remote call to clear last.
    In vpmu_load(), remote call will guarded by VPMU_CONTEXT_LOADED flag check. As for vpmu->last_pcpu, we can't use some random online one to produce false.
    What is your opinion?

Thanks,
Luwei Kang

Comments

Jan Beulich May 18, 2017, 1:06 p.m. UTC | #1
>>> On 18.05.17 at 13:51, <luwei.kang@intel.com> wrote:
>> >>> On 17.05.17 at 17:57, <luwei.kang@intel.com> wrote:
>> > @@ -581,9 +582,14 @@ static void vpmu_arch_destroy(struct vcpu *v)
>> >
>> >      if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
>> >      {
>> > -        /* Unload VPMU first. This will stop counters */
>> > -        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
>> > -                         vpmu_save_force, v, 1);
>> > +        /*
>> > +         * Unload VPMU first if VPMU_CONTEXT_LOADED being set.
>> > +         * This will stop counters.
>> > +         */
>> > +        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
>> > +            on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
>> > +                             vpmu_save_force, v, 1);
>> > +
>> >           vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
>> >      }
>> >  }
>> 
>> So this is a good step towards what was requested during v1 review, provided 
> it is correct (I'll let Boris comment). You didn't,
>> however, do anything about the other unguarded last_pcpu uses (in 
> vpmu_load() and upwards from the code above in
>> vpmu_arch_destroy()). These _may_ be implicitly fine, but if so please at 
> least add suitable ASSERT()s.
>> 
> 
> Hi Jan,
>     Thanks for your reply. I think I understand the issue you mentioned. But 
> sorry,  I am not very clear what is your solution from your description.
>     At first, I want to change like this:
> --- a/xen/arch/x86/cpu/vpmu.c
> +++ b/xen/arch/x86/cpu/vpmu.c
> @@ -859,6 +859,7 @@ static int cpu_callback(
>      {
>          vpmu_save_force(vcpu);
>          vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
> +        per_cpu(last_vcpu, cpu) = NULL;        // OR: this_cpu(last_vcpu) = NULL;
>      }
>     As you mentioned in before comments, it has been done in vpmu_save_force(). So this change is unnecessary.

Indeed. But all I was talking is last_pcpu (whereas you once again
talk about last_vcpu).

>     In summary, I think it is enough to solve the issue in vpmu_load() and vpmu_arch_destroy().

That's what I alluded to in my reply.

>     After cpu_callback() function, per_cpu(last_vcpu, vpmu->last_pcpu)
> will be NULL

No. per_cpu(..., <offlined-pcpu>) simply is invalid.

> and VPMU_CONTEXT_LOADED will be clear.
>     In vpmu_arch_destroy(), there will not make remote call to clear last.

I don't understand this sentence.

>     In vpmu_load(), remote call will guarded by VPMU_CONTEXT_LOADED flag check. As for vpmu->last_pcpu, we can't use some random online one to produce false.
>     What is your opinion?

I continue to think that it needs to be made sure last_pcpu is valid
before using it for anything. Agreed, my previous suggestion of
simply storing an invalid value was not very useful, as the
questionable comparison is != (when making the suggestion I
did wrongly rememeber it to be == ), but that doesn't eliminate
the need to sanity check the value before use. Perhaps all that's
needed are a couple of cpu_online() checks.

Jan
Jan Beulich May 18, 2017, 1:19 p.m. UTC | #2
>>> On 18.05.17 at 15:06, <JBeulich@suse.com> wrote:
>>>> On 18.05.17 at 13:51, <luwei.kang@intel.com> wrote:
>>     In vpmu_load(), remote call will guarded by VPMU_CONTEXT_LOADED flag 
> check. As for vpmu->last_pcpu, we can't use some random online one to produce 
> false.
>>     What is your opinion?
> 
> I continue to think that it needs to be made sure last_pcpu is valid
> before using it for anything. Agreed, my previous suggestion of
> simply storing an invalid value was not very useful, as the
> questionable comparison is != (when making the suggestion I
> did wrongly rememeber it to be == ),

And I was wrong here and right originally:

int vpmu_load(struct vcpu *v, bool_t from_guest)
{
    ...
    /* First time this VCPU is running here */
    if ( vpmu->last_pcpu != pcpu )
    {

If last_pcpu was NR_CPUS or nr_cpu_ids or any other always
invalid value, this condition would be true, which is exactly what
we want. Whereas if you leave the field alone, and another (or
the same) CPU comes (back) up with that number, we may
wrongly not enter the if() body.

Jan
Luwei Kang May 18, 2017, 2:23 p.m. UTC | #3
> > --- a/xen/arch/x86/cpu/vpmu.c
> > +++ b/xen/arch/x86/cpu/vpmu.c
> > @@ -859,6 +859,7 @@ static int cpu_callback(
> >      {
> >          vpmu_save_force(vcpu);
> >          vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
> > +        per_cpu(last_vcpu, cpu) = NULL;        // OR: this_cpu(last_vcpu) = NULL;
> >      }
> >     As you mentioned in before comments, it has been done in vpmu_save_force(). So this change is unnecessary.
> 
> Indeed. But all I was talking is last_pcpu (whereas you once again talk about last_vcpu).
> 
> >     In summary, I think it is enough to solve the issue in vpmu_load() and vpmu_arch_destroy().
> 
> That's what I alluded to in my reply.
> 
> >     After cpu_callback() function, per_cpu(last_vcpu, vpmu->last_pcpu)
> > will be NULL
> 
> No. per_cpu(..., <offlined-pcpu>) simply is invalid.
> 
> > and VPMU_CONTEXT_LOADED will be clear.
> >     In vpmu_arch_destroy(), there will not make remote call to clear last.
> 
> I don't understand this sentence.

I mean per_cpu(..., <offlined-pcpu>) will be NULL after cpu_callback(), so that "per_cpu(last_vcpu, vpmu->last_pcpu) == v" check in 
vpmu_arch_destroy() will be fail when last_pcpu is the offlined pCPU. Or, it may make a remote call to clear the last_vpcu (vpmu_clear_last()).
But I don't understand why simply is invalid, last_vcpu set to NULL is presented in source code.  How to comprehend it?

Thanks,
Luwei Kang

> 
> >     In vpmu_load(), remote call will guarded by VPMU_CONTEXT_LOADED flag check. As for vpmu->last_pcpu, we can't use some
> random online one to produce false.
> >     What is your opinion?
> 
> I continue to think that it needs to be made sure last_pcpu is valid before using it for anything. Agreed, my previous suggestion of
> simply storing an invalid value was not very useful, as the questionable comparison is != (when making the suggestion I did wrongly
> rememeber it to be == ), but that doesn't eliminate the need to sanity check the value before use. Perhaps all that's needed are a
> couple of cpu_online() checks.
> 
> Jan
Jan Beulich May 19, 2017, 6:21 a.m. UTC | #4
>>> On 18.05.17 at 16:23, <luwei.kang@intel.com> wrote:
>> > --- a/xen/arch/x86/cpu/vpmu.c
>> > +++ b/xen/arch/x86/cpu/vpmu.c
>> > @@ -859,6 +859,7 @@ static int cpu_callback(
>> >      {
>> >          vpmu_save_force(vcpu);
>> >          vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
>> > +        per_cpu(last_vcpu, cpu) = NULL;        // OR: this_cpu(last_vcpu) 
> = NULL;
>> >      }
>> >     As you mentioned in before comments, it has been done in 
> vpmu_save_force(). So this change is unnecessary.
>> 
>> Indeed. But all I was talking is last_pcpu (whereas you once again talk 
> about last_vcpu).
>> 
>> >     In summary, I think it is enough to solve the issue in vpmu_load() and 
> vpmu_arch_destroy().
>> 
>> That's what I alluded to in my reply.
>> 
>> >     After cpu_callback() function, per_cpu(last_vcpu, vpmu->last_pcpu)
>> > will be NULL
>> 
>> No. per_cpu(..., <offlined-pcpu>) simply is invalid.
>> 
>> > and VPMU_CONTEXT_LOADED will be clear.
>> >     In vpmu_arch_destroy(), there will not make remote call to clear last.
>> 
>> I don't understand this sentence.
> 
> I mean per_cpu(..., <offlined-pcpu>) will be NULL after cpu_callback(), so that 
> "per_cpu(last_vcpu, vpmu->last_pcpu) == v" check in 
> vpmu_arch_destroy() will be fail when last_pcpu is the offlined pCPU. Or, it 
> may make a remote call to clear the last_vpcu (vpmu_clear_last()).
> But I don't understand why simply is invalid, last_vcpu set to NULL is 
> presented in source code.  How to comprehend it?

per_cpu(..., <offlined-pcpu>) will fault once the CPU is actually
offline. See free_percpu_area().

Jan
Luwei Kang May 22, 2017, 2:08 a.m. UTC | #5
> >>> On 18.05.17 at 16:23, <luwei.kang@intel.com> wrote:
> >> > --- a/xen/arch/x86/cpu/vpmu.c
> >> > +++ b/xen/arch/x86/cpu/vpmu.c
> >> > @@ -859,6 +859,7 @@ static int cpu_callback(
> >> >      {
> >> >          vpmu_save_force(vcpu);
> >> >          vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
> >> > +        per_cpu(last_vcpu, cpu) = NULL;        // OR: this_cpu(last_vcpu)
> > = NULL;
> >> >      }
> >> >     As you mentioned in before comments, it has been done in
> > vpmu_save_force(). So this change is unnecessary.
> >>
> >> Indeed. But all I was talking is last_pcpu (whereas you once again
> >> talk
> > about last_vcpu).
> >>
> >> >     In summary, I think it is enough to solve the issue in
> >> > vpmu_load() and
> > vpmu_arch_destroy().
> >>
> >> That's what I alluded to in my reply.
> >>
> >> >     After cpu_callback() function, per_cpu(last_vcpu,
> >> > vpmu->last_pcpu) will be NULL
> >>
> >> No. per_cpu(..., <offlined-pcpu>) simply is invalid.
> >>
> >> > and VPMU_CONTEXT_LOADED will be clear.
> >> >     In vpmu_arch_destroy(), there will not make remote call to clear last.
> >>
> >> I don't understand this sentence.
> >
> > I mean per_cpu(..., <offlined-pcpu>) will be NULL after
> > cpu_callback(), so that "per_cpu(last_vcpu, vpmu->last_pcpu) == v"
> > check in
> > vpmu_arch_destroy() will be fail when last_pcpu is the offlined pCPU.
> > Or, it may make a remote call to clear the last_vpcu (vpmu_clear_last()).
> > But I don't understand why simply is invalid, last_vcpu set to NULL is
> > presented in source code.  How to comprehend it?
> 
> per_cpu(..., <offlined-pcpu>) will fault once the CPU is actually offline. See free_percpu_area().

Oh, yes. Thanks for your clarification.

Thanks,
Luwei Kang
diff mbox

Patch

--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -859,6 +859,7 @@  static int cpu_callback(
     {
         vpmu_save_force(vcpu);
         vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+        per_cpu(last_vcpu, cpu) = NULL;        // OR: this_cpu(last_vcpu) = NULL;
     }
    As you mentioned in before comments, it has been done in vpmu_save_force(). So this change is unnecessary.