diff mbox

[2/3] x86/xsaves: fix overwriting between non-lazy/lazy xsave[sc]

Message ID 1456119321-10384-3-git-send-email-shuai.ruan@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Shuai Ruan Feb. 22, 2016, 5:35 a.m. UTC
The offset at which components xsaved by xsave[sc] are not fixed.
So when when a save with v->fpu_dirtied set is followed by one
with v->fpu_dirtied clear, non-lazy xsave[sc] may overwriting data
written by the lazy one.

When xsave[sc] is enable, vcpu_xsave_mask will return XSTATE_ALL when
v->fpu_dirtied clear and v->arch.nonlazy_xstate_used is set.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
---
 xen/arch/x86/i387.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jan Beulich Feb. 22, 2016, 5:08 p.m. UTC | #1
>>> On 22.02.16 at 06:35, <shuai.ruan@linux.intel.com> wrote:

First of all I wonder on what basis you collect your Cc lists on
patches.

> --- a/xen/arch/x86/i387.c
> +++ b/xen/arch/x86/i387.c
> @@ -118,7 +118,7 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v)
>      if ( v->fpu_dirtied )
>          return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY;
>  
> -    return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0;
> +    return ( cpu_has_xsaves || cpu_has_xsavec ) ? XSTATE_ALL : XSTATE_NONLAZY;
>  }

The description lacks any mention of the performance impact,
and what investigation was done to find ways to perhaps
overcome this. For example, regardless of cpu_has_xsaves,
do we really always need to _use_ XSAVES?

Also - coding style (stray spaces inside parentheses).

Jan
Shuai Ruan Feb. 24, 2016, 7:05 a.m. UTC | #2
On Mon, Feb 22, 2016 at 10:08:52AM -0700, Jan Beulich wrote:
> >>> On 22.02.16 at 06:35, <shuai.ruan@linux.intel.com> wrote:
> 
> First of all I wonder on what basis you collect your Cc lists on
> patches.
> 
I send the bugs-fix patch as whole. I just get the Cc lists using the
script based on the whole patchset. May be I will send the patch
seperately.
> > --- a/xen/arch/x86/i387.c
> > +++ b/xen/arch/x86/i387.c
> > @@ -118,7 +118,7 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v)
> >      if ( v->fpu_dirtied )
> >          return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY;
> >  
> > -    return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0;
> > +    return ( cpu_has_xsaves || cpu_has_xsavec ) ? XSTATE_ALL : XSTATE_NONLAZY;
> >  }
> 
> The description lacks any mention of the performance impact,
> and what investigation was done to find ways to perhaps
> overcome this. For example, regardless of cpu_has_xsaves,
> do we really always need to _use_ XSAVES?
> 
Currently no supervisor xstates is enabled in xen or even in
guest os. Using xsaves is a little ahead (xsavec may used).  
xsavec may also cause overwriting problem like xsaves.
I will add performance impact in the description. 
I am still thinking of a better way to overcome the overhead xsave
(But have not get a better solution yet).

Thanks
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
Jan Beulich Feb. 24, 2016, 9:16 a.m. UTC | #3
>>> On 24.02.16 at 08:05, <shuai.ruan@linux.intel.com> wrote:
> On Mon, Feb 22, 2016 at 10:08:52AM -0700, Jan Beulich wrote:
>> >>> On 22.02.16 at 06:35, <shuai.ruan@linux.intel.com> wrote:
>> 
>> First of all I wonder on what basis you collect your Cc lists on
>> patches.
>> 
> I send the bugs-fix patch as whole. I just get the Cc lists using the
> script based on the whole patchset. May be I will send the patch
> seperately.

Thank you. Please also see
http://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches
and in particular also Konrad's recent suggested adjustments
http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg02877.html
(perhaps including the other parts of the thread).

>> > --- a/xen/arch/x86/i387.c
>> > +++ b/xen/arch/x86/i387.c
>> > @@ -118,7 +118,7 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v)
>> >      if ( v->fpu_dirtied )
>> >          return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY;
>> >  
>> > -    return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0;
>> > +    return ( cpu_has_xsaves || cpu_has_xsavec ) ? XSTATE_ALL : XSTATE_NONLAZY;
>> >  }
>> 
>> The description lacks any mention of the performance impact,
>> and what investigation was done to find ways to perhaps
>> overcome this. For example, regardless of cpu_has_xsaves,
>> do we really always need to _use_ XSAVES?
>> 
> Currently no supervisor xstates is enabled in xen or even in
> guest os. Using xsaves is a little ahead (xsavec may used).  
> xsavec may also cause overwriting problem like xsaves.
> I will add performance impact in the description. 
> I am still thinking of a better way to overcome the overhead xsave
> (But have not get a better solution yet).

I was thinking that taking into consideration the features a guest
has ever used (i.e. v->arch.xcr0_accum) to decide which variant
to use may be a worthwhile avenue to explore.

Jan
Shuai Ruan Feb. 26, 2016, 7:41 a.m. UTC | #4
On Wed, Feb 24, 2016 at 02:16:38AM -0700, Jan Beulich wrote:
> > I send the bugs-fix patch as whole. I just get the Cc lists using the
> > script based on the whole patchset. May be I will send the patch
> > seperately.
> 
> Thank you. Please also see
> http://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches
> and in particular also Konrad's recent suggested adjustments
> http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg02877.html
> (perhaps including the other parts of the thread).
Thanks.
> 
> >> > --- a/xen/arch/x86/i387.c
> >> > +++ b/xen/arch/x86/i387.c
> >> > @@ -118,7 +118,7 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v)
> >> >      if ( v->fpu_dirtied )
> >> >          return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY;
> >> >  
> >> > -    return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0;
> >> > +    return ( cpu_has_xsaves || cpu_has_xsavec ) ? XSTATE_ALL : XSTATE_NONLAZY;
> >> >  }
> >> 
> >> The description lacks any mention of the performance impact,
> >> and what investigation was done to find ways to perhaps
> >> overcome this. For example, regardless of cpu_has_xsaves,
> >> do we really always need to _use_ XSAVES?
> >> 
> > Currently no supervisor xstates is enabled in xen or even in
> > guest os. Using xsaves is a little ahead (xsavec may used).  
> > xsavec may also cause overwriting problem like xsaves.
> > I will add performance impact in the description. 
> > I am still thinking of a better way to overcome the overhead xsave
> > (But have not get a better solution yet).
> 
> I was thinking that taking into consideration the features a guest
> has ever used (i.e. v->arch.xcr0_accum) to decide which variant
> to use may be a worthwhile avenue to explore.
> 
Oh, Thanks for your suggestion.
You mean when (v->arch.xcr0_accum & (XSTATE_LAZY & ~XSTATE_FP_SSE)) return false,
we can return XSTATE_NONLAZY in vcpu_xsave_mask when using xsave[cs]
otherwise return XSTATE_ALL.
It means we can save the area safely, if the guest has ever use 
XSTATE_NONLAZY | XSTATE_FP_SSE only. 
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
Jan Beulich Feb. 26, 2016, 8:42 a.m. UTC | #5
>>> On 26.02.16 at 08:41, <shuai.ruan@linux.intel.com> wrote:
> On Wed, Feb 24, 2016 at 02:16:38AM -0700, Jan Beulich wrote:
>> >> > --- a/xen/arch/x86/i387.c
>> >> > +++ b/xen/arch/x86/i387.c
>> >> > @@ -118,7 +118,7 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v)
>> >> >      if ( v->fpu_dirtied )
>> >> >          return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY;
>> >> >  
>> >> > -    return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0;
>> >> > +    return ( cpu_has_xsaves || cpu_has_xsavec ) ? XSTATE_ALL : XSTATE_NONLAZY;
>> >> >  }
>> >> 
>> >> The description lacks any mention of the performance impact,
>> >> and what investigation was done to find ways to perhaps
>> >> overcome this. For example, regardless of cpu_has_xsaves,
>> >> do we really always need to _use_ XSAVES?
>> >> 
>> > Currently no supervisor xstates is enabled in xen or even in
>> > guest os. Using xsaves is a little ahead (xsavec may used).  
>> > xsavec may also cause overwriting problem like xsaves.
>> > I will add performance impact in the description. 
>> > I am still thinking of a better way to overcome the overhead xsave
>> > (But have not get a better solution yet).
>> 
>> I was thinking that taking into consideration the features a guest
>> has ever used (i.e. v->arch.xcr0_accum) to decide which variant
>> to use may be a worthwhile avenue to explore.
>> 
> Oh, Thanks for your suggestion.
> You mean when (v->arch.xcr0_accum & (XSTATE_LAZY & ~XSTATE_FP_SSE)) return 
> false,
> we can return XSTATE_NONLAZY in vcpu_xsave_mask when using xsave[cs]
> otherwise return XSTATE_ALL.
> It means we can save the area safely, if the guest has ever use 
> XSTATE_NONLAZY | XSTATE_FP_SSE only. 

That's one step in the right direction. But the main difference
between XSAVE/XSAVEOPT and XSAVEC/XSAVES is that the former
can be used incrementally, while the latter can't. And I highly
doubt the modified optimization the latter two support will kick in
very often, since there's quite good a chance that the guest
itself executed another one of these between two of our instances.
Which to me means it should at least be investigated whether using
XSAVEOPT in favor of XSAVEC wouldn't produce better performance,
and whether XSAVES wouldn't better be used only if the guest uses
a component which can only be saved that way.

Jan
Shuai Ruan Feb. 29, 2016, 9:06 a.m. UTC | #6
On Fri, Feb 26, 2016 at 01:42:35AM -0700, Jan Beulich wrote:
> >>> On 26.02.16 at 08:41, <shuai.ruan@linux.intel.com> wrote:
> > On Wed, Feb 24, 2016 at 02:16:38AM -0700, Jan Beulich wrote:
> >> >> The description lacks any mention of the performance impact,
> >> >> and what investigation was done to find ways to perhaps
> >> >> overcome this. For example, regardless of cpu_has_xsaves,
> >> >> do we really always need to _use_ XSAVES?
> >> >> 
> >> > Currently no supervisor xstates is enabled in xen or even in
> >> > guest os. Using xsaves is a little ahead (xsavec may used).  
> >> > xsavec may also cause overwriting problem like xsaves.
> >> > I will add performance impact in the description. 
> >> > I am still thinking of a better way to overcome the overhead xsave
> >> > (But have not get a better solution yet).
> >> 
> >> I was thinking that taking into consideration the features a guest
> >> has ever used (i.e. v->arch.xcr0_accum) to decide which variant
> >> to use may be a worthwhile avenue to explore.
> >> 
> > Oh, Thanks for your suggestion.
> > You mean when (v->arch.xcr0_accum & (XSTATE_LAZY & ~XSTATE_FP_SSE)) return 
> > false,
> > we can return XSTATE_NONLAZY in vcpu_xsave_mask when using xsave[cs]
> > otherwise return XSTATE_ALL.
> > It means we can save the area safely, if the guest has ever use 
> > XSTATE_NONLAZY | XSTATE_FP_SSE only. 
> 
> That's one step in the right direction. But the main difference
> between XSAVE/XSAVEOPT and XSAVEC/XSAVES is that the former
> can be used incrementally, while the latter can't. And I highly
> doubt the modified optimization the latter two support will kick in
> very often, since there's quite good a chance that the guest
> itself executed another one of these between two of our instances.
> Which to me means it should at least be investigated whether using
> XSAVEOPT in favor of XSAVEC wouldn't produce better performance,
> and whether XSAVES wouldn't better be used only if the guest uses
> a component which can only be saved that way.
> 
Thanks. 

Ok , I will do the performace test. And can you suggest me some workload/benchmark 
can be used here to the xsave related performance test ?


Other thing is from the text above, I guess that the best way to solve xsave[cs]
problem is:
1. use xsaveopt instead of xsave[cs] nowdays.
2. use xsaves whenever a component can only be saved that way( when some
   supervised components are supported in xen).
3. no xsavec support.
4. xsavec/xsaves feature will expose guest os if point 2 is ok.

If the above 4 points are ok to you, then I will write a patch to do
this.

> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
Jan Beulich Feb. 29, 2016, 9:33 a.m. UTC | #7
>>> On 29.02.16 at 10:06, <shuai.ruan@linux.intel.com> wrote:
> On Fri, Feb 26, 2016 at 01:42:35AM -0700, Jan Beulich wrote:
>> >>> On 26.02.16 at 08:41, <shuai.ruan@linux.intel.com> wrote:
>> > On Wed, Feb 24, 2016 at 02:16:38AM -0700, Jan Beulich wrote:
>> >> >> The description lacks any mention of the performance impact,
>> >> >> and what investigation was done to find ways to perhaps
>> >> >> overcome this. For example, regardless of cpu_has_xsaves,
>> >> >> do we really always need to _use_ XSAVES?
>> >> >> 
>> >> > Currently no supervisor xstates is enabled in xen or even in
>> >> > guest os. Using xsaves is a little ahead (xsavec may used).  
>> >> > xsavec may also cause overwriting problem like xsaves.
>> >> > I will add performance impact in the description. 
>> >> > I am still thinking of a better way to overcome the overhead xsave
>> >> > (But have not get a better solution yet).
>> >> 
>> >> I was thinking that taking into consideration the features a guest
>> >> has ever used (i.e. v->arch.xcr0_accum) to decide which variant
>> >> to use may be a worthwhile avenue to explore.
>> >> 
>> > Oh, Thanks for your suggestion.
>> > You mean when (v->arch.xcr0_accum & (XSTATE_LAZY & ~XSTATE_FP_SSE)) return 
>> > false,
>> > we can return XSTATE_NONLAZY in vcpu_xsave_mask when using xsave[cs]
>> > otherwise return XSTATE_ALL.
>> > It means we can save the area safely, if the guest has ever use 
>> > XSTATE_NONLAZY | XSTATE_FP_SSE only. 
>> 
>> That's one step in the right direction. But the main difference
>> between XSAVE/XSAVEOPT and XSAVEC/XSAVES is that the former
>> can be used incrementally, while the latter can't. And I highly
>> doubt the modified optimization the latter two support will kick in
>> very often, since there's quite good a chance that the guest
>> itself executed another one of these between two of our instances.
>> Which to me means it should at least be investigated whether using
>> XSAVEOPT in favor of XSAVEC wouldn't produce better performance,
>> and whether XSAVES wouldn't better be used only if the guest uses
>> a component which can only be saved that way.
>> 
> Thanks. 
> 
> Ok , I will do the performace test. And can you suggest me some 
> workload/benchmark 
> can be used here to the xsave related performance test ?

Measuring just instruction execution time should be fine for the
purpose here, I think.

> Other thing is from the text above, I guess that the best way to solve 
> xsave[cs]
> problem is:
> 1. use xsaveopt instead of xsave[cs] nowdays.
> 2. use xsaves whenever a component can only be saved that way( when some
>    supervised components are supported in xen).
> 3. no xsavec support.
> 4. xsavec/xsaves feature will expose guest os if point 2 is ok.

Provided this results in better performance than the alternative(s),
yes.

Jan
Shuai Ruan March 2, 2016, 10:31 a.m. UTC | #8
On Mon, Feb 29, 2016 at 02:33:49AM -0700, Jan Beulich wrote:
> > Thanks. 
> > 
> > Ok , I will do the performace test. And can you suggest me some 
> > workload/benchmark 
> > can be used here to the xsave related performance test ?
> 
> Measuring just instruction execution time should be fine for the
> purpose here, I think.
> 
I do the test as follow:

Xsave time measure method:
use function get_s_time, before/after xsave to get the difference. 

1. only enable xsaveopt in xen, start two guest(no workload running in
guest). 

xsave time land into two ranges [28 - 40] and [110 - 140](most in
110-120)

2. only enable xsavec in xen
xsave time land into two ranges [30 - 50] and [120 - 140]

I use a fragment of the test result and do a rough estimation. 

And I also try to get the average execution time (calculate very 10 xsave
times), the result is alomst the same as above.

The result can show xsaveopt has better performance than xsavec ( if the
test/measure method is corret ).

> > Other thing is from the text above, I guess that the best way to solve 
> > xsave[cs]
> > problem is:
> > 1. use xsaveopt instead of xsave[cs] nowdays.
> > 2. use xsaves whenever a component can only be saved that way( when some
> >    supervised components are supported in xen).
> > 3. no xsavec support.
> > 4. xsavec/xsaves feature will expose guest os if point 2 is ok.
> 
> Provided this results in better performance than the alternative(s),
> yes.
> 
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
diff mbox

Patch

diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c
index 67016c9..e3a7bc0 100644
--- a/xen/arch/x86/i387.c
+++ b/xen/arch/x86/i387.c
@@ -118,7 +118,7 @@  static inline uint64_t vcpu_xsave_mask(const struct vcpu *v)
     if ( v->fpu_dirtied )
         return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY;
 
-    return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0;
+    return ( cpu_has_xsaves || cpu_has_xsavec ) ? XSTATE_ALL : XSTATE_NONLAZY;
 }
 
 /* Save x87 extended state */