Message ID | 1457090525-22468-1-git-send-email-shuai.ruan@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 04.03.16 at 12:22, <shuai.ruan@linux.intel.com> wrote: > The offset at which components xsaved by xsave[sc] are not fixed. > So when when a save with v->fpu_dirtied set is followed by one > with v->fpu_dirtied clear, non-lazy xsave[sc] may overwriting data > written by the lazy one. > > The solution is when xsave[sc] is enabled and taking xcr0_accum into > consideration, if guest has ever used lazy states, vcpu_xsave_mask will > return XSTATE_ALL. Otherwise vcpu_xsave_mask will return XSTATE_NONLAZY. > This may cause overhead save on lazy states which will cause performance > impact. As xsavec support code will be cleaned up(reason is list below), > so the patch only take xsaves into consideration. > > After doing some performance test on xsavec and xsaveopt(suggested by jan), > the result show xsaveopt performs better than xsavec. So next step clean up > xsavec suppot code in xen and disable xsaves (xsaves should be used when > supervised state is introduced). I'd rather see this being right away except of just getting mentioned here; I don't expect this to be too difficult a change. > --- a/xen/arch/x86/i387.c > +++ b/xen/arch/x86/i387.c > @@ -118,7 +118,8 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v) > if ( v->fpu_dirtied ) > return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY; > > - return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0; > + return cpu_has_xsaves && (v->arch.xcr0_accum & XSTATE_LAZY & ~XSTATE_FP_SSE) ? If (here or in the suggested future patch) such an expression is needed, I'd highly recommend accompanying it by a comment explaining what this is about (i.e. specifically why the low two bits can be ignored for the purpose here). In any event you will need to make the commit description (which currently just says "if guest has ever used lazy states") match this. Jan
diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c index c29d0fa..0084578 100644 --- a/xen/arch/x86/i387.c +++ b/xen/arch/x86/i387.c @@ -118,7 +118,8 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v) if ( v->fpu_dirtied ) return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY; - return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0; + return cpu_has_xsaves && (v->arch.xcr0_accum & XSTATE_LAZY & ~XSTATE_FP_SSE) ? + XSTATE_ALL : XSTATE_NONLAZY; } /* Save x87 extended state */
The offset at which components xsaved by xsave[sc] are not fixed. So when when a save with v->fpu_dirtied set is followed by one with v->fpu_dirtied clear, non-lazy xsave[sc] may overwriting data written by the lazy one. The solution is when xsave[sc] is enabled and taking xcr0_accum into consideration, if guest has ever used lazy states, vcpu_xsave_mask will return XSTATE_ALL. Otherwise vcpu_xsave_mask will return XSTATE_NONLAZY. This may cause overhead save on lazy states which will cause performance impact. As xsavec support code will be cleaned up(reason is list below), so the patch only take xsaves into consideration. After doing some performance test on xsavec and xsaveopt(suggested by jan), the result show xsaveopt performs better than xsavec. So next step clean up xsavec suppot code in xen and disable xsaves (xsaves should be used when supervised state is introduced). Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com> Reported-by: Jan Beulich <jbeulich@suse.com> --- V2: Address comments from Jan: Add performance impact and future steps in description. xen/arch/x86/i387.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)