Message ID | 1456119321-10384-3-git-send-email-shuai.ruan@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 22.02.16 at 06:35, <shuai.ruan@linux.intel.com> wrote: First of all I wonder on what basis you collect your Cc lists on patches. > --- a/xen/arch/x86/i387.c > +++ b/xen/arch/x86/i387.c > @@ -118,7 +118,7 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v) > if ( v->fpu_dirtied ) > return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY; > > - return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0; > + return ( cpu_has_xsaves || cpu_has_xsavec ) ? XSTATE_ALL : XSTATE_NONLAZY; > } The description lacks any mention of the performance impact, and what investigation was done to find ways to perhaps overcome this. For example, regardless of cpu_has_xsaves, do we really always need to _use_ XSAVES? Also - coding style (stray spaces inside parentheses). Jan
On Mon, Feb 22, 2016 at 10:08:52AM -0700, Jan Beulich wrote: > >>> On 22.02.16 at 06:35, <shuai.ruan@linux.intel.com> wrote: > > First of all I wonder on what basis you collect your Cc lists on > patches. > I send the bugs-fix patch as whole. I just get the Cc lists using the script based on the whole patchset. May be I will send the patch seperately. > > --- a/xen/arch/x86/i387.c > > +++ b/xen/arch/x86/i387.c > > @@ -118,7 +118,7 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v) > > if ( v->fpu_dirtied ) > > return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY; > > > > - return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0; > > + return ( cpu_has_xsaves || cpu_has_xsavec ) ? XSTATE_ALL : XSTATE_NONLAZY; > > } > > The description lacks any mention of the performance impact, > and what investigation was done to find ways to perhaps > overcome this. For example, regardless of cpu_has_xsaves, > do we really always need to _use_ XSAVES? > Currently no supervisor xstates is enabled in xen or even in guest os. Using xsaves is a little ahead (xsavec may used). xsavec may also cause overwriting problem like xsaves. I will add performance impact in the description. I am still thinking of a better way to overcome the overhead xsave (But have not get a better solution yet). Thanks > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
>>> On 24.02.16 at 08:05, <shuai.ruan@linux.intel.com> wrote: > On Mon, Feb 22, 2016 at 10:08:52AM -0700, Jan Beulich wrote: >> >>> On 22.02.16 at 06:35, <shuai.ruan@linux.intel.com> wrote: >> >> First of all I wonder on what basis you collect your Cc lists on >> patches. >> > I send the bugs-fix patch as whole. I just get the Cc lists using the > script based on the whole patchset. May be I will send the patch > seperately. Thank you. Please also see http://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches and in particular also Konrad's recent suggested adjustments http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg02877.html (perhaps including the other parts of the thread). >> > --- a/xen/arch/x86/i387.c >> > +++ b/xen/arch/x86/i387.c >> > @@ -118,7 +118,7 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v) >> > if ( v->fpu_dirtied ) >> > return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY; >> > >> > - return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0; >> > + return ( cpu_has_xsaves || cpu_has_xsavec ) ? XSTATE_ALL : XSTATE_NONLAZY; >> > } >> >> The description lacks any mention of the performance impact, >> and what investigation was done to find ways to perhaps >> overcome this. For example, regardless of cpu_has_xsaves, >> do we really always need to _use_ XSAVES? >> > Currently no supervisor xstates is enabled in xen or even in > guest os. Using xsaves is a little ahead (xsavec may used). > xsavec may also cause overwriting problem like xsaves. > I will add performance impact in the description. > I am still thinking of a better way to overcome the overhead xsave > (But have not get a better solution yet). I was thinking that taking into consideration the features a guest has ever used (i.e. v->arch.xcr0_accum) to decide which variant to use may be a worthwhile avenue to explore. Jan
On Wed, Feb 24, 2016 at 02:16:38AM -0700, Jan Beulich wrote: > > I send the bugs-fix patch as whole. I just get the Cc lists using the > > script based on the whole patchset. May be I will send the patch > > seperately. > > Thank you. Please also see > http://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches > and in particular also Konrad's recent suggested adjustments > http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg02877.html > (perhaps including the other parts of the thread). Thanks. > > >> > --- a/xen/arch/x86/i387.c > >> > +++ b/xen/arch/x86/i387.c > >> > @@ -118,7 +118,7 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v) > >> > if ( v->fpu_dirtied ) > >> > return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY; > >> > > >> > - return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0; > >> > + return ( cpu_has_xsaves || cpu_has_xsavec ) ? XSTATE_ALL : XSTATE_NONLAZY; > >> > } > >> > >> The description lacks any mention of the performance impact, > >> and what investigation was done to find ways to perhaps > >> overcome this. For example, regardless of cpu_has_xsaves, > >> do we really always need to _use_ XSAVES? > >> > > Currently no supervisor xstates is enabled in xen or even in > > guest os. Using xsaves is a little ahead (xsavec may used). > > xsavec may also cause overwriting problem like xsaves. > > I will add performance impact in the description. > > I am still thinking of a better way to overcome the overhead xsave > > (But have not get a better solution yet). > > I was thinking that taking into consideration the features a guest > has ever used (i.e. v->arch.xcr0_accum) to decide which variant > to use may be a worthwhile avenue to explore. > Oh, Thanks for your suggestion. You mean when (v->arch.xcr0_accum & (XSTATE_LAZY & ~XSTATE_FP_SSE)) return false, we can return XSTATE_NONLAZY in vcpu_xsave_mask when using xsave[cs] otherwise return XSTATE_ALL. It means we can save the area safely, if the guest has ever use XSTATE_NONLAZY | XSTATE_FP_SSE only. > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
>>> On 26.02.16 at 08:41, <shuai.ruan@linux.intel.com> wrote: > On Wed, Feb 24, 2016 at 02:16:38AM -0700, Jan Beulich wrote: >> >> > --- a/xen/arch/x86/i387.c >> >> > +++ b/xen/arch/x86/i387.c >> >> > @@ -118,7 +118,7 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v) >> >> > if ( v->fpu_dirtied ) >> >> > return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY; >> >> > >> >> > - return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0; >> >> > + return ( cpu_has_xsaves || cpu_has_xsavec ) ? XSTATE_ALL : XSTATE_NONLAZY; >> >> > } >> >> >> >> The description lacks any mention of the performance impact, >> >> and what investigation was done to find ways to perhaps >> >> overcome this. For example, regardless of cpu_has_xsaves, >> >> do we really always need to _use_ XSAVES? >> >> >> > Currently no supervisor xstates is enabled in xen or even in >> > guest os. Using xsaves is a little ahead (xsavec may used). >> > xsavec may also cause overwriting problem like xsaves. >> > I will add performance impact in the description. >> > I am still thinking of a better way to overcome the overhead xsave >> > (But have not get a better solution yet). >> >> I was thinking that taking into consideration the features a guest >> has ever used (i.e. v->arch.xcr0_accum) to decide which variant >> to use may be a worthwhile avenue to explore. >> > Oh, Thanks for your suggestion. > You mean when (v->arch.xcr0_accum & (XSTATE_LAZY & ~XSTATE_FP_SSE)) return > false, > we can return XSTATE_NONLAZY in vcpu_xsave_mask when using xsave[cs] > otherwise return XSTATE_ALL. > It means we can save the area safely, if the guest has ever use > XSTATE_NONLAZY | XSTATE_FP_SSE only. That's one step in the right direction. But the main difference between XSAVE/XSAVEOPT and XSAVEC/XSAVES is that the former can be used incrementally, while the latter can't. And I highly doubt the modified optimization the latter two support will kick in very often, since there's quite good a chance that the guest itself executed another one of these between two of our instances. Which to me means it should at least be investigated whether using XSAVEOPT in favor of XSAVEC wouldn't produce better performance, and whether XSAVES wouldn't better be used only if the guest uses a component which can only be saved that way. Jan
On Fri, Feb 26, 2016 at 01:42:35AM -0700, Jan Beulich wrote: > >>> On 26.02.16 at 08:41, <shuai.ruan@linux.intel.com> wrote: > > On Wed, Feb 24, 2016 at 02:16:38AM -0700, Jan Beulich wrote: > >> >> The description lacks any mention of the performance impact, > >> >> and what investigation was done to find ways to perhaps > >> >> overcome this. For example, regardless of cpu_has_xsaves, > >> >> do we really always need to _use_ XSAVES? > >> >> > >> > Currently no supervisor xstates is enabled in xen or even in > >> > guest os. Using xsaves is a little ahead (xsavec may used). > >> > xsavec may also cause overwriting problem like xsaves. > >> > I will add performance impact in the description. > >> > I am still thinking of a better way to overcome the overhead xsave > >> > (But have not get a better solution yet). > >> > >> I was thinking that taking into consideration the features a guest > >> has ever used (i.e. v->arch.xcr0_accum) to decide which variant > >> to use may be a worthwhile avenue to explore. > >> > > Oh, Thanks for your suggestion. > > You mean when (v->arch.xcr0_accum & (XSTATE_LAZY & ~XSTATE_FP_SSE)) return > > false, > > we can return XSTATE_NONLAZY in vcpu_xsave_mask when using xsave[cs] > > otherwise return XSTATE_ALL. > > It means we can save the area safely, if the guest has ever use > > XSTATE_NONLAZY | XSTATE_FP_SSE only. > > That's one step in the right direction. But the main difference > between XSAVE/XSAVEOPT and XSAVEC/XSAVES is that the former > can be used incrementally, while the latter can't. And I highly > doubt the modified optimization the latter two support will kick in > very often, since there's quite good a chance that the guest > itself executed another one of these between two of our instances. > Which to me means it should at least be investigated whether using > XSAVEOPT in favor of XSAVEC wouldn't produce better performance, > and whether XSAVES wouldn't better be used only if the guest uses > a component which can only be saved that way. > Thanks. Ok , I will do the performace test. And can you suggest me some workload/benchmark can be used here to the xsave related performance test ? Other thing is from the text above, I guess that the best way to solve xsave[cs] problem is: 1. use xsaveopt instead of xsave[cs] nowdays. 2. use xsaves whenever a component can only be saved that way( when some supervised components are supported in xen). 3. no xsavec support. 4. xsavec/xsaves feature will expose guest os if point 2 is ok. If the above 4 points are ok to you, then I will write a patch to do this. > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
>>> On 29.02.16 at 10:06, <shuai.ruan@linux.intel.com> wrote: > On Fri, Feb 26, 2016 at 01:42:35AM -0700, Jan Beulich wrote: >> >>> On 26.02.16 at 08:41, <shuai.ruan@linux.intel.com> wrote: >> > On Wed, Feb 24, 2016 at 02:16:38AM -0700, Jan Beulich wrote: >> >> >> The description lacks any mention of the performance impact, >> >> >> and what investigation was done to find ways to perhaps >> >> >> overcome this. For example, regardless of cpu_has_xsaves, >> >> >> do we really always need to _use_ XSAVES? >> >> >> >> >> > Currently no supervisor xstates is enabled in xen or even in >> >> > guest os. Using xsaves is a little ahead (xsavec may used). >> >> > xsavec may also cause overwriting problem like xsaves. >> >> > I will add performance impact in the description. >> >> > I am still thinking of a better way to overcome the overhead xsave >> >> > (But have not get a better solution yet). >> >> >> >> I was thinking that taking into consideration the features a guest >> >> has ever used (i.e. v->arch.xcr0_accum) to decide which variant >> >> to use may be a worthwhile avenue to explore. >> >> >> > Oh, Thanks for your suggestion. >> > You mean when (v->arch.xcr0_accum & (XSTATE_LAZY & ~XSTATE_FP_SSE)) return >> > false, >> > we can return XSTATE_NONLAZY in vcpu_xsave_mask when using xsave[cs] >> > otherwise return XSTATE_ALL. >> > It means we can save the area safely, if the guest has ever use >> > XSTATE_NONLAZY | XSTATE_FP_SSE only. >> >> That's one step in the right direction. But the main difference >> between XSAVE/XSAVEOPT and XSAVEC/XSAVES is that the former >> can be used incrementally, while the latter can't. And I highly >> doubt the modified optimization the latter two support will kick in >> very often, since there's quite good a chance that the guest >> itself executed another one of these between two of our instances. >> Which to me means it should at least be investigated whether using >> XSAVEOPT in favor of XSAVEC wouldn't produce better performance, >> and whether XSAVES wouldn't better be used only if the guest uses >> a component which can only be saved that way. >> > Thanks. > > Ok , I will do the performace test. And can you suggest me some > workload/benchmark > can be used here to the xsave related performance test ? Measuring just instruction execution time should be fine for the purpose here, I think. > Other thing is from the text above, I guess that the best way to solve > xsave[cs] > problem is: > 1. use xsaveopt instead of xsave[cs] nowdays. > 2. use xsaves whenever a component can only be saved that way( when some > supervised components are supported in xen). > 3. no xsavec support. > 4. xsavec/xsaves feature will expose guest os if point 2 is ok. Provided this results in better performance than the alternative(s), yes. Jan
On Mon, Feb 29, 2016 at 02:33:49AM -0700, Jan Beulich wrote: > > Thanks. > > > > Ok , I will do the performace test. And can you suggest me some > > workload/benchmark > > can be used here to the xsave related performance test ? > > Measuring just instruction execution time should be fine for the > purpose here, I think. > I do the test as follow: Xsave time measure method: use function get_s_time, before/after xsave to get the difference. 1. only enable xsaveopt in xen, start two guest(no workload running in guest). xsave time land into two ranges [28 - 40] and [110 - 140](most in 110-120) 2. only enable xsavec in xen xsave time land into two ranges [30 - 50] and [120 - 140] I use a fragment of the test result and do a rough estimation. And I also try to get the average execution time (calculate very 10 xsave times), the result is alomst the same as above. The result can show xsaveopt has better performance than xsavec ( if the test/measure method is corret ). > > Other thing is from the text above, I guess that the best way to solve > > xsave[cs] > > problem is: > > 1. use xsaveopt instead of xsave[cs] nowdays. > > 2. use xsaves whenever a component can only be saved that way( when some > > supervised components are supported in xen). > > 3. no xsavec support. > > 4. xsavec/xsaves feature will expose guest os if point 2 is ok. > > Provided this results in better performance than the alternative(s), > yes. > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c index 67016c9..e3a7bc0 100644 --- a/xen/arch/x86/i387.c +++ b/xen/arch/x86/i387.c @@ -118,7 +118,7 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v) if ( v->fpu_dirtied ) return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY; - return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0; + return ( cpu_has_xsaves || cpu_has_xsavec ) ? XSTATE_ALL : XSTATE_NONLAZY; } /* Save x87 extended state */
The offset at which components xsaved by xsave[sc] are not fixed. So when when a save with v->fpu_dirtied set is followed by one with v->fpu_dirtied clear, non-lazy xsave[sc] may overwriting data written by the lazy one. When xsave[sc] is enable, vcpu_xsave_mask will return XSTATE_ALL when v->fpu_dirtied clear and v->arch.nonlazy_xstate_used is set. Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com> --- xen/arch/x86/i387.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)