Message ID | 1458270080-19493-1-git-send-email-shuai.ruan@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 18.03.16 at 04:01, <shuai.ruan@linux.intel.com> wrote: > v5: Address comments from Jan > 1. Add XSTATE_XSAVES_ONLY and using xsaves depend on whether this bits are > set in xcr0_accum > 2. Change compress logic in compress_xsave_states() depend on > !(v->arch.xcr0_accum & XSTATE_XSAVES_ONLY) && !xsave_area_compressed(src)). > 3. XSTATE_COMPACTION_ENABLED only set in xrstor(). > 4. Rebase the code on > [V4] x86/xsaves: calculate the xstate_comp_offsets base on xstate_bv > (already sent out) For they both change same code. > (I am not sure whether this rebase is ok or not). Such re-basing is okay, but the dependency on the other patch shouldn't get hidden in the revision log. Best would really be if this was a series (consisting of both patches). > @@ -222,22 +222,21 @@ void compress_xsave_states(struct vcpu *v, const void *src, unsigned int size) > u64 xstate_bv = ((const struct xsave_struct *)src)->xsave_hdr.xstate_bv; > u64 valid; > > - if ( !cpu_has_xsaves && !cpu_has_xsavec ) > + if ( !(v->arch.xcr0_accum & XSTATE_XSAVES_ONLY) && > + !xsave_area_compressed(src) ) > { > memcpy(xsave, src, size); > return; > } > > - ASSERT(!xsave_area_compressed(src)); This is bogus: The function here gets called only after validate_xstate() already succeeded. Hence the ASSERT() should imo simply get moved ahead of the if(). > /* > * Copy legacy XSAVE area, to avoid complications with CPUID > * leaves 0 and 1 in the loop below. > */ > memcpy(xsave, src, FXSAVE_SIZE); > > - /* Set XSTATE_BV and XCOMP_BV. */ > + /* Set XSTATE_BV. */ > xsave->xsave_hdr.xstate_bv = xstate_bv; > - xsave->xsave_hdr.xcomp_bv = v->arch.xcr0_accum | XSTATE_COMPACTION_ENABLED; > setup_xstate_comp(xstate_comp_offsets, xstate_bv); I see you set xcomp_bv (and hence the compaction bit) in xrstor() now, but afaict that doesn't allow you to completely drop initializing the field here, as the code there looks at the compaction bit. > @@ -267,31 +266,35 @@ void xsave(struct vcpu *v, uint64_t mask) > uint32_t hmask = mask >> 32; > uint32_t lmask = mask; > unsigned int fip_width = v->domain->arch.x87_fip_width; > -#define XSAVE(pfx) \ > - alternative_io_3(".byte " pfx "0x0f,0xae,0x27\n", /* xsave */ \ > - ".byte " pfx "0x0f,0xae,0x37\n", /* xsaveopt */ \ > - X86_FEATURE_XSAVEOPT, \ > - ".byte " pfx "0x0f,0xc7,0x27\n", /* xsavec */ \ > - X86_FEATURE_XSAVEC, \ > - ".byte " pfx "0x0f,0xc7,0x2f\n", /* xsaves */ \ > - X86_FEATURE_XSAVES, \ > - "=m" (*ptr), \ > - "a" (lmask), "d" (hmask), "D" (ptr)) > +#define XSAVE(pfx, xsave_ins) \ > + asm volatile ( ".byte " pfx xsave_ins \ > + : "=m" (*ptr) \ > + : "a" (lmask), "d" (hmask), "D" (ptr) ) > > if ( fip_width == 8 || !(mask & XSTATE_FP) ) > { > - XSAVE("0x48,"); > + if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY ) > + XSAVE("0x48,", "0x0f,0xc7,0x2f"); /* xsaves */ > + else if ( cpu_has_xsaveopt ) > + XSAVE("0x48,", "0x0f,0xae,0x37"); /* xsaveopt */ > + else > + XSAVE("0x48,", "0x0f,0xae,0x27"); /* xsave */ The latter two should still use alternative insn patching. > } > else if ( fip_width == 4 ) > { > - XSAVE(""); > + if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY ) > + XSAVE("", "0x0f,0xc7,0x2f"); > + else if ( cpu_has_xsaveopt ) > + XSAVE("", "0x0f,0xae,0x37"); > + else > + XSAVE("", "0x0f,0xae,0x27"); And this logic being repeated here (and another time below) should have made obvious that you want to leave the code here untouched and do all of your changes just to the XSAVE() macro definition. > @@ -378,25 +387,42 @@ void xrstor(struct vcpu *v, uint64_t mask) > switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) ) > { > BUILD_BUG_ON(sizeof(faults) != 4); /* Clang doesn't support %z in asm. */ > -#define XRSTOR(pfx) \ > - alternative_io("1: .byte " pfx "0x0f,0xae,0x2f\n" \ > +#define XRSTOR(pfx, xrstor_ins) \ > + asm volatile ( "1: .byte " pfx xrstor_ins"\n" \ > "3:\n" \ > " .section .fixup,\"ax\"\n" \ > "2: incl %[faults]\n" \ > " jmp 3b\n" \ > " .previous\n" \ > - _ASM_EXTABLE(1b, 2b), \ > - ".byte " pfx "0x0f,0xc7,0x1f\n", \ > - X86_FEATURE_XSAVES, \ > - ASM_OUTPUT2([mem] "+m" (*ptr), [faults] "+g" (faults)), \ > - [lmask] "a" (lmask), [hmask] "d" (hmask), \ > - [ptr] "D" (ptr)) > + _ASM_EXTABLE(1b, 2b) \ > + : [mem] "+m" (*ptr), [faults] "+g" (faults) \ > + : [lmask] "a" (lmask), [hmask] "d" (hmask), \ > + [ptr] "D" (ptr) ) > > default: > - XRSTOR("0x48,"); > + if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY ) > + { > + if ( unlikely(!(ptr->xsave_hdr.xcomp_bv > + & XSTATE_COMPACTION_ENABLED)) ) > + ptr->xsave_hdr.xcomp_bv = ptr->xsave_hdr.xstate_bv > + | XSTATE_COMPACTION_ENABLED; > + > + XRSTOR("0x48,","0x0f,0xc7,0x1f"); /* xrstors */ > + } > + else > + XRSTOR("0x48,","0x0f,0xae,0x2f"); /* xrstor */ At this point, what guarantees that xcomp_bv is zero, no matter where the state to be loaded originates from? I think at least in arch_set_info_guest(), hvm_load_cpu_ctxt(), and hvm_vcpu_reset_state() you went too far deleting code, and you really need to keep the storing of zero there. Did you draw, just for yourself, mentally or on a sheet of paper, a diagram illustrating the various state transitions? > break; > case 4: case 2: > - XRSTOR(""); > + if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY ) > + { > + if ( unlikely(!(ptr->xsave_hdr.xcomp_bv > + & XSTATE_COMPACTION_ENABLED)) ) > + ptr->xsave_hdr.xcomp_bv = ptr->xsave_hdr.xstate_bv > + | XSTATE_COMPACTION_ENABLED; > + XRSTOR("","0x0f,0xc7,0x1f"); > + } > + else > + XRSTOR("","0x0f,0xae,0x2f"); > break; Since again you repeat the same logic twice, this should again have been a signal that all your changes should go into the XRSTOR() macro. Or alternatively, since the exception fixup also differs, you may want to convert the whole logic into an XSAVES and an XSAVE path. My only really sincere request here is - as little redundancy as possible, since having to change the same thing twice in more than one place is always calling for trouble. Jan
On Tue, Mar 22, 2016 at 08:34:33AM -0600, Jan Beulich wrote: > >>> On 18.03.16 at 04:01, <shuai.ruan@linux.intel.com> wrote: > > * Copy legacy XSAVE area, to avoid complications with CPUID > > * leaves 0 and 1 in the loop below. > > */ > > memcpy(xsave, src, FXSAVE_SIZE); > > > > - /* Set XSTATE_BV and XCOMP_BV. */ > > + /* Set XSTATE_BV. */ > > xsave->xsave_hdr.xstate_bv = xstate_bv; > > - xsave->xsave_hdr.xcomp_bv = v->arch.xcr0_accum | XSTATE_COMPACTION_ENABLED; > > setup_xstate_comp(xstate_comp_offsets, xstate_bv); > > I see you set xcomp_bv (and hence the compaction bit) in xrstor() > now, but afaict that doesn't allow you to completely drop initializing > the field here, as the code there looks at the compaction bit. > > > + else > > + XRSTOR("0x48,","0x0f,0xae,0x2f"); /* xrstor */ > > At this point, what guarantees that xcomp_bv is zero, no matter > where the state to be loaded originates from? I think at least in > arch_set_info_guest(), hvm_load_cpu_ctxt(), and > hvm_vcpu_reset_state() you went too far deleting code, and you > really need to keep the storing of zero there. Did you draw, just > for yourself, mentally or on a sheet of paper, a diagram illustrating > the various state transitions? > For above two comments. The patch is base on [v4]x86/xsaves: calculate the xstate_comp_offsets base on xstate_bv and I answer your question on why caculate xstate_comp_offset based on xstate_bv in previous thread. If that is right, drop initializing xcomp_bv is ok. Now xcomp_bv can guarantee to be zero for arch_set_info_guest() and hvm_load_cpu_ctxt(). If the drop is wrong (due to my misunderstand of the SDM), I will change the if () here. For hvm_vcpu_reset_state(), we should depend on whether xsaves is used to decide whether to init xcomp_bv or not. And currently we use xcr0_accum to indicate the use of xsaves, when hvm_vcpu_reset_state() is called , can vcpu->xcr0_accum indicate using of xsaves ? I think in hvm_vcpu_reset_state(), we should leave xcomp_bv zero. > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On Wed, Mar 23, 2016 at 10:02:24AM +0800, Shuai Ruan wrote: > > > - /* Set XSTATE_BV and XCOMP_BV. */ > > > + /* Set XSTATE_BV. */ > > > xsave->xsave_hdr.xstate_bv = xstate_bv; > > > - xsave->xsave_hdr.xcomp_bv = v->arch.xcr0_accum | XSTATE_COMPACTION_ENABLED; > > > setup_xstate_comp(xstate_comp_offsets, xstate_bv); > > > > I see you set xcomp_bv (and hence the compaction bit) in xrstor() > > now, but afaict that doesn't allow you to completely drop initializing > > the field here, as the code there looks at the compaction bit. > > > > > + if ( unlikely(!(ptr->xsave_hdr.xcomp_bv & > > > + XSTATE_COMPACTION_ENABLED)) ) > > > + ptr->xsave_hdr.xcomp_bv = ptr->xsave_hdr.xstate_bv > > > + |XSTATE_COMPACTION_ENABLED; > > > + > > > + XRSTOR("0x48,","0x0f,0xc7,0x1f"); /* xrstors */ > > > + } > > > + else > > > + XRSTOR("0x48,","0x0f,0xae,0x2f"); /* xrstor */ > > > > At this point, what guarantees that xcomp_bv is zero, no matter > > where the state to be loaded originates from? I think at least in > > arch_set_info_guest(), hvm_load_cpu_ctxt(), and > > hvm_vcpu_reset_state() you went too far deleting code, and you > > really need to keep the storing of zero there. Did you draw, just > > for yourself, mentally or on a sheet of paper, a diagram illustrating > > the various state transitions? > > > For above two comments. > > The patch is base on [v4]x86/xsaves: calculate the xstate_comp_offsets base > on xstate_bv and I answer your question on why caculate xstate_comp_offset > based on xstate_bv in previous thread. If that is right, drop > initializing xcomp_bv is ok. Now xcomp_bv can guarantee to be zero for > arch_set_info_guest() and hvm_load_cpu_ctxt(). If the drop is wrong > (due to my misunderstand of the SDM), I will change the if () here. > Ignore the above paragraph, I realized that [v4]x86/xsaves: calculate the xstate_comp_offsets base on xstate_bv is wrong(should be base on xcomp_bv as previous version). Then I should not delete the code initializing the xcomp_bv in compress_xsave_state() But for hvm_vcpu_reset_state(), I think we should deleting the code initializing the xcomp_bv as said below. > For hvm_vcpu_reset_state(), we should depend on whether xsaves is used > to decide whether to init xcomp_bv or not. And currently we use > xcr0_accum to indicate the use of xsaves, when hvm_vcpu_reset_state() > is called , can vcpu->xcr0_accum indicate using of xsaves ? > I think in hvm_vcpu_reset_state(), we should leave xcomp_bv zero. > > > Since again you repeat the same logic twice, this should again have > > been a signal that all your changes should go into the XRSTOR() > > macro. Or alternatively, since the exception fixup also differs, you > > may want to convert the whole logic into an XSAVES and an XSAVE > > path. My only really sincere request here is - as little redundancy as > > possible, since having to change the same thing twice in more than > > one place is always calling for trouble. I will do all changes only in XRSTOR(). Code like : #define _XRSTOR(pfx, xrstor_ins) asm volatile ( "1: .byte " pfx xrstor_ins"\n" \ "3:\n" \ " .section .fixup,\"ax\"\n" \ "2: incl %[faults]\n" \ " jmp 3b\n" \ " .previous\n" \ _ASM_EXTABLE(1b, 2b) \ : [mem] "+m" (*ptr), [faults] "+g" (faults) \ : [lmask] "a" (lmask), [hmask] "d" (hmask), \ [ptr] "D" (ptr) ) #define XRSTOR(pfx) \ if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY ) \ { \ if ( unlikely(!(ptr->xsave_hdr.xcomp_bv \ & XSTATE_COMPACTION_ENABLED)) ) \ ptr->xsave_hdr.xcomp_bv |= (ptr->xsave_hdr.xstate_bv \ | XSTATE_COMPACTION_ENABLED); \ _XRSTOR("0x48, ", "0x0f,0xc7,0x1f"); \ } \ else \ { \ _XRSTOR("0x48, ", "0x0f,0xae,0x2f"); \ } .... #undef XRSTOR #undef _XRSTOR A now wapper is intruduced as "_XRSTOR"( maybe the macro name is not good ). > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
>>> On 23.03.16 at 07:14, <shuai.ruan@linux.intel.com> wrote: > On Wed, Mar 23, 2016 at 10:02:24AM +0800, Shuai Ruan wrote: > But for hvm_vcpu_reset_state(), I think we should deleting the code > initializing the xcomp_bv as said below. >> For hvm_vcpu_reset_state(), we should depend on whether xsaves is used >> to decide whether to init xcomp_bv or not. And currently we use >> xcr0_accum to indicate the use of xsaves, when hvm_vcpu_reset_state() >> is called , can vcpu->xcr0_accum indicate using of xsaves ? >> I think in hvm_vcpu_reset_state(), we should leave xcomp_bv zero. Leaving it to be zero would be fine, but is it guaranteed to be zero? >> > Since again you repeat the same logic twice, this should again have >> > been a signal that all your changes should go into the XRSTOR() >> > macro. Or alternatively, since the exception fixup also differs, you >> > may want to convert the whole logic into an XSAVES and an XSAVE >> > path. My only really sincere request here is - as little redundancy as >> > possible, since having to change the same thing twice in more than >> > one place is always calling for trouble. > I will do all changes only in XRSTOR(). Code like : > > #define _XRSTOR(pfx, xrstor_ins) > asm volatile ( "1: .byte " pfx xrstor_ins"\n" \ > "3:\n" \ > " .section .fixup,\"ax\"\n" \ > "2: incl %[faults]\n" \ > " jmp 3b\n" \ > " .previous\n" \ > _ASM_EXTABLE(1b, 2b) \ > : [mem] "+m" (*ptr), [faults] "+g" (faults) \ > : [lmask] "a" (lmask), [hmask] "d" (hmask), \ > [ptr] "D" (ptr) ) > > #define XRSTOR(pfx) \ > if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY ) \ > { \ > if ( unlikely(!(ptr->xsave_hdr.xcomp_bv \ > & XSTATE_COMPACTION_ENABLED)) ) \ > ptr->xsave_hdr.xcomp_bv |= (ptr->xsave_hdr.xstate_bv \ > | XSTATE_COMPACTION_ENABLED); \ > _XRSTOR("0x48, ", "0x0f,0xc7,0x1f"); \ I think you mean to use pfx here. Also I don't see the point of passing two string literals to the auxiliary macro - just pass them as a single argument. > A now wapper is intruduced as "_XRSTOR"( maybe the macro name is not > good ). Indeed an underscore followed by an uppercase letter is starting a reserved identifier. Maybe DO_XRSTOR() or _xrstor()? Jan
On Wed, Mar 23, 2016 at 04:21:32AM -0600, Jan Beulich wrote: > >>> On 23.03.16 at 07:14, <shuai.ruan@linux.intel.com> wrote: > > But for hvm_vcpu_reset_state(), I think we should deleting the code > > initializing the xcomp_bv as said below. > >> For hvm_vcpu_reset_state(), we should depend on whether xsaves is used > >> to decide whether to init xcomp_bv or not. And currently we use > >> xcr0_accum to indicate the use of xsaves, when hvm_vcpu_reset_state() > >> is called , can vcpu->xcr0_accum indicate using of xsaves ? > >> I think in hvm_vcpu_reset_state(), we should leave xcomp_bv zero. > > Leaving it to be zero would be fine, but is it guaranteed to be > zero? Oh, yeah, I will add v->arch.xsave_area->xsave_hdr.xcomp_bv = 0 in hvm_vcpu_reset_state() to guarante that. > > A now wapper is intruduced as "_XRSTOR"( maybe the macro name is not > > good ). > > Indeed an underscore followed by an uppercase letter is starting > a reserved identifier. Maybe DO_XRSTOR() or _xrstor()? > Really thanks for your suggestion. I will use _xrstor(). > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index a6d721b..26dd1d3 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -948,14 +948,6 @@ int arch_set_info_guest( fpu_sse->fcw = FCW_DEFAULT; fpu_sse->mxcsr = MXCSR_DEFAULT; } - if ( cpu_has_xsaves ) - { - ASSERT(v->arch.xsave_area); - v->arch.xsave_area->xsave_hdr.xcomp_bv = XSTATE_COMPACTION_ENABLED | - v->arch.xsave_area->xsave_hdr.xstate_bv; - } - else if ( v->arch.xsave_area ) - v->arch.xsave_area->xsave_hdr.xcomp_bv = 0; if ( !compat ) { diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index b34a295..1a36a36 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -922,7 +922,7 @@ long arch_do_domctl( ret = -EFAULT; offset += sizeof(v->arch.xcr0_accum); - if ( !ret && (cpu_has_xsaves || cpu_has_xsavec) ) + if ( !ret ) { void *xsave_area; @@ -942,10 +942,6 @@ long arch_do_domctl( ret = -EFAULT; xfree(xsave_area); } - else if ( !ret && copy_to_guest_offset(evc->buffer, offset, - (void *)v->arch.xsave_area, - size - 2 * sizeof(uint64_t)) ) - ret = -EFAULT; vcpu_unpause(v); } diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 255a1d6..35e2c52 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -2202,9 +2202,6 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h) xsave_area->xsave_hdr.xstate_bv = 0; xsave_area->fpu_sse.mxcsr = MXCSR_DEFAULT; } - if ( cpu_has_xsaves && xsave_area ) - xsave_area->xsave_hdr.xcomp_bv = XSTATE_COMPACTION_ENABLED | - xsave_area->xsave_hdr.xstate_bv; v->arch.user_regs.eax = ctxt.rax; v->arch.user_regs.ebx = ctxt.rbx; @@ -5589,11 +5586,7 @@ void hvm_vcpu_reset_state(struct vcpu *v, uint16_t cs, uint16_t ip) fpu_ctxt->fcw = FCW_RESET; fpu_ctxt->mxcsr = MXCSR_DEFAULT; if ( v->arch.xsave_area ) - { v->arch.xsave_area->xsave_hdr.xstate_bv = XSTATE_FP; - v->arch.xsave_area->xsave_hdr.xcomp_bv = cpu_has_xsaves - ? XSTATE_COMPACTION_ENABLED | XSTATE_FP : 0; - } v->arch.vgc_flags = VGCF_online; memset(&v->arch.user_regs, 0, sizeof(v->arch.user_regs)); diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c index c29d0fa..2afa762 100644 --- a/xen/arch/x86/i387.c +++ b/xen/arch/x86/i387.c @@ -118,7 +118,24 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v) if ( v->fpu_dirtied ) return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY; - return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0; + ASSERT(v->arch.nonlazy_xstate_used); + + /* + * The offsets of components which live in the extended region of + * compact xsave area are not fixed. Xsave area may be overwritten + * when a xsave with v->fpu_dirtied set is followed by one with + * v->fpu_dirtied clear. + * In such case, if hypervisor uses compact xsave area and guest + * has ever used lazy states (checking xcr0_accum excluding + * XSTATE_FP_SSE), vcpu_xsave_mask will return XSTATE_ALL. Otherwise + * return XSTATE_NONLAZY. + * XSTATE_FP_SSE may be excluded, because the offsets of XSTATE_FP_SSE + * (in the legacy region of xsave area) are fixed, so saving + * XSTATE_FP_SSE will not cause overwriting problem. + */ + return (v->arch.xsave_area->xsave_hdr.xcomp_bv & XSTATE_COMPACTION_ENABLED) + && (v->arch.xcr0_accum & XSTATE_LAZY & ~XSTATE_FP_SSE) + ? XSTATE_ALL : XSTATE_NONLAZY; } /* Save x87 extended state */ @@ -275,11 +292,7 @@ int vcpu_init_fpu(struct vcpu *v) return rc; if ( v->arch.xsave_area ) - { v->arch.fpu_ctxt = &v->arch.xsave_area->fpu_sse; - if ( cpu_has_xsaves ) - v->arch.xsave_area->xsave_hdr.xcomp_bv = XSTATE_COMPACTION_ENABLED; - } else { BUILD_BUG_ON(__alignof(v->arch.xsave_area->fpu_sse) < 16); diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c index ef2c54d..61e5828 100644 --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -178,7 +178,7 @@ void expand_xsave_states(struct vcpu *v, void *dest, unsigned int size) u64 xstate_bv = xsave->xsave_hdr.xstate_bv; u64 valid; - if ( !cpu_has_xsaves && !cpu_has_xsavec ) + if ( !(xsave->xsave_hdr.xcomp_bv & XSTATE_COMPACTION_ENABLED) ) { memcpy(dest, xsave, size); return; @@ -222,22 +222,21 @@ void compress_xsave_states(struct vcpu *v, const void *src, unsigned int size) u64 xstate_bv = ((const struct xsave_struct *)src)->xsave_hdr.xstate_bv; u64 valid; - if ( !cpu_has_xsaves && !cpu_has_xsavec ) + if ( !(v->arch.xcr0_accum & XSTATE_XSAVES_ONLY) && + !xsave_area_compressed(src) ) { memcpy(xsave, src, size); return; } - ASSERT(!xsave_area_compressed(src)); /* * Copy legacy XSAVE area, to avoid complications with CPUID * leaves 0 and 1 in the loop below. */ memcpy(xsave, src, FXSAVE_SIZE); - /* Set XSTATE_BV and XCOMP_BV. */ + /* Set XSTATE_BV. */ xsave->xsave_hdr.xstate_bv = xstate_bv; - xsave->xsave_hdr.xcomp_bv = v->arch.xcr0_accum | XSTATE_COMPACTION_ENABLED; setup_xstate_comp(xstate_comp_offsets, xstate_bv); /* @@ -267,31 +266,35 @@ void xsave(struct vcpu *v, uint64_t mask) uint32_t hmask = mask >> 32; uint32_t lmask = mask; unsigned int fip_width = v->domain->arch.x87_fip_width; -#define XSAVE(pfx) \ - alternative_io_3(".byte " pfx "0x0f,0xae,0x27\n", /* xsave */ \ - ".byte " pfx "0x0f,0xae,0x37\n", /* xsaveopt */ \ - X86_FEATURE_XSAVEOPT, \ - ".byte " pfx "0x0f,0xc7,0x27\n", /* xsavec */ \ - X86_FEATURE_XSAVEC, \ - ".byte " pfx "0x0f,0xc7,0x2f\n", /* xsaves */ \ - X86_FEATURE_XSAVES, \ - "=m" (*ptr), \ - "a" (lmask), "d" (hmask), "D" (ptr)) +#define XSAVE(pfx, xsave_ins) \ + asm volatile ( ".byte " pfx xsave_ins \ + : "=m" (*ptr) \ + : "a" (lmask), "d" (hmask), "D" (ptr) ) if ( fip_width == 8 || !(mask & XSTATE_FP) ) { - XSAVE("0x48,"); + if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY ) + XSAVE("0x48,", "0x0f,0xc7,0x2f"); /* xsaves */ + else if ( cpu_has_xsaveopt ) + XSAVE("0x48,", "0x0f,0xae,0x37"); /* xsaveopt */ + else + XSAVE("0x48,", "0x0f,0xae,0x27"); /* xsave */ } else if ( fip_width == 4 ) { - XSAVE(""); + if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY ) + XSAVE("", "0x0f,0xc7,0x2f"); + else if ( cpu_has_xsaveopt ) + XSAVE("", "0x0f,0xae,0x37"); + else + XSAVE("", "0x0f,0xae,0x27"); } else { typeof(ptr->fpu_sse.fip.sel) fcs = ptr->fpu_sse.fip.sel; typeof(ptr->fpu_sse.fdp.sel) fds = ptr->fpu_sse.fdp.sel; - if ( cpu_has_xsaveopt || cpu_has_xsaves ) + if ( cpu_has_xsaveopt || (v->arch.xcr0_accum & XSTATE_XSAVES_ONLY) ) { /* * XSAVEOPT/XSAVES may not write the FPU portion even when the @@ -307,7 +310,12 @@ void xsave(struct vcpu *v, uint64_t mask) } } - XSAVE("0x48,"); + if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY ) + XSAVE("0x48,", "0x0f,0xc7,0x2f"); + else if ( cpu_has_xsaveopt ) + XSAVE("0x48,", "0x0f,0xae,0x37"); + else + XSAVE("0x48,", "0x0f,0xae,0x27"); if ( !(mask & ptr->xsave_hdr.xstate_bv & XSTATE_FP) || /* @@ -317,7 +325,8 @@ void xsave(struct vcpu *v, uint64_t mask) (!(ptr->fpu_sse.fsw & 0x0080) && boot_cpu_data.x86_vendor == X86_VENDOR_AMD) ) { - if ( cpu_has_xsaveopt || cpu_has_xsaves ) + if ( cpu_has_xsaveopt || + (v->arch.xcr0_accum & XSTATE_XSAVES_ONLY) ) { ptr->fpu_sse.fip.sel = fcs; ptr->fpu_sse.fdp.sel = fds; @@ -378,25 +387,42 @@ void xrstor(struct vcpu *v, uint64_t mask) switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) ) { BUILD_BUG_ON(sizeof(faults) != 4); /* Clang doesn't support %z in asm. */ -#define XRSTOR(pfx) \ - alternative_io("1: .byte " pfx "0x0f,0xae,0x2f\n" \ +#define XRSTOR(pfx, xrstor_ins) \ + asm volatile ( "1: .byte " pfx xrstor_ins"\n" \ "3:\n" \ " .section .fixup,\"ax\"\n" \ "2: incl %[faults]\n" \ " jmp 3b\n" \ " .previous\n" \ - _ASM_EXTABLE(1b, 2b), \ - ".byte " pfx "0x0f,0xc7,0x1f\n", \ - X86_FEATURE_XSAVES, \ - ASM_OUTPUT2([mem] "+m" (*ptr), [faults] "+g" (faults)), \ - [lmask] "a" (lmask), [hmask] "d" (hmask), \ - [ptr] "D" (ptr)) + _ASM_EXTABLE(1b, 2b) \ + : [mem] "+m" (*ptr), [faults] "+g" (faults) \ + : [lmask] "a" (lmask), [hmask] "d" (hmask), \ + [ptr] "D" (ptr) ) default: - XRSTOR("0x48,"); + if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY ) + { + if ( unlikely(!(ptr->xsave_hdr.xcomp_bv + & XSTATE_COMPACTION_ENABLED)) ) + ptr->xsave_hdr.xcomp_bv = ptr->xsave_hdr.xstate_bv + | XSTATE_COMPACTION_ENABLED; + + XRSTOR("0x48,","0x0f,0xc7,0x1f"); /* xrstors */ + } + else + XRSTOR("0x48,","0x0f,0xae,0x2f"); /* xrstor */ break; case 4: case 2: - XRSTOR(""); + if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY ) + { + if ( unlikely(!(ptr->xsave_hdr.xcomp_bv + & XSTATE_COMPACTION_ENABLED)) ) + ptr->xsave_hdr.xcomp_bv = ptr->xsave_hdr.xstate_bv + | XSTATE_COMPACTION_ENABLED; + XRSTOR("","0x0f,0xc7,0x1f"); + } + else + XRSTOR("","0x0f,0xae,0x2f"); break; #undef XRSTOR } @@ -426,7 +452,7 @@ void xrstor(struct vcpu *v, uint64_t mask) ((mask & XSTATE_YMM) && !(ptr->xsave_hdr.xcomp_bv & XSTATE_COMPACTION_ENABLED))) ) ptr->fpu_sse.mxcsr &= mxcsr_mask; - if ( cpu_has_xsaves || cpu_has_xsavec ) + if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY ) { ptr->xsave_hdr.xcomp_bv &= this_cpu(xcr0) | this_cpu(xss); ptr->xsave_hdr.xstate_bv &= ptr->xsave_hdr.xcomp_bv; @@ -443,7 +469,7 @@ void xrstor(struct vcpu *v, uint64_t mask) case 2: /* Stage 2: Reset all state. */ ptr->fpu_sse.mxcsr = MXCSR_DEFAULT; ptr->xsave_hdr.xstate_bv = 0; - ptr->xsave_hdr.xcomp_bv = cpu_has_xsaves + ptr->xsave_hdr.xcomp_bv = v->arch.xcr0_accum & XSTATE_XSAVES_ONLY ? XSTATE_COMPACTION_ENABLED : 0; continue; } diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h index a488688..91d1c39 100644 --- a/xen/include/asm-x86/xstate.h +++ b/xen/include/asm-x86/xstate.h @@ -44,6 +44,7 @@ #define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR | \ XSTATE_PKRU) #define XSTATE_LAZY (XSTATE_ALL & ~XSTATE_NONLAZY) +#define XSTATE_XSAVES_ONLY 0 #define XSTATE_COMPACTION_ENABLED (1ULL << 63) #define XSTATE_ALIGN64 (1U << 1)
The offset at which components xsaved by xsave[sc] are not fixed. So when when a save with v->fpu_dirtied set is followed by one with v->fpu_dirtied clear, non-lazy xsave[sc] may overwriting data written by the lazy one. The solution is when using_xsave_compact is enabled and taking xcr0_accum into consideration, if guest has ever used XSTATE_LAZY & ~XSTATE_FP_SSE (XSTATE_FP_SSE will be excluded beacause xsave will write XSTATE_FP_SSE part in legacy region of xsave area which is fixed, saving XSTATE_FS_SSE will not cause overwriting problem), vcpu_xsave_mask will return XSTATE_ALL. Otherwise vcpu_xsave_mask will return XSTATE_NONLAZY. This may cause overhead save on lazy states which will cause performance impact. After doing some performance tests on xsavec and xsaveopt (suggested by jan), the results show xsaveopt performs better than xsavec. So hypervisor will not use xsavec anymore. xsaves will be used until supervised state is instroduced in hypervisor. And XSTATE_XSAVES_ONLY (indicates supervised state is understood in xen) is instroduced, the use of xsaves depend on whether XSTATE_XSAVES_ONLY is set in xcr0_accum. Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com> Reported-by: Jan Beulich <jbeulich@suse.com> --- v5: Address comments from Jan 1. Add XSTATE_XSAVES_ONLY and using xsaves depend on whether this bits are set in xcr0_accum 2. Change compress logic in compress_xsave_states() depend on !(v->arch.xcr0_accum & XSTATE_XSAVES_ONLY) && !xsave_area_compressed(src)). 3. XSTATE_COMPACTION_ENABLED only set in xrstor(). 4. Rebase the code on [V4] x86/xsaves: calculate the xstate_comp_offsets base on xstate_bv (already sent out) For they both change same code. (I am not sure whether this rebase is ok or not). v4: Address comments from Jan 1. Add synthetic CPU feature X86_FEATURE_XSAVE_COMPACT and use this feature indicate whether hypervisor use compact xsave area or not. 2. Fix type/grammer errors of the comment in vcpu_xsave_mask. v3: Address comments from Jan 1. Add xsavc clean up code and disable xsaves. 2. Add comment on why certain mask should be return in vcpu_xsave_mask. v2: Address comments from Jan add performance impact and next step to do in the description. xen/arch/x86/domain.c | 8 ---- xen/arch/x86/domctl.c | 6 +-- xen/arch/x86/hvm/hvm.c | 7 ---- xen/arch/x86/i387.c | 23 ++++++++--- xen/arch/x86/xstate.c | 90 ++++++++++++++++++++++++++++---------------- xen/include/asm-x86/xstate.h | 1 + 6 files changed, 78 insertions(+), 57 deletions(-)