Message ID | 4DF33413.9070605@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 06/11/2011 12:23 PM, Jan Kiszka wrote: > From: Jan Kiszka<jan.kiszka@siemens.com> > > These FPU states are properly maintained by KVM but not yet by TCG. So > far we unconditionally set them to 0 in the guest which may cause > state corruptions - not only during migration. > > > -#define CPU_SAVE_VERSION 12 > +#define CPU_SAVE_VERSION 13 > Incrementing the version number seems excessive - I can't imagine a real-life guest will break due to fp pointer corruption However, I don't think we have a mechanism for optional state. We discussed this during the 18th VMState Subsection Symposium and IIRC agreed to re-raise the issue when we encountered it, which appears to be now.
On 2011-06-13 10:45, Avi Kivity wrote: > On 06/11/2011 12:23 PM, Jan Kiszka wrote: >> From: Jan Kiszka<jan.kiszka@siemens.com> >> >> These FPU states are properly maintained by KVM but not yet by TCG. So >> far we unconditionally set them to 0 in the guest which may cause >> state corruptions - not only during migration. >> >> >> -#define CPU_SAVE_VERSION 12 >> +#define CPU_SAVE_VERSION 13 >> > > Incrementing the version number seems excessive - I can't imagine a > real-life guest will break due to fp pointer corruption > > However, I don't think we have a mechanism for optional state. We > discussed this during the 18th VMState Subsection Symposium and IIRC > agreed to re-raise the issue when we encountered it, which appears to be > now. > Whatever we invent, it has to be backported as well to allow that infamous traveling back in time, migrating VMs from newer to older versions. Would that backporting be simpler if we used an unconditional subsection for the additional states? Jan
On 06/14/2011 09:10 AM, Jan Kiszka wrote: > On 2011-06-13 10:45, Avi Kivity wrote: > > On 06/11/2011 12:23 PM, Jan Kiszka wrote: > >> From: Jan Kiszka<jan.kiszka@siemens.com> > >> > >> These FPU states are properly maintained by KVM but not yet by TCG. So > >> far we unconditionally set them to 0 in the guest which may cause > >> state corruptions - not only during migration. > >> > >> > >> -#define CPU_SAVE_VERSION 12 > >> +#define CPU_SAVE_VERSION 13 > >> > > > > Incrementing the version number seems excessive - I can't imagine a > > real-life guest will break due to fp pointer corruption > > > > However, I don't think we have a mechanism for optional state. We > > discussed this during the 18th VMState Subsection Symposium and IIRC > > agreed to re-raise the issue when we encountered it, which appears to be > > now. > > > > Whatever we invent, it has to be backported as well to allow that > infamous traveling back in time, migrating VMs from newer to older versions. > > Would that backporting be simpler if we used an unconditional subsection > for the additional states? Most likely. It depends on what mechanism we use. Let's spend some time to think about what it would be like. This patch is not urgent, is it? (i.e. it was discovered by code inspection, not live migration that caught the cpu between an instruction that caused a math exception and the exception handler).
On 2011-06-14 10:23, Avi Kivity wrote: > On 06/14/2011 09:10 AM, Jan Kiszka wrote: >> On 2011-06-13 10:45, Avi Kivity wrote: >> > On 06/11/2011 12:23 PM, Jan Kiszka wrote: >> >> From: Jan Kiszka<jan.kiszka@siemens.com> >> >> >> >> These FPU states are properly maintained by KVM but not yet by >> TCG. So >> >> far we unconditionally set them to 0 in the guest which may cause >> >> state corruptions - not only during migration. >> >> >> >> >> >> -#define CPU_SAVE_VERSION 12 >> >> +#define CPU_SAVE_VERSION 13 >> >> >> > >> > Incrementing the version number seems excessive - I can't imagine a >> > real-life guest will break due to fp pointer corruption >> > >> > However, I don't think we have a mechanism for optional state. We >> > discussed this during the 18th VMState Subsection Symposium and IIRC >> > agreed to re-raise the issue when we encountered it, which appears >> to be >> > now. >> > >> >> Whatever we invent, it has to be backported as well to allow that >> infamous traveling back in time, migrating VMs from newer to older >> versions. >> >> Would that backporting be simpler if we used an unconditional subsection >> for the additional states? > > Most likely. It depends on what mechanism we use. > > Let's spend some time to think about what it would be like. This patch > is not urgent, is it? (i.e. it was discovered by code inspection, not > live migration that caught the cpu between an instruction that caused a > math exception and the exception handler). Right, not urgent, should just make it into 0.15 in the end. Jan
On 06/14/2011 09:10 AM, Jan Kiszka wrote: > On 2011-06-13 10:45, Avi Kivity wrote: > > On 06/11/2011 12:23 PM, Jan Kiszka wrote: > >> From: Jan Kiszka<jan.kiszka@siemens.com> > >> > >> These FPU states are properly maintained by KVM but not yet by TCG. So > >> far we unconditionally set them to 0 in the guest which may cause > >> state corruptions - not only during migration. > >> > >> > >> -#define CPU_SAVE_VERSION 12 > >> +#define CPU_SAVE_VERSION 13 > >> > > > > Incrementing the version number seems excessive - I can't imagine a > > real-life guest will break due to fp pointer corruption > > > > However, I don't think we have a mechanism for optional state. We > > discussed this during the 18th VMState Subsection Symposium and IIRC > > agreed to re-raise the issue when we encountered it, which appears to be > > now. > > > > Whatever we invent, it has to be backported as well to allow that > infamous traveling back in time, migrating VMs from newer to older versions. > > Would that backporting be simpler if we used an unconditional subsection > for the additional states? Thinking about it, a conditional subsection would work fine. Most threads will never see an fpu error, and are all initialized to a clean slate. SDM 1 8.1.9.1 says: > 8.1.9.1 Fopcode Compatibility Sub-mode > Beginning with the Pentium 4 and Intel Xeon processors, the IA-32 > architecture > provides program control over the storing of the last instruction > opcode (sometimes > referred to as the fopcode). Here, bit 2 of the IA32_MISC_ENABLE MSR > enables (set) > or disables (clear) the fopcode compatibility mode. > If FOP code compatibility mode is enabled, the FOP is defined as it > has always been > in previous IA32 implementations (always defined as the FOP of the > last non-trans- > parent FP instruction executed before a FSAVE/FSTENV/FXSAVE). If FOP code > compatibility mode is disabled (default), FOP is only valid if the > last non-transparent > FP instruction executed before a FSAVE/FSTENV/FXSAVE had an unmasked > exception. So fopcode will usually be clear.
On 2011-06-15 11:10, Avi Kivity wrote: > On 06/14/2011 09:10 AM, Jan Kiszka wrote: >> On 2011-06-13 10:45, Avi Kivity wrote: >> > On 06/11/2011 12:23 PM, Jan Kiszka wrote: >> >> From: Jan Kiszka<jan.kiszka@siemens.com> >> >> >> >> These FPU states are properly maintained by KVM but not yet by >> TCG. So >> >> far we unconditionally set them to 0 in the guest which may cause >> >> state corruptions - not only during migration. >> >> >> >> >> >> -#define CPU_SAVE_VERSION 12 >> >> +#define CPU_SAVE_VERSION 13 >> >> >> > >> > Incrementing the version number seems excessive - I can't imagine a >> > real-life guest will break due to fp pointer corruption >> > >> > However, I don't think we have a mechanism for optional state. We >> > discussed this during the 18th VMState Subsection Symposium and IIRC >> > agreed to re-raise the issue when we encountered it, which appears >> to be >> > now. >> > >> >> Whatever we invent, it has to be backported as well to allow that >> infamous traveling back in time, migrating VMs from newer to older >> versions. >> >> Would that backporting be simpler if we used an unconditional subsection >> for the additional states? > > Thinking about it, a conditional subsection would work fine. Most > threads will never see an fpu error, and are all initialized to a clean > slate. > > SDM 1 8.1.9.1 says: > >> 8.1.9.1 Fopcode Compatibility Sub-mode >> Beginning with the Pentium 4 and Intel Xeon processors, the IA-32 >> architecture >> provides program control over the storing of the last instruction >> opcode (sometimes >> referred to as the fopcode). Here, bit 2 of the IA32_MISC_ENABLE MSR >> enables (set) >> or disables (clear) the fopcode compatibility mode. >> If FOP code compatibility mode is enabled, the FOP is defined as it >> has always been >> in previous IA32 implementations (always defined as the FOP of the >> last non-trans- >> parent FP instruction executed before a FSAVE/FSTENV/FXSAVE). If FOP code >> compatibility mode is disabled (default), FOP is only valid if the >> last non-transparent >> FP instruction executed before a FSAVE/FSTENV/FXSAVE had an unmasked >> exception. > > So fopcode will usually be clear. > OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But if it's off, how to test for that other condition "last non-transparent FP instruction ... had an unmasked exception" from the host? Jan
On 2011-06-15 12:20, Jan Kiszka wrote: > On 2011-06-15 11:10, Avi Kivity wrote: >> On 06/14/2011 09:10 AM, Jan Kiszka wrote: >>> On 2011-06-13 10:45, Avi Kivity wrote: >>>> On 06/11/2011 12:23 PM, Jan Kiszka wrote: >>>>> From: Jan Kiszka<jan.kiszka@siemens.com> >>>>> >>>>> These FPU states are properly maintained by KVM but not yet by >>> TCG. So >>>>> far we unconditionally set them to 0 in the guest which may cause >>>>> state corruptions - not only during migration. >>>>> >>>>> >>>>> -#define CPU_SAVE_VERSION 12 >>>>> +#define CPU_SAVE_VERSION 13 >>>>> >>>> >>>> Incrementing the version number seems excessive - I can't imagine a >>>> real-life guest will break due to fp pointer corruption >>>> >>>> However, I don't think we have a mechanism for optional state. We >>>> discussed this during the 18th VMState Subsection Symposium and IIRC >>>> agreed to re-raise the issue when we encountered it, which appears >>> to be >>>> now. >>>> >>> >>> Whatever we invent, it has to be backported as well to allow that >>> infamous traveling back in time, migrating VMs from newer to older >>> versions. >>> >>> Would that backporting be simpler if we used an unconditional subsection >>> for the additional states? >> >> Thinking about it, a conditional subsection would work fine. Most >> threads will never see an fpu error, and are all initialized to a clean >> slate. >> >> SDM 1 8.1.9.1 says: >> >>> 8.1.9.1 Fopcode Compatibility Sub-mode >>> Beginning with the Pentium 4 and Intel Xeon processors, the IA-32 >>> architecture >>> provides program control over the storing of the last instruction >>> opcode (sometimes >>> referred to as the fopcode). Here, bit 2 of the IA32_MISC_ENABLE MSR >>> enables (set) >>> or disables (clear) the fopcode compatibility mode. >>> If FOP code compatibility mode is enabled, the FOP is defined as it >>> has always been >>> in previous IA32 implementations (always defined as the FOP of the >>> last non-trans- >>> parent FP instruction executed before a FSAVE/FSTENV/FXSAVE). If FOP code >>> compatibility mode is disabled (default), FOP is only valid if the >>> last non-transparent >>> FP instruction executed before a FSAVE/FSTENV/FXSAVE had an unmasked >>> exception. >> >> So fopcode will usually be clear. >> > > OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But > if it's off, how to test for that other condition "last non-transparent > FP instruction ... had an unmasked exception" from the host? I briefly thought about status.ES == 1. But the guest may clear the flag in its exception handler before reading opcode etc. Jan
On 06/15/2011 01:20 PM, Jan Kiszka wrote: > > > > So fopcode will usually be clear. > > > > OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But > if it's off, how to test for that other condition "last non-transparent > FP instruction ... had an unmasked exception" from the host? > We save fopcode unconditionally. But if IA32_MISC_ENABLE_MSR[2]=0, then fopcode will be zero, and we can skip the subsection (if the data and instruction pointers are also zero, which they will be). If it isn't zero, there's still a good chance fopcode will be zero (64-bit userspace, thread that hasn't used the fpu since the last context switch, last opcode happened to be zero).
On 2011-06-15 13:26, Avi Kivity wrote: > On 06/15/2011 01:20 PM, Jan Kiszka wrote: >> > >> > So fopcode will usually be clear. >> > >> >> OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But >> if it's off, how to test for that other condition "last non-transparent >> FP instruction ... had an unmasked exception" from the host? >> > > We save fopcode unconditionally. But if IA32_MISC_ENABLE_MSR[2]=0, then > fopcode will be zero, and we can skip the subsection (if the data and > instruction pointers are also zero, which they will be). > > If it isn't zero, there's still a good chance fopcode will be zero > (64-bit userspace, thread that hasn't used the fpu since the last > context switch, last opcode happened to be zero). I do not yet find "if fopcode is invalid, it is zero, just as IP and DP" in the spec. What clears them reliably? Jan
On 06/15/2011 02:32 PM, Jan Kiszka wrote: > > > > If it isn't zero, there's still a good chance fopcode will be zero > > (64-bit userspace, thread that hasn't used the fpu since the last > > context switch, last opcode happened to be zero). > > I do not yet find "if fopcode is invalid, it is zero, just as IP and DP" > in the spec. What clears them reliably? FNINIT
On 2011-06-15 13:33, Avi Kivity wrote: > On 06/15/2011 02:32 PM, Jan Kiszka wrote: >>> >>> If it isn't zero, there's still a good chance fopcode will be zero >>> (64-bit userspace, thread that hasn't used the fpu since the last >>> context switch, last opcode happened to be zero). >> >> I do not yet find "if fopcode is invalid, it is zero, just as IP and DP" >> in the spec. What clears them reliably? > > FNINIT OK, I see. So we simply check for all fields being zero and skip the section in that case. The MSR doesn't actually to us here. Will write v2. Jan
Hi Jan, On Sat, Jun 11, 2011 at 11:23:31AM +0200, Jan Kiszka wrote: > These FPU states are properly maintained by KVM but not yet by TCG. So > far we unconditionally set them to 0 in the guest which may cause > state corruptions - not only during migration. I can't judge whether the patch is correct or not, but I can confirm it fixes my compilation problem. Feel free to add an Acked-by-me if that makes sense. Christophe
diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 9c3340d..3c2dab9 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -641,6 +641,10 @@ typedef struct CPUX86State { uint16_t fpuc; uint8_t fptags[8]; /* 0 = valid, 1 = empty */ FPReg fpregs[8]; + /* KVM-only so far */ + uint16_t fpop; + uint64_t fpip; + uint64_t fpdp; /* emulator internal variables */ float_status fp_status; @@ -942,7 +946,7 @@ uint64_t cpu_get_tsc(CPUX86State *env); #define cpu_list_id x86_cpu_list #define cpudef_setup x86_cpudef_setup -#define CPU_SAVE_VERSION 12 +#define CPU_SAVE_VERSION 13 /* MMU modes definitions */ #define MMU_MODE0_SUFFIX _kernel diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 5ebb054..938e0a3 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -718,6 +718,9 @@ static int kvm_put_fpu(CPUState *env) fpu.fsw = env->fpus & ~(7 << 11); fpu.fsw |= (env->fpstt & 7) << 11; fpu.fcw = env->fpuc; + fpu.last_opcode = env->fpop; + fpu.last_ip = env->fpip; + fpu.last_dp = env->fpdp; for (i = 0; i < 8; ++i) { fpu.ftwx |= (!env->fptags[i]) << i; } @@ -740,7 +743,7 @@ static int kvm_put_xsave(CPUState *env) { int i, r; struct kvm_xsave* xsave; - uint16_t cwd, swd, twd, fop; + uint16_t cwd, swd, twd; if (!kvm_has_xsave()) { return kvm_put_fpu(env); @@ -748,7 +751,7 @@ static int kvm_put_xsave(CPUState *env) xsave = qemu_memalign(4096, sizeof(struct kvm_xsave)); memset(xsave, 0, sizeof(struct kvm_xsave)); - cwd = swd = twd = fop = 0; + cwd = swd = twd = 0; swd = env->fpus & ~(7 << 11); swd |= (env->fpstt & 7) << 11; cwd = env->fpuc; @@ -756,7 +759,9 @@ static int kvm_put_xsave(CPUState *env) twd |= (!env->fptags[i]) << i; } xsave->region[0] = (uint32_t)(swd << 16) + cwd; - xsave->region[1] = (uint32_t)(fop << 16) + twd; + xsave->region[1] = (uint32_t)(env->fpop << 16) + twd; + memcpy(&xsave->region[XSAVE_CWD_RIP], &env->fpip, sizeof(env->fpip)); + memcpy(&xsave->region[XSAVE_CWD_RDP], &env->fpdp, sizeof(env->fpdp)); memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs, sizeof env->fpregs); memcpy(&xsave->region[XSAVE_XMM_SPACE], env->xmm_regs, @@ -921,6 +926,9 @@ static int kvm_get_fpu(CPUState *env) env->fpstt = (fpu.fsw >> 11) & 7; env->fpus = fpu.fsw; env->fpuc = fpu.fcw; + env->fpop = fpu.last_opcode; + env->fpip = fpu.last_ip; + env->fpdp = fpu.last_dp; for (i = 0; i < 8; ++i) { env->fptags[i] = !((fpu.ftwx >> i) & 1); } @@ -935,7 +943,7 @@ static int kvm_get_xsave(CPUState *env) { struct kvm_xsave* xsave; int ret, i; - uint16_t cwd, swd, twd, fop; + uint16_t cwd, swd, twd; if (!kvm_has_xsave()) { return kvm_get_fpu(env); @@ -951,13 +959,15 @@ static int kvm_get_xsave(CPUState *env) cwd = (uint16_t)xsave->region[0]; swd = (uint16_t)(xsave->region[0] >> 16); twd = (uint16_t)xsave->region[1]; - fop = (uint16_t)(xsave->region[1] >> 16); + env->fpop = (uint16_t)(xsave->region[1] >> 16); env->fpstt = (swd >> 11) & 7; env->fpus = swd; env->fpuc = cwd; for (i = 0; i < 8; ++i) { env->fptags[i] = !((twd >> i) & 1); } + memcpy(&env->fpip, &xsave->region[XSAVE_CWD_RIP], sizeof(env->fpip)); + memcpy(&env->fpdp, &xsave->region[XSAVE_CWD_RDP], sizeof(env->fpdp)); env->mxcsr = xsave->region[XSAVE_MXCSR]; memcpy(env->fpregs, &xsave->region[XSAVE_ST_SPACE], sizeof env->fpregs); diff --git a/target-i386/machine.c b/target-i386/machine.c index bbeae88..e02c2a3 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -390,6 +390,10 @@ static const VMStateDescription vmstate_cpu = { VMSTATE_UINT64_V(xcr0, CPUState, 12), VMSTATE_UINT64_V(xstate_bv, CPUState, 12), VMSTATE_YMMH_REGS_VARS(ymmh_regs, CPUState, CPU_NB_REGS, 12), + /* Further FPU states */ + VMSTATE_UINT16_V(fpop, CPUState, 13), + VMSTATE_UINT64_V(fpip, CPUState, 13), + VMSTATE_UINT64_V(fpdp, CPUState, 13), VMSTATE_END_OF_LIST() /* The above list is not sorted /wrt version numbers, watch out! */ },