Message ID | 20220129094644.385841-1-leobras@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v1,1/1] target/i386: Mask xstate_bv based on the cpu enabled features | expand |
On Saturday, 2022-01-29 at 06:46:45 -03, Leonardo Bras wrote: > The following steps describe a migration bug: > 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu > 2 - Migrate to a host with EPYC-Naples cpu > > The guest kernel crashes shortly after the migration. > > The crash happens due to a fault caused by XRSTOR: > A set bit in XSTATE_BV is not set in XCR0. > The faulting bit is FEATURE_PKRU (enabled in Milan, but not in Naples) I'm trying to understand how this happens. If we boot on EPYC-Milan with "-cpu EPYC", the PKRU feature should not be exposed to the VM (it is not available in the EPYC CPU). Given this, how would bit 0x200 (representing PKRU) end up set in xstate_bv? > To avoid this kind of bug: > In kvm_get_xsave, mask-out from xstate_bv any bits that are not set in > current vcpu's features. > > This keeps cpu->env->xstate_bv with feature bits compatible with any > host machine capable of running the vcpu model. > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > --- > target/i386/xsave_helper.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c > index ac61a96344..0628226234 100644 > --- a/target/i386/xsave_helper.c > +++ b/target/i386/xsave_helper.c > @@ -167,7 +167,7 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen) > env->xmm_regs[i].ZMM_Q(1) = ldq_p(xmm + 8); > } > > - env->xstate_bv = header->xstate_bv; > + env->xstate_bv = header->xstate_bv & env->features[FEAT_XSAVE_COMP_LO]; > > e = &x86_ext_save_areas[XSTATE_YMM_BIT]; > if (e->size && e->offset) { dme.
On Mon, 31 Jan 2022 12:53:31 +0000 David Edmondson <david.edmondson@oracle.com> wrote: > On Saturday, 2022-01-29 at 06:46:45 -03, Leonardo Bras wrote: > > > The following steps describe a migration bug: > > 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu > > 2 - Migrate to a host with EPYC-Naples cpu > > > > The guest kernel crashes shortly after the migration. > > > > The crash happens due to a fault caused by XRSTOR: > > A set bit in XSTATE_BV is not set in XCR0. > > The faulting bit is FEATURE_PKRU (enabled in Milan, but not in Naples) > > I'm trying to understand how this happens. > > If we boot on EPYC-Milan with "-cpu EPYC", the PKRU feature should not > be exposed to the VM (it is not available in the EPYC CPU). > > Given this, how would bit 0x200 (representing PKRU) end up set in > xstate_bv? > > > To avoid this kind of bug: > > In kvm_get_xsave, mask-out from xstate_bv any bits that are not set in > > current vcpu's features. In addition to above: it's not good idea to silently mask something out. If we can't ensure the same feature-set for a CPU model and can't verify it by asking QEMU on source and target host, the next best thing would be to explicitly fail migration (i.e. adding check to.post_load hook or doing some other migration magic, CCing David) > > > > This keeps cpu->env->xstate_bv with feature bits compatible with any > > host machine capable of running the vcpu model. > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > > --- > > target/i386/xsave_helper.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c > > index ac61a96344..0628226234 100644 > > --- a/target/i386/xsave_helper.c > > +++ b/target/i386/xsave_helper.c > > @@ -167,7 +167,7 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen) > > env->xmm_regs[i].ZMM_Q(1) = ldq_p(xmm + 8); > > } > > > > - env->xstate_bv = header->xstate_bv; > > + env->xstate_bv = header->xstate_bv & env->features[FEAT_XSAVE_COMP_LO]; > > > > e = &x86_ext_save_areas[XSTATE_YMM_BIT]; > > if (e->size && e->offset) { > > dme.
Hello David Edmondson and Igor Memmedov, Thank you for the feedback! For some reason I did not get your comments in my email. I could only notice them when I opened Patchwork to get the link. Sorry for the delay. I will do my best to address them in a few minutes. Best regards, Leo On Sat, Jan 29, 2022 at 6:47 AM Leonardo Bras <leobras@redhat.com> wrote: > > The following steps describe a migration bug: > 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu > 2 - Migrate to a host with EPYC-Naples cpu > > The guest kernel crashes shortly after the migration. > > The crash happens due to a fault caused by XRSTOR: > A set bit in XSTATE_BV is not set in XCR0. > The faulting bit is FEATURE_PKRU (enabled in Milan, but not in Naples) > > To avoid this kind of bug: > In kvm_get_xsave, mask-out from xstate_bv any bits that are not set in > current vcpu's features. > > This keeps cpu->env->xstate_bv with feature bits compatible with any > host machine capable of running the vcpu model. > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > --- > target/i386/xsave_helper.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c > index ac61a96344..0628226234 100644 > --- a/target/i386/xsave_helper.c > +++ b/target/i386/xsave_helper.c > @@ -167,7 +167,7 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen) > env->xmm_regs[i].ZMM_Q(1) = ldq_p(xmm + 8); > } > > - env->xstate_bv = header->xstate_bv; > + env->xstate_bv = header->xstate_bv & env->features[FEAT_XSAVE_COMP_LO]; > > e = &x86_ext_save_areas[XSTATE_YMM_BIT]; > if (e->size && e->offset) { > -- > 2.34.1 >
Hello David, thanks for this feedback! On Mon, 2022-01-31 at 12:53 +0000, David Edmondson wrote: > On Saturday, 2022-01-29 at 06:46:45 -03, Leonardo Bras wrote: > > > The following steps describe a migration bug: > > 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu > > 2 - Migrate to a host with EPYC-Naples cpu > > > > The guest kernel crashes shortly after the migration. > > > > The crash happens due to a fault caused by XRSTOR: > > A set bit in XSTATE_BV is not set in XCR0. > > The faulting bit is FEATURE_PKRU (enabled in Milan, but not in > > Naples) > > I'm trying to understand how this happens. > > If we boot on EPYC-Milan with "-cpu EPYC", the PKRU feature should > not > be exposed to the VM (it is not available in the EPYC CPU). > > Given this, how would bit 0x200 (representing PKRU) end up set in > xstate_bv? During my debug, I noticed this bit gets set before the kernel even starts. It's possible Seabios and/or IPXE are somehow setting 0x200 using the xrstor command. I am not sure if qemu is able to stop this in KVM mode. If you have any info on this, please let me know. Best regards, Leo > > > To avoid this kind of bug: > > In kvm_get_xsave, mask-out from xstate_bv any bits that are not set > > in > > current vcpu's features. > > > > This keeps cpu->env->xstate_bv with feature bits compatible with > > any > > host machine capable of running the vcpu model. > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > > --- > > target/i386/xsave_helper.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/target/i386/xsave_helper.c > > b/target/i386/xsave_helper.c > > index ac61a96344..0628226234 100644 > > --- a/target/i386/xsave_helper.c > > +++ b/target/i386/xsave_helper.c > > @@ -167,7 +167,7 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, > > const void *buf, uint32_t buflen) > > env->xmm_regs[i].ZMM_Q(1) = ldq_p(xmm + 8); > > } > > > > - env->xstate_bv = header->xstate_bv; > > + env->xstate_bv = header->xstate_bv & env- > > >features[FEAT_XSAVE_COMP_LO]; > > > > e = &x86_ext_save_areas[XSTATE_YMM_BIT]; > > if (e->size && e->offset) { > > dme.
Hello Igor, On Tue, 2022-02-01 at 09:29 +0100, Igor Mammedov wrote: > On Mon, 31 Jan 2022 12:53:31 +0000 > David Edmondson <david.edmondson@oracle.com> wrote: > > > On Saturday, 2022-01-29 at 06:46:45 -03, Leonardo Bras wrote: > > > > > The following steps describe a migration bug: > > > 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu > > > 2 - Migrate to a host with EPYC-Naples cpu > > > > > > The guest kernel crashes shortly after the migration. > > > > > > The crash happens due to a fault caused by XRSTOR: > > > A set bit in XSTATE_BV is not set in XCR0. > > > The faulting bit is FEATURE_PKRU (enabled in Milan, but not in > > > Naples) > > > > I'm trying to understand how this happens. > > > > If we boot on EPYC-Milan with "-cpu EPYC", the PKRU feature should > > not > > be exposed to the VM (it is not available in the EPYC CPU). > > > > Given this, how would bit 0x200 (representing PKRU) end up set in > > xstate_bv? > > > > > To avoid this kind of bug: > > > In kvm_get_xsave, mask-out from xstate_bv any bits that are not > > > set in > > > current vcpu's features. > > In addition to above: > > it's not good idea to silently mask something out. > If we can't ensure the same feature-set for a CPU model > and can't verify it by asking QEMU on source and > target host, the next best thing would be to explicitly > fail migration (i.e. adding check to.post_load hook or > doing some other migration magic, CCing David) Maybe there is something to do with the host kernel (kvm) doing some strange stuff. IIRC qemu ended up getting some masked version for using on migration, since it was not failing as expected. I will try to investigate further. Please let me know if you have any information on that. Best regards, Leo > > > > > > > This keeps cpu->env->xstate_bv with feature bits compatible with > > > any > > > host machine capable of running the vcpu model. > > > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > > > --- > > > target/i386/xsave_helper.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/target/i386/xsave_helper.c > > > b/target/i386/xsave_helper.c > > > index ac61a96344..0628226234 100644 > > > --- a/target/i386/xsave_helper.c > > > +++ b/target/i386/xsave_helper.c > > > @@ -167,7 +167,7 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, > > > const void *buf, uint32_t buflen) > > > env->xmm_regs[i].ZMM_Q(1) = ldq_p(xmm + 8); > > > } > > > > > > - env->xstate_bv = header->xstate_bv; > > > + env->xstate_bv = header->xstate_bv & env- > > > >features[FEAT_XSAVE_COMP_LO]; > > > > > > e = &x86_ext_save_areas[XSTATE_YMM_BIT]; > > > if (e->size && e->offset) { > > > > dme. > >
On Tuesday, 2022-02-01 at 16:09:57 -03, Leonardo Brás wrote: > Hello David, thanks for this feedback! > > On Mon, 2022-01-31 at 12:53 +0000, David Edmondson wrote: >> On Saturday, 2022-01-29 at 06:46:45 -03, Leonardo Bras wrote: >> >> > The following steps describe a migration bug: >> > 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu >> > 2 - Migrate to a host with EPYC-Naples cpu >> > >> > The guest kernel crashes shortly after the migration. >> > >> > The crash happens due to a fault caused by XRSTOR: >> > A set bit in XSTATE_BV is not set in XCR0. >> > The faulting bit is FEATURE_PKRU (enabled in Milan, but not in >> > Naples) >> >> I'm trying to understand how this happens. >> >> If we boot on EPYC-Milan with "-cpu EPYC", the PKRU feature should >> not >> be exposed to the VM (it is not available in the EPYC CPU). >> >> Given this, how would bit 0x200 (representing PKRU) end up set in >> xstate_bv? > > During my debug, I noticed this bit gets set before the kernel even > starts. > > It's possible Seabios and/or IPXE are somehow setting 0x200 using the > xrstor command. I am not sure if qemu is able to stop this in KVM mode. I don't believe that this should be possible. If the CPU is set to EPYC in QEMU then .features[FEAT_7_0_ECX] does not include CPUID_7_0_ECX_PKU, which in turn means that when x86_cpu_enable_xsave_components() generates FEAT_XSAVE_COMP_LO it should not set XSTATE_PKRU_BIT. Given that, KVM's vcpu->arch.guest_supported_xcr0 will not include XSTATE_PKRU_BIT, and __kvm_set_xcr() should not allow that bit to be set when it intercepts the guest xsetbv instruction. dme.
Hello David, thank you for the feedback. On Wed, Feb 2, 2022 at 12:47 PM David Edmondson <david.edmondson@oracle.com> wrote: > > On Tuesday, 2022-02-01 at 16:09:57 -03, Leonardo Brás wrote: > > > Hello David, thanks for this feedback! > > > > On Mon, 2022-01-31 at 12:53 +0000, David Edmondson wrote: > >> On Saturday, 2022-01-29 at 06:46:45 -03, Leonardo Bras wrote: > >> > >> > The following steps describe a migration bug: > >> > 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu > >> > 2 - Migrate to a host with EPYC-Naples cpu > >> > > >> > The guest kernel crashes shortly after the migration. > >> > > >> > The crash happens due to a fault caused by XRSTOR: > >> > A set bit in XSTATE_BV is not set in XCR0. > >> > The faulting bit is FEATURE_PKRU (enabled in Milan, but not in > >> > Naples) > >> > >> I'm trying to understand how this happens. > >> > >> If we boot on EPYC-Milan with "-cpu EPYC", the PKRU feature should > >> not > >> be exposed to the VM (it is not available in the EPYC CPU). > >> > >> Given this, how would bit 0x200 (representing PKRU) end up set in > >> xstate_bv? > > > > During my debug, I noticed this bit gets set before the kernel even > > starts. > > > > It's possible Seabios and/or IPXE are somehow setting 0x200 using the > > xrstor command. I am not sure if qemu is able to stop this in KVM mode. > > I don't believe that this should be possible. > > If the CPU is set to EPYC in QEMU then .features[FEAT_7_0_ECX] does not > include CPUID_7_0_ECX_PKU, which in turn means that when > x86_cpu_enable_xsave_components() generates FEAT_XSAVE_COMP_LO it should > not set XSTATE_PKRU_BIT. > > Given that, KVM's vcpu->arch.guest_supported_xcr0 will not include > XSTATE_PKRU_BIT, and __kvm_set_xcr() should not allow that bit to be > set when it intercepts the guest xsetbv instruction. Thanks for sharing those details, it helped me on the kernel side of this bug. FWIW, i did send a patchset fixing this bug to kernel list: https://patchwork.kernel.org/project/kvm/list/?series=611524&state=%2A&archive=both Best regards, Leo
diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c index ac61a96344..0628226234 100644 --- a/target/i386/xsave_helper.c +++ b/target/i386/xsave_helper.c @@ -167,7 +167,7 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen) env->xmm_regs[i].ZMM_Q(1) = ldq_p(xmm + 8); } - env->xstate_bv = header->xstate_bv; + env->xstate_bv = header->xstate_bv & env->features[FEAT_XSAVE_COMP_LO]; e = &x86_ext_save_areas[XSTATE_YMM_BIT]; if (e->size && e->offset) {
The following steps describe a migration bug: 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu 2 - Migrate to a host with EPYC-Naples cpu The guest kernel crashes shortly after the migration. The crash happens due to a fault caused by XRSTOR: A set bit in XSTATE_BV is not set in XCR0. The faulting bit is FEATURE_PKRU (enabled in Milan, but not in Naples) To avoid this kind of bug: In kvm_get_xsave, mask-out from xstate_bv any bits that are not set in current vcpu's features. This keeps cpu->env->xstate_bv with feature bits compatible with any host machine capable of running the vcpu model. Signed-off-by: Leonardo Bras <leobras@redhat.com> --- target/i386/xsave_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)