Message ID | 20231026054201.87845-1-eiichi.tsukata@nutanix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | target/i386/kvm: call kvm_put_vcpu_events() before kvm_put_nested_state() | expand |
Cc'ing Max :-) At first glance the condition in vmx_set_nested_state() is correct so I guess we either have a stale KVM_STATE_NESTED_RUN_PENDING when in SMM or stale smm.flags when outside of it... Philippe Mathieu-Daudé <philmd@linaro.org> writes: > Cc'ing Vitaly. > > On 26/10/23 07:49, Eiichi Tsukata wrote: >> Hi all, >> >> Here is additional details on the issue. >> >> We've found this issue when testing Windows Virtual Secure Mode (VSM) VMs. >> We sometimes saw live migration failures of VSM-enabled VMs. It turned >> out that the issue happens during live migration when VMs change boot related >> EFI variables (ex: BootOrder, Boot0001). >> After some debugging, I've found the race I mentioned in the commit message. >> >> Symptom >> ======= >> >> When it happnes with the latest Qemu which has commit https://github.com/qemu/qemu/commit/7191f24c7fcfbc1216d09 >> Qemu shows the following error message on destination. >> >> qemu-system-x86_64: Failed to put registers after init: Invalid argument >> >> If it happens with older Qemu which doesn't have the commit, then we see CPU dump something like this: >> >> KVM internal error. Suberror: 3 >> extra data[0]: 0x0000000080000b0e >> extra data[1]: 0x0000000000000031 >> extra data[2]: 0x0000000000000683 >> extra data[3]: 0x000000007f809000 >> extra data[4]: 0x0000000000000026 >> RAX=0000000000000000 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000f61 >> RSI=0000000000000000 RDI=0000000000000000 RBP=0000000000000000 RSP=0000000000000000 >> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000 >> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000 >> RIP=000000000000fff0 RFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 >> ES =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] >> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] >> SS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] >> DS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] >> FS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] >> GS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] >> LDT=0000 0000000000000000 ffffffff 00c00000 >> TR =0040 000000007f7df050 00068fff 00808b00 DPL=0 TSS64-busy >> GDT= 000000007f7df000 0000004f >> IDT= 000000007f836000 000001ff >> CR0=80010033 CR2=000000000000fff0 CR3=000000007f809000 CR4=00000668 >> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 >> EFER=0000000000000d00 >> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? >> >> In the above dump, CR3 is pointing to SMRAM region though SMM=0. >> >> Repro >> ===== >> >> Repro step is pretty simple. >> >> * Run SMM enabled Linux guest with secure boot enabled OVMF. >> * Run the following script in the guest. >> >> /usr/libexec/qemu-kvm & >> while true >> do >> efibootmgr -n 1 >> done >> >> * Do live migration >> >> On my environment, live migration fails in 20%. >> >> VMX specific >> ============ >> >> This issue is VMX sepcific and SVM is not affected as the validation >> in svm_set_nested_state() is a bit different from VMX one. >> >> VMX: >> >> static int vmx_set_nested_state(struct kvm_vcpu *vcpu, >> struct kvm_nested_state __user *user_kvm_nested_state, >> struct kvm_nested_state *kvm_state) >> { >> .. /* * SMM temporarily disables VMX, so we cannot be in guest mode, >> * nor can VMLAUNCH/VMRESUME be pending. Outside SMM, SMM flags >> * must be zero. >> */ if (is_smm(vcpu) ? >> (kvm_state->flags & >> (KVM_STATE_NESTED_GUEST_MODE | KVM_STATE_NESTED_RUN_PENDING)) >> : kvm_state->hdr.vmx.smm.flags) >> return -EINVAL; >> .. >> >> SVM: >> >> static int svm_set_nested_state(struct kvm_vcpu *vcpu, >> struct kvm_nested_state __user *user_kvm_nested_state, >> struct kvm_nested_state *kvm_state) >> { >> .. /* SMM temporarily disables SVM, so we cannot be in guest mode. */ if (is_smm(vcpu) && (kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE)) >> return -EINVAL; >> .. >> >> Thanks, >> >> Eiichi >> >>> On Oct 26, 2023, at 14:42, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote: >>> >>> kvm_put_vcpu_events() needs to be called before kvm_put_nested_state() >>> because vCPU's hflag is referred in KVM vmx_get_nested_state() >>> validation. Otherwise kvm_put_nested_state() can fail with -EINVAL when >>> a vCPU is in VMX operation and enters SMM mode. This leads to live >>> migration failure. >>> >>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> >>> --- >>> target/i386/kvm/kvm.c | 13 +++++++++---- >>> 1 file changed, 9 insertions(+), 4 deletions(-) >>> >>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >>> index e7c054cc16..cd635c9142 100644 >>> --- a/target/i386/kvm/kvm.c >>> +++ b/target/i386/kvm/kvm.c >>> @@ -4741,6 +4741,15 @@ int kvm_arch_put_registers(CPUState *cpu, int level) >>> return ret; >>> } >>> >>> + /* >>> + * must be before kvm_put_nested_state so that HF_SMM_MASK is set during >>> + * SMM. >>> + */ >>> + ret = kvm_put_vcpu_events(x86_cpu, level); >>> + if (ret < 0) { >>> + return ret; >>> + } >>> + >>> if (level >= KVM_PUT_RESET_STATE) { >>> ret = kvm_put_nested_state(x86_cpu); >>> if (ret < 0) { >>> @@ -4787,10 +4796,6 @@ int kvm_arch_put_registers(CPUState *cpu, int level) >>> if (ret < 0) { >>> return ret; >>> } >>> - ret = kvm_put_vcpu_events(x86_cpu, level); >>> - if (ret < 0) { >>> - return ret; >>> - } >>> if (level >= KVM_PUT_RESET_STATE) { >>> ret = kvm_put_mp_state(x86_cpu); >>> if (ret < 0) { >>> -- >>> 2.41.0 >>> >> >> >
FYI: The EINVAL in vmx_set_nested_state() is caused by the following condition: * vcpu->arch.hflags == 0 * kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON Please feel free to ask me any more data points you need. Thanks, Eiichi > On Oct 26, 2023, at 17:52, Vitaly Kuznetsov <vkuznets@redhat.com> wrote: > > Cc'ing Max :-) At first glance the condition in vmx_set_nested_state() > is correct so I guess we either have a stale > KVM_STATE_NESTED_RUN_PENDING when in SMM or stale smm.flags when outside > of it... > > Philippe Mathieu-Daudé <philmd@linaro.org> writes: > >> Cc'ing Vitaly. >> >> On 26/10/23 07:49, Eiichi Tsukata wrote: >>> Hi all, >>> >>> Here is additional details on the issue. >>> >>> We've found this issue when testing Windows Virtual Secure Mode (VSM) VMs. >>> We sometimes saw live migration failures of VSM-enabled VMs. It turned >>> out that the issue happens during live migration when VMs change boot related >>> EFI variables (ex: BootOrder, Boot0001). >>> After some debugging, I've found the race I mentioned in the commit message. >>> >>> Symptom >>> ======= >>> >>> When it happnes with the latest Qemu which has commit https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_qemu_qemu_commit_7191f24c7fcfbc1216d09&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=dy01Dr4Ly8mhvnUdx1pZhhT1bkq4h9z5aVWu3paoZtk&m=w7w9eoWDJkzNpMAP1--ljTEkGkHZkB_81JBr2vdK47SK2RQFyAJHI5f13n2bybQF&s=orR7p9iiUFmw98VuhxB-Uc6Jtn91Ldm1V6qOhIqRKnc&e= >>> Qemu shows the following error message on destination. >>> >>> qemu-system-x86_64: Failed to put registers after init: Invalid argument >>> >>> If it happens with older Qemu which doesn't have the commit, then we see CPU dump something like this: >>> >>> KVM internal error. Suberror: 3 >>> extra data[0]: 0x0000000080000b0e >>> extra data[1]: 0x0000000000000031 >>> extra data[2]: 0x0000000000000683 >>> extra data[3]: 0x000000007f809000 >>> extra data[4]: 0x0000000000000026 >>> RAX=0000000000000000 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000f61 >>> RSI=0000000000000000 RDI=0000000000000000 RBP=0000000000000000 RSP=0000000000000000 >>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000 >>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000 >>> RIP=000000000000fff0 RFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 >>> ES =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] >>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] >>> SS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] >>> DS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] >>> FS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] >>> GS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] >>> LDT=0000 0000000000000000 ffffffff 00c00000 >>> TR =0040 000000007f7df050 00068fff 00808b00 DPL=0 TSS64-busy >>> GDT= 000000007f7df000 0000004f >>> IDT= 000000007f836000 000001ff >>> CR0=80010033 CR2=000000000000fff0 CR3=000000007f809000 CR4=00000668 >>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 >>> EFER=0000000000000d00 >>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? >>> >>> In the above dump, CR3 is pointing to SMRAM region though SMM=0. >>> >>> Repro >>> ===== >>> >>> Repro step is pretty simple. >>> >>> * Run SMM enabled Linux guest with secure boot enabled OVMF. >>> * Run the following script in the guest. >>> >>> /usr/libexec/qemu-kvm & >>> while true >>> do >>> efibootmgr -n 1 >>> done >>> >>> * Do live migration >>> >>> On my environment, live migration fails in 20%. >>> >>> VMX specific >>> ============ >>> >>> This issue is VMX sepcific and SVM is not affected as the validation >>> in svm_set_nested_state() is a bit different from VMX one. >>> >>> VMX: >>> >>> static int vmx_set_nested_state(struct kvm_vcpu *vcpu, >>> struct kvm_nested_state __user *user_kvm_nested_state, >>> struct kvm_nested_state *kvm_state) >>> { >>> .. /* * SMM temporarily disables VMX, so we cannot be in guest mode, >>> * nor can VMLAUNCH/VMRESUME be pending. Outside SMM, SMM flags >>> * must be zero. >>> */ if (is_smm(vcpu) ? >>> (kvm_state->flags & >>> (KVM_STATE_NESTED_GUEST_MODE | KVM_STATE_NESTED_RUN_PENDING)) >>> : kvm_state->hdr.vmx.smm.flags) >>> return -EINVAL; >>> .. >>> >>> SVM: >>> >>> static int svm_set_nested_state(struct kvm_vcpu *vcpu, >>> struct kvm_nested_state __user *user_kvm_nested_state, >>> struct kvm_nested_state *kvm_state) >>> { >>> .. /* SMM temporarily disables SVM, so we cannot be in guest mode. */ if (is_smm(vcpu) && (kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE)) >>> return -EINVAL; >>> .. >>> >>> Thanks, >>> >>> Eiichi >>> >>>> On Oct 26, 2023, at 14:42, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote: >>>> >>>> kvm_put_vcpu_events() needs to be called before kvm_put_nested_state() >>>> because vCPU's hflag is referred in KVM vmx_get_nested_state() >>>> validation. Otherwise kvm_put_nested_state() can fail with -EINVAL when >>>> a vCPU is in VMX operation and enters SMM mode. This leads to live >>>> migration failure. >>>> >>>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> >>>> --- >>>> target/i386/kvm/kvm.c | 13 +++++++++---- >>>> 1 file changed, 9 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >>>> index e7c054cc16..cd635c9142 100644 >>>> --- a/target/i386/kvm/kvm.c >>>> +++ b/target/i386/kvm/kvm.c >>>> @@ -4741,6 +4741,15 @@ int kvm_arch_put_registers(CPUState *cpu, int level) >>>> return ret; >>>> } >>>> >>>> + /* >>>> + * must be before kvm_put_nested_state so that HF_SMM_MASK is set during >>>> + * SMM. >>>> + */ >>>> + ret = kvm_put_vcpu_events(x86_cpu, level); >>>> + if (ret < 0) { >>>> + return ret; >>>> + } >>>> + >>>> if (level >= KVM_PUT_RESET_STATE) { >>>> ret = kvm_put_nested_state(x86_cpu); >>>> if (ret < 0) { >>>> @@ -4787,10 +4796,6 @@ int kvm_arch_put_registers(CPUState *cpu, int level) >>>> if (ret < 0) { >>>> return ret; >>>> } >>>> - ret = kvm_put_vcpu_events(x86_cpu, level); >>>> - if (ret < 0) { >>>> - return ret; >>>> - } >>>> if (level >= KVM_PUT_RESET_STATE) { >>>> ret = kvm_put_mp_state(x86_cpu); >>>> if (ret < 0) { >>>> -- >>>> 2.41.0 >>>> >>> >>> >> > > -- > Vitaly
Eiichi Tsukata <eiichi.tsukata@nutanix.com> writes: > FYI: The EINVAL in vmx_set_nested_state() is caused by the following condition: > * vcpu->arch.hflags == 0 > * kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON This is a weird state indeed, 'vcpu->arch.hflags == 0' means we're not in SMM and not in guest mode but kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON is a reflection of vmx->nested.smm.vmxon (see vmx_get_nested_state()). vmx->nested.smm.vmxon gets set (conditioally) in vmx_enter_smm() and gets cleared in vmx_leave_smm() which means the vCPU must be in SMM to have it set. In case the vCPU is in SMM upon migration, HF_SMM_MASK must be set from kvm_vcpu_ioctl_x86_set_vcpu_events() -> kvm_smm_changed() but QEMU's kvm_put_vcpu_events() calls kvm_put_nested_state() _before_ kvm_put_vcpu_events(). This can explain "vcpu->arch.hflags == 0". Paolo, Max, any idea how this is supposed to work?
Hi all, appreciate any comments or feedbacks on the patch. Thanks, Eiichi > On Nov 1, 2023, at 23:04, Vitaly Kuznetsov <vkuznets@redhat.com> wrote: > > Eiichi Tsukata <eiichi.tsukata@nutanix.com> writes: > >> FYI: The EINVAL in vmx_set_nested_state() is caused by the following condition: >> * vcpu->arch.hflags == 0 >> * kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON > > This is a weird state indeed, > > 'vcpu->arch.hflags == 0' means we're not in SMM and not in guest mode > but kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON is a > reflection of vmx->nested.smm.vmxon (see > vmx_get_nested_state()). vmx->nested.smm.vmxon gets set (conditioally) > in vmx_enter_smm() and gets cleared in vmx_leave_smm() which means the > vCPU must be in SMM to have it set. > > In case the vCPU is in SMM upon migration, HF_SMM_MASK must be set from > kvm_vcpu_ioctl_x86_set_vcpu_events() -> kvm_smm_changed() but QEMU's > kvm_put_vcpu_events() calls kvm_put_nested_state() _before_ > kvm_put_vcpu_events(). This can explain "vcpu->arch.hflags == 0". > > Paolo, Max, any idea how this is supposed to work? > > -- > Vitaly >
Ping. > On Nov 8, 2023, at 10:12, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote: > > Hi all, appreciate any comments or feedbacks on the patch. > > Thanks, > Eiichi > >> On Nov 1, 2023, at 23:04, Vitaly Kuznetsov <vkuznets@redhat.com> wrote: >> >> Eiichi Tsukata <eiichi.tsukata@nutanix.com> writes: >> >>> FYI: The EINVAL in vmx_set_nested_state() is caused by the following condition: >>> * vcpu->arch.hflags == 0 >>> * kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON >> >> This is a weird state indeed, >> >> 'vcpu->arch.hflags == 0' means we're not in SMM and not in guest mode >> but kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON is a >> reflection of vmx->nested.smm.vmxon (see >> vmx_get_nested_state()). vmx->nested.smm.vmxon gets set (conditioally) >> in vmx_enter_smm() and gets cleared in vmx_leave_smm() which means the >> vCPU must be in SMM to have it set. >> >> In case the vCPU is in SMM upon migration, HF_SMM_MASK must be set from >> kvm_vcpu_ioctl_x86_set_vcpu_events() -> kvm_smm_changed() but QEMU's >> kvm_put_vcpu_events() calls kvm_put_nested_state() _before_ >> kvm_put_vcpu_events(). This can explain "vcpu->arch.hflags == 0". >> >> Paolo, Max, any idea how this is supposed to work? >> >> -- >> Vitaly >> >
As I'm the addressee of the ping for some reason ... :-) the fix looks good to me but I'm not sure about all the consequences of moving kvm_put_vcpu_events() to an earlier stage. Max, Paolo, please take a look! Eiichi Tsukata <eiichi.tsukata@nutanix.com> writes: > Ping. > >> On Nov 8, 2023, at 10:12, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote: >> >> Hi all, appreciate any comments or feedbacks on the patch. >> >> Thanks, >> Eiichi >> >>> On Nov 1, 2023, at 23:04, Vitaly Kuznetsov <vkuznets@redhat.com> wrote: >>> >>> Eiichi Tsukata <eiichi.tsukata@nutanix.com> writes: >>> >>>> FYI: The EINVAL in vmx_set_nested_state() is caused by the following condition: >>>> * vcpu->arch.hflags == 0 >>>> * kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON >>> >>> This is a weird state indeed, >>> >>> 'vcpu->arch.hflags == 0' means we're not in SMM and not in guest mode >>> but kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON is a >>> reflection of vmx->nested.smm.vmxon (see >>> vmx_get_nested_state()). vmx->nested.smm.vmxon gets set (conditioally) >>> in vmx_enter_smm() and gets cleared in vmx_leave_smm() which means the >>> vCPU must be in SMM to have it set. >>> >>> In case the vCPU is in SMM upon migration, HF_SMM_MASK must be set from >>> kvm_vcpu_ioctl_x86_set_vcpu_events() -> kvm_smm_changed() but QEMU's >>> kvm_put_vcpu_events() calls kvm_put_nested_state() _before_ >>> kvm_put_vcpu_events(). This can explain "vcpu->arch.hflags == 0". >>> >>> Paolo, Max, any idea how this is supposed to work? >>> >>> -- >>> Vitaly >>> >> >
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index e7c054cc16..cd635c9142 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -4741,6 +4741,15 @@ int kvm_arch_put_registers(CPUState *cpu, int level) return ret; } + /* + * must be before kvm_put_nested_state so that HF_SMM_MASK is set during + * SMM. + */ + ret = kvm_put_vcpu_events(x86_cpu, level); + if (ret < 0) { + return ret; + } + if (level >= KVM_PUT_RESET_STATE) { ret = kvm_put_nested_state(x86_cpu); if (ret < 0) { @@ -4787,10 +4796,6 @@ int kvm_arch_put_registers(CPUState *cpu, int level) if (ret < 0) { return ret; } - ret = kvm_put_vcpu_events(x86_cpu, level); - if (ret < 0) { - return ret; - } if (level >= KVM_PUT_RESET_STATE) { ret = kvm_put_mp_state(x86_cpu); if (ret < 0) {
kvm_put_vcpu_events() needs to be called before kvm_put_nested_state() because vCPU's hflag is referred in KVM vmx_get_nested_state() validation. Otherwise kvm_put_nested_state() can fail with -EINVAL when a vCPU is in VMX operation and enters SMM mode. This leads to live migration failure. Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> --- target/i386/kvm/kvm.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)