diff mbox series

target/i386/kvm: call kvm_put_vcpu_events() before kvm_put_nested_state()

Message ID 20231026054201.87845-1-eiichi.tsukata@nutanix.com (mailing list archive)
State New, archived
Headers show
Series target/i386/kvm: call kvm_put_vcpu_events() before kvm_put_nested_state() | expand

Commit Message

Eiichi Tsukata Oct. 26, 2023, 5:42 a.m. UTC
kvm_put_vcpu_events() needs to be called before kvm_put_nested_state()
because vCPU's hflag is referred in KVM vmx_get_nested_state()
validation. Otherwise kvm_put_nested_state() can fail with -EINVAL when
a vCPU is in VMX operation and enters SMM mode. This leads to live
migration failure.

Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
---
 target/i386/kvm/kvm.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

Comments

Eiichi Tsukata Oct. 26, 2023, 5:49 a.m. UTC | #1
Hi all,

Here is additional details on the issue.

We've found this issue when testing Windows Virtual Secure Mode (VSM) VMs.
We sometimes saw live migration failures of VSM-enabled VMs. It turned
out that the issue happens during live migration when VMs change boot related
EFI variables (ex: BootOrder, Boot0001).
After some debugging, I've found the race I mentioned in the commit message.

Symptom
=======

When it happnes with the latest Qemu which has commit https://github.com/qemu/qemu/commit/7191f24c7fcfbc1216d09
Qemu shows the following error message on destination.

  qemu-system-x86_64: Failed to put registers after init: Invalid argument

If it happens with older Qemu which doesn't have the commit, then we see  CPU dump something like this:

  KVM internal error. Suberror: 3
  extra data[0]: 0x0000000080000b0e
  extra data[1]: 0x0000000000000031
  extra data[2]: 0x0000000000000683
  extra data[3]: 0x000000007f809000
  extra data[4]: 0x0000000000000026
  RAX=0000000000000000 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000f61
  RSI=0000000000000000 RDI=0000000000000000 RBP=0000000000000000 RSP=0000000000000000
  R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
  R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
  RIP=000000000000fff0 RFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
  CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
  SS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
  DS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
  FS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
  GS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
  LDT=0000 0000000000000000 ffffffff 00c00000
  TR =0040 000000007f7df050 00068fff 00808b00 DPL=0 TSS64-busy
  GDT=     000000007f7df000 0000004f
  IDT=     000000007f836000 000001ff
  CR0=80010033 CR2=000000000000fff0 CR3=000000007f809000 CR4=00000668
  DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000    DR6=00000000ffff0ff0 DR7=0000000000000400
  EFER=0000000000000d00
  Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

In the above dump, CR3 is pointing to SMRAM region though SMM=0.

Repro
=====

Repro step is pretty simple.

* Run SMM enabled Linux guest with secure boot enabled OVMF.
* Run the following script in the guest.

  /usr/libexec/qemu-kvm &
  while true
  do
    efibootmgr -n 1
  done

* Do live migration

On my environment, live migration fails in 20%.

VMX specific
============

This issue is VMX sepcific and SVM is not affected as the validation
in svm_set_nested_state() is a bit different from VMX one.

VMX:

  static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
                                  struct kvm_nested_state __user *user_kvm_nested_state,
                                  struct kvm_nested_state *kvm_state)
  {
  ..           /*             * SMM temporarily disables VMX, so we cannot be in guest mode,
         * nor can VMLAUNCH/VMRESUME be pending.  Outside SMM, SMM flags
         * must be zero.
         */           if (is_smm(vcpu) ?
                (kvm_state->flags &
                 (KVM_STATE_NESTED_GUEST_MODE | KVM_STATE_NESTED_RUN_PENDING))
                : kvm_state->hdr.vmx.smm.flags)
                return -EINVAL;
  ..   

SVM:

  static int svm_set_nested_state(struct kvm_vcpu *vcpu,
                                  struct kvm_nested_state __user *user_kvm_nested_state,
                                  struct kvm_nested_state *kvm_state)
  {
  ..           /* SMM temporarily disables SVM, so we cannot be in guest mode.  */           if (is_smm(vcpu) && (kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE))
                return -EINVAL;
  ..   

Thanks,

Eiichi

> On Oct 26, 2023, at 14:42, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> 
> kvm_put_vcpu_events() needs to be called before kvm_put_nested_state()
> because vCPU's hflag is referred in KVM vmx_get_nested_state()
> validation. Otherwise kvm_put_nested_state() can fail with -EINVAL when
> a vCPU is in VMX operation and enters SMM mode. This leads to live
> migration failure.
> 
> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
> ---
> target/i386/kvm/kvm.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index e7c054cc16..cd635c9142 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -4741,6 +4741,15 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
>         return ret;
>     }
> 
> +    /*
> +     * must be before kvm_put_nested_state so that HF_SMM_MASK is set during
> +     * SMM.
> +     */
> +    ret = kvm_put_vcpu_events(x86_cpu, level);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
>     if (level >= KVM_PUT_RESET_STATE) {
>         ret = kvm_put_nested_state(x86_cpu);
>         if (ret < 0) {
> @@ -4787,10 +4796,6 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
>     if (ret < 0) {
>         return ret;
>     }
> -    ret = kvm_put_vcpu_events(x86_cpu, level);
> -    if (ret < 0) {
> -        return ret;
> -    }
>     if (level >= KVM_PUT_RESET_STATE) {
>         ret = kvm_put_mp_state(x86_cpu);
>         if (ret < 0) {
> -- 
> 2.41.0
>
Philippe Mathieu-Daudé Oct. 26, 2023, 5:52 a.m. UTC | #2
Cc'ing Vitaly.

On 26/10/23 07:49, Eiichi Tsukata wrote:
> Hi all,
> 
> Here is additional details on the issue.
> 
> We've found this issue when testing Windows Virtual Secure Mode (VSM) VMs.
> We sometimes saw live migration failures of VSM-enabled VMs. It turned
> out that the issue happens during live migration when VMs change boot related
> EFI variables (ex: BootOrder, Boot0001).
> After some debugging, I've found the race I mentioned in the commit message.
> 
> Symptom
> =======
> 
> When it happnes with the latest Qemu which has commit https://github.com/qemu/qemu/commit/7191f24c7fcfbc1216d09
> Qemu shows the following error message on destination.
> 
>    qemu-system-x86_64: Failed to put registers after init: Invalid argument
> 
> If it happens with older Qemu which doesn't have the commit, then we see  CPU dump something like this:
> 
>    KVM internal error. Suberror: 3
>    extra data[0]: 0x0000000080000b0e
>    extra data[1]: 0x0000000000000031
>    extra data[2]: 0x0000000000000683
>    extra data[3]: 0x000000007f809000
>    extra data[4]: 0x0000000000000026
>    RAX=0000000000000000 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000f61
>    RSI=0000000000000000 RDI=0000000000000000 RBP=0000000000000000 RSP=0000000000000000
>    R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>    R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>    RIP=000000000000fff0 RFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
>    ES =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>    CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>    SS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>    DS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>    FS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>    GS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>    LDT=0000 0000000000000000 ffffffff 00c00000
>    TR =0040 000000007f7df050 00068fff 00808b00 DPL=0 TSS64-busy
>    GDT=     000000007f7df000 0000004f
>    IDT=     000000007f836000 000001ff
>    CR0=80010033 CR2=000000000000fff0 CR3=000000007f809000 CR4=00000668
>    DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000    DR6=00000000ffff0ff0 DR7=0000000000000400
>    EFER=0000000000000d00
>    Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> 
> In the above dump, CR3 is pointing to SMRAM region though SMM=0.
> 
> Repro
> =====
> 
> Repro step is pretty simple.
> 
> * Run SMM enabled Linux guest with secure boot enabled OVMF.
> * Run the following script in the guest.
> 
>    /usr/libexec/qemu-kvm &
>    while true
>    do
>      efibootmgr -n 1
>    done
> 
> * Do live migration
> 
> On my environment, live migration fails in 20%.
> 
> VMX specific
> ============
> 
> This issue is VMX sepcific and SVM is not affected as the validation
> in svm_set_nested_state() is a bit different from VMX one.
> 
> VMX:
> 
>    static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
>                                    struct kvm_nested_state __user *user_kvm_nested_state,
>                                    struct kvm_nested_state *kvm_state)
>    {
>    ..           /*             * SMM temporarily disables VMX, so we cannot be in guest mode,
>           * nor can VMLAUNCH/VMRESUME be pending.  Outside SMM, SMM flags
>           * must be zero.
>           */           if (is_smm(vcpu) ?
>                  (kvm_state->flags &
>                   (KVM_STATE_NESTED_GUEST_MODE | KVM_STATE_NESTED_RUN_PENDING))
>                  : kvm_state->hdr.vmx.smm.flags)
>                  return -EINVAL;
>    ..
> 
> SVM:
> 
>    static int svm_set_nested_state(struct kvm_vcpu *vcpu,
>                                    struct kvm_nested_state __user *user_kvm_nested_state,
>                                    struct kvm_nested_state *kvm_state)
>    {
>    ..           /* SMM temporarily disables SVM, so we cannot be in guest mode.  */           if (is_smm(vcpu) && (kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE))
>                  return -EINVAL;
>    ..
> 
> Thanks,
> 
> Eiichi
> 
>> On Oct 26, 2023, at 14:42, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
>>
>> kvm_put_vcpu_events() needs to be called before kvm_put_nested_state()
>> because vCPU's hflag is referred in KVM vmx_get_nested_state()
>> validation. Otherwise kvm_put_nested_state() can fail with -EINVAL when
>> a vCPU is in VMX operation and enters SMM mode. This leads to live
>> migration failure.
>>
>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
>> ---
>> target/i386/kvm/kvm.c | 13 +++++++++----
>> 1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index e7c054cc16..cd635c9142 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -4741,6 +4741,15 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
>>          return ret;
>>      }
>>
>> +    /*
>> +     * must be before kvm_put_nested_state so that HF_SMM_MASK is set during
>> +     * SMM.
>> +     */
>> +    ret = kvm_put_vcpu_events(x86_cpu, level);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>>      if (level >= KVM_PUT_RESET_STATE) {
>>          ret = kvm_put_nested_state(x86_cpu);
>>          if (ret < 0) {
>> @@ -4787,10 +4796,6 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
>>      if (ret < 0) {
>>          return ret;
>>      }
>> -    ret = kvm_put_vcpu_events(x86_cpu, level);
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>>      if (level >= KVM_PUT_RESET_STATE) {
>>          ret = kvm_put_mp_state(x86_cpu);
>>          if (ret < 0) {
>> -- 
>> 2.41.0
>>
> 
>
Vitaly Kuznetsov Oct. 26, 2023, 8:52 a.m. UTC | #3
Cc'ing Max :-) At first glance the condition in vmx_set_nested_state()
is correct so I guess we either have a stale
KVM_STATE_NESTED_RUN_PENDING when in SMM or stale smm.flags when outside
of it...

Philippe Mathieu-Daudé <philmd@linaro.org> writes:

> Cc'ing Vitaly.
>
> On 26/10/23 07:49, Eiichi Tsukata wrote:
>> Hi all,
>> 
>> Here is additional details on the issue.
>> 
>> We've found this issue when testing Windows Virtual Secure Mode (VSM) VMs.
>> We sometimes saw live migration failures of VSM-enabled VMs. It turned
>> out that the issue happens during live migration when VMs change boot related
>> EFI variables (ex: BootOrder, Boot0001).
>> After some debugging, I've found the race I mentioned in the commit message.
>> 
>> Symptom
>> =======
>> 
>> When it happnes with the latest Qemu which has commit https://github.com/qemu/qemu/commit/7191f24c7fcfbc1216d09
>> Qemu shows the following error message on destination.
>> 
>>    qemu-system-x86_64: Failed to put registers after init: Invalid argument
>> 
>> If it happens with older Qemu which doesn't have the commit, then we see  CPU dump something like this:
>> 
>>    KVM internal error. Suberror: 3
>>    extra data[0]: 0x0000000080000b0e
>>    extra data[1]: 0x0000000000000031
>>    extra data[2]: 0x0000000000000683
>>    extra data[3]: 0x000000007f809000
>>    extra data[4]: 0x0000000000000026
>>    RAX=0000000000000000 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000f61
>>    RSI=0000000000000000 RDI=0000000000000000 RBP=0000000000000000 RSP=0000000000000000
>>    R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>>    R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>    RIP=000000000000fff0 RFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>    ES =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>    CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>>    SS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>    DS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>    FS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>    GS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>    LDT=0000 0000000000000000 ffffffff 00c00000
>>    TR =0040 000000007f7df050 00068fff 00808b00 DPL=0 TSS64-busy
>>    GDT=     000000007f7df000 0000004f
>>    IDT=     000000007f836000 000001ff
>>    CR0=80010033 CR2=000000000000fff0 CR3=000000007f809000 CR4=00000668
>>    DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000    DR6=00000000ffff0ff0 DR7=0000000000000400
>>    EFER=0000000000000d00
>>    Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>> 
>> In the above dump, CR3 is pointing to SMRAM region though SMM=0.
>> 
>> Repro
>> =====
>> 
>> Repro step is pretty simple.
>> 
>> * Run SMM enabled Linux guest with secure boot enabled OVMF.
>> * Run the following script in the guest.
>> 
>>    /usr/libexec/qemu-kvm &
>>    while true
>>    do
>>      efibootmgr -n 1
>>    done
>> 
>> * Do live migration
>> 
>> On my environment, live migration fails in 20%.
>> 
>> VMX specific
>> ============
>> 
>> This issue is VMX sepcific and SVM is not affected as the validation
>> in svm_set_nested_state() is a bit different from VMX one.
>> 
>> VMX:
>> 
>>    static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
>>                                    struct kvm_nested_state __user *user_kvm_nested_state,
>>                                    struct kvm_nested_state *kvm_state)
>>    {
>>    ..           /*             * SMM temporarily disables VMX, so we cannot be in guest mode,
>>           * nor can VMLAUNCH/VMRESUME be pending.  Outside SMM, SMM flags
>>           * must be zero.
>>           */           if (is_smm(vcpu) ?
>>                  (kvm_state->flags &
>>                   (KVM_STATE_NESTED_GUEST_MODE | KVM_STATE_NESTED_RUN_PENDING))
>>                  : kvm_state->hdr.vmx.smm.flags)
>>                  return -EINVAL;
>>    ..
>> 
>> SVM:
>> 
>>    static int svm_set_nested_state(struct kvm_vcpu *vcpu,
>>                                    struct kvm_nested_state __user *user_kvm_nested_state,
>>                                    struct kvm_nested_state *kvm_state)
>>    {
>>    ..           /* SMM temporarily disables SVM, so we cannot be in guest mode.  */           if (is_smm(vcpu) && (kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE))
>>                  return -EINVAL;
>>    ..
>> 
>> Thanks,
>> 
>> Eiichi
>> 
>>> On Oct 26, 2023, at 14:42, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
>>>
>>> kvm_put_vcpu_events() needs to be called before kvm_put_nested_state()
>>> because vCPU's hflag is referred in KVM vmx_get_nested_state()
>>> validation. Otherwise kvm_put_nested_state() can fail with -EINVAL when
>>> a vCPU is in VMX operation and enters SMM mode. This leads to live
>>> migration failure.
>>>
>>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
>>> ---
>>> target/i386/kvm/kvm.c | 13 +++++++++----
>>> 1 file changed, 9 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>> index e7c054cc16..cd635c9142 100644
>>> --- a/target/i386/kvm/kvm.c
>>> +++ b/target/i386/kvm/kvm.c
>>> @@ -4741,6 +4741,15 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
>>>          return ret;
>>>      }
>>>
>>> +    /*
>>> +     * must be before kvm_put_nested_state so that HF_SMM_MASK is set during
>>> +     * SMM.
>>> +     */
>>> +    ret = kvm_put_vcpu_events(x86_cpu, level);
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>> +
>>>      if (level >= KVM_PUT_RESET_STATE) {
>>>          ret = kvm_put_nested_state(x86_cpu);
>>>          if (ret < 0) {
>>> @@ -4787,10 +4796,6 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
>>>      if (ret < 0) {
>>>          return ret;
>>>      }
>>> -    ret = kvm_put_vcpu_events(x86_cpu, level);
>>> -    if (ret < 0) {
>>> -        return ret;
>>> -    }
>>>      if (level >= KVM_PUT_RESET_STATE) {
>>>          ret = kvm_put_mp_state(x86_cpu);
>>>          if (ret < 0) {
>>> -- 
>>> 2.41.0
>>>
>> 
>> 
>
Eiichi Tsukata Nov. 1, 2023, 2:09 a.m. UTC | #4
FYI: The EINVAL in vmx_set_nested_state() is caused by the following condition:
* vcpu->arch.hflags == 0
* kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON

Please feel free to ask me any more data points you need.

Thanks,
Eiichi

> On Oct 26, 2023, at 17:52, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> 
> Cc'ing Max :-) At first glance the condition in vmx_set_nested_state()
> is correct so I guess we either have a stale
> KVM_STATE_NESTED_RUN_PENDING when in SMM or stale smm.flags when outside
> of it...
> 
> Philippe Mathieu-Daudé <philmd@linaro.org> writes:
> 
>> Cc'ing Vitaly.
>> 
>> On 26/10/23 07:49, Eiichi Tsukata wrote:
>>> Hi all,
>>> 
>>> Here is additional details on the issue.
>>> 
>>> We've found this issue when testing Windows Virtual Secure Mode (VSM) VMs.
>>> We sometimes saw live migration failures of VSM-enabled VMs. It turned
>>> out that the issue happens during live migration when VMs change boot related
>>> EFI variables (ex: BootOrder, Boot0001).
>>> After some debugging, I've found the race I mentioned in the commit message.
>>> 
>>> Symptom
>>> =======
>>> 
>>> When it happnes with the latest Qemu which has commit https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_qemu_qemu_commit_7191f24c7fcfbc1216d09&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=dy01Dr4Ly8mhvnUdx1pZhhT1bkq4h9z5aVWu3paoZtk&m=w7w9eoWDJkzNpMAP1--ljTEkGkHZkB_81JBr2vdK47SK2RQFyAJHI5f13n2bybQF&s=orR7p9iiUFmw98VuhxB-Uc6Jtn91Ldm1V6qOhIqRKnc&e= 
>>> Qemu shows the following error message on destination.
>>> 
>>>   qemu-system-x86_64: Failed to put registers after init: Invalid argument
>>> 
>>> If it happens with older Qemu which doesn't have the commit, then we see  CPU dump something like this:
>>> 
>>>   KVM internal error. Suberror: 3
>>>   extra data[0]: 0x0000000080000b0e
>>>   extra data[1]: 0x0000000000000031
>>>   extra data[2]: 0x0000000000000683
>>>   extra data[3]: 0x000000007f809000
>>>   extra data[4]: 0x0000000000000026
>>>   RAX=0000000000000000 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000f61
>>>   RSI=0000000000000000 RDI=0000000000000000 RBP=0000000000000000 RSP=0000000000000000
>>>   R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
>>>   R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
>>>   RIP=000000000000fff0 RFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>>   ES =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>>   CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>>>   SS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>>   DS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>>   FS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>>   GS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>>>   LDT=0000 0000000000000000 ffffffff 00c00000
>>>   TR =0040 000000007f7df050 00068fff 00808b00 DPL=0 TSS64-busy
>>>   GDT=     000000007f7df000 0000004f
>>>   IDT=     000000007f836000 000001ff
>>>   CR0=80010033 CR2=000000000000fff0 CR3=000000007f809000 CR4=00000668
>>>   DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000    DR6=00000000ffff0ff0 DR7=0000000000000400
>>>   EFER=0000000000000d00
>>>   Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>>> 
>>> In the above dump, CR3 is pointing to SMRAM region though SMM=0.
>>> 
>>> Repro
>>> =====
>>> 
>>> Repro step is pretty simple.
>>> 
>>> * Run SMM enabled Linux guest with secure boot enabled OVMF.
>>> * Run the following script in the guest.
>>> 
>>>   /usr/libexec/qemu-kvm &
>>>   while true
>>>   do
>>>     efibootmgr -n 1
>>>   done
>>> 
>>> * Do live migration
>>> 
>>> On my environment, live migration fails in 20%.
>>> 
>>> VMX specific
>>> ============
>>> 
>>> This issue is VMX sepcific and SVM is not affected as the validation
>>> in svm_set_nested_state() is a bit different from VMX one.
>>> 
>>> VMX:
>>> 
>>>   static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
>>>                                   struct kvm_nested_state __user *user_kvm_nested_state,
>>>                                   struct kvm_nested_state *kvm_state)
>>>   {
>>>   ..           /*             * SMM temporarily disables VMX, so we cannot be in guest mode,
>>>          * nor can VMLAUNCH/VMRESUME be pending.  Outside SMM, SMM flags
>>>          * must be zero.
>>>          */           if (is_smm(vcpu) ?
>>>                 (kvm_state->flags &
>>>                  (KVM_STATE_NESTED_GUEST_MODE | KVM_STATE_NESTED_RUN_PENDING))
>>>                 : kvm_state->hdr.vmx.smm.flags)
>>>                 return -EINVAL;
>>>   ..
>>> 
>>> SVM:
>>> 
>>>   static int svm_set_nested_state(struct kvm_vcpu *vcpu,
>>>                                   struct kvm_nested_state __user *user_kvm_nested_state,
>>>                                   struct kvm_nested_state *kvm_state)
>>>   {
>>>   ..           /* SMM temporarily disables SVM, so we cannot be in guest mode.  */           if (is_smm(vcpu) && (kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE))
>>>                 return -EINVAL;
>>>   ..
>>> 
>>> Thanks,
>>> 
>>> Eiichi
>>> 
>>>> On Oct 26, 2023, at 14:42, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
>>>> 
>>>> kvm_put_vcpu_events() needs to be called before kvm_put_nested_state()
>>>> because vCPU's hflag is referred in KVM vmx_get_nested_state()
>>>> validation. Otherwise kvm_put_nested_state() can fail with -EINVAL when
>>>> a vCPU is in VMX operation and enters SMM mode. This leads to live
>>>> migration failure.
>>>> 
>>>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
>>>> ---
>>>> target/i386/kvm/kvm.c | 13 +++++++++----
>>>> 1 file changed, 9 insertions(+), 4 deletions(-)
>>>> 
>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>>> index e7c054cc16..cd635c9142 100644
>>>> --- a/target/i386/kvm/kvm.c
>>>> +++ b/target/i386/kvm/kvm.c
>>>> @@ -4741,6 +4741,15 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
>>>>         return ret;
>>>>     }
>>>> 
>>>> +    /*
>>>> +     * must be before kvm_put_nested_state so that HF_SMM_MASK is set during
>>>> +     * SMM.
>>>> +     */
>>>> +    ret = kvm_put_vcpu_events(x86_cpu, level);
>>>> +    if (ret < 0) {
>>>> +        return ret;
>>>> +    }
>>>> +
>>>>     if (level >= KVM_PUT_RESET_STATE) {
>>>>         ret = kvm_put_nested_state(x86_cpu);
>>>>         if (ret < 0) {
>>>> @@ -4787,10 +4796,6 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
>>>>     if (ret < 0) {
>>>>         return ret;
>>>>     }
>>>> -    ret = kvm_put_vcpu_events(x86_cpu, level);
>>>> -    if (ret < 0) {
>>>> -        return ret;
>>>> -    }
>>>>     if (level >= KVM_PUT_RESET_STATE) {
>>>>         ret = kvm_put_mp_state(x86_cpu);
>>>>         if (ret < 0) {
>>>> -- 
>>>> 2.41.0
>>>> 
>>> 
>>> 
>> 
> 
> -- 
> Vitaly
Vitaly Kuznetsov Nov. 1, 2023, 2:04 p.m. UTC | #5
Eiichi Tsukata <eiichi.tsukata@nutanix.com> writes:

> FYI: The EINVAL in vmx_set_nested_state() is caused by the following condition:
> * vcpu->arch.hflags == 0
> * kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON

This is a weird state indeed,

'vcpu->arch.hflags == 0' means we're not in SMM and not in guest mode
but kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON is a
reflection of vmx->nested.smm.vmxon (see
vmx_get_nested_state()). vmx->nested.smm.vmxon gets set (conditioally)
in vmx_enter_smm() and gets cleared in vmx_leave_smm() which means the
vCPU must be in SMM to have it set.

In case the vCPU is in SMM upon migration, HF_SMM_MASK must be set from
kvm_vcpu_ioctl_x86_set_vcpu_events() -> kvm_smm_changed() but QEMU's
kvm_put_vcpu_events() calls kvm_put_nested_state() _before_
kvm_put_vcpu_events(). This can explain "vcpu->arch.hflags == 0".

Paolo, Max, any idea how this is supposed to work?
Eiichi Tsukata Nov. 8, 2023, 1:12 a.m. UTC | #6
Hi all, appreciate any comments or feedbacks on the patch.

Thanks,
Eiichi

> On Nov 1, 2023, at 23:04, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> 
> Eiichi Tsukata <eiichi.tsukata@nutanix.com> writes:
> 
>> FYI: The EINVAL in vmx_set_nested_state() is caused by the following condition:
>> * vcpu->arch.hflags == 0
>> * kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON
> 
> This is a weird state indeed,
> 
> 'vcpu->arch.hflags == 0' means we're not in SMM and not in guest mode
> but kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON is a
> reflection of vmx->nested.smm.vmxon (see
> vmx_get_nested_state()). vmx->nested.smm.vmxon gets set (conditioally)
> in vmx_enter_smm() and gets cleared in vmx_leave_smm() which means the
> vCPU must be in SMM to have it set.
> 
> In case the vCPU is in SMM upon migration, HF_SMM_MASK must be set from
> kvm_vcpu_ioctl_x86_set_vcpu_events() -> kvm_smm_changed() but QEMU's
> kvm_put_vcpu_events() calls kvm_put_nested_state() _before_
> kvm_put_vcpu_events(). This can explain "vcpu->arch.hflags == 0".
> 
> Paolo, Max, any idea how this is supposed to work?
> 
> -- 
> Vitaly
>
Eiichi Tsukata Jan. 16, 2024, 12:13 a.m. UTC | #7
Ping.

> On Nov 8, 2023, at 10:12, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> 
> Hi all, appreciate any comments or feedbacks on the patch.
> 
> Thanks,
> Eiichi
> 
>> On Nov 1, 2023, at 23:04, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> 
>> Eiichi Tsukata <eiichi.tsukata@nutanix.com> writes:
>> 
>>> FYI: The EINVAL in vmx_set_nested_state() is caused by the following condition:
>>> * vcpu->arch.hflags == 0
>>> * kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON
>> 
>> This is a weird state indeed,
>> 
>> 'vcpu->arch.hflags == 0' means we're not in SMM and not in guest mode
>> but kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON is a
>> reflection of vmx->nested.smm.vmxon (see
>> vmx_get_nested_state()). vmx->nested.smm.vmxon gets set (conditioally)
>> in vmx_enter_smm() and gets cleared in vmx_leave_smm() which means the
>> vCPU must be in SMM to have it set.
>> 
>> In case the vCPU is in SMM upon migration, HF_SMM_MASK must be set from
>> kvm_vcpu_ioctl_x86_set_vcpu_events() -> kvm_smm_changed() but QEMU's
>> kvm_put_vcpu_events() calls kvm_put_nested_state() _before_
>> kvm_put_vcpu_events(). This can explain "vcpu->arch.hflags == 0".
>> 
>> Paolo, Max, any idea how this is supposed to work?
>> 
>> -- 
>> Vitaly
>> 
>
Vitaly Kuznetsov Jan. 16, 2024, 9:31 a.m. UTC | #8
As I'm the addressee of the ping for some reason ... :-)

the fix looks good to me but I'm not sure about all the consequences of
moving kvm_put_vcpu_events() to an earlier stage. Max, Paolo, please
take a look!

Eiichi Tsukata <eiichi.tsukata@nutanix.com> writes:

> Ping.
>
>> On Nov 8, 2023, at 10:12, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
>> 
>> Hi all, appreciate any comments or feedbacks on the patch.
>> 
>> Thanks,
>> Eiichi
>> 
>>> On Nov 1, 2023, at 23:04, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>> 
>>> Eiichi Tsukata <eiichi.tsukata@nutanix.com> writes:
>>> 
>>>> FYI: The EINVAL in vmx_set_nested_state() is caused by the following condition:
>>>> * vcpu->arch.hflags == 0
>>>> * kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON
>>> 
>>> This is a weird state indeed,
>>> 
>>> 'vcpu->arch.hflags == 0' means we're not in SMM and not in guest mode
>>> but kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON is a
>>> reflection of vmx->nested.smm.vmxon (see
>>> vmx_get_nested_state()). vmx->nested.smm.vmxon gets set (conditioally)
>>> in vmx_enter_smm() and gets cleared in vmx_leave_smm() which means the
>>> vCPU must be in SMM to have it set.
>>> 
>>> In case the vCPU is in SMM upon migration, HF_SMM_MASK must be set from
>>> kvm_vcpu_ioctl_x86_set_vcpu_events() -> kvm_smm_changed() but QEMU's
>>> kvm_put_vcpu_events() calls kvm_put_nested_state() _before_
>>> kvm_put_vcpu_events(). This can explain "vcpu->arch.hflags == 0".
>>> 
>>> Paolo, Max, any idea how this is supposed to work?
>>> 
>>> -- 
>>> Vitaly
>>> 
>> 
>
diff mbox series

Patch

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index e7c054cc16..cd635c9142 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4741,6 +4741,15 @@  int kvm_arch_put_registers(CPUState *cpu, int level)
         return ret;
     }
 
+    /*
+     * must be before kvm_put_nested_state so that HF_SMM_MASK is set during
+     * SMM.
+     */
+    ret = kvm_put_vcpu_events(x86_cpu, level);
+    if (ret < 0) {
+        return ret;
+    }
+
     if (level >= KVM_PUT_RESET_STATE) {
         ret = kvm_put_nested_state(x86_cpu);
         if (ret < 0) {
@@ -4787,10 +4796,6 @@  int kvm_arch_put_registers(CPUState *cpu, int level)
     if (ret < 0) {
         return ret;
     }
-    ret = kvm_put_vcpu_events(x86_cpu, level);
-    if (ret < 0) {
-        return ret;
-    }
     if (level >= KVM_PUT_RESET_STATE) {
         ret = kvm_put_mp_state(x86_cpu);
         if (ret < 0) {