mbox series

[v6,00/17] Introducing AMD x2AVIC and hybrid-AVIC modes

Message ID 20220519102709.24125-1-suravee.suthikulpanit@amd.com (mailing list archive)
Headers show
Series Introducing AMD x2AVIC and hybrid-AVIC modes | expand

Message

Suravee Suthikulpanit May 19, 2022, 10:26 a.m. UTC
Introducing support for AMD x2APIC virtualization. This feature is
indicated by the CPUID Fn8000_000A EDX[14], and it can be activated
by setting bit 31 (enable AVIC) and bit 30 (x2APIC mode) of VMCB
offset 60h.

With x2AVIC support, the guest local APIC can be fully virtualized in
both xAPIC and x2APIC modes, and the mode can be changed during runtime.
For example, when AVIC is enabled, the hypervisor set VMCB bit 31
to activate AVIC for each vCPU. Then, it keeps track of each vCPU's
APIC mode, and updates VMCB bit 30 to enable/disable x2APIC
virtualization mode accordingly.

Besides setting bit VMCB bit 30 and 31, for x2AVIC, kvm_amd driver needs
to disable interception for the x2APIC MSR range to allow AVIC hardware
to virtualize register accesses.

This series also introduce a partial APIC virtualization (hybrid-AVIC)
mode, where APIC register accesses are trapped (i.e. not virtualized
by hardware), but leverage AVIC doorbell for interrupt injection.
This eliminates need to disable x2APIC in the guest on system without
x2AVIC support. (Note: suggested by Maxim)

Testing for v5:
  * Test partial AVIC mode by launching a VM with x2APIC mode
  * Tested booting a Linux VM with x2APIC physical and logical modes upto 512 vCPUs.
  * Test the following nested SVM test use cases:

             L0     |    L1   |   L2
       ----------------------------------
               AVIC |    APIC |    APIC
               AVIC |    APIC |  x2APIC
        hybrid-AVIC |  x2APIC |    APIC
        hybrid-AVIC |  x2APIC |  x2APIC
             x2AVIC |    APIC |    APIC
             x2AVIC |    APIC |  x2APIC
             x2AVIC |  x2APIC |    APIC
             x2AVIC |  x2APIC |  x2APIC

Changes from v5:
(https://lore.kernel.org/lkml/20220518162652.100493-1-suravee.suthikulpanit@amd.com/T/#t)
  * Re-order patch 16 to 10
  * Patch 11: Update commit message

Changes from v4:
(https://lore.kernel.org/lkml/20220508023930.12881-5-suravee.suthikulpanit@amd.com/T/)
  * Patch  3: Move enum_avic_modes definition to svm.h
  * Patch 10: Rename avic_set_x2apic_msr_interception to
              svm_set_x2apic_msr_interception and move it to svm.c
              to simplify the struct svm_direct_access_msrs declaration.
  * Patch 16: New from Maxim 
  * Patch 17: New from Maxim 

Best Regards,
Suravee

Maxim Levitsky (2):
  KVM: x86: nSVM: always intercept x2apic msrs
  KVM: x86: nSVM: optimize svm_set_x2apic_msr_interception

Suravee Suthikulpanit (15):
  x86/cpufeatures: Introduce x2AVIC CPUID bit
  KVM: x86: lapic: Rename [GET/SET]_APIC_DEST_FIELD to
    [GET/SET]_XAPIC_DEST_FIELD
  KVM: SVM: Detect X2APIC virtualization (x2AVIC) support
  KVM: SVM: Update max number of vCPUs supported for x2AVIC mode
  KVM: SVM: Update avic_kick_target_vcpus to support 32-bit APIC ID
  KVM: SVM: Do not support updating APIC ID when in x2APIC mode
  KVM: SVM: Adding support for configuring x2APIC MSRs interception
  KVM: x86: Deactivate APICv on vCPU with APIC disabled
  KVM: SVM: Refresh AVIC configuration when changing APIC mode
  KVM: SVM: Introduce logic to (de)activate x2AVIC mode
  KVM: SVM: Do not throw warning when calling avic_vcpu_load on a
    running vcpu
  KVM: SVM: Introduce hybrid-AVIC mode
  KVM: x86: Warning APICv inconsistency only when vcpu APIC mode is
    valid
  KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible
  KVM: SVM: Add AVIC doorbell tracepoint

 arch/x86/hyperv/hv_apic.c          |   2 +-
 arch/x86/include/asm/apicdef.h     |   4 +-
 arch/x86/include/asm/cpufeatures.h |   1 +
 arch/x86/include/asm/kvm_host.h    |   1 -
 arch/x86/include/asm/svm.h         |  16 ++-
 arch/x86/kernel/apic/apic.c        |   2 +-
 arch/x86/kernel/apic/ipi.c         |   2 +-
 arch/x86/kvm/lapic.c               |   6 +-
 arch/x86/kvm/svm/avic.c            | 178 ++++++++++++++++++++++++++---
 arch/x86/kvm/svm/nested.c          |   5 +
 arch/x86/kvm/svm/svm.c             |  75 ++++++++----
 arch/x86/kvm/svm/svm.h             |  25 +++-
 arch/x86/kvm/trace.h               |  18 +++
 arch/x86/kvm/x86.c                 |   8 +-
 14 files changed, 291 insertions(+), 52 deletions(-)

Comments

Jim Mattson June 6, 2022, 11:05 p.m. UTC | #1
On Thu, May 19, 2022 at 3:32 AM Suravee Suthikulpanit
<suravee.suthikulpanit@amd.com> wrote:
>
> Introducing support for AMD x2APIC virtualization. This feature is
> indicated by the CPUID Fn8000_000A EDX[14], and it can be activated
> by setting bit 31 (enable AVIC) and bit 30 (x2APIC mode) of VMCB
> offset 60h.
>
> With x2AVIC support, the guest local APIC can be fully virtualized in
> both xAPIC and x2APIC modes, and the mode can be changed during runtime.
> For example, when AVIC is enabled, the hypervisor set VMCB bit 31
> to activate AVIC for each vCPU. Then, it keeps track of each vCPU's
> APIC mode, and updates VMCB bit 30 to enable/disable x2APIC
> virtualization mode accordingly.
>
> Besides setting bit VMCB bit 30 and 31, for x2AVIC, kvm_amd driver needs
> to disable interception for the x2APIC MSR range to allow AVIC hardware
> to virtualize register accesses.
>
> This series also introduce a partial APIC virtualization (hybrid-AVIC)
> mode, where APIC register accesses are trapped (i.e. not virtualized
> by hardware), but leverage AVIC doorbell for interrupt injection.
> This eliminates need to disable x2APIC in the guest on system without
> x2AVIC support. (Note: suggested by Maxim)
>
> Testing for v5:
>   * Test partial AVIC mode by launching a VM with x2APIC mode
>   * Tested booting a Linux VM with x2APIC physical and logical modes upto 512 vCPUs.
>   * Test the following nested SVM test use cases:
>
>              L0     |    L1   |   L2
>        ----------------------------------
>                AVIC |    APIC |    APIC
>                AVIC |    APIC |  x2APIC
>         hybrid-AVIC |  x2APIC |    APIC
>         hybrid-AVIC |  x2APIC |  x2APIC
>              x2AVIC |    APIC |    APIC
>              x2AVIC |    APIC |  x2APIC
>              x2AVIC |  x2APIC |    APIC
>              x2AVIC |  x2APIC |  x2APIC
>
> Changes from v5:
> (https://lore.kernel.org/lkml/20220518162652.100493-1-suravee.suthikulpanit@amd.com/T/#t)
>   * Re-order patch 16 to 10
>   * Patch 11: Update commit message
>
> Changes from v4:
> (https://lore.kernel.org/lkml/20220508023930.12881-5-suravee.suthikulpanit@amd.com/T/)
>   * Patch  3: Move enum_avic_modes definition to svm.h
>   * Patch 10: Rename avic_set_x2apic_msr_interception to
>               svm_set_x2apic_msr_interception and move it to svm.c
>               to simplify the struct svm_direct_access_msrs declaration.
>   * Patch 16: New from Maxim
>   * Patch 17: New from Maxim
>
> Best Regards,
> Suravee
>
> Maxim Levitsky (2):
>   KVM: x86: nSVM: always intercept x2apic msrs
>   KVM: x86: nSVM: optimize svm_set_x2apic_msr_interception
>
> Suravee Suthikulpanit (15):
>   x86/cpufeatures: Introduce x2AVIC CPUID bit
>   KVM: x86: lapic: Rename [GET/SET]_APIC_DEST_FIELD to
>     [GET/SET]_XAPIC_DEST_FIELD
>   KVM: SVM: Detect X2APIC virtualization (x2AVIC) support
>   KVM: SVM: Update max number of vCPUs supported for x2AVIC mode
>   KVM: SVM: Update avic_kick_target_vcpus to support 32-bit APIC ID
>   KVM: SVM: Do not support updating APIC ID when in x2APIC mode
>   KVM: SVM: Adding support for configuring x2APIC MSRs interception
>   KVM: x86: Deactivate APICv on vCPU with APIC disabled
>   KVM: SVM: Refresh AVIC configuration when changing APIC mode
>   KVM: SVM: Introduce logic to (de)activate x2AVIC mode
>   KVM: SVM: Do not throw warning when calling avic_vcpu_load on a
>     running vcpu
>   KVM: SVM: Introduce hybrid-AVIC mode
>   KVM: x86: Warning APICv inconsistency only when vcpu APIC mode is
>     valid
>   KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible
>   KVM: SVM: Add AVIC doorbell tracepoint
>
>  arch/x86/hyperv/hv_apic.c          |   2 +-
>  arch/x86/include/asm/apicdef.h     |   4 +-
>  arch/x86/include/asm/cpufeatures.h |   1 +
>  arch/x86/include/asm/kvm_host.h    |   1 -
>  arch/x86/include/asm/svm.h         |  16 ++-
>  arch/x86/kernel/apic/apic.c        |   2 +-
>  arch/x86/kernel/apic/ipi.c         |   2 +-
>  arch/x86/kvm/lapic.c               |   6 +-
>  arch/x86/kvm/svm/avic.c            | 178 ++++++++++++++++++++++++++---
>  arch/x86/kvm/svm/nested.c          |   5 +
>  arch/x86/kvm/svm/svm.c             |  75 ++++++++----
>  arch/x86/kvm/svm/svm.h             |  25 +++-
>  arch/x86/kvm/trace.h               |  18 +++
>  arch/x86/kvm/x86.c                 |   8 +-
>  14 files changed, 291 insertions(+), 52 deletions(-)
>
> --
> 2.25.1

When will we see this feature in silicon?

Where is the official documentation?
Paolo Bonzini June 24, 2022, 5:04 p.m. UTC | #2
On 5/19/22 12:26, Suravee Suthikulpanit wrote:
> Introducing support for AMD x2APIC virtualization. This feature is
> indicated by the CPUID Fn8000_000A EDX[14], and it can be activated
> by setting bit 31 (enable AVIC) and bit 30 (x2APIC mode) of VMCB
> offset 60h.
> 
> With x2AVIC support, the guest local APIC can be fully virtualized in
> both xAPIC and x2APIC modes, and the mode can be changed during runtime.
> For example, when AVIC is enabled, the hypervisor set VMCB bit 31
> to activate AVIC for each vCPU. Then, it keeps track of each vCPU's
> APIC mode, and updates VMCB bit 30 to enable/disable x2APIC
> virtualization mode accordingly.
> 
> Besides setting bit VMCB bit 30 and 31, for x2AVIC, kvm_amd driver needs
> to disable interception for the x2APIC MSR range to allow AVIC hardware
> to virtualize register accesses.
> 
> This series also introduce a partial APIC virtualization (hybrid-AVIC)
> mode, where APIC register accesses are trapped (i.e. not virtualized
> by hardware), but leverage AVIC doorbell for interrupt injection.
> This eliminates need to disable x2APIC in the guest on system without
> x2AVIC support. (Note: suggested by Maxim)
> 
> Testing for v5:
>    * Test partial AVIC mode by launching a VM with x2APIC mode
>    * Tested booting a Linux VM with x2APIC physical and logical modes upto 512 vCPUs.
>    * Test the following nested SVM test use cases:
> 
>               L0     |    L1   |   L2
>         ----------------------------------
>                 AVIC |    APIC |    APIC
>                 AVIC |    APIC |  x2APIC
>          hybrid-AVIC |  x2APIC |    APIC
>          hybrid-AVIC |  x2APIC |  x2APIC
>               x2AVIC |    APIC |    APIC
>               x2AVIC |    APIC |  x2APIC
>               x2AVIC |  x2APIC |    APIC
>               x2AVIC |  x2APIC |  x2APIC
> 
> Changes from v5:
> (https://lore.kernel.org/lkml/20220518162652.100493-1-suravee.suthikulpanit@amd.com/T/#t)
>    * Re-order patch 16 to 10
>    * Patch 11: Update commit message
> 
> Changes from v4:
> (https://lore.kernel.org/lkml/20220508023930.12881-5-suravee.suthikulpanit@amd.com/T/)
>    * Patch  3: Move enum_avic_modes definition to svm.h
>    * Patch 10: Rename avic_set_x2apic_msr_interception to
>                svm_set_x2apic_msr_interception and move it to svm.c
>                to simplify the struct svm_direct_access_msrs declaration.
>    * Patch 16: New from Maxim
>    * Patch 17: New from Maxim
> 
> Best Regards,
> Suravee
> 
> Maxim Levitsky (2):
>    KVM: x86: nSVM: always intercept x2apic msrs
>    KVM: x86: nSVM: optimize svm_set_x2apic_msr_interception
> 
> Suravee Suthikulpanit (15):
>    x86/cpufeatures: Introduce x2AVIC CPUID bit
>    KVM: x86: lapic: Rename [GET/SET]_APIC_DEST_FIELD to
>      [GET/SET]_XAPIC_DEST_FIELD
>    KVM: SVM: Detect X2APIC virtualization (x2AVIC) support
>    KVM: SVM: Update max number of vCPUs supported for x2AVIC mode
>    KVM: SVM: Update avic_kick_target_vcpus to support 32-bit APIC ID
>    KVM: SVM: Do not support updating APIC ID when in x2APIC mode
>    KVM: SVM: Adding support for configuring x2APIC MSRs interception
>    KVM: x86: Deactivate APICv on vCPU with APIC disabled
>    KVM: SVM: Refresh AVIC configuration when changing APIC mode
>    KVM: SVM: Introduce logic to (de)activate x2AVIC mode
>    KVM: SVM: Do not throw warning when calling avic_vcpu_load on a
>      running vcpu
>    KVM: SVM: Introduce hybrid-AVIC mode
>    KVM: x86: Warning APICv inconsistency only when vcpu APIC mode is
>      valid
>    KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible
>    KVM: SVM: Add AVIC doorbell tracepoint
> 
>   arch/x86/hyperv/hv_apic.c          |   2 +-
>   arch/x86/include/asm/apicdef.h     |   4 +-
>   arch/x86/include/asm/cpufeatures.h |   1 +
>   arch/x86/include/asm/kvm_host.h    |   1 -
>   arch/x86/include/asm/svm.h         |  16 ++-
>   arch/x86/kernel/apic/apic.c        |   2 +-
>   arch/x86/kernel/apic/ipi.c         |   2 +-
>   arch/x86/kvm/lapic.c               |   6 +-
>   arch/x86/kvm/svm/avic.c            | 178 ++++++++++++++++++++++++++---
>   arch/x86/kvm/svm/nested.c          |   5 +
>   arch/x86/kvm/svm/svm.c             |  75 ++++++++----
>   arch/x86/kvm/svm/svm.h             |  25 +++-
>   arch/x86/kvm/trace.h               |  18 +++
>   arch/x86/kvm/x86.c                 |   8 +-
>   14 files changed, 291 insertions(+), 52 deletions(-)
> 

I haven't quite finished reviewing this, but it passes both 
kvm-unit-tests and selftests so I pushed it to kvm/queue.

Paolo
Suravee Suthikulpanit June 28, 2022, 1:20 p.m. UTC | #3
Maxim,

On 5/19/2022 5:26 PM, Suravee Suthikulpanit wrote:
> Introducing support for AMD x2APIC virtualization. This feature is
> indicated by the CPUID Fn8000_000A EDX[14], and it can be activated
> by setting bit 31 (enable AVIC) and bit 30 (x2APIC mode) of VMCB
> offset 60h.
> 
> With x2AVIC support, the guest local APIC can be fully virtualized in
> both xAPIC and x2APIC modes, and the mode can be changed during runtime.
> For example, when AVIC is enabled, the hypervisor set VMCB bit 31
> to activate AVIC for each vCPU. Then, it keeps track of each vCPU's
> APIC mode, and updates VMCB bit 30 to enable/disable x2APIC
> virtualization mode accordingly.
> 
> Besides setting bit VMCB bit 30 and 31, for x2AVIC, kvm_amd driver needs
> to disable interception for the x2APIC MSR range to allow AVIC hardware
> to virtualize register accesses.
> 
> This series also introduce a partial APIC virtualization (hybrid-AVIC)
> mode, where APIC register accesses are trapped (i.e. not virtualized
> by hardware), but leverage AVIC doorbell for interrupt injection.
> This eliminates need to disable x2APIC in the guest on system without
> x2AVIC support. (Note: suggested by Maxim)
> 
> Testing for v5:
>    * Test partial AVIC mode by launching a VM with x2APIC mode
>    * Tested booting a Linux VM with x2APIC physical and logical modes upto 512 vCPUs.
>    * Test the following nested SVM test use cases:
> 
>               L0     |    L1   |   L2
>         ----------------------------------
>                 AVIC |    APIC |    APIC
>                 AVIC |    APIC |  x2APIC
>          hybrid-AVIC |  x2APIC |    APIC
>          hybrid-AVIC |  x2APIC |  x2APIC
>               x2AVIC |    APIC |    APIC
>               x2AVIC |    APIC |  x2APIC
>               x2AVIC |  x2APIC |    APIC
>               x2AVIC |  x2APIC |  x2APIC

With the commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base"),
APICV/AVIC is now inhibit when the guest kernel boots w/ option "nox2apic" or "x2apic_phys"
due to APICV_INHIBIT_REASON_APIC_ID_MODIFIED.

These cases used to work. In theory, we should be able to allow AVIC works in this case.
Is there a way to modify logic in kvm_lapic_xapic_id_updated() to allow these use cases
to work w/ APICv/AVIC?

Best Regards,
Suravee
Maxim Levitsky June 28, 2022, 1:43 p.m. UTC | #4
On Tue, 2022-06-28 at 20:20 +0700, Suthikulpanit, Suravee wrote:
> Maxim,
> 
> On 5/19/2022 5:26 PM, Suravee Suthikulpanit wrote:
> > Introducing support for AMD x2APIC virtualization. This feature is
> > indicated by the CPUID Fn8000_000A EDX[14], and it can be activated
> > by setting bit 31 (enable AVIC) and bit 30 (x2APIC mode) of VMCB
> > offset 60h.
> > 
> > With x2AVIC support, the guest local APIC can be fully virtualized in
> > both xAPIC and x2APIC modes, and the mode can be changed during runtime.
> > For example, when AVIC is enabled, the hypervisor set VMCB bit 31
> > to activate AVIC for each vCPU. Then, it keeps track of each vCPU's
> > APIC mode, and updates VMCB bit 30 to enable/disable x2APIC
> > virtualization mode accordingly.
> > 
> > Besides setting bit VMCB bit 30 and 31, for x2AVIC, kvm_amd driver needs
> > to disable interception for the x2APIC MSR range to allow AVIC hardware
> > to virtualize register accesses.
> > 
> > This series also introduce a partial APIC virtualization (hybrid-AVIC)
> > mode, where APIC register accesses are trapped (i.e. not virtualized
> > by hardware), but leverage AVIC doorbell for interrupt injection.
> > This eliminates need to disable x2APIC in the guest on system without
> > x2AVIC support. (Note: suggested by Maxim)
> > 
> > Testing for v5:
> >    * Test partial AVIC mode by launching a VM with x2APIC mode
> >    * Tested booting a Linux VM with x2APIC physical and logical modes upto 512 vCPUs.
> >    * Test the following nested SVM test use cases:
> > 
> >               L0     |    L1   |   L2
> >         ----------------------------------
> >                 AVIC |    APIC |    APIC
> >                 AVIC |    APIC |  x2APIC
> >          hybrid-AVIC |  x2APIC |    APIC
> >          hybrid-AVIC |  x2APIC |  x2APIC
> >               x2AVIC |    APIC |    APIC
> >               x2AVIC |    APIC |  x2APIC
> >               x2AVIC |  x2APIC |    APIC
> >               x2AVIC |  x2APIC |  x2APIC
> 
> With the commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base"),
> APICV/AVIC is now inhibit when the guest kernel boots w/ option "nox2apic" or "x2apic_phys"
> due to APICV_INHIBIT_REASON_APIC_ID_MODIFIED.
> 
> These cases used to work. In theory, we should be able to allow AVIC works in this case.
> Is there a way to modify logic in kvm_lapic_xapic_id_updated() to allow these use cases
> to work w/ APICv/AVIC?
> 
> Best Regards,
> Suravee
> 

This seems very strange, I assume you test the kvm/queue of today,

which contains a fix for a typo I had in the list of inhibit reasons
(commit 5bdae49fc2f689b5f896b54bd9230425d3643dab - KVM: SEV: fix misplaced closing parenthesis)


Could you share more details on the test? How many vCPUs in the guest, is x2apic exposed to the guest?


Looking through the code the the __x2apic_disable, touches the MSR_IA32_APICBASE so I would expect
the APICV_INHIBIT_REASON_APIC_BASE_MODIFIED inhibit to be triggered and not APICV_INHIBIT_REASON_APIC_ID_MODIFIED


I don't see yet how the x2apic_phys can trigger these inhibits.

Best regards,
	Maxim Levitsky
Suravee Suthikulpanit June 28, 2022, 4:34 p.m. UTC | #5
Maxim,

On 6/28/2022 8:43 PM, Maxim Levitsky wrote:
>> With the commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base"),
>> APICV/AVIC is now inhibit when the guest kernel boots w/ option "nox2apic" or "x2apic_phys"
>> due to APICV_INHIBIT_REASON_APIC_ID_MODIFIED.
>>
>> These cases used to work. In theory, we should be able to allow AVIC works in this case.
>> Is there a way to modify logic in kvm_lapic_xapic_id_updated() to allow these use cases
>> to work w/ APICv/AVIC?
>>
>> Best Regards,
>> Suravee
>>
> This seems very strange, I assume you test the kvm/queue of today,

Yes

> which contains a fix for a typo I had in the list of inhibit reasons
> (commit 5bdae49fc2f689b5f896b54bd9230425d3643dab - KVM: SEV: fix misplaced closing parenthesis)

Yes

> Could you share more details on the test? How many vCPUs in the guest, is x2apic exposed to the guest?

With the problem happens w/ 257 vCPUs or more (i.e. vcpu ID 0x100).

> Looking through the code the the __x2apic_disable, touches the MSR_IA32_APICBASE so I would expect
> the APICV_INHIBIT_REASON_APIC_BASE_MODIFIED inhibit to be triggered and not APICV_INHIBIT_REASON_APIC_ID_MODIFIED
> 
> 
> I don't see yet how the x2apic_phys can trigger these inhibits.

When I add WARN_ON_ONCE at the point when we set the APICV_INHIBIT_REASON_APIC_ID_MODIFIED,
I get this call stack.

  11 [  105.470685] ------------[ cut here ]------------
  12 [  105.470686] WARNING: CPU: 279 PID: 11511 at arch/x86/kvm/lapic.c:2057 kvm_lapic_xapic_id_updated.cold+0x13/0x2f [kvm]
  13 [  105.470769] Modules linked in: kvm_amd kvm irqbypass nf_tables nfnetlink bridge stp llc squashfs loop vfat fat dm_multipath intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd wmi_bmof sg ipmi_ssif dm_mod acpi_ipmi ccp k10temp ipmi_si acpi_cpufreq sch_fq_codel ipmi_devintf ipmi_msghandler fuse ip_tables ext4 mbcache jbd2 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy as    ync_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear ast 
i2c_algo_bit drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul ses crc32c_intel drm_kms_helper enclosure ghash_clmulni_intel nvme sd_mod scsi_transport_sas syscopyarea aesni_intel sysfillrect crypto_simd nvme_core sysimgblt cryptd t10_pi fb_sys_fops tg3 uas crc64_rocksoft_generic i2c_designware_platform ptp crc64_rocksoft drm     i2c_piix4 i2c_designware_core usb_storage pps_core crc64 wmi pinctrl_amd i2c_core
  14 [  105.470851] CPU: 279 PID: 11511 Comm: qemu-system-x86 Kdump: loaded Not tainted 5.19.0-rc1-kvm-queue-x2avic+ #38
  15 [  105.470856] Hardware name: AMD Corporation QUARTZ/QUARTZ, BIOS TQZ0080D 05/11/2022
  16 [  105.470858] RIP: 0010:kvm_lapic_xapic_id_updated.cold+0x13/0x2f [kvm]
  17 [  105.470906] Code: db 8f fd ff 48 c7 c7 8d e8 ca c0 e8 43 27 88 ce 31 c0 e9 f8 90 fd ff 48 c7 c6 00 6a ca c0 48 c7 c7 e5 e8 ca c0 e8 29 27 88 ce <0f> 0b 48 8b 83 90 00 00 00 ba 01 00 00 00 be 04 00 00 00 5b 48 8b
  18 [  105.470909] RSP: 0018:ffffb13a436d7d40 EFLAGS: 00010246
  19 [  105.470913] RAX: 0000000000000030 RBX: ffff9f0372c98400 RCX: 0000000000000000
  20 [  105.470916] RDX: 0000000000000000 RSI: ffffffff8fd59e05 RDI: 00000000ffffffff
  21 [  105.470918] RBP: ffffb13a436d7e40 R08: 0000000000000030 R09: 0000000000000002
  22 [  105.470920] R10: 000000000000000f R11: ffff9f21c5c2fc80 R12: ffff9f0372c64250
  23 [  105.470921] R13: ffff9f0372c64250 R14: 00007f9ac9ffa2f0 R15: ffff9f0344da7000
  24 [  105.470930] FS:  00007f9ac9ffb640(0000) GS:ffff9f118edc0000(0000) knlGS:0000000000000000
  25 [  105.470932] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  26 [  105.470934] CR2: 00007fa34c73c001 CR3: 00000001b71a2003 CR4: 0000000000770ee0
  27 [  105.470936] PKRU: 55555554
  28 [  105.470938] Call Trace:
  29 [  105.470942]  <TASK>
  30 [  105.470945]  kvm_apic_state_fixup+0x85/0xb0 [kvm]
  31 [  105.471002]  kvm_arch_vcpu_ioctl+0xa01/0x14b0 [kvm]
  32 [  105.471080]  ? __local_bh_enable_ip+0x37/0x70
  33 [  105.471088]  ? copy_fpstate_to_sigframe+0x2f6/0x360
  34 [  105.471099]  ? mod_objcg_state+0xd2/0x360
  35 [  105.471109]  ? refill_obj_stock+0xb0/0x160
  36 [  105.471116]  ? kvm_vcpu_ioctl+0x4bc/0x680 [kvm]
  37 [  105.471156]  kvm_vcpu_ioctl+0x4bc/0x680 [kvm]
  38 [  105.471197]  __x64_sys_ioctl+0x83/0xb0
  39 [  105.471206]  do_syscall_64+0x3b/0x90
  40 [  105.471218]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
  41 [  105.471228] RIP: 0033:0x7fa356d19a2b
  42 [  105.471232] Code: ff ff ff 85 c0 79 8b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 f3 0f 00 f7 d8 64 89 01 48
  43 [  105.471235] RSP: 002b:00007f9ac9ffa248 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  44 [  105.471240] RAX: ffffffffffffffda RBX: 000000008400ae8e RCX: 00007fa356d19a2b
  45 [  105.471243] RDX: 00007f9ac9ffa2f0 RSI: ffffffff8400ae8e RDI: 000000000000010c
  46 [  105.471245] RBP: 0000561ce47ee560 R08: 0000561ce2351954 R09: 0000561ce2351c5c
  47 [  105.471248] R10: 00007f9ab80037b0 R11: 0000000000000246 R12: 00007f9ac9ffa2f0
  48 [  105.471266] R13: 00007f9ab80037b0 R14: fff0000000000000 R15: 00007f9ac97fb000
  49 [  105.471270]  </TASK>
  50 [  105.471272] ---[ end trace 0000000000000000 ]---

Best Regards,
Suravee
Maxim Levitsky June 29, 2022, 7:10 a.m. UTC | #6
On Tue, 2022-06-28 at 23:34 +0700, Suthikulpanit, Suravee wrote:
> Maxim,
> 
> On 6/28/2022 8:43 PM, Maxim Levitsky wrote:
> > > With the commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base"),
> > > APICV/AVIC is now inhibit when the guest kernel boots w/ option "nox2apic" or "x2apic_phys"
> > > due to APICV_INHIBIT_REASON_APIC_ID_MODIFIED.
> > > 
> > > These cases used to work. In theory, we should be able to allow AVIC works in this case.
> > > Is there a way to modify logic in kvm_lapic_xapic_id_updated() to allow these use cases
> > > to work w/ APICv/AVIC?
> > > 
> > > Best Regards,
> > > Suravee
> > > 
> > This seems very strange, I assume you test the kvm/queue of today,
> 
> Yes
> 
> > which contains a fix for a typo I had in the list of inhibit reasons
> > (commit 5bdae49fc2f689b5f896b54bd9230425d3643dab - KVM: SEV: fix misplaced closing parenthesis)
> 
> Yes
> 
> > Could you share more details on the test? How many vCPUs in the guest, is x2apic exposed to the guest?
> 
> With the problem happens w/ 257 vCPUs or more (i.e. vcpu ID 0x100).
> 
> > Looking through the code the the __x2apic_disable, touches the MSR_IA32_APICBASE so I would expect
> > the APICV_INHIBIT_REASON_APIC_BASE_MODIFIED inhibit to be triggered and not APICV_INHIBIT_REASON_APIC_ID_MODIFIED
> > 
> > 
> > I don't see yet how the x2apic_phys can trigger these inhibits.
> 
> When I add WARN_ON_ONCE at the point when we set the APICV_INHIBIT_REASON_APIC_ID_MODIFIED,
> I get this call stack.

Great, thanks for the info, now it all clear.

For > 255 vCPUs, it is not possible to have APIC_ID == vcpu_id, thus the check kvm_lapic_xapic_id_updated
should truncate the vcpu_id to 8 bit.

I'll send a patch to fix this, very soon.

In addition to that later we should check that both AVIC (I think it doesn't crash) and APICv doesn't crash in this case
(when a guest still attempts to enable APIC on vCPU > 254 (255 also can't be used for regular apic)).

Thanks,
Best regards,
	Maxim Levitsky

> 
>   11 [  105.470685] ------------[ cut here ]------------
>   12 [  105.470686] WARNING: CPU: 279 PID: 11511 at arch/x86/kvm/lapic.c:2057 kvm_lapic_xapic_id_updated.cold+0x13/0x2f [kvm]
>   13 [  105.470769] Modules linked in: kvm_amd kvm irqbypass nf_tables nfnetlink bridge stp llc squashfs loop vfat fat dm_multipath intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd wmi_bmof sg ipmi_ssif dm_mod acpi_ipmi ccp k10temp ipmi_si acpi_cpufreq sch_fq_codel ipmi_devintf ipmi_msghandler fuse ip_tables ext4 mbcache jbd2 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy as    ync_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear ast 
> i2c_algo_bit drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul ses crc32c_intel drm_kms_helper enclosure ghash_clmulni_intel nvme sd_mod scsi_transport_sas syscopyarea aesni_intel sysfillrect crypto_simd nvme_core sysimgblt cryptd t10_pi fb_sys_fops tg3 uas crc64_rocksoft_generic i2c_designware_platform ptp crc64_rocksoft drm     i2c_piix4 i2c_designware_core usb_storage pps_core crc64 wmi pinctrl_amd i2c_core
>   14 [  105.470851] CPU: 279 PID: 11511 Comm: qemu-system-x86 Kdump: loaded Not tainted 5.19.0-rc1-kvm-queue-x2avic+ #38
>   15 [  105.470856] Hardware name: AMD Corporation QUARTZ/QUARTZ, BIOS TQZ0080D 05/11/2022
>   16 [  105.470858] RIP: 0010:kvm_lapic_xapic_id_updated.cold+0x13/0x2f [kvm]
>   17 [  105.470906] Code: db 8f fd ff 48 c7 c7 8d e8 ca c0 e8 43 27 88 ce 31 c0 e9 f8 90 fd ff 48 c7 c6 00 6a ca c0 48 c7 c7 e5 e8 ca c0 e8 29 27 88 ce <0f> 0b 48 8b 83 90 00 00 00 ba 01 00 00 00 be 04 00 00 00 5b 48 8b
>   18 [  105.470909] RSP: 0018:ffffb13a436d7d40 EFLAGS: 00010246
>   19 [  105.470913] RAX: 0000000000000030 RBX: ffff9f0372c98400 RCX: 0000000000000000
>   20 [  105.470916] RDX: 0000000000000000 RSI: ffffffff8fd59e05 RDI: 00000000ffffffff
>   21 [  105.470918] RBP: ffffb13a436d7e40 R08: 0000000000000030 R09: 0000000000000002
>   22 [  105.470920] R10: 000000000000000f R11: ffff9f21c5c2fc80 R12: ffff9f0372c64250
>   23 [  105.470921] R13: ffff9f0372c64250 R14: 00007f9ac9ffa2f0 R15: ffff9f0344da7000
>   24 [  105.470930] FS:  00007f9ac9ffb640(0000) GS:ffff9f118edc0000(0000) knlGS:0000000000000000
>   25 [  105.470932] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   26 [  105.470934] CR2: 00007fa34c73c001 CR3: 00000001b71a2003 CR4: 0000000000770ee0
>   27 [  105.470936] PKRU: 55555554
>   28 [  105.470938] Call Trace:
>   29 [  105.470942]  <TASK>
>   30 [  105.470945]  kvm_apic_state_fixup+0x85/0xb0 [kvm]
>   31 [  105.471002]  kvm_arch_vcpu_ioctl+0xa01/0x14b0 [kvm]
>   32 [  105.471080]  ? __local_bh_enable_ip+0x37/0x70
>   33 [  105.471088]  ? copy_fpstate_to_sigframe+0x2f6/0x360
>   34 [  105.471099]  ? mod_objcg_state+0xd2/0x360
>   35 [  105.471109]  ? refill_obj_stock+0xb0/0x160
>   36 [  105.471116]  ? kvm_vcpu_ioctl+0x4bc/0x680 [kvm]
>   37 [  105.471156]  kvm_vcpu_ioctl+0x4bc/0x680 [kvm]
>   38 [  105.471197]  __x64_sys_ioctl+0x83/0xb0
>   39 [  105.471206]  do_syscall_64+0x3b/0x90
>   40 [  105.471218]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
>   41 [  105.471228] RIP: 0033:0x7fa356d19a2b
>   42 [  105.471232] Code: ff ff ff 85 c0 79 8b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 f3 0f 00 f7 d8 64 89 01 48
>   43 [  105.471235] RSP: 002b:00007f9ac9ffa248 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>   44 [  105.471240] RAX: ffffffffffffffda RBX: 000000008400ae8e RCX: 00007fa356d19a2b
>   45 [  105.471243] RDX: 00007f9ac9ffa2f0 RSI: ffffffff8400ae8e RDI: 000000000000010c
>   46 [  105.471245] RBP: 0000561ce47ee560 R08: 0000561ce2351954 R09: 0000561ce2351c5c
>   47 [  105.471248] R10: 00007f9ab80037b0 R11: 0000000000000246 R12: 00007f9ac9ffa2f0
>   48 [  105.471266] R13: 00007f9ab80037b0 R14: fff0000000000000 R15: 00007f9ac97fb000
>   49 [  105.471270]  </TASK>
>   50 [  105.471272] ---[ end trace 0000000000000000 ]---
> 
> Best Regards,
> Suravee
>