mbox series

[00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation

Message ID 20240609154945.55332-1-nsaenz@amazon.com (mailing list archive)
Headers show
Series Introducing Core Building Blocks for Hyper-V VSM Emulation | expand

Message

Nicolas Saenz Julienne June 9, 2024, 3:49 p.m. UTC
This series introduces core KVM functionality necessary to emulate Hyper-V's
Virtual Secure Mode in a Virtual Machine Monitor (VMM).

Hyper-V's Virtual Secure Mode (VSM) is a virtualization security feature that
leverages the hypervisor to create secure execution environments within a
guest. VSM is documented as part of Microsoft's Hypervisor Top Level Functional
Specification [1]. Security features that build upon VSM, like Windows
Credential Guard, are enabled by default on Windows 11 and are becoming a
prerequisite in some industries.

VSM introduces the concept of Virtual Trust Levels (VTLs). These are
independent execution contexts, each with its own CPU architectural state,
local APIC state, and a different view of memory. They are hierarchical, with
more privileged VTLs having priority over the execution of lower VTLs and
control over lower VTLs' state. Windows leverages these low-level
paravirtualized primitives, as well as the hypervisor's higher trust base, to
prevent guest data exfiltration even when the operating system itself has been
compromised.

As discussed at LPC2023 and in our previous RFC [2], we decided to model each
VTL as a distinct KVM VM. With this approach, and the RWX memory attributes
introduced in this series, we have been able to implement VTL memory
protections in a non-intrusive way, using generic KVM APIs. Additionally, each
CPU's VTL is modeled as a distinct KVM vCPU, owned by the KVM VM tracking that
VTL's state. VTL awareness is fully removed from KVM, and the responsibility
for VTL-aware hypercalls, VTL scheduling, and state transfer is delegated to
userspace.

Series overview:
- 1-8: Introduce a number of Hyper-V hyper-calls, all of which are VTL-aware and
       expected to be handled in userspace. Additionally an new VTL-specifc MP
       state is introduced.
- 9-10: Pass the instruction length as part of the userspace fault exit data
        in order to simplify VSM's secure intercept generation.
- 11-17: Introduce RWX memory attributes as well as extend userspace faults.
- 18: Introduces the main VSM CPUID bit which gates all VTL configuration and
      runtime hypercalls.

The series is accompanied by two repositories:
 - A PoC QEMU implementation of VSM [3]: This PoC VSM implementation is capable
   of booting Windows Server 2016 and 2019 with Credential Guard (CG) enabled
   on VMs of any size or vCPUs number. It's generally stable, but still sees
   its share of crashes. The PoC itself implements VSM interfaces to
   accommodate CG's needs, and it's by no means comprehensive. All in all,
   don't expect anything usable in production.

 - VSM kvm-unit-tests [4]: They cover all VSM hypercalls, as well as KVM APIs
   introduced by this series. But unfortunately depends on the QEMU
   implementation.

We mostly tested on an Intel machine, both with and without TDP. Basic tests
were also run on AMD (build and kvm-unit-tests). Please note that v2 will
include KVM self-tests to close the testing gap, and allow merging this while
we work on the userspace bits.

The series is based on 'kvm/master', that is, commit db574f2f96d0, and also
available in github [5].

This series also serves as a call-out to anyone interested in collaborating. We
have a proven design, a working PoC, and hopefully a path forward to merge
these KVM APIs. There is plenty to do in both QEMU and KVM still, I'll post a
list of ideas in the future. Feel free to get in touch!

Thanks,
Nicolas

[1] https://raw.githubusercontent.com/Microsoft/Virtualization-Documentation/master/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v6.0b.pdf
[2] https://lore.kernel.org/lkml/20231108111806.92604-1-nsaenz@amazon.com/
[3] https://github.com/vianpl/qemu/tree/vsm-v1
[4] https://github.com/vianpl/kvm-unit-tests/tree/vsm-v1
[4] https://github.com/vianpl/linux/tree/vsm-v1

---

Anish Moorthy (1):
  KVM: Define and communicate KVM_EXIT_MEMORY_FAULT RWX flags to
    userspace

Nicolas Saenz Julienne (17):
  KVM: x86: hyper-v: Introduce XMM output support
  KVM: x86: hyper-v: Introduce helpers to check if VSM is exposed to
    guest
  hyperv-tlfs: Update struct hv_send_ipi{_ex}'s declarations
  KVM: x86: hyper-v: Introduce VTL awareness to Hyper-V's PV-IPIs
  KVM: x86: hyper-v: Introduce MP_STATE_HV_INACTIVE_VTL
  KVM: x86: hyper-v: Exit on Get/SetVpRegisters hcall
  KVM: x86: hyper-v: Exit on TranslateVirtualAddress hcall
  KVM: x86: hyper-v: Exit on StartVirtualProcessor and
    GetVpIndexFromApicId hcalls
  KVM: x86: Keep track of instruction length during faults
  KVM: x86: Pass the instruction length on memory fault user-space exits
  KVM: x86/mmu: Introduce infrastructure to handle non-executable
    mappings
  KVM: x86/mmu: Avoid warning when installing non-private memory
    attributes
  KVM: x86/mmu: Init memslot if memory attributes available
  KVM: Introduce RWX memory attributes
  KVM: x86: Take mem attributes into account when faulting memory
  KVM: Introduce traces to track memory attributes modification.
  KVM: x86: hyper-v: Handle VSM hcalls in user-space

 Documentation/virt/kvm/api.rst     | 107 +++++++++++++++++++++++-
 arch/x86/hyperv/hv_apic.c          |   3 +-
 arch/x86/include/asm/hyperv-tlfs.h |   2 +-
 arch/x86/kvm/Kconfig               |   1 +
 arch/x86/kvm/hyperv.c              | 127 +++++++++++++++++++++++++++--
 arch/x86/kvm/hyperv.h              |  18 ++++
 arch/x86/kvm/mmu/mmu.c             |  91 +++++++++++++++++----
 arch/x86/kvm/mmu/mmu_internal.h    |   9 +-
 arch/x86/kvm/mmu/mmutrace.h        |  29 +++++++
 arch/x86/kvm/mmu/paging_tmpl.h     |   2 +-
 arch/x86/kvm/mmu/tdp_mmu.c         |   8 +-
 arch/x86/kvm/svm/svm.c             |   7 +-
 arch/x86/kvm/vmx/vmx.c             |  23 +++++-
 arch/x86/kvm/x86.c                 |  17 +++-
 include/asm-generic/hyperv-tlfs.h  |  16 +++-
 include/linux/kvm_host.h           |  45 +++++++++-
 include/trace/events/kvm.h         |  20 +++++
 include/uapi/linux/kvm.h           |  15 ++++
 virt/kvm/kvm_main.c                |  35 +++++++-
 19 files changed, 527 insertions(+), 48 deletions(-)

Comments

Nicolas Saenz Julienne July 3, 2024, 9:55 a.m. UTC | #1
Hi Sean,

On Sun Jun 9, 2024 at 3:49 PM UTC, Nicolas Saenz Julienne wrote:
> This series introduces core KVM functionality necessary to emulate Hyper-V's
> Virtual Secure Mode in a Virtual Machine Monitor (VMM).

Just wanted to make sure the series is in your radar.

Thanks,
Nicolas
Vitaly Kuznetsov July 3, 2024, 12:48 p.m. UTC | #2
Nicolas Saenz Julienne <nsaenz@amazon.com> writes:

> Hi Sean,
>
> On Sun Jun 9, 2024 at 3:49 PM UTC, Nicolas Saenz Julienne wrote:
>> This series introduces core KVM functionality necessary to emulate Hyper-V's
>> Virtual Secure Mode in a Virtual Machine Monitor (VMM).
>
> Just wanted to make sure the series is in your radar.
>

Not Sean here but I was planning to take a look at least at Hyper-V
parts of it next week.
Nicolas Saenz Julienne July 3, 2024, 1:18 p.m. UTC | #3
Hi Vitaly,

On Wed Jul 3, 2024 at 12:48 PM UTC, Vitaly Kuznetsov wrote:
> Nicolas Saenz Julienne <nsaenz@amazon.com> writes:
>
> > Hi Sean,
> >
> > On Sun Jun 9, 2024 at 3:49 PM UTC, Nicolas Saenz Julienne wrote:
> >> This series introduces core KVM functionality necessary to emulate Hyper-V's
> >> Virtual Secure Mode in a Virtual Machine Monitor (VMM).
> >
> > Just wanted to make sure the series is in your radar.
> >
>
> Not Sean here but I was planning to take a look at least at Hyper-V
> parts of it next week.

Thanks for the update.

Nicolas
Sean Christopherson Sept. 13, 2024, 7:19 p.m. UTC | #4
On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote:
> This series introduces core KVM functionality necessary to emulate Hyper-V's
> Virtual Secure Mode in a Virtual Machine Monitor (VMM).

...

> As discussed at LPC2023 and in our previous RFC [2], we decided to model each
> VTL as a distinct KVM VM. With this approach, and the RWX memory attributes
> introduced in this series, we have been able to implement VTL memory
> protections in a non-intrusive way, using generic KVM APIs. Additionally, each
> CPU's VTL is modeled as a distinct KVM vCPU, owned by the KVM VM tracking that
> VTL's state. VTL awareness is fully removed from KVM, and the responsibility
> for VTL-aware hypercalls, VTL scheduling, and state transfer is delegated to
> userspace.
> 
> Series overview:
> - 1-8: Introduce a number of Hyper-V hyper-calls, all of which are VTL-aware and
>        expected to be handled in userspace. Additionally an new VTL-specifc MP
>        state is introduced.
> - 9-10: Pass the instruction length as part of the userspace fault exit data
>         in order to simplify VSM's secure intercept generation.
> - 11-17: Introduce RWX memory attributes as well as extend userspace faults.
> - 18: Introduces the main VSM CPUID bit which gates all VTL configuration and
>       runtime hypercalls.

Aside from the RWX attributes, which to no one's surprise will need a lot of work
to get them performant and functional, are there any "big" TODO items that you see
in KVM?

If this series is more or less code complete, IMO modeling VTLs as distinct VM
structures is a clear win.  Except for the "idle VTL" stuff, which I think we can
simplify, this series is quite boring, and I mean that in the best possible way :-)
Nicolas Saenz Julienne Sept. 16, 2024, 4:32 p.m. UTC | #5
On Fri Sep 13, 2024 at 7:19 PM UTC, Sean Christopherson wrote:
> On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote:
> > This series introduces core KVM functionality necessary to emulate Hyper-V's
> > Virtual Secure Mode in a Virtual Machine Monitor (VMM).
>
> ...
>
> > As discussed at LPC2023 and in our previous RFC [2], we decided to model each
> > VTL as a distinct KVM VM. With this approach, and the RWX memory attributes
> > introduced in this series, we have been able to implement VTL memory
> > protections in a non-intrusive way, using generic KVM APIs. Additionally, each
> > CPU's VTL is modeled as a distinct KVM vCPU, owned by the KVM VM tracking that
> > VTL's state. VTL awareness is fully removed from KVM, and the responsibility
> > for VTL-aware hypercalls, VTL scheduling, and state transfer is delegated to
> > userspace.
> >
> > Series overview:
> > - 1-8: Introduce a number of Hyper-V hyper-calls, all of which are VTL-aware and
> >        expected to be handled in userspace. Additionally an new VTL-specifc MP
> >        state is introduced.
> > - 9-10: Pass the instruction length as part of the userspace fault exit data
> >         in order to simplify VSM's secure intercept generation.
> > - 11-17: Introduce RWX memory attributes as well as extend userspace faults.
> > - 18: Introduces the main VSM CPUID bit which gates all VTL configuration and
> >       runtime hypercalls.
>
> Aside from the RWX attributes, which to no one's surprise will need a lot of work
> to get them performant and functional, are there any "big" TODO items that you see
> in KVM?

Aside from VTLs and VTL switching, there is bunch of KVM features we
still need to be fully compliant with the VSM spec:
- KVM_TRANSLATE2, which Nikolas Wipper posted a week ago [1].
  Technically we can do this in user-space, but it's way simpler to
  re-use KVM's page-walker.

- Hv's TlbFlushInhibit, it allows VTL1 to block VTL0 vCPUs from issuing
  TLB Flushes, and blocks them until uninhibited. Note this only applies
  to para-virtualized TLB flushes:
  HvFlushVirtualAddress{Space,SpaceEx,List,ListEx}, so it's 100% Hyper-V
  specific.

- CPU register pinning/intecepting, we plan on reusing what HEKI
  proposed some time ago, and expose it through an IOCTL using ONE_REG
  to represent registers.

- MBEC aware memory attributes, we don't plan on enabling support for
  these with the first RWX memattrs submission. We'll do it as a follow
  up, especially as not every Windows VBS feature requires it
  (Credential Guard doesn't need it, HVCI does).

> If this series is more or less code complete, IMO modeling VTLs as distinct VM
> structures is a clear win.

I agree.

> Except for the "idle VTL" stuff, which I think we can simplify, this
> series is quite boring, and I mean that in the best possible way :-)

:)

Thanks,
Nicolas

[1] https://lore.kernel.org/kvm/20240910152207.38974-1-nikwip@amazon.de