mbox series

[v2,0/3] KVM: arm64: support MTE in protected VMs

Message ID 20220708212106.325260-1-pcc@google.com (mailing list archive)
Headers show
Series KVM: arm64: support MTE in protected VMs | expand

Message

Peter Collingbourne July 8, 2022, 9:21 p.m. UTC
Hi,

This patch series contains a proposed extension to pKVM that allows MTE
to be exposed to the protected guests. It is based on the base pKVM
series previously sent to the list [1] and later rebased to 5.19-rc3
and uploaded to [2].

This series takes precautions against host compromise of the guests
via direct access to their tag storage, by preventing the host from
accessing the tag storage via stage 2 page tables. The device tree
must describe the physical memory address of the tag storage, if any,
and the memory nodes must declare that the tag storage location is
described. Otherwise, the MTE feature is disabled in protected guests.

Now that we can easily do so, we also prevent the host from accessing
any unmapped reserved-memory regions without a driver, as the host
has no business accessing that memory.

A proposed extension to the devicetree specification is available at
[3], a patched version of QEMU that produces the required device tree
nodes is available at [4] and a patched version of the crosvm hypervisor
that enables MTE is available at [5].

v2:
- refcount the PTEs owned by NOBODY

[1] https://lore.kernel.org/all/20220519134204.5379-1-will@kernel.org/
[2] https://android-kvm.googlesource.com/linux/ for-upstream/pkvm-base-v2
[3] https://github.com/pcc/devicetree-specification mte-alloc
[4] https://github.com/pcc/qemu mte-shared-alloc
[5] https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3719324

Peter Collingbourne (3):
  KVM: arm64: add a hypercall for disowning pages
  KVM: arm64: disown unused reserved-memory regions
  KVM: arm64: allow MTE in protected VMs if the tag storage is known

 arch/arm64/include/asm/kvm_asm.h              |  1 +
 arch/arm64/include/asm/kvm_host.h             |  6 ++
 arch/arm64/include/asm/kvm_pkvm.h             |  4 +-
 arch/arm64/kernel/image-vars.h                |  3 +
 arch/arm64/kvm/arm.c                          | 83 ++++++++++++++++++-
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  9 ++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 11 +++
 arch/arm64/kvm/hyp/nvhe/pkvm.c                |  8 +-
 arch/arm64/kvm/mmu.c                          |  4 +-
 11 files changed, 123 insertions(+), 8 deletions(-)

Comments

Cornelia Huck July 19, 2022, 2:50 p.m. UTC | #1
On Fri, Jul 08 2022, Peter Collingbourne <pcc@google.com> wrote:

> Hi,
>
> This patch series contains a proposed extension to pKVM that allows MTE
> to be exposed to the protected guests. It is based on the base pKVM
> series previously sent to the list [1] and later rebased to 5.19-rc3
> and uploaded to [2].
>
> This series takes precautions against host compromise of the guests
> via direct access to their tag storage, by preventing the host from
> accessing the tag storage via stage 2 page tables. The device tree
> must describe the physical memory address of the tag storage, if any,
> and the memory nodes must declare that the tag storage location is
> described. Otherwise, the MTE feature is disabled in protected guests.
>
> Now that we can easily do so, we also prevent the host from accessing
> any unmapped reserved-memory regions without a driver, as the host
> has no business accessing that memory.
>
> A proposed extension to the devicetree specification is available at
> [3], a patched version of QEMU that produces the required device tree
> nodes is available at [4] and a patched version of the crosvm hypervisor
> that enables MTE is available at [5].

I'm unsure how this is supposed to work with QEMU + KVM, as your QEMU
patch adds mte-alloc properties to regions that are exposed as a
separate address space (which will not work with KVM). Is the magic in
that new shared section?

>
> v2:
> - refcount the PTEs owned by NOBODY
>
> [1] https://lore.kernel.org/all/20220519134204.5379-1-will@kernel.org/
> [2] https://android-kvm.googlesource.com/linux/ for-upstream/pkvm-base-v2
> [3] https://github.com/pcc/devicetree-specification mte-alloc
> [4] https://github.com/pcc/qemu mte-shared-alloc
> [5] https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3719324
>
> Peter Collingbourne (3):
>   KVM: arm64: add a hypercall for disowning pages
>   KVM: arm64: disown unused reserved-memory regions
>   KVM: arm64: allow MTE in protected VMs if the tag storage is known
>
>  arch/arm64/include/asm/kvm_asm.h              |  1 +
>  arch/arm64/include/asm/kvm_host.h             |  6 ++
>  arch/arm64/include/asm/kvm_pkvm.h             |  4 +-
>  arch/arm64/kernel/image-vars.h                |  3 +
>  arch/arm64/kvm/arm.c                          | 83 ++++++++++++++++++-
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
>  arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  1 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  9 ++
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 11 +++
>  arch/arm64/kvm/hyp/nvhe/pkvm.c                |  8 +-
>  arch/arm64/kvm/mmu.c                          |  4 +-
>  11 files changed, 123 insertions(+), 8 deletions(-)
Peter Collingbourne July 20, 2022, 1:06 a.m. UTC | #2
On Tue, Jul 19, 2022 at 7:50 AM Cornelia Huck <cohuck@redhat.com> wrote:
>
> On Fri, Jul 08 2022, Peter Collingbourne <pcc@google.com> wrote:
>
> > Hi,
> >
> > This patch series contains a proposed extension to pKVM that allows MTE
> > to be exposed to the protected guests. It is based on the base pKVM
> > series previously sent to the list [1] and later rebased to 5.19-rc3
> > and uploaded to [2].
> >
> > This series takes precautions against host compromise of the guests
> > via direct access to their tag storage, by preventing the host from
> > accessing the tag storage via stage 2 page tables. The device tree
> > must describe the physical memory address of the tag storage, if any,
> > and the memory nodes must declare that the tag storage location is
> > described. Otherwise, the MTE feature is disabled in protected guests.
> >
> > Now that we can easily do so, we also prevent the host from accessing
> > any unmapped reserved-memory regions without a driver, as the host
> > has no business accessing that memory.
> >
> > A proposed extension to the devicetree specification is available at
> > [3], a patched version of QEMU that produces the required device tree
> > nodes is available at [4] and a patched version of the crosvm hypervisor
> > that enables MTE is available at [5].
>
> I'm unsure how this is supposed to work with QEMU + KVM, as your QEMU
> patch adds mte-alloc properties to regions that are exposed as a
> separate address space (which will not work with KVM). Is the magic in
> that new shared section?

Hi Cornelia,

The intent is that the mte-alloc property may be set on memory whose
allocation tag storage is not directly accessible via physical memory,
since in this case there is no need for the hypervisor to do anything
to protect allocation tag storage before exposing MTE to guests. In
the case of QEMU + KVM, I would expect the emulated system to not
expose the allocation tag storage directly, in which case it would be
able to set mte-alloc on all memory nodes without further action,
exactly as my patch implements for TCG. With the interface as
proposed, QEMU would need to reject the mte-shared-alloc option when
KVM is enabled, as there is currently no mechanism for KVM-accelerated
virtualized tag storage.

Note that these properties are only relevant for guest kernels running
under an emulated EL2 in which pKVM could conceivably run, which means
that the host would need to implement FEAT_NV2. As far as I know there
is currently no support for NV2 neither in QEMU TCG nor in the Linux
kernel, and I'm unaware of any available hardware that supports both
NV2 and MTE, so it'll be a while before any of this becomes relevant.

Peter
Cornelia Huck July 20, 2022, 4:21 p.m. UTC | #3
On Tue, Jul 19 2022, Peter Collingbourne <pcc@google.com> wrote:

> On Tue, Jul 19, 2022 at 7:50 AM Cornelia Huck <cohuck@redhat.com> wrote:
>>
>> On Fri, Jul 08 2022, Peter Collingbourne <pcc@google.com> wrote:
>>
>> > Hi,
>> >
>> > This patch series contains a proposed extension to pKVM that allows MTE
>> > to be exposed to the protected guests. It is based on the base pKVM
>> > series previously sent to the list [1] and later rebased to 5.19-rc3
>> > and uploaded to [2].
>> >
>> > This series takes precautions against host compromise of the guests
>> > via direct access to their tag storage, by preventing the host from
>> > accessing the tag storage via stage 2 page tables. The device tree
>> > must describe the physical memory address of the tag storage, if any,
>> > and the memory nodes must declare that the tag storage location is
>> > described. Otherwise, the MTE feature is disabled in protected guests.
>> >
>> > Now that we can easily do so, we also prevent the host from accessing
>> > any unmapped reserved-memory regions without a driver, as the host
>> > has no business accessing that memory.
>> >
>> > A proposed extension to the devicetree specification is available at
>> > [3], a patched version of QEMU that produces the required device tree
>> > nodes is available at [4] and a patched version of the crosvm hypervisor
>> > that enables MTE is available at [5].
>>
>> I'm unsure how this is supposed to work with QEMU + KVM, as your QEMU
>> patch adds mte-alloc properties to regions that are exposed as a
>> separate address space (which will not work with KVM). Is the magic in
>> that new shared section?
>
> Hi Cornelia,
>
> The intent is that the mte-alloc property may be set on memory whose
> allocation tag storage is not directly accessible via physical memory,
> since in this case there is no need for the hypervisor to do anything
> to protect allocation tag storage before exposing MTE to guests. In
> the case of QEMU + KVM, I would expect the emulated system to not
> expose the allocation tag storage directly, in which case it would be
> able to set mte-alloc on all memory nodes without further action,
> exactly as my patch implements for TCG. With the interface as
> proposed, QEMU would need to reject the mte-shared-alloc option when
> KVM is enabled, as there is currently no mechanism for KVM-accelerated
> virtualized tag storage.

Ok, that makes sense.

>
> Note that these properties are only relevant for guest kernels running
> under an emulated EL2 in which pKVM could conceivably run, which means
> that the host would need to implement FEAT_NV2. As far as I know there
> is currently no support for NV2 neither in QEMU TCG nor in the Linux
> kernel, and I'm unaware of any available hardware that supports both
> NV2 and MTE, so it'll be a while before any of this becomes relevant.

Nod.

I'm mostly interested because I wanted to figure out how this feature
might interact with enabling MTE for QEMU+KVM. I'll keep it in mind.

Thanks!