Message ID | 20220708212106.325260-1-pcc@google.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: arm64: support MTE in protected VMs | expand |
On Fri, Jul 08 2022, Peter Collingbourne <pcc@google.com> wrote: > Hi, > > This patch series contains a proposed extension to pKVM that allows MTE > to be exposed to the protected guests. It is based on the base pKVM > series previously sent to the list [1] and later rebased to 5.19-rc3 > and uploaded to [2]. > > This series takes precautions against host compromise of the guests > via direct access to their tag storage, by preventing the host from > accessing the tag storage via stage 2 page tables. The device tree > must describe the physical memory address of the tag storage, if any, > and the memory nodes must declare that the tag storage location is > described. Otherwise, the MTE feature is disabled in protected guests. > > Now that we can easily do so, we also prevent the host from accessing > any unmapped reserved-memory regions without a driver, as the host > has no business accessing that memory. > > A proposed extension to the devicetree specification is available at > [3], a patched version of QEMU that produces the required device tree > nodes is available at [4] and a patched version of the crosvm hypervisor > that enables MTE is available at [5]. I'm unsure how this is supposed to work with QEMU + KVM, as your QEMU patch adds mte-alloc properties to regions that are exposed as a separate address space (which will not work with KVM). Is the magic in that new shared section? > > v2: > - refcount the PTEs owned by NOBODY > > [1] https://lore.kernel.org/all/20220519134204.5379-1-will@kernel.org/ > [2] https://android-kvm.googlesource.com/linux/ for-upstream/pkvm-base-v2 > [3] https://github.com/pcc/devicetree-specification mte-alloc > [4] https://github.com/pcc/qemu mte-shared-alloc > [5] https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3719324 > > Peter Collingbourne (3): > KVM: arm64: add a hypercall for disowning pages > KVM: arm64: disown unused reserved-memory regions > KVM: arm64: allow MTE in protected VMs if the tag storage is known > > arch/arm64/include/asm/kvm_asm.h | 1 + > arch/arm64/include/asm/kvm_host.h | 6 ++ > arch/arm64/include/asm/kvm_pkvm.h | 4 +- > arch/arm64/kernel/image-vars.h | 3 + > arch/arm64/kvm/arm.c | 83 ++++++++++++++++++- > arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 + > arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 1 + > arch/arm64/kvm/hyp/nvhe/hyp-main.c | 9 ++ > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 11 +++ > arch/arm64/kvm/hyp/nvhe/pkvm.c | 8 +- > arch/arm64/kvm/mmu.c | 4 +- > 11 files changed, 123 insertions(+), 8 deletions(-)
On Tue, Jul 19, 2022 at 7:50 AM Cornelia Huck <cohuck@redhat.com> wrote: > > On Fri, Jul 08 2022, Peter Collingbourne <pcc@google.com> wrote: > > > Hi, > > > > This patch series contains a proposed extension to pKVM that allows MTE > > to be exposed to the protected guests. It is based on the base pKVM > > series previously sent to the list [1] and later rebased to 5.19-rc3 > > and uploaded to [2]. > > > > This series takes precautions against host compromise of the guests > > via direct access to their tag storage, by preventing the host from > > accessing the tag storage via stage 2 page tables. The device tree > > must describe the physical memory address of the tag storage, if any, > > and the memory nodes must declare that the tag storage location is > > described. Otherwise, the MTE feature is disabled in protected guests. > > > > Now that we can easily do so, we also prevent the host from accessing > > any unmapped reserved-memory regions without a driver, as the host > > has no business accessing that memory. > > > > A proposed extension to the devicetree specification is available at > > [3], a patched version of QEMU that produces the required device tree > > nodes is available at [4] and a patched version of the crosvm hypervisor > > that enables MTE is available at [5]. > > I'm unsure how this is supposed to work with QEMU + KVM, as your QEMU > patch adds mte-alloc properties to regions that are exposed as a > separate address space (which will not work with KVM). Is the magic in > that new shared section? Hi Cornelia, The intent is that the mte-alloc property may be set on memory whose allocation tag storage is not directly accessible via physical memory, since in this case there is no need for the hypervisor to do anything to protect allocation tag storage before exposing MTE to guests. In the case of QEMU + KVM, I would expect the emulated system to not expose the allocation tag storage directly, in which case it would be able to set mte-alloc on all memory nodes without further action, exactly as my patch implements for TCG. With the interface as proposed, QEMU would need to reject the mte-shared-alloc option when KVM is enabled, as there is currently no mechanism for KVM-accelerated virtualized tag storage. Note that these properties are only relevant for guest kernels running under an emulated EL2 in which pKVM could conceivably run, which means that the host would need to implement FEAT_NV2. As far as I know there is currently no support for NV2 neither in QEMU TCG nor in the Linux kernel, and I'm unaware of any available hardware that supports both NV2 and MTE, so it'll be a while before any of this becomes relevant. Peter
On Tue, Jul 19 2022, Peter Collingbourne <pcc@google.com> wrote: > On Tue, Jul 19, 2022 at 7:50 AM Cornelia Huck <cohuck@redhat.com> wrote: >> >> On Fri, Jul 08 2022, Peter Collingbourne <pcc@google.com> wrote: >> >> > Hi, >> > >> > This patch series contains a proposed extension to pKVM that allows MTE >> > to be exposed to the protected guests. It is based on the base pKVM >> > series previously sent to the list [1] and later rebased to 5.19-rc3 >> > and uploaded to [2]. >> > >> > This series takes precautions against host compromise of the guests >> > via direct access to their tag storage, by preventing the host from >> > accessing the tag storage via stage 2 page tables. The device tree >> > must describe the physical memory address of the tag storage, if any, >> > and the memory nodes must declare that the tag storage location is >> > described. Otherwise, the MTE feature is disabled in protected guests. >> > >> > Now that we can easily do so, we also prevent the host from accessing >> > any unmapped reserved-memory regions without a driver, as the host >> > has no business accessing that memory. >> > >> > A proposed extension to the devicetree specification is available at >> > [3], a patched version of QEMU that produces the required device tree >> > nodes is available at [4] and a patched version of the crosvm hypervisor >> > that enables MTE is available at [5]. >> >> I'm unsure how this is supposed to work with QEMU + KVM, as your QEMU >> patch adds mte-alloc properties to regions that are exposed as a >> separate address space (which will not work with KVM). Is the magic in >> that new shared section? > > Hi Cornelia, > > The intent is that the mte-alloc property may be set on memory whose > allocation tag storage is not directly accessible via physical memory, > since in this case there is no need for the hypervisor to do anything > to protect allocation tag storage before exposing MTE to guests. In > the case of QEMU + KVM, I would expect the emulated system to not > expose the allocation tag storage directly, in which case it would be > able to set mte-alloc on all memory nodes without further action, > exactly as my patch implements for TCG. With the interface as > proposed, QEMU would need to reject the mte-shared-alloc option when > KVM is enabled, as there is currently no mechanism for KVM-accelerated > virtualized tag storage. Ok, that makes sense. > > Note that these properties are only relevant for guest kernels running > under an emulated EL2 in which pKVM could conceivably run, which means > that the host would need to implement FEAT_NV2. As far as I know there > is currently no support for NV2 neither in QEMU TCG nor in the Linux > kernel, and I'm unaware of any available hardware that supports both > NV2 and MTE, so it'll be a while before any of this becomes relevant. Nod. I'm mostly interested because I wanted to figure out how this feature might interact with enabling MTE for QEMU+KVM. I'll keep it in mind. Thanks!