mbox series

[RFC,v2,00/26] KVM/arm64: A stage 2 for the host

Message ID 20210108121524.656872-1-qperret@google.com (mailing list archive)
Headers show
Series KVM/arm64: A stage 2 for the host | expand

Message

Quentin Perret Jan. 8, 2021, 12:14 p.m. UTC
Hi all,

This is the v2 of the series previously posted here:

  https://lore.kernel.org/kvmarm/20201117181607.1761516-1-qperret@google.com/

This basically allows us to wrap the host with a stage 2 when running in
nVHE, hence paving the way for protecting guest memory from the host in
the future (among other use-cases). For more details about the
motivation and the design angle taken here, I would recommend to have a
look at the cover letter of v1, and/or to watch these presentations at
LPC [1] and KVM forum 2020 [2].

In short, the changes since v1 include:

 - Renamed most pkvm-specific pgtable functions as pkvm_* to avoid
   confusion with the host's (Fuad)

 - Added an IC flush when switching pgtables (Fuad, Mark)

 - Cleaned-up the PI aliasing in image-vars.h (David)

 - Added a TLB flush when enabling the host stage 2 to avoid stale TLBs
   from bootloader

 - Fixed the early memory reservation by using NR_CPUS instead of
   num_possible_cpus() (which is always 1 that early)

 - Added missing preempt_{dis,en}able() guards in
   kvm_hyp_enable_protection()

 - Rebased on latest kvmarm/next

And if you'd like a branch that has all the goodies, there it is:

    https://android-kvm.googlesource.com/linux qperret/host-stage2-v2

Thanks!
Quentin

[1] https://youtu.be/54q6RzS9BpQ?t=10859
[2] https://kvmforum2020.sched.com/event/eE24/virtualization-for-the-masses-exposing-kvm-on-android-will-deacon-google

Quentin Perret (23):
  KVM: arm64: Initialize kvm_nvhe_init_params early
  KVM: arm64: Avoid free_page() in page-table allocator
  KVM: arm64: Factor memory allocation out of pgtable.c
  KVM: arm64: Introduce a BSS section for use at Hyp
  KVM: arm64: Make kvm_call_hyp() a function call at Hyp
  KVM: arm64: Allow using kvm_nvhe_sym() in hyp code
  KVM: arm64: Introduce an early Hyp page allocator
  KVM: arm64: Stub CONFIG_DEBUG_LIST at Hyp
  KVM: arm64: Introduce a Hyp buddy page allocator
  KVM: arm64: Enable access to sanitized CPU features at EL2
  KVM: arm64: Factor out vector address calculation
  of/fdt: Introduce early_init_dt_add_memory_hyp()
  KVM: arm64: Prepare Hyp memory protection
  KVM: arm64: Elevate Hyp mappings creation at EL2
  KVM: arm64: Use kvm_arch for stage 2 pgtable
  KVM: arm64: Use kvm_arch in kvm_s2_mmu
  KVM: arm64: Set host stage 2 using kvm_nvhe_init_params
  KVM: arm64: Refactor kvm_arm_setup_stage2()
  KVM: arm64: Refactor __load_guest_stage2()
  KVM: arm64: Refactor __populate_fault_info()
  KVM: arm64: Make memcache anonymous in pgtable allocator
  KVM: arm64: Reserve memory for host stage 2
  KVM: arm64: Wrap the host with a stage 2

Will Deacon (3):
  arm64: lib: Annotate {clear,copy}_page() as position-independent
  KVM: arm64: Link position-independent string routines into .hyp.text
  arm64: kvm: Add standalone ticket spinlock implementation for use at
    hyp

 arch/arm64/include/asm/cpufeature.h           |   1 +
 arch/arm64/include/asm/hyp_image.h            |   7 +
 arch/arm64/include/asm/kvm_asm.h              |   7 +
 arch/arm64/include/asm/kvm_cpufeature.h       |  19 ++
 arch/arm64/include/asm/kvm_host.h             |  16 +-
 arch/arm64/include/asm/kvm_hyp.h              |   8 +
 arch/arm64/include/asm/kvm_mmu.h              |  69 +++++-
 arch/arm64/include/asm/kvm_pgtable.h          |  41 +++-
 arch/arm64/include/asm/sections.h             |   1 +
 arch/arm64/kernel/asm-offsets.c               |   3 +
 arch/arm64/kernel/cpufeature.c                |  12 +
 arch/arm64/kernel/image-vars.h                |  33 +++
 arch/arm64/kernel/vmlinux.lds.S               |   7 +
 arch/arm64/kvm/arm.c                          | 144 ++++++++++--
 arch/arm64/kvm/hyp/Makefile                   |   2 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h       |  36 +--
 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h |  14 ++
 arch/arm64/kvm/hyp/include/nvhe/gfp.h         |  32 +++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  33 +++
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |  55 +++++
 arch/arm64/kvm/hyp/include/nvhe/mm.h          | 107 +++++++++
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h    |  92 ++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile              |   9 +-
 arch/arm64/kvm/hyp/nvhe/cache.S               |  13 ++
 arch/arm64/kvm/hyp/nvhe/cpufeature.c          |   8 +
 arch/arm64/kvm/hyp/nvhe/early_alloc.c         |  60 +++++
 arch/arm64/kvm/hyp/nvhe/hyp-init.S            |  41 ++++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  48 ++++
 arch/arm64/kvm/hyp/nvhe/hyp.lds.S             |   1 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 191 ++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 174 ++++++++++++++
 arch/arm64/kvm/hyp/nvhe/page_alloc.c          | 185 +++++++++++++++
 arch/arm64/kvm/hyp/nvhe/psci-relay.c          |   4 +-
 arch/arm64/kvm/hyp/nvhe/setup.c               | 214 ++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/stub.c                |  22 ++
 arch/arm64/kvm/hyp/nvhe/switch.c              |  12 +-
 arch/arm64/kvm/hyp/nvhe/tlb.c                 |   4 +-
 arch/arm64/kvm/hyp/pgtable.c                  |  98 ++++----
 arch/arm64/kvm/hyp/reserved_mem.c             | 104 +++++++++
 arch/arm64/kvm/mmu.c                          | 114 +++++++++-
 arch/arm64/kvm/reset.c                        |  42 +---
 arch/arm64/lib/clear_page.S                   |   4 +-
 arch/arm64/lib/copy_page.S                    |   4 +-
 arch/arm64/mm/init.c                          |   3 +
 drivers/of/fdt.c                              |   5 +
 45 files changed, 1954 insertions(+), 145 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_cpufeature.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/gfp.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/memory.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mm.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/spinlock.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/cache.S
 create mode 100644 arch/arm64/kvm/hyp/nvhe/cpufeature.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/early_alloc.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/mm.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/page_alloc.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/setup.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/stub.c
 create mode 100644 arch/arm64/kvm/hyp/reserved_mem.c

Comments

Mate Toth-Pal Feb. 17, 2021, 4:27 p.m. UTC | #1
Hi Quentin,


On 2021-01-08 13:14, Quentin Perret wrote:
> Hi all,
>
> This is the v2 of the series previously posted here:
>
>    https://lore.kernel.org/kvmarm/20201117181607.1761516-1-qperret@google.com/
>
> This basically allows us to wrap the host with a stage 2 when running in
> nVHE, hence paving the way for protecting guest memory from the host in
> the future (among other use-cases). For more details about the
> motivation and the design angle taken here, I would recommend to have a
> look at the cover letter of v1, and/or to watch these presentations at
> LPC [1] and KVM forum 2020 [2].


We tested the pKVM changes pulled from here:


>      https://android-kvm.googlesource.com/linux qperret/host-stage2-v2


We were using a target with Arm architecture with FEAT_S2FWB, and found 
that there is a bug in the patch.


It turned out that the Kernel checks for the extension, and sets up the 
stage 2 translation so that it forces the host memory type to 
write-through. However it seems that the code doesn't turn on the 
feature in the HCR_EL2 register.


We were able to fix the issue by applying the following patch:


diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 0cd3eb178f3b..e8521a072ea6 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -105,6 +105,8 @@ int kvm_host_prepare_stage2(void *mem_pgt_pool, void 
*dev_pgt_pool)
                 params->vttbr = kvm_get_vttbr(mmu);
                 params->vtcr = host_kvm.arch.vtcr;
                 params->hcr_el2 |= HCR_VM;
+               if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
+                       params->hcr_el2 |= HCR_FWB;
                 __flush_dcache_area(params, sizeof(*params));
         }


Best regards,
Mate Toth-Pal
Quentin Perret Feb. 17, 2021, 5:24 p.m. UTC | #2
Hi Mate,

On Wednesday 17 Feb 2021 at 17:27:07 (+0100), Mate Toth-Pal wrote:
> We tested the pKVM changes pulled from here:
> 
> 
> >      https://android-kvm.googlesource.com/linux qperret/host-stage2-v2
> 
> 
> We were using a target with Arm architecture with FEAT_S2FWB, and found that
> there is a bug in the patch.
> 
> 
> It turned out that the Kernel checks for the extension, and sets up the
> stage 2 translation so that it forces the host memory type to write-through.
> However it seems that the code doesn't turn on the feature in the HCR_EL2
> register.
> 
> 
> We were able to fix the issue by applying the following patch:
> 
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 0cd3eb178f3b..e8521a072ea6 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -105,6 +105,8 @@ int kvm_host_prepare_stage2(void *mem_pgt_pool, void
> *dev_pgt_pool)
>                 params->vttbr = kvm_get_vttbr(mmu);
>                 params->vtcr = host_kvm.arch.vtcr;
>                 params->hcr_el2 |= HCR_VM;
> +               if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> +                       params->hcr_el2 |= HCR_FWB;
>                 __flush_dcache_area(params, sizeof(*params));
>         }

Aha, indeed, this looks right. I'll double check HCR_EL2 to see if I'm
missing any other, and I'll add this to v3.

Thanks for testing, and the for the report.
Quentin
Sean Christopherson Feb. 19, 2021, 5:54 p.m. UTC | #3
On Fri, Jan 08, 2021, Quentin Perret wrote:
> [2] https://kvmforum2020.sched.com/event/eE24/virtualization-for-the-masses-exposing-kvm-on-android-will-deacon-google

I couldn't find any slides on the official KVM forum site linked above.  I was
able to track down a mirror[1] and the recorded presentation[2].

[1] https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/slides/kvmforum-2020-edited.pdf
[2] https://youtu.be/wY-u6n75iXc
Quentin Perret Feb. 19, 2021, 5:57 p.m. UTC | #4
On Friday 19 Feb 2021 at 09:54:38 (-0800), Sean Christopherson wrote:
> On Fri, Jan 08, 2021, Quentin Perret wrote:
> > [2] https://kvmforum2020.sched.com/event/eE24/virtualization-for-the-masses-exposing-kvm-on-android-will-deacon-google
> 
> I couldn't find any slides on the official KVM forum site linked above.  I was
> able to track down a mirror[1] and the recorded presentation[2].
> 
> [1] https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/slides/kvmforum-2020-edited.pdf
> [2] https://youtu.be/wY-u6n75iXc

Much nicer, I'll make sure to link those in the next cover letter.

Thanks Sean!
Quentin