[v2,39/94] KVM: arm64: nv: Handle shadow stage 2 page faults

Message ID	20200211174938.27809-40-maz@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=UyAS=37=vger.kernel.org=kvm-owner@kernel.org> From: Marc Zyngier <maz@kernel.org> To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org Cc: Andre Przywara <andre.przywara@arm.com>, Christoffer Dall <christoffer.dall@arm.com>, Dave Martin <Dave.Martin@arm.com>, Jintack Lim <jintack@cs.columbia.edu>, Alexandru Elisei <alexandru.elisei@arm.com>, James Morse <james.morse@arm.com>, Julien Thierry <julien.thierry.kdev@gmail.com>, Suzuki K Poulose <suzuki.poulose@arm.com> Subject: [PATCH v2 39/94] KVM: arm64: nv: Handle shadow stage 2 page faults Date: Tue, 11 Feb 2020 17:48:43 +0000 Message-Id: <20200211174938.27809-40-maz@kernel.org> In-Reply-To: <20200211174938.27809-1-maz@kernel.org> References: <20200211174938.27809-1-maz@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	KVM: arm64: ARMv8.3/8.4 Nested Virtualization support \| expand [v2,00/94] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support [v2,01/94] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h [v2,02/94] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature [v2,03/94] KVM: arm64: nv: Introduce nested virtualization VCPU feature [v2,04/94] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set [v2,05/94] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x [v2,06/94] KVM: arm64: nv: Add EL2 system registers to vcpu context [v2,07/94] KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values [v2,08/94] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state [v2,09/94] KVM: arm64: nv: Support virtual EL2 exceptions [v2,10/94] KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 [v2,11/94] KVM: arm64: nv: Handle trapped ERET from virtual EL2 [v2,12/94] KVM: arm64: nv: Add EL2->EL1 translation helpers [v2,13/94] KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg [v2,14/94] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() [v2,15/94] KVM: arm64: nv: Handle SPSR_EL2 specially [v2,16/94] KVM: arm64: nv: Handle HCR_EL2.E2H specially [v2,17/94] KVM: arm64: nv: Save/Restore vEL2 sysregs [v2,18/94] KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor [v2,19/94] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2 [v2,20/94] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 [v2,21/94] KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2 [v2,22/94] KVM: arm64: nv: Handle PSCI call via smc from the guest [v2,23/94] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting [v2,24/94] KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings [v2,25/94] KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting [v2,26/94] KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings [v2,27/94] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting [v2,28/94] KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 [v2,29/94] KVM: arm64: nv: Forward debug traps to the nested guest [v2,30/94] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization [v2,31/94] KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes [v2,32/94] KVM: arm64: nv: Filter out unsupported features from ID regs [v2,33/94] KVM: arm64: nv: Hide RAS from nested guests [v2,34/94] KVM: arm64: nv: Use ARMv8.5-GTG to advertise supported Stage-2 page sizes [v2,35/94] KVM: arm64: Check advertised Stage-2 page size capability [v2,36/94] KVM: arm/arm64: nv: Factor out stage 2 page table data from struct kvm [v2,37/94] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures [v2,38/94] KVM: arm64: nv: Implement nested Stage-2 page table walk logic [v2,39/94] KVM: arm64: nv: Handle shadow stage 2 page faults [v2,40/94] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables [v2,41/94] KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu [v2,42/94] KVM: arm64: nv: Introduce sys_reg_desc.forward_trap [v2,43/94] KVM: arm64: nv: Set a handler for the system instruction traps [v2,44/94] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 [v2,45/94] KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2 [v2,46/94] KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's [v2,47/94] KVM: arm64: nv: Handle traps for timer _EL02 and _EL2 sysregs accessors [v2,48/94] KVM: arm64: nv: arch_timer: Support hyp timer emulation [v2,49/94] KVM: arm64: nv: Propagate CNTVOFF_EL2 to the virtual EL1 timer [v2,50/94] KVM: arm64: nv: Load timer before the GIC [v2,51/94] KVM: arm64: nv: vgic-v3: Take cpu_if pointer directly instead of vcpu [v2,52/94] KVM: arm64: nv: Nested GICv3 Support [v2,53/94] KVM: arm64: nv: vgic: Emulate the HW bit in software [v2,54/94] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ [v2,55/94] KVM: arm64: nv: Implement maintenance interrupt forwarding [v2,56/94] KVM: arm64: nv: Add nested GICv3 tracepoints [v2,57/94] arm64: KVM: nv: Add handling of EL2-specific timer registers [v2,58/94] arm64: KVM: nv: Honor SCTLR_EL2.SPAN on entering vEL2 [v2,59/94] arm64: KVM: nv: Handle SCTLR_EL2 RES0/RES1 bits [v2,60/94] arm64: KVM: nv: Restrict S2 RD/WR permissions to match the guest's [v2,61/94] arm64: KVM: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT [v2,62/94] arm64: Detect the ARMv8.4 TTL feature [v2,63/94] arm64: KVM: nv: Add handling of ARMv8.4-TTL TLB invalidation [v2,64/94] arm64: KVM: nv: Invalidate TLBs based on shadow S2 TTL-like information [v2,65/94] arm64: KVM: nv: Tag shadow S2 entries with nested level [v2,66/94] arm64: Add SW reserved PTE/PMD bits [v2,67/94] arm64: Add level-hinted TLB invalidation helper [v2,68/94] arm64: KVM: Add a level hint to __kvm_tlb_flush_vmid_ipa [v2,69/94] arm64: KVM: Use TTL hint in when invalidating stage-2 translations [v2,70/94] arm64: KVM: nv: Add include containing the VNCR_EL2 offsets [v2,71/94] KVM: arm64: Introduce accessor for ctxt->sys_reg [v2,72/94] KVM: arm64: sysreg: Use ctxt_sys_reg() instead of raw sys_regs access [v2,73/94] KVM: arm64: sve: Use __vcpu_sys_reg() instead of raw sys_regs access [v2,74/94] KVM: arm64: pauth: Use ctxt_sys_reg() instead of raw sys_regs access [v2,75/94] KVM: arm64: debug: Use ctxt_sys_reg() instead of raw sys_regs access [v2,76/94] KVM: arm64: Add missing reset handlers for PMU emulation [v2,77/94] KVM: arm64: nv: Move sysreg reset check to boot time [v2,78/94] KVM: arm64: Map VNCR-capable registers to a separate page [v2,79/94] KVM: arm64: nv: Move nested vgic state into the sysreg file [v2,80/94] KVM: arm64: Use accessors for timer ctl/cval/offset [v2,81/94] KVM: arm64: Add VNCR-capable timer accessors for arm64 [v2,82/94] KVM: arm64: Make struct kvm_regs userspace-only [v2,83/94] KVM: arm64: VNCR-ize ELR_EL1 [v2,84/94] KVM: arm64: VNCR-ize SP_EL1 [v2,85/94] KVM: arm64: Disintegrate SPSR array [v2,86/94] KVM: arm64: aarch32: Use __vcpu_sys_reg() instead of raw sys_regs access [v2,87/94] KVM: arm64: VNCR-ize SPSR_EL1 [v2,88/94] KVM: arm64: Add ARMv8.4 Enhanced Nested Virt cpufeature [v2,89/94] KVM: arm64: nv: Synchronize PSTATE early on exit [v2,90/94] KVM: arm64: nv: Sync nested timer state with ARMv8.4 [v2,91/94] KVM: arm64: nv: Allocate VNCR page when required [v2,92/94] KVM: arm64: nv: Enable ARMv8.4-NV support [v2,93/94] KVM: arm64: nv: Fast-track 'InHost' exception returns [v2,94/94] KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 8727fde21b8f..7f1fb496f435 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -423,6 +423,58 @@ static inline int kvm_set_ipa_limit(void) { return 0; } static inline void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu) {} static inline void kvm_init_nested(struct kvm *kvm) {} +struct kvm_s2_trans {}; +static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans) +{ + BUG(); +} + +static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans) +{ + BUG(); +} + +static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans) +{ + BUG(); +} + +static inline int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t ipa, + struct kvm_s2_trans *trans) +{ + BUG(); +} + +static inline int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, + struct kvm_s2_trans *trans) +{ + BUG(); +} + +static inline void kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u32 esr) +{ + BUG(); +} + +static inline bool kvm_s2_trans_readable(struct kvm_s2_trans *trans) +{ + BUG(); +} + +static inline bool kvm_s2_trans_writable(struct kvm_s2_trans *trans) +{ + BUG(); +} + +static inline void kvm_nested_s2_flush(struct kvm *kvm) {} +static inline void kvm_nested_s2_wp(struct kvm *kvm) {} +static inline void kvm_nested_s2_clear(struct kvm *kvm) {} + +static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu) +{ + return false; +} + static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu) { struct kvm_vmid *vmid = &mmu->vmid; diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h index 0e5f88060ecc..1ccd98b5fead 100644 --- a/arch/arm64/include/asm/kvm_emulate.h +++ b/arch/arm64/include/asm/kvm_emulate.h @@ -684,4 +684,10 @@ static inline void __hyp_text __kvm_skip_instr(struct kvm_vcpu *vcpu) write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR); } +static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu) +{ + return (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu && + vcpu->arch.hw_mmu->nested_stage2_enabled); +} + #endif /* __ARM64_KVM_EMULATE_H__ */ diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h index 3881e51d5a2d..b2bf82461fa6 100644 --- a/arch/arm64/include/asm/kvm_nested.h +++ b/arch/arm64/include/asm/kvm_nested.h @@ -27,9 +27,27 @@ struct kvm_s2_trans { u64 upper_attr; }; +static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans) +{ + return trans->output; +} + +static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans) +{ + return trans->block_size; +} + +static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans) +{ + return trans->esr; +} + extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa, struct kvm_s2_trans *result); +extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, + struct kvm_s2_trans *trans); +extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2); int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe); extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg, u64 control_bit); diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c index 573bcfcfe53f..361c3eb2032c 100644 --- a/arch/arm64/kvm/nested.c +++ b/arch/arm64/kvm/nested.c @@ -471,6 +471,45 @@ void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu) } } +/* + * Returns non-zero if permission fault is handled by injecting it to the next + * level hypervisor. + */ +int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans) +{ + unsigned long fault_status = kvm_vcpu_trap_get_fault_type(vcpu); + bool forward_fault = false; + + trans->esr = 0; + + if (fault_status != FSC_PERM) + return 0; + + if (kvm_vcpu_trap_is_iabt(vcpu)) { + forward_fault = (trans->upper_attr & PTE_S2_XN); + } else { + bool write_fault = kvm_is_write_fault(vcpu); + + forward_fault = ((write_fault && !trans->writable) || + (!write_fault && !trans->readable)); + } + + if (forward_fault) { + trans->esr = esr_s2_fault(vcpu, trans->level, ESR_ELx_FSC_PERM); + return 1; + } + + return 0; +} + +int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2) +{ + vcpu_write_sys_reg(vcpu, vcpu->arch.fault.far_el2, FAR_EL2); + vcpu_write_sys_reg(vcpu, vcpu->arch.fault.hpfar_el2, HPFAR_EL2); + + return kvm_inject_nested_sync(vcpu, esr_el2); +} + /* * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c index aedfcff99ac5..7ac7b1eac1ac 100644 --- a/virt/kvm/arm/mmio.c +++ b/virt/kvm/arm/mmio.c @@ -120,7 +120,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run) } int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, - phys_addr_t fault_ipa) + phys_addr_t ipa) { unsigned long data; unsigned long rt; @@ -137,7 +137,7 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, if (vcpu->kvm->arch.return_nisv_io_abort_to_user) { run->exit_reason = KVM_EXIT_ARM_NISV; run->arm_nisv.esr_iss = kvm_vcpu_dabt_iss_nisv_sanitized(vcpu); - run->arm_nisv.fault_ipa = fault_ipa; + run->arm_nisv.fault_ipa = ipa; return 0; } @@ -164,22 +164,22 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, data = vcpu_data_guest_to_host(vcpu, vcpu_get_reg(vcpu, rt), len); - trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, len, fault_ipa, &data); + trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, len, ipa, &data); kvm_mmio_write_buf(data_buf, len, data); - ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, fault_ipa, len, + ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, ipa, len, data_buf); } else { trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, len, - fault_ipa, NULL); + ipa, NULL); - ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, fault_ipa, len, + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, ipa, len, data_buf); } /* Now prepare kvm_run for the potential return to userland. */ run->mmio.is_write = is_write; - run->mmio.phys_addr = fault_ipa; + run->mmio.phys_addr = ipa; run->mmio.len = len; vcpu->mmio_needed = 1; diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index bead2ad59f7d..c6db597b925c 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1388,7 +1388,8 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, return ret; } -static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) +static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap, + phys_addr_t *fault_ipap) { kvm_pfn_t pfn = *pfnp; gfn_t gfn = *ipap >> PAGE_SHIFT; @@ -1416,6 +1417,7 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) mask = PTRS_PER_PMD - 1; VM_BUG_ON((gfn & mask) != (pfn & mask)); if (pfn & mask) { + *fault_ipap &= PMD_MASK; *ipap &= PMD_MASK; kvm_release_pfn_clean(pfn); pfn &= ~mask; @@ -1671,14 +1673,16 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, } static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, - struct kvm_memory_slot *memslot, unsigned long hva, - unsigned long fault_status) + struct kvm_s2_trans *nested, + struct kvm_memory_slot *memslot, + unsigned long hva, unsigned long fault_status) { int ret; - bool write_fault, writable, force_pte = false; + bool write_fault, writable; bool exec_fault, needs_exec; unsigned long mmu_seq; - gfn_t gfn = fault_ipa >> PAGE_SHIFT; + phys_addr_t ipa = fault_ipa; + gfn_t gfn; struct kvm *kvm = vcpu->kvm; struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache; struct vm_area_struct *vma; @@ -1688,6 +1692,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, bool logging_active = memslot_is_logging(memslot); unsigned long vma_pagesize, flags = 0; struct kvm_s2_mmu *mmu = vcpu->arch.hw_mmu; + unsigned long max_map_size = PUD_SIZE; write_fault = kvm_is_write_fault(vcpu); exec_fault = kvm_vcpu_trap_is_iabt(vcpu); @@ -1715,10 +1720,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, vma_pagesize = 1ULL << vma_shift; if (logging_active || (vma->vm_flags & VM_PFNMAP) || - !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) { - force_pte = true; - vma_pagesize = PAGE_SIZE; + !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) + max_map_size = PAGE_SIZE; + + if (kvm_is_shadow_s2_fault(vcpu)) { + ipa = kvm_s2_trans_output(nested); + + /* + * If we're about to create a shadow stage 2 entry, then we + * can only create a block mapping if the guest stage 2 page + * table uses at least as big a mapping. + */ + max_map_size = min(kvm_s2_trans_size(nested), max_map_size); } + gfn = ipa >> PAGE_SHIFT; + + vma_pagesize = min(vma_pagesize, max_map_size); /* * The stage2 has a minimum of 2 level table (For arm64 see @@ -1728,8 +1745,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * 3 levels, i.e, PMD is not folded. */ if (vma_pagesize == PMD_SIZE || - (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) - gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; + (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) { + gfn = (ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; + } up_read(&current->mm->mmap_sem); /* We need minimum second+third level pages */ @@ -1784,7 +1802,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (mmu_notifier_retry(kvm, mmu_seq)) goto out_unlock; - if (vma_pagesize == PAGE_SIZE && !force_pte) { + if (vma_pagesize == PAGE_SIZE && max_map_size >= PMD_SIZE) { /* * Only PMD_SIZE transparent hugepages(THP) are * currently supported. This code will need to be @@ -1794,7 +1812,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * aligned and that the block is contained within the memslot. */ if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE) && - transparent_hugepage_adjust(&pfn, &fault_ipa)) + transparent_hugepage_adjust(&pfn, &ipa, &fault_ipa)) vma_pagesize = PMD_SIZE; } @@ -1919,8 +1937,10 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa) int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) { unsigned long fault_status; - phys_addr_t fault_ipa; + phys_addr_t fault_ipa; /* The address we faulted on */ + phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */ struct kvm_memory_slot *memslot; + struct kvm_s2_trans nested_trans; unsigned long hva; bool is_iabt, write_fault, writable; gfn_t gfn; @@ -1928,7 +1948,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) fault_status = kvm_vcpu_trap_get_fault_type(vcpu); - fault_ipa = kvm_vcpu_get_fault_ipa(vcpu); + ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu); is_iabt = kvm_vcpu_trap_is_iabt(vcpu); /* Synchronous External Abort? */ @@ -1952,6 +1972,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) /* Check the stage-2 fault is trans. fault or write fault */ if (fault_status != FSC_FAULT && fault_status != FSC_PERM && fault_status != FSC_ACCESS) { + /* + * We must never see an address size fault on shadow stage 2 + * page table walk, because we would have injected an addr + * size fault when we walked the nested s2 page and not + * create the shadow entry. + */ kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n", kvm_vcpu_trap_get_class(vcpu), (unsigned long)kvm_vcpu_trap_get_fault(vcpu), @@ -1961,7 +1987,36 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) idx = srcu_read_lock(&vcpu->kvm->srcu); - gfn = fault_ipa >> PAGE_SHIFT; + /* + * We may have faulted on a shadow stage 2 page table if we are + * running a nested guest. In this case, we have to resolve the L2 + * IPA to the L1 IPA first, before knowing what kind of memory should + * back the L1 IPA. + * + * If the shadow stage 2 page table walk faults, then we simply inject + * this to the guest and carry on. + */ + if (kvm_is_shadow_s2_fault(vcpu)) { + u32 esr; + + ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans); + esr = kvm_s2_trans_esr(&nested_trans); + if (esr) + kvm_inject_s2_fault(vcpu, esr); + if (ret) + goto out_unlock; + + ret = kvm_s2_handle_perm_fault(vcpu, &nested_trans); + esr = kvm_s2_trans_esr(&nested_trans); + if (esr) + kvm_inject_s2_fault(vcpu, esr); + if (ret) + goto out_unlock; + + ipa = kvm_s2_trans_output(&nested_trans); + } + + gfn = ipa >> PAGE_SHIFT; memslot = gfn_to_memslot(vcpu->kvm, gfn); hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable); write_fault = kvm_is_write_fault(vcpu); @@ -1994,13 +2049,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) * faulting VA. This is always 12 bits, irrespective * of the page size. */ - fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1); - ret = io_mem_abort(vcpu, run, fault_ipa); + ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1); + ret = io_mem_abort(vcpu, run, ipa); goto out_unlock; } /* Userspace should not be able to register out-of-bounds IPAs */ - VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm)); + VM_BUG_ON(ipa >= kvm_phys_size(vcpu->kvm)); if (fault_status == FSC_ACCESS) { handle_access_fault(vcpu, fault_ipa); @@ -2008,7 +2063,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) goto out_unlock; } - ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status); + ret = user_mem_abort(vcpu, fault_ipa, &nested_trans, + memslot, hva, fault_status); if (ret == 0) ret = 1; out:

[v2,39/94] KVM: arm64: nv: Handle shadow stage 2 page faults

Commit Message

Patch