Message ID | 20181004045351.GC16300@fergus (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault | expand |
On Thu, 4 Oct 2018 14:53:51 +1000 Paul Mackerras <paulus@ozlabs.org> wrote: > Commit 71d29f43b633 ("KVM: PPC: Book3S HV: Don't use compound_order to > determine host mapping size", 2018-09-11) added a call to > __find_linux_pte() and a dereference of the returned PTE pointer to the > radix page fault path in the common case where the page is normal > system memory. Previously, __find_linux_pte() was only called for > mappings to physical addresses which don't have a page struct (e.g. > memory-mapped I/O) or where the page struct is marked as reserved > memory. > > This exposes us to the possibility that the returned PTE pointer > could be NULL, for example in the case of a concurrent THP collapse > operation. Dereferencing the returned NULL pointer causes a host > crash. > > To fix this, we check for NULL, and if it is NULL, we retry the > operation by returning to the guest, with the expectation that it > will generate the same page fault again (unless of course it has > been fixed up by another CPU in the meantime). > > Fixes: 71d29f43b633 ("KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size") > Signed-off-by: Paul Mackerras <paulus@ozlabs.org> This seems like a reasonable fix. Thanks, Nick > --- > arch/powerpc/kvm/book3s_64_mmu_radix.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c > index 933c574e1cf7..998f8d089ac7 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > @@ -646,6 +646,16 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, > */ > local_irq_disable(); > ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift); > + /* > + * If the PTE disappeared temporarily due to a THP > + * collapse, just return and let the guest try again. > + */ > + if (!ptep) { > + local_irq_enable(); > + if (page) > + put_page(page); > + return RESUME_GUEST; > + } > pte = *ptep; > local_irq_enable(); > > -- > 2.11.0
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 933c574e1cf7..998f8d089ac7 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -646,6 +646,16 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, */ local_irq_disable(); ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift); + /* + * If the PTE disappeared temporarily due to a THP + * collapse, just return and let the guest try again. + */ + if (!ptep) { + local_irq_enable(); + if (page) + put_page(page); + return RESUME_GUEST; + } pte = *ptep; local_irq_enable();
Commit 71d29f43b633 ("KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size", 2018-09-11) added a call to __find_linux_pte() and a dereference of the returned PTE pointer to the radix page fault path in the common case where the page is normal system memory. Previously, __find_linux_pte() was only called for mappings to physical addresses which don't have a page struct (e.g. memory-mapped I/O) or where the page struct is marked as reserved memory. This exposes us to the possibility that the returned PTE pointer could be NULL, for example in the case of a concurrent THP collapse operation. Dereferencing the returned NULL pointer causes a host crash. To fix this, we check for NULL, and if it is NULL, we retry the operation by returning to the guest, with the expectation that it will generate the same page fault again (unless of course it has been fixed up by another CPU in the meantime). Fixes: 71d29f43b633 ("KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size") Signed-off-by: Paul Mackerras <paulus@ozlabs.org> --- arch/powerpc/kvm/book3s_64_mmu_radix.c | 10 ++++++++++ 1 file changed, 10 insertions(+)