diff mbox series

KVM: arm64: Fix memory leaks from stage2 pagetable

Message ID 20220526203956.143873-1-quic_qiancai@quicinc.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Fix memory leaks from stage2 pagetable | expand

Commit Message

Qian Cai May 26, 2022, 8:39 p.m. UTC
Running some SR-IOV workloads could trigger some leak reports from
kmemleak.

unreferenced object 0xffff080243cef500 (size 128):
  comm "qemu-system-aar", pid 179935, jiffies 4298359506 (age 1629.732s)
  hex dump (first 32 bytes):
    28 00 00 00 01 00 00 00 00 e0 4c 52 03 08 ff ff  (.........LR....
    e0 af a4 7f 7c d1 ff ff a8 3c b3 08 00 80 ff ff  ....|....<......
  backtrace:
     kmem_cache_alloc_trace
     kvm_init_stage2_mmu
     kvm_arch_init_vm
     kvm_create_vm
     kvm_dev_ioctl
     __arm64_sys_ioctl
     invoke_syscall
     el0_svc_common.constprop.0
     do_el0_svc
     el0_svc
     el0t_64_sync_handler
     el0t_64_sync

Since I yet to find a way to reproduce this at will, I just did a code
inspection and found this one spot that could happen. It is unlikely
that will fix my issue because I don't see mine went into the error
paths. But, we should fix it regardless.

If hardware_enable_all() or kvm_init_mmu_notifier() failed in
kvm_create_vm(), we ended up leaking stage2 pagetable memory from
kvm_init_stage2_mmu() because we will no longer call
kvm_arch_flush_shadow_all().

It seems that it is impossible to simply move kvm_free_stage2_pgd() from
kvm_arch_flush_shadow_all() into kvm_arch_destroy_vm() due to the issue
mentioned in the "Fixes" commit below. Thus, fixed it by freeing the
memory from kvm_arch_destroy_vm() only if the MMU notifier is not even
initialized.

Fixes: 293f293637b5 ("kvm-arm: Unmap shadow pagetables properly")
Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
---
 arch/arm64/kvm/arm.c | 3 +++
 arch/arm64/kvm/mmu.c | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

Comments

Will Deacon May 31, 2022, 4:57 p.m. UTC | #1
On Thu, May 26, 2022 at 04:39:56PM -0400, Qian Cai wrote:
> Running some SR-IOV workloads could trigger some leak reports from
> kmemleak.
> 
> unreferenced object 0xffff080243cef500 (size 128):
>   comm "qemu-system-aar", pid 179935, jiffies 4298359506 (age 1629.732s)
>   hex dump (first 32 bytes):
>     28 00 00 00 01 00 00 00 00 e0 4c 52 03 08 ff ff  (.........LR....
>     e0 af a4 7f 7c d1 ff ff a8 3c b3 08 00 80 ff ff  ....|....<......
>   backtrace:
>      kmem_cache_alloc_trace
>      kvm_init_stage2_mmu

Hmm, I can't spot a 128-byte allocation in here so this is pretty cryptic.
I don't really like the idea of papering over the report; we'd be better off
trying to reproduce it.

Will
Will Deacon May 31, 2022, 5:01 p.m. UTC | #2
On Tue, May 31, 2022 at 05:57:11PM +0100, Will Deacon wrote:
> On Thu, May 26, 2022 at 04:39:56PM -0400, Qian Cai wrote:
> > Running some SR-IOV workloads could trigger some leak reports from
> > kmemleak.
> > 
> > unreferenced object 0xffff080243cef500 (size 128):
> >   comm "qemu-system-aar", pid 179935, jiffies 4298359506 (age 1629.732s)
> >   hex dump (first 32 bytes):
> >     28 00 00 00 01 00 00 00 00 e0 4c 52 03 08 ff ff  (.........LR....
> >     e0 af a4 7f 7c d1 ff ff a8 3c b3 08 00 80 ff ff  ....|....<......
> >   backtrace:
> >      kmem_cache_alloc_trace
> >      kvm_init_stage2_mmu
> 
> Hmm, I can't spot a 128-byte allocation in here so this is pretty cryptic.
> I don't really like the idea of papering over the report; we'd be better off
> trying to reproduce it.

... although the hexdump does look like {u32; u32; ptr; ptr; ptr}, which
would match 'struct kvm_pgtable'. I guess the allocation is aligned to
ARCH_DMA_MINALIGN, which could explain the size?

Have you spotted any pattern for when the leak occurs? How are you
terminating the guest?

Will
Qian Cai May 31, 2022, 5:23 p.m. UTC | #3
On Tue, May 31, 2022 at 05:57:11PM +0100, Will Deacon wrote:
> On Thu, May 26, 2022 at 04:39:56PM -0400, Qian Cai wrote:
> > Running some SR-IOV workloads could trigger some leak reports from
> > kmemleak.
> > 
> > unreferenced object 0xffff080243cef500 (size 128):
> >   comm "qemu-system-aar", pid 179935, jiffies 4298359506 (age 1629.732s)
> >   hex dump (first 32 bytes):
> >     28 00 00 00 01 00 00 00 00 e0 4c 52 03 08 ff ff  (.........LR....
> >     e0 af a4 7f 7c d1 ff ff a8 3c b3 08 00 80 ff ff  ....|....<......
> >   backtrace:
> >      kmem_cache_alloc_trace
> >      kvm_init_stage2_mmu
> 
> Hmm, I can't spot a 128-byte allocation in here so this is pretty cryptic.
> I don't really like the idea of papering over the report; we'd be better off
> trying to reproduce it.

As far as I would like to reproduce, I have tried it in the last a few
weeks without luck. It still happens from time to time though from our
daily CI, so I was thinking to plug the knowns leaks first.
Qian Cai May 31, 2022, 5:41 p.m. UTC | #4
On Tue, May 31, 2022 at 06:01:58PM +0100, Will Deacon wrote:
> Have you spotted any pattern for when the leak occurs? How are you
> terminating the guest?

It just to send a SIGTERM to the qemu-system-aarch64 process. Origially,
right after sending the signal, it will remove_id/unbind from the vfio-pci
and then bind to the original (ixgbe) driver. However, since the process
might take a while to clean off itself, the bind might failed with -EBUSY.
I could reproduce it a few times one day while was unable to do so some
other days.

Later, we changed the code to make sure the process is disappeard first and
then remove_id/bind/unbind. Apparently, it make harder to reproduce if not
totally eliminate it.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 400bb0fe2745..7d12824f2034 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -180,6 +180,9 @@  vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
  */
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
+	if (!kvm->mmu_notifier.ops)
+		kvm_free_stage2_pgd(&kvm->arch.mmu);
+
 	bitmap_free(kvm->arch.pmu_filter);
 	free_cpumask_var(kvm->arch.supported_cpus);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f5651a05b6a8..13a527656ba7 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1739,7 +1739,8 @@  void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
-	kvm_free_stage2_pgd(&kvm->arch.mmu);
+	if (kvm->mmu_notifier.ops)
+		kvm_free_stage2_pgd(&kvm->arch.mmu);
 }
 
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,