Message ID | 20220526203956.143873-1-quic_qiancai@quicinc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: Fix memory leaks from stage2 pagetable | expand |
On Thu, May 26, 2022 at 04:39:56PM -0400, Qian Cai wrote: > Running some SR-IOV workloads could trigger some leak reports from > kmemleak. > > unreferenced object 0xffff080243cef500 (size 128): > comm "qemu-system-aar", pid 179935, jiffies 4298359506 (age 1629.732s) > hex dump (first 32 bytes): > 28 00 00 00 01 00 00 00 00 e0 4c 52 03 08 ff ff (.........LR.... > e0 af a4 7f 7c d1 ff ff a8 3c b3 08 00 80 ff ff ....|....<...... > backtrace: > kmem_cache_alloc_trace > kvm_init_stage2_mmu Hmm, I can't spot a 128-byte allocation in here so this is pretty cryptic. I don't really like the idea of papering over the report; we'd be better off trying to reproduce it. Will
On Tue, May 31, 2022 at 05:57:11PM +0100, Will Deacon wrote: > On Thu, May 26, 2022 at 04:39:56PM -0400, Qian Cai wrote: > > Running some SR-IOV workloads could trigger some leak reports from > > kmemleak. > > > > unreferenced object 0xffff080243cef500 (size 128): > > comm "qemu-system-aar", pid 179935, jiffies 4298359506 (age 1629.732s) > > hex dump (first 32 bytes): > > 28 00 00 00 01 00 00 00 00 e0 4c 52 03 08 ff ff (.........LR.... > > e0 af a4 7f 7c d1 ff ff a8 3c b3 08 00 80 ff ff ....|....<...... > > backtrace: > > kmem_cache_alloc_trace > > kvm_init_stage2_mmu > > Hmm, I can't spot a 128-byte allocation in here so this is pretty cryptic. > I don't really like the idea of papering over the report; we'd be better off > trying to reproduce it. ... although the hexdump does look like {u32; u32; ptr; ptr; ptr}, which would match 'struct kvm_pgtable'. I guess the allocation is aligned to ARCH_DMA_MINALIGN, which could explain the size? Have you spotted any pattern for when the leak occurs? How are you terminating the guest? Will
On Tue, May 31, 2022 at 05:57:11PM +0100, Will Deacon wrote: > On Thu, May 26, 2022 at 04:39:56PM -0400, Qian Cai wrote: > > Running some SR-IOV workloads could trigger some leak reports from > > kmemleak. > > > > unreferenced object 0xffff080243cef500 (size 128): > > comm "qemu-system-aar", pid 179935, jiffies 4298359506 (age 1629.732s) > > hex dump (first 32 bytes): > > 28 00 00 00 01 00 00 00 00 e0 4c 52 03 08 ff ff (.........LR.... > > e0 af a4 7f 7c d1 ff ff a8 3c b3 08 00 80 ff ff ....|....<...... > > backtrace: > > kmem_cache_alloc_trace > > kvm_init_stage2_mmu > > Hmm, I can't spot a 128-byte allocation in here so this is pretty cryptic. > I don't really like the idea of papering over the report; we'd be better off > trying to reproduce it. As far as I would like to reproduce, I have tried it in the last a few weeks without luck. It still happens from time to time though from our daily CI, so I was thinking to plug the knowns leaks first.
On Tue, May 31, 2022 at 06:01:58PM +0100, Will Deacon wrote: > Have you spotted any pattern for when the leak occurs? How are you > terminating the guest? It just to send a SIGTERM to the qemu-system-aarch64 process. Origially, right after sending the signal, it will remove_id/unbind from the vfio-pci and then bind to the original (ixgbe) driver. However, since the process might take a while to clean off itself, the bind might failed with -EBUSY. I could reproduce it a few times one day while was unable to do so some other days. Later, we changed the code to make sure the process is disappeard first and then remove_id/bind/unbind. Apparently, it make harder to reproduce if not totally eliminate it.
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 400bb0fe2745..7d12824f2034 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -180,6 +180,9 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) */ void kvm_arch_destroy_vm(struct kvm *kvm) { + if (!kvm->mmu_notifier.ops) + kvm_free_stage2_pgd(&kvm->arch.mmu); + bitmap_free(kvm->arch.pmu_filter); free_cpumask_var(kvm->arch.supported_cpus); diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index f5651a05b6a8..13a527656ba7 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1739,7 +1739,8 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) void kvm_arch_flush_shadow_all(struct kvm *kvm) { - kvm_free_stage2_pgd(&kvm->arch.mmu); + if (kvm->mmu_notifier.ops) + kvm_free_stage2_pgd(&kvm->arch.mmu); } void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
Running some SR-IOV workloads could trigger some leak reports from kmemleak. unreferenced object 0xffff080243cef500 (size 128): comm "qemu-system-aar", pid 179935, jiffies 4298359506 (age 1629.732s) hex dump (first 32 bytes): 28 00 00 00 01 00 00 00 00 e0 4c 52 03 08 ff ff (.........LR.... e0 af a4 7f 7c d1 ff ff a8 3c b3 08 00 80 ff ff ....|....<...... backtrace: kmem_cache_alloc_trace kvm_init_stage2_mmu kvm_arch_init_vm kvm_create_vm kvm_dev_ioctl __arm64_sys_ioctl invoke_syscall el0_svc_common.constprop.0 do_el0_svc el0_svc el0t_64_sync_handler el0t_64_sync Since I yet to find a way to reproduce this at will, I just did a code inspection and found this one spot that could happen. It is unlikely that will fix my issue because I don't see mine went into the error paths. But, we should fix it regardless. If hardware_enable_all() or kvm_init_mmu_notifier() failed in kvm_create_vm(), we ended up leaking stage2 pagetable memory from kvm_init_stage2_mmu() because we will no longer call kvm_arch_flush_shadow_all(). It seems that it is impossible to simply move kvm_free_stage2_pgd() from kvm_arch_flush_shadow_all() into kvm_arch_destroy_vm() due to the issue mentioned in the "Fixes" commit below. Thus, fixed it by freeing the memory from kvm_arch_destroy_vm() only if the MMU notifier is not even initialized. Fixes: 293f293637b5 ("kvm-arm: Unmap shadow pagetables properly") Signed-off-by: Qian Cai <quic_qiancai@quicinc.com> --- arch/arm64/kvm/arm.c | 3 +++ arch/arm64/kvm/mmu.c | 3 ++- 2 files changed, 5 insertions(+), 1 deletion(-)