Message ID | 20230817063429.31454-1-qiang.zhang1211@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 1cc178e9a20997e3a36ae1ff40800b49525eb3ff |
Headers | show |
Series | rcu: Drop vmalloc memory info dump when double call_rcu() | expand |
On Thu, Aug 17, 2023 at 02:34:29PM +0800, Zqiang wrote: > Currently, for double invoke call_rcu(), will dump rcu_head objects > memory info, if the objects is not allocated from the slab allocator, > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock > need to be held, since the call_rcu() can be invoked in interrupt context, > therefore, there is a possibility of spinlock deadlock scenarios. > > And in Preempt-RT kernel, the rcutorture test also trigger the following > lockdep warning: > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 > preempt_count: 1, expected: 0 > RCU nest depth: 1, expected: 1 > 3 locks held by swapper/0/1: > #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0 > #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370 > #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70 > irq event stamp: 565512 > hardirqs last enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940 > hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370 > softirqs last enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170 > softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0 > Preemption disabled at: > [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370 > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.5.0-rc4-rt2-yocto-preempt-rt+ #15 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 > Call Trace: > <TASK> > dump_stack_lvl+0x68/0xb0 > dump_stack+0x14/0x20 > __might_resched+0x1aa/0x280 > ? __pfx_rcu_torture_err_cb+0x10/0x10 > rt_spin_lock+0x53/0x130 > ? find_vmap_area+0x1f/0x70 > find_vmap_area+0x1f/0x70 > vmalloc_dump_obj+0x20/0x60 > mem_dump_obj+0x22/0x90 > __call_rcu_common+0x5bf/0x940 > ? debug_smp_processor_id+0x1b/0x30 > call_rcu_hurry+0x14/0x20 > rcu_torture_init+0x1f82/0x2370 > ? __pfx_rcu_torture_leak_cb+0x10/0x10 > ? __pfx_rcu_torture_leak_cb+0x10/0x10 > ? __pfx_rcu_torture_init+0x10/0x10 > do_one_initcall+0x6c/0x300 > ? debug_smp_processor_id+0x1b/0x30 > kernel_init_freeable+0x2b9/0x540 > ? __pfx_kernel_init+0x10/0x10 > kernel_init+0x1f/0x150 > ret_from_fork+0x40/0x50 > ? __pfx_kernel_init+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 > </TASK> > > The statistics about the source of 'rhp', the kmem_valid_obj() accounts > for more than 97.5%, and vmalloc accounts for less than 1%, this statistic > comes from leizhen. this commit therefore drop vmalloc_dump_obj() from > mem_dump_obj() and only check whether is vmalloc address. > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Hearing no objections, I pulled this into -rcu, but only for testing. It should either go up via mm or accumulate maintainer acks for me to take it, as the case may be. Thanx, Paul > --- > include/linux/vmalloc.h | 5 ----- > mm/util.c | 7 +++---- > mm/vmalloc.c | 14 -------------- > 3 files changed, 3 insertions(+), 23 deletions(-) > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > index c720be70c8dd..d7c4e112a4ad 100644 > --- a/include/linux/vmalloc.h > +++ b/include/linux/vmalloc.h > @@ -289,10 +289,5 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) > int register_vmap_purge_notifier(struct notifier_block *nb); > int unregister_vmap_purge_notifier(struct notifier_block *nb); > > -#if defined(CONFIG_MMU) && defined(CONFIG_PRINTK) > -bool vmalloc_dump_obj(void *object); > -#else > -static inline bool vmalloc_dump_obj(void *object) { return false; } > -#endif > > #endif /* _LINUX_VMALLOC_H */ > diff --git a/mm/util.c b/mm/util.c > index ddfbb22dc187..fbc007adc108 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -1066,10 +1066,9 @@ void mem_dump_obj(void *object) > if (kmem_dump_obj(object)) > return; > > - if (vmalloc_dump_obj(object)) > - return; > - > - if (virt_addr_valid(object)) > + if (is_vmalloc_addr(object)) > + type = "vmalloc memory"; > + else if (virt_addr_valid(object)) > type = "non-slab/vmalloc memory"; > else if (object == NULL) > type = "NULL pointer"; > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 93cf99aba335..224af955bcb5 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -4271,20 +4271,6 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) > } > #endif /* CONFIG_SMP */ > > -#ifdef CONFIG_PRINTK > -bool vmalloc_dump_obj(void *object) > -{ > - struct vm_struct *vm; > - void *objp = (void *)PAGE_ALIGN((unsigned long)object); > - > - vm = find_vm_area(objp); > - if (!vm) > - return false; > - pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n", > - vm->nr_pages, (unsigned long)vm->addr, vm->caller); > - return true; > -} > -#endif > > #ifdef CONFIG_PROC_FS > static void *s_start(struct seq_file *m, loff_t *pos) > -- > 2.17.1 >
On Thu, Aug 17, 2023 at 2:34 AM Zqiang <qiang.zhang1211@gmail.com> wrote: > > Currently, for double invoke call_rcu(), will dump rcu_head objects > memory info, if the objects is not allocated from the slab allocator, > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock > need to be held, since the call_rcu() can be invoked in interrupt context, > therefore, there is a possibility of spinlock deadlock scenarios. > > And in Preempt-RT kernel, the rcutorture test also trigger the following > lockdep warning: > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 [...] > > The statistics about the source of 'rhp', the kmem_valid_obj() accounts > for more than 97.5%, and vmalloc accounts for less than 1%, this statistic > comes from leizhen. this commit therefore drop vmalloc_dump_obj() from > mem_dump_obj() and only check whether is vmalloc address. > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > --- > include/linux/vmalloc.h | 5 ----- > mm/util.c | 7 +++---- > mm/vmalloc.c | 14 -------------- > 3 files changed, 3 insertions(+), 23 deletions(-) > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > index c720be70c8dd..d7c4e112a4ad 100644 > --- a/include/linux/vmalloc.h > +++ b/include/linux/vmalloc.h > @@ -289,10 +289,5 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) > int register_vmap_purge_notifier(struct notifier_block *nb); > int unregister_vmap_purge_notifier(struct notifier_block *nb); > > -#if defined(CONFIG_MMU) && defined(CONFIG_PRINTK) > -bool vmalloc_dump_obj(void *object); > -#else > -static inline bool vmalloc_dump_obj(void *object) { return false; } > -#endif > > #endif /* _LINUX_VMALLOC_H */ > diff --git a/mm/util.c b/mm/util.c > index ddfbb22dc187..fbc007adc108 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -1066,10 +1066,9 @@ void mem_dump_obj(void *object) > if (kmem_dump_obj(object)) > return; > > - if (vmalloc_dump_obj(object)) > - return; > - > - if (virt_addr_valid(object)) > + if (is_vmalloc_addr(object)) > + type = "vmalloc memory"; > + else if (virt_addr_valid(object)) > type = "non-slab/vmalloc memory"; > else if (object == NULL) > type = "NULL pointer"; > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 93cf99aba335..224af955bcb5 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -4271,20 +4271,6 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) > } > #endif /* CONFIG_SMP */ > > -#ifdef CONFIG_PRINTK > -bool vmalloc_dump_obj(void *object) > -{ > - struct vm_struct *vm; > - void *objp = (void *)PAGE_ALIGN((unsigned long)object); > - > - vm = find_vm_area(objp); What about adding something like the following to vmalloc.c. Would it work? +static struct vmap_area *find_vmap_area_trylock(unsigned long addr) +{ + struct vmap_area *va; + + if (!spin_trylock(&vmap_area_lock)) + return NULL; + va = __find_vmap_area(addr); + spin_unlock(&vmap_area_lock); + + return va; +} and call it from vmalloc_dump_obj(): vm = find_vm_area_trylock(objp); And then keep the rest of your patch as it is but add back the call to vmalloc_dump_obj() in mem_dump_obj(): if (vmalloc_dump_obj(object)) return; - if (virt_addr_valid(object)) + if (is_vmalloc_addr(object)) + type = "vmalloc memory"; + else if (virt_addr_valid(object)) type = "non-slab/vmalloc memory"; If that still throws "sleeping function called" type warnings, then that is a preempt RT issue because spin_trylock() on conversion to a sleeping lock should not be sleeping as it is uncontended. But I could be missing something and I'll look more later. But for now off to pick up a rental car here... thanks, - Joel
On Tue, Aug 22, 2023 at 8:50 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Thu, Aug 17, 2023 at 02:34:29PM +0800, Zqiang wrote: > > Currently, for double invoke call_rcu(), will dump rcu_head objects > > memory info, if the objects is not allocated from the slab allocator, > > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock > > need to be held, since the call_rcu() can be invoked in interrupt context, > > therefore, there is a possibility of spinlock deadlock scenarios. > > > > And in Preempt-RT kernel, the rcutorture test also trigger the following > > lockdep warning: > > > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 > > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 > > preempt_count: 1, expected: 0 > > RCU nest depth: 1, expected: 1 > > 3 locks held by swapper/0/1: > > #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0 > > #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370 > > #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70 > > irq event stamp: 565512 > > hardirqs last enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940 > > hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370 > > softirqs last enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170 > > softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0 > > Preemption disabled at: > > [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370 > > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.5.0-rc4-rt2-yocto-preempt-rt+ #15 > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 > > Call Trace: > > <TASK> > > dump_stack_lvl+0x68/0xb0 > > dump_stack+0x14/0x20 > > __might_resched+0x1aa/0x280 > > ? __pfx_rcu_torture_err_cb+0x10/0x10 > > rt_spin_lock+0x53/0x130 > > ? find_vmap_area+0x1f/0x70 > > find_vmap_area+0x1f/0x70 > > vmalloc_dump_obj+0x20/0x60 > > mem_dump_obj+0x22/0x90 > > __call_rcu_common+0x5bf/0x940 > > ? debug_smp_processor_id+0x1b/0x30 > > call_rcu_hurry+0x14/0x20 > > rcu_torture_init+0x1f82/0x2370 > > ? __pfx_rcu_torture_leak_cb+0x10/0x10 > > ? __pfx_rcu_torture_leak_cb+0x10/0x10 > > ? __pfx_rcu_torture_init+0x10/0x10 > > do_one_initcall+0x6c/0x300 > > ? debug_smp_processor_id+0x1b/0x30 > > kernel_init_freeable+0x2b9/0x540 > > ? __pfx_kernel_init+0x10/0x10 > > kernel_init+0x1f/0x150 > > ret_from_fork+0x40/0x50 > > ? __pfx_kernel_init+0x10/0x10 > > ret_from_fork_asm+0x1b/0x30 > > </TASK> > > > > The statistics about the source of 'rhp', the kmem_valid_obj() accounts > > for more than 97.5%, and vmalloc accounts for less than 1%, this statistic > > comes from leizhen. this commit therefore drop vmalloc_dump_obj() from > > mem_dump_obj() and only check whether is vmalloc address. > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > Hearing no objections, I pulled this into -rcu, but only for testing. > > It should either go up via mm or accumulate maintainer acks for me to > take it, as the case may be. I was wondering if we can opportunistically acquire the vmap_area_lock and keep the debug some of the times (I suggested this idea in another thread). Thanks.
> > On Thu, Aug 17, 2023 at 2:34 AM Zqiang <qiang.zhang1211@gmail.com> wrote: > > > > Currently, for double invoke call_rcu(), will dump rcu_head objects > > memory info, if the objects is not allocated from the slab allocator, > > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock > > need to be held, since the call_rcu() can be invoked in interrupt context, > > therefore, there is a possibility of spinlock deadlock scenarios. > > > > And in Preempt-RT kernel, the rcutorture test also trigger the following > > lockdep warning: > > > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 > > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 > [...] > > > > The statistics about the source of 'rhp', the kmem_valid_obj() accounts > > for more than 97.5%, and vmalloc accounts for less than 1%, this statistic > > comes from leizhen. this commit therefore drop vmalloc_dump_obj() from > > mem_dump_obj() and only check whether is vmalloc address. > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > --- > > include/linux/vmalloc.h | 5 ----- > > mm/util.c | 7 +++---- > > mm/vmalloc.c | 14 -------------- > > 3 files changed, 3 insertions(+), 23 deletions(-) > > > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > > index c720be70c8dd..d7c4e112a4ad 100644 > > --- a/include/linux/vmalloc.h > > +++ b/include/linux/vmalloc.h > > @@ -289,10 +289,5 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) > > int register_vmap_purge_notifier(struct notifier_block *nb); > > int unregister_vmap_purge_notifier(struct notifier_block *nb); > > > > -#if defined(CONFIG_MMU) && defined(CONFIG_PRINTK) > > -bool vmalloc_dump_obj(void *object); > > -#else > > -static inline bool vmalloc_dump_obj(void *object) { return false; } > > -#endif > > > > #endif /* _LINUX_VMALLOC_H */ > > diff --git a/mm/util.c b/mm/util.c > > index ddfbb22dc187..fbc007adc108 100644 > > --- a/mm/util.c > > +++ b/mm/util.c > > @@ -1066,10 +1066,9 @@ void mem_dump_obj(void *object) > > if (kmem_dump_obj(object)) > > return; > > > > - if (vmalloc_dump_obj(object)) > > - return; > > - > > - if (virt_addr_valid(object)) > > + if (is_vmalloc_addr(object)) > > + type = "vmalloc memory"; > > + else if (virt_addr_valid(object)) > > type = "non-slab/vmalloc memory"; > > else if (object == NULL) > > type = "NULL pointer"; > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index 93cf99aba335..224af955bcb5 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -4271,20 +4271,6 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) > > } > > #endif /* CONFIG_SMP */ > > > > -#ifdef CONFIG_PRINTK > > -bool vmalloc_dump_obj(void *object) > > -{ > > - struct vm_struct *vm; > > - void *objp = (void *)PAGE_ALIGN((unsigned long)object); > > - > > - vm = find_vm_area(objp); > > What about adding something like the following to vmalloc.c. Would it work? > > +static struct vmap_area *find_vmap_area_trylock(unsigned long addr) > +{ > + struct vmap_area *va; > + > + if (!spin_trylock(&vmap_area_lock)) > + return NULL; > + va = __find_vmap_area(addr); > + spin_unlock(&vmap_area_lock); > + > + return va; > +} > > and call it from vmalloc_dump_obj(): > > vm = find_vm_area_trylock(objp); > > And then keep the rest of your patch as it is but add back the call to > vmalloc_dump_obj() in mem_dump_obj(): > > if (vmalloc_dump_obj(object)) > return; > > - if (virt_addr_valid(object)) > + if (is_vmalloc_addr(object)) > + type = "vmalloc memory"; > + else if (virt_addr_valid(object)) > type = "non-slab/vmalloc memory"; > > > If that still throws "sleeping function called" type warnings, then > that is a preempt RT issue because spin_trylock() on conversion to a > sleeping lock should not be sleeping as it is uncontended. > Hello, Joel I did a test in the RT kernel with the following modifications, and didn't trigger sleeping function called warning. diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 93cf99aba335..e07b0e34249b 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -4272,12 +4272,24 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) #endif /* CONFIG_SMP */ #ifdef CONFIG_PRINTK +static struct vm_struct *find_vmap_area_trylock(unsigned long addr) +{ + struct vmap_area *va; + + if (!spin_trylock(&vmap_area_lock)) + return NULL; + va = __find_vmap_area(addr, &vmap_area_root); + spin_unlock(&vmap_area_lock); + + return va ? va->vm : NULL; +} + bool vmalloc_dump_obj(void *object) { struct vm_struct *vm; void *objp = (void *)PAGE_ALIGN((unsigned long)object); - vm = find_vm_area(objp); + vm = find_vmap_area_trylock((unsigned long)objp); if (!vm) return false; pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n", If double call_rcu() appears in different threads at the same time, It is possible to dump only one of the vmalloc information. Thanks Zqiang > > But I could be missing something and I'll look more later. But for now > off to pick up a rental car here... > > thanks, > > - Joel
On Tue, Aug 22, 2023 at 10:27 PM Z qiang <qiang.zhang1211@gmail.com> wrote: > > > > > On Thu, Aug 17, 2023 at 2:34 AM Zqiang <qiang.zhang1211@gmail.com> wrote: > > > > > > Currently, for double invoke call_rcu(), will dump rcu_head objects > > > memory info, if the objects is not allocated from the slab allocator, > > > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock > > > need to be held, since the call_rcu() can be invoked in interrupt context, > > > therefore, there is a possibility of spinlock deadlock scenarios. [..] > > > - > > > - if (virt_addr_valid(object)) > > > + if (is_vmalloc_addr(object)) > > > + type = "vmalloc memory"; > > > + else if (virt_addr_valid(object)) > > > type = "non-slab/vmalloc memory"; > > > else if (object == NULL) > > > type = "NULL pointer"; > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > > index 93cf99aba335..224af955bcb5 100644 > > > --- a/mm/vmalloc.c > > > +++ b/mm/vmalloc.c > > > @@ -4271,20 +4271,6 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) > > > } > > > #endif /* CONFIG_SMP */ > > > > > > -#ifdef CONFIG_PRINTK > > > -bool vmalloc_dump_obj(void *object) > > > -{ > > > - struct vm_struct *vm; > > > - void *objp = (void *)PAGE_ALIGN((unsigned long)object); > > > - > > > - vm = find_vm_area(objp); > > > > What about adding something like the following to vmalloc.c. Would it work? > > > > +static struct vmap_area *find_vmap_area_trylock(unsigned long addr) > > +{ > > + struct vmap_area *va; > > + > > + if (!spin_trylock(&vmap_area_lock)) > > + return NULL; > > + va = __find_vmap_area(addr); > > + spin_unlock(&vmap_area_lock); > > + > > + return va; > > +} > > > > and call it from vmalloc_dump_obj(): > > > > vm = find_vm_area_trylock(objp); > > > > And then keep the rest of your patch as it is but add back the call to > > vmalloc_dump_obj() in mem_dump_obj(): > > > > if (vmalloc_dump_obj(object)) > > return; > > > > - if (virt_addr_valid(object)) > > + if (is_vmalloc_addr(object)) > > + type = "vmalloc memory"; > > + else if (virt_addr_valid(object)) > > type = "non-slab/vmalloc memory"; > > > > > > If that still throws "sleeping function called" type warnings, then > > that is a preempt RT issue because spin_trylock() on conversion to a > > sleeping lock should not be sleeping as it is uncontended. > > > > Hello, Joel > > I did a test in the RT kernel with the following modifications, and > didn't trigger > sleeping function called warning. > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 93cf99aba335..e07b0e34249b 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -4272,12 +4272,24 @@ void pcpu_free_vm_areas(struct vm_struct > **vms, int nr_vms) > #endif /* CONFIG_SMP */ > > #ifdef CONFIG_PRINTK > +static struct vm_struct *find_vmap_area_trylock(unsigned long addr) > +{ > + struct vmap_area *va; > + > + if (!spin_trylock(&vmap_area_lock)) > + return NULL; > + va = __find_vmap_area(addr, &vmap_area_root); > + spin_unlock(&vmap_area_lock); > + > + return va ? va->vm : NULL; > +} > + > bool vmalloc_dump_obj(void *object) > { > struct vm_struct *vm; > void *objp = (void *)PAGE_ALIGN((unsigned long)object); > > - vm = find_vm_area(objp); > + vm = find_vmap_area_trylock((unsigned long)objp); > if (!vm) > return false; > pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n", > > > If double call_rcu() appears in different threads at the same time, > It is possible to dump only one of the vmalloc information. Great to hear and thanks for checking! So what I'll do is create 2 patches -- one for the vmalloc change, and one for the patch you submitted to use the vmalloc change. For the second patch, I'll keep your authorship for attribution. I'll also queue it on my staging trees for further testing. Let me know if that does not sound okay. Today is a lighter work day for me, but allow me 1-2 days for sending this out for review. Thanks, - Joel
> > On Tue, Aug 22, 2023 at 10:27 PM Z qiang <qiang.zhang1211@gmail.com> wrote: > > > > > > > > On Thu, Aug 17, 2023 at 2:34 AM Zqiang <qiang.zhang1211@gmail.com> wrote: > > > > > > > > Currently, for double invoke call_rcu(), will dump rcu_head objects > > > > memory info, if the objects is not allocated from the slab allocator, > > > > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock > > > > need to be held, since the call_rcu() can be invoked in interrupt context, > > > > therefore, there is a possibility of spinlock deadlock scenarios. > [..] > > > > - > > > > - if (virt_addr_valid(object)) > > > > + if (is_vmalloc_addr(object)) > > > > + type = "vmalloc memory"; > > > > + else if (virt_addr_valid(object)) > > > > type = "non-slab/vmalloc memory"; > > > > else if (object == NULL) > > > > type = "NULL pointer"; > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > > > index 93cf99aba335..224af955bcb5 100644 > > > > --- a/mm/vmalloc.c > > > > +++ b/mm/vmalloc.c > > > > @@ -4271,20 +4271,6 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) > > > > } > > > > #endif /* CONFIG_SMP */ > > > > > > > > -#ifdef CONFIG_PRINTK > > > > -bool vmalloc_dump_obj(void *object) > > > > -{ > > > > - struct vm_struct *vm; > > > > - void *objp = (void *)PAGE_ALIGN((unsigned long)object); > > > > - > > > > - vm = find_vm_area(objp); > > > > > > What about adding something like the following to vmalloc.c. Would it work? > > > > > > +static struct vmap_area *find_vmap_area_trylock(unsigned long addr) > > > +{ > > > + struct vmap_area *va; > > > + > > > + if (!spin_trylock(&vmap_area_lock)) > > > + return NULL; > > > + va = __find_vmap_area(addr); > > > + spin_unlock(&vmap_area_lock); > > > + > > > + return va; > > > +} > > > > > > and call it from vmalloc_dump_obj(): > > > > > > vm = find_vm_area_trylock(objp); > > > > > > And then keep the rest of your patch as it is but add back the call to > > > vmalloc_dump_obj() in mem_dump_obj(): > > > > > > if (vmalloc_dump_obj(object)) > > > return; > > > > > > - if (virt_addr_valid(object)) > > > + if (is_vmalloc_addr(object)) > > > + type = "vmalloc memory"; > > > + else if (virt_addr_valid(object)) > > > type = "non-slab/vmalloc memory"; > > > > > > > > > If that still throws "sleeping function called" type warnings, then > > > that is a preempt RT issue because spin_trylock() on conversion to a > > > sleeping lock should not be sleeping as it is uncontended. > > > > > > > Hello, Joel > > > > I did a test in the RT kernel with the following modifications, and > > didn't trigger > > sleeping function called warning. > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index 93cf99aba335..e07b0e34249b 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -4272,12 +4272,24 @@ void pcpu_free_vm_areas(struct vm_struct > > **vms, int nr_vms) > > #endif /* CONFIG_SMP */ > > > > #ifdef CONFIG_PRINTK > > +static struct vm_struct *find_vmap_area_trylock(unsigned long addr) > > +{ > > + struct vmap_area *va; > > + > > + if (!spin_trylock(&vmap_area_lock)) > > + return NULL; > > + va = __find_vmap_area(addr, &vmap_area_root); > > + spin_unlock(&vmap_area_lock); > > + > > + return va ? va->vm : NULL; > > +} > > + > > bool vmalloc_dump_obj(void *object) > > { > > struct vm_struct *vm; > > void *objp = (void *)PAGE_ALIGN((unsigned long)object); > > > > - vm = find_vm_area(objp); > > + vm = find_vmap_area_trylock((unsigned long)objp); > > if (!vm) > > return false; > > pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n", > > > > > > If double call_rcu() appears in different threads at the same time, > > It is possible to dump only one of the vmalloc information. > > Great to hear and thanks for checking! So what I'll do is create 2 > patches -- one for the vmalloc change, and one for the patch you > submitted to use the vmalloc change. For the second patch, I'll keep > your authorship for attribution. I'll also queue it on my staging > trees for further testing. Let me know if that does not sound okay. > Thanks, Joel. It's okay :) > Today is a lighter work day for me, but allow me 1-2 days for sending > this out for review. Thanks, > > - Joel
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index c720be70c8dd..d7c4e112a4ad 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -289,10 +289,5 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) int register_vmap_purge_notifier(struct notifier_block *nb); int unregister_vmap_purge_notifier(struct notifier_block *nb); -#if defined(CONFIG_MMU) && defined(CONFIG_PRINTK) -bool vmalloc_dump_obj(void *object); -#else -static inline bool vmalloc_dump_obj(void *object) { return false; } -#endif #endif /* _LINUX_VMALLOC_H */ diff --git a/mm/util.c b/mm/util.c index ddfbb22dc187..fbc007adc108 100644 --- a/mm/util.c +++ b/mm/util.c @@ -1066,10 +1066,9 @@ void mem_dump_obj(void *object) if (kmem_dump_obj(object)) return; - if (vmalloc_dump_obj(object)) - return; - - if (virt_addr_valid(object)) + if (is_vmalloc_addr(object)) + type = "vmalloc memory"; + else if (virt_addr_valid(object)) type = "non-slab/vmalloc memory"; else if (object == NULL) type = "NULL pointer"; diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 93cf99aba335..224af955bcb5 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -4271,20 +4271,6 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) } #endif /* CONFIG_SMP */ -#ifdef CONFIG_PRINTK -bool vmalloc_dump_obj(void *object) -{ - struct vm_struct *vm; - void *objp = (void *)PAGE_ALIGN((unsigned long)object); - - vm = find_vm_area(objp); - if (!vm) - return false; - pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n", - vm->nr_pages, (unsigned long)vm->addr, vm->caller); - return true; -} -#endif #ifdef CONFIG_PROC_FS static void *s_start(struct seq_file *m, loff_t *pos)
Currently, for double invoke call_rcu(), will dump rcu_head objects memory info, if the objects is not allocated from the slab allocator, the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock need to be held, since the call_rcu() can be invoked in interrupt context, therefore, there is a possibility of spinlock deadlock scenarios. And in Preempt-RT kernel, the rcutorture test also trigger the following lockdep warning: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 RCU nest depth: 1, expected: 1 3 locks held by swapper/0/1: #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0 #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370 #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70 irq event stamp: 565512 hardirqs last enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940 hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370 softirqs last enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170 softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0 Preemption disabled at: [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370 CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.5.0-rc4-rt2-yocto-preempt-rt+ #15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x68/0xb0 dump_stack+0x14/0x20 __might_resched+0x1aa/0x280 ? __pfx_rcu_torture_err_cb+0x10/0x10 rt_spin_lock+0x53/0x130 ? find_vmap_area+0x1f/0x70 find_vmap_area+0x1f/0x70 vmalloc_dump_obj+0x20/0x60 mem_dump_obj+0x22/0x90 __call_rcu_common+0x5bf/0x940 ? debug_smp_processor_id+0x1b/0x30 call_rcu_hurry+0x14/0x20 rcu_torture_init+0x1f82/0x2370 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_init+0x10/0x10 do_one_initcall+0x6c/0x300 ? debug_smp_processor_id+0x1b/0x30 kernel_init_freeable+0x2b9/0x540 ? __pfx_kernel_init+0x10/0x10 kernel_init+0x1f/0x150 ret_from_fork+0x40/0x50 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> The statistics about the source of 'rhp', the kmem_valid_obj() accounts for more than 97.5%, and vmalloc accounts for less than 1%, this statistic comes from leizhen. this commit therefore drop vmalloc_dump_obj() from mem_dump_obj() and only check whether is vmalloc address. Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> --- include/linux/vmalloc.h | 5 ----- mm/util.c | 7 +++---- mm/vmalloc.c | 14 -------------- 3 files changed, 3 insertions(+), 23 deletions(-)