diff mbox series

rcu: Drop vmalloc memory info dump when double call_rcu()

Message ID 20230817063429.31454-1-qiang.zhang1211@gmail.com (mailing list archive)
State Accepted
Commit 1cc178e9a20997e3a36ae1ff40800b49525eb3ff
Headers show
Series rcu: Drop vmalloc memory info dump when double call_rcu() | expand

Commit Message

Zqiang Aug. 17, 2023, 6:34 a.m. UTC
Currently, for double invoke call_rcu(), will dump rcu_head objects
memory info, if the objects is not allocated from the slab allocator,
the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
need to be held, since the call_rcu() can be invoked in interrupt context,
therefore, there is a possibility of spinlock deadlock scenarios.

And in Preempt-RT kernel, the rcutorture test also trigger the following
lockdep warning:

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
RCU nest depth: 1, expected: 1
3 locks held by swapper/0/1:
 #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
 #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
 #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
irq event stamp: 565512
hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
Preemption disabled at:
[<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0xb0
 dump_stack+0x14/0x20
 __might_resched+0x1aa/0x280
 ? __pfx_rcu_torture_err_cb+0x10/0x10
 rt_spin_lock+0x53/0x130
 ? find_vmap_area+0x1f/0x70
 find_vmap_area+0x1f/0x70
 vmalloc_dump_obj+0x20/0x60
 mem_dump_obj+0x22/0x90
 __call_rcu_common+0x5bf/0x940
 ? debug_smp_processor_id+0x1b/0x30
 call_rcu_hurry+0x14/0x20
 rcu_torture_init+0x1f82/0x2370
 ? __pfx_rcu_torture_leak_cb+0x10/0x10
 ? __pfx_rcu_torture_leak_cb+0x10/0x10
 ? __pfx_rcu_torture_init+0x10/0x10
 do_one_initcall+0x6c/0x300
 ? debug_smp_processor_id+0x1b/0x30
 kernel_init_freeable+0x2b9/0x540
 ? __pfx_kernel_init+0x10/0x10
 kernel_init+0x1f/0x150
 ret_from_fork+0x40/0x50
 ? __pfx_kernel_init+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>

The statistics about the source of 'rhp', the kmem_valid_obj() accounts
for more than 97.5%, and vmalloc accounts for less than 1%, this statistic
comes from leizhen. this commit therefore drop vmalloc_dump_obj() from
mem_dump_obj() and only check whether is vmalloc address.

Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
---
 include/linux/vmalloc.h |  5 -----
 mm/util.c               |  7 +++----
 mm/vmalloc.c            | 14 --------------
 3 files changed, 3 insertions(+), 23 deletions(-)

Comments

Paul E. McKenney Aug. 22, 2023, 12:49 p.m. UTC | #1
On Thu, Aug 17, 2023 at 02:34:29PM +0800, Zqiang wrote:
> Currently, for double invoke call_rcu(), will dump rcu_head objects
> memory info, if the objects is not allocated from the slab allocator,
> the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> need to be held, since the call_rcu() can be invoked in interrupt context,
> therefore, there is a possibility of spinlock deadlock scenarios.
> 
> And in Preempt-RT kernel, the rcutorture test also trigger the following
> lockdep warning:
> 
> BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> preempt_count: 1, expected: 0
> RCU nest depth: 1, expected: 1
> 3 locks held by swapper/0/1:
>  #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
>  #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
>  #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
> irq event stamp: 565512
> hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
> hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
> softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
> softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
> Preemption disabled at:
> [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
> CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x68/0xb0
>  dump_stack+0x14/0x20
>  __might_resched+0x1aa/0x280
>  ? __pfx_rcu_torture_err_cb+0x10/0x10
>  rt_spin_lock+0x53/0x130
>  ? find_vmap_area+0x1f/0x70
>  find_vmap_area+0x1f/0x70
>  vmalloc_dump_obj+0x20/0x60
>  mem_dump_obj+0x22/0x90
>  __call_rcu_common+0x5bf/0x940
>  ? debug_smp_processor_id+0x1b/0x30
>  call_rcu_hurry+0x14/0x20
>  rcu_torture_init+0x1f82/0x2370
>  ? __pfx_rcu_torture_leak_cb+0x10/0x10
>  ? __pfx_rcu_torture_leak_cb+0x10/0x10
>  ? __pfx_rcu_torture_init+0x10/0x10
>  do_one_initcall+0x6c/0x300
>  ? debug_smp_processor_id+0x1b/0x30
>  kernel_init_freeable+0x2b9/0x540
>  ? __pfx_kernel_init+0x10/0x10
>  kernel_init+0x1f/0x150
>  ret_from_fork+0x40/0x50
>  ? __pfx_kernel_init+0x10/0x10
>  ret_from_fork_asm+0x1b/0x30
>  </TASK>
> 
> The statistics about the source of 'rhp', the kmem_valid_obj() accounts
> for more than 97.5%, and vmalloc accounts for less than 1%, this statistic
> comes from leizhen. this commit therefore drop vmalloc_dump_obj() from
> mem_dump_obj() and only check whether is vmalloc address.
> 
> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>

Hearing no objections, I pulled this into -rcu, but only for testing.

It should either go up via mm or accumulate maintainer acks for me to
take it, as the case may be.

							Thanx, Paul

> ---
>  include/linux/vmalloc.h |  5 -----
>  mm/util.c               |  7 +++----
>  mm/vmalloc.c            | 14 --------------
>  3 files changed, 3 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index c720be70c8dd..d7c4e112a4ad 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -289,10 +289,5 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
>  int register_vmap_purge_notifier(struct notifier_block *nb);
>  int unregister_vmap_purge_notifier(struct notifier_block *nb);
>  
> -#if defined(CONFIG_MMU) && defined(CONFIG_PRINTK)
> -bool vmalloc_dump_obj(void *object);
> -#else
> -static inline bool vmalloc_dump_obj(void *object) { return false; }
> -#endif
>  
>  #endif /* _LINUX_VMALLOC_H */
> diff --git a/mm/util.c b/mm/util.c
> index ddfbb22dc187..fbc007adc108 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1066,10 +1066,9 @@ void mem_dump_obj(void *object)
>  	if (kmem_dump_obj(object))
>  		return;
>  
> -	if (vmalloc_dump_obj(object))
> -		return;
> -
> -	if (virt_addr_valid(object))
> +	if (is_vmalloc_addr(object))
> +		type = "vmalloc memory";
> +	else if (virt_addr_valid(object))
>  		type = "non-slab/vmalloc memory";
>  	else if (object == NULL)
>  		type = "NULL pointer";
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 93cf99aba335..224af955bcb5 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4271,20 +4271,6 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
>  }
>  #endif	/* CONFIG_SMP */
>  
> -#ifdef CONFIG_PRINTK
> -bool vmalloc_dump_obj(void *object)
> -{
> -	struct vm_struct *vm;
> -	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> -
> -	vm = find_vm_area(objp);
> -	if (!vm)
> -		return false;
> -	pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> -		vm->nr_pages, (unsigned long)vm->addr, vm->caller);
> -	return true;
> -}
> -#endif
>  
>  #ifdef CONFIG_PROC_FS
>  static void *s_start(struct seq_file *m, loff_t *pos)
> -- 
> 2.17.1
>
Joel Fernandes Aug. 22, 2023, 7:09 p.m. UTC | #2
On Thu, Aug 17, 2023 at 2:34 AM Zqiang <qiang.zhang1211@gmail.com> wrote:
>
> Currently, for double invoke call_rcu(), will dump rcu_head objects
> memory info, if the objects is not allocated from the slab allocator,
> the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> need to be held, since the call_rcu() can be invoked in interrupt context,
> therefore, there is a possibility of spinlock deadlock scenarios.
>
> And in Preempt-RT kernel, the rcutorture test also trigger the following
> lockdep warning:
>
> BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
[...]
>
> The statistics about the source of 'rhp', the kmem_valid_obj() accounts
> for more than 97.5%, and vmalloc accounts for less than 1%, this statistic
> comes from leizhen. this commit therefore drop vmalloc_dump_obj() from
> mem_dump_obj() and only check whether is vmalloc address.
> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> ---
>  include/linux/vmalloc.h |  5 -----
>  mm/util.c               |  7 +++----
>  mm/vmalloc.c            | 14 --------------
>  3 files changed, 3 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index c720be70c8dd..d7c4e112a4ad 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -289,10 +289,5 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
>  int register_vmap_purge_notifier(struct notifier_block *nb);
>  int unregister_vmap_purge_notifier(struct notifier_block *nb);
>
> -#if defined(CONFIG_MMU) && defined(CONFIG_PRINTK)
> -bool vmalloc_dump_obj(void *object);
> -#else
> -static inline bool vmalloc_dump_obj(void *object) { return false; }
> -#endif
>
>  #endif /* _LINUX_VMALLOC_H */
> diff --git a/mm/util.c b/mm/util.c
> index ddfbb22dc187..fbc007adc108 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1066,10 +1066,9 @@ void mem_dump_obj(void *object)
>         if (kmem_dump_obj(object))
>                 return;
>
> -       if (vmalloc_dump_obj(object))
> -               return;
> -
> -       if (virt_addr_valid(object))
> +       if (is_vmalloc_addr(object))
> +               type = "vmalloc memory";
> +       else if (virt_addr_valid(object))
>                 type = "non-slab/vmalloc memory";
>         else if (object == NULL)
>                 type = "NULL pointer";
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 93cf99aba335..224af955bcb5 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4271,20 +4271,6 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
>  }
>  #endif /* CONFIG_SMP */
>
> -#ifdef CONFIG_PRINTK
> -bool vmalloc_dump_obj(void *object)
> -{
> -       struct vm_struct *vm;
> -       void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> -
> -       vm = find_vm_area(objp);

What about adding something like the following to vmalloc.c. Would it work?

+static struct vmap_area *find_vmap_area_trylock(unsigned long addr)
+{
+       struct vmap_area *va;
+
+       if (!spin_trylock(&vmap_area_lock))
+               return NULL;
+       va = __find_vmap_area(addr);
+       spin_unlock(&vmap_area_lock);
+
+       return va;
+}

and call it from vmalloc_dump_obj():

vm = find_vm_area_trylock(objp);

And then keep the rest of your patch as it is but add back the call to
vmalloc_dump_obj() in mem_dump_obj():

       if (vmalloc_dump_obj(object))
               return;

 -       if (virt_addr_valid(object))
 +       if (is_vmalloc_addr(object))
 +               type = "vmalloc memory";
 +       else if (virt_addr_valid(object))
                 type = "non-slab/vmalloc memory";


If that still throws "sleeping function called" type warnings, then
that is a preempt RT issue because spin_trylock() on conversion to a
sleeping lock should not be sleeping as it is uncontended.

But I could be missing something and I'll look more later. But for now
off to pick up a rental car here...

thanks,

  - Joel
Joel Fernandes Aug. 22, 2023, 7:10 p.m. UTC | #3
On Tue, Aug 22, 2023 at 8:50 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Thu, Aug 17, 2023 at 02:34:29PM +0800, Zqiang wrote:
> > Currently, for double invoke call_rcu(), will dump rcu_head objects
> > memory info, if the objects is not allocated from the slab allocator,
> > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> > need to be held, since the call_rcu() can be invoked in interrupt context,
> > therefore, there is a possibility of spinlock deadlock scenarios.
> >
> > And in Preempt-RT kernel, the rcutorture test also trigger the following
> > lockdep warning:
> >
> > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> > preempt_count: 1, expected: 0
> > RCU nest depth: 1, expected: 1
> > 3 locks held by swapper/0/1:
> >  #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
> >  #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
> >  #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
> > irq event stamp: 565512
> > hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
> > hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
> > softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
> > softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
> > Preemption disabled at:
> > [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
> > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> > Call Trace:
> >  <TASK>
> >  dump_stack_lvl+0x68/0xb0
> >  dump_stack+0x14/0x20
> >  __might_resched+0x1aa/0x280
> >  ? __pfx_rcu_torture_err_cb+0x10/0x10
> >  rt_spin_lock+0x53/0x130
> >  ? find_vmap_area+0x1f/0x70
> >  find_vmap_area+0x1f/0x70
> >  vmalloc_dump_obj+0x20/0x60
> >  mem_dump_obj+0x22/0x90
> >  __call_rcu_common+0x5bf/0x940
> >  ? debug_smp_processor_id+0x1b/0x30
> >  call_rcu_hurry+0x14/0x20
> >  rcu_torture_init+0x1f82/0x2370
> >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> >  ? __pfx_rcu_torture_init+0x10/0x10
> >  do_one_initcall+0x6c/0x300
> >  ? debug_smp_processor_id+0x1b/0x30
> >  kernel_init_freeable+0x2b9/0x540
> >  ? __pfx_kernel_init+0x10/0x10
> >  kernel_init+0x1f/0x150
> >  ret_from_fork+0x40/0x50
> >  ? __pfx_kernel_init+0x10/0x10
> >  ret_from_fork_asm+0x1b/0x30
> >  </TASK>
> >
> > The statistics about the source of 'rhp', the kmem_valid_obj() accounts
> > for more than 97.5%, and vmalloc accounts for less than 1%, this statistic
> > comes from leizhen. this commit therefore drop vmalloc_dump_obj() from
> > mem_dump_obj() and only check whether is vmalloc address.
> >
> > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
>
> Hearing no objections, I pulled this into -rcu, but only for testing.
>
> It should either go up via mm or accumulate maintainer acks for me to
> take it, as the case may be.

I was wondering if we can opportunistically acquire the vmap_area_lock
and keep the debug some of the times (I suggested this idea in another
thread).

Thanks.
Zqiang Aug. 23, 2023, 2:27 a.m. UTC | #4
>
> On Thu, Aug 17, 2023 at 2:34 AM Zqiang <qiang.zhang1211@gmail.com> wrote:
> >
> > Currently, for double invoke call_rcu(), will dump rcu_head objects
> > memory info, if the objects is not allocated from the slab allocator,
> > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> > need to be held, since the call_rcu() can be invoked in interrupt context,
> > therefore, there is a possibility of spinlock deadlock scenarios.
> >
> > And in Preempt-RT kernel, the rcutorture test also trigger the following
> > lockdep warning:
> >
> > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> [...]
> >
> > The statistics about the source of 'rhp', the kmem_valid_obj() accounts
> > for more than 97.5%, and vmalloc accounts for less than 1%, this statistic
> > comes from leizhen. this commit therefore drop vmalloc_dump_obj() from
> > mem_dump_obj() and only check whether is vmalloc address.
> > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> > ---
> >  include/linux/vmalloc.h |  5 -----
> >  mm/util.c               |  7 +++----
> >  mm/vmalloc.c            | 14 --------------
> >  3 files changed, 3 insertions(+), 23 deletions(-)
> >
> > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > index c720be70c8dd..d7c4e112a4ad 100644
> > --- a/include/linux/vmalloc.h
> > +++ b/include/linux/vmalloc.h
> > @@ -289,10 +289,5 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> >  int register_vmap_purge_notifier(struct notifier_block *nb);
> >  int unregister_vmap_purge_notifier(struct notifier_block *nb);
> >
> > -#if defined(CONFIG_MMU) && defined(CONFIG_PRINTK)
> > -bool vmalloc_dump_obj(void *object);
> > -#else
> > -static inline bool vmalloc_dump_obj(void *object) { return false; }
> > -#endif
> >
> >  #endif /* _LINUX_VMALLOC_H */
> > diff --git a/mm/util.c b/mm/util.c
> > index ddfbb22dc187..fbc007adc108 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -1066,10 +1066,9 @@ void mem_dump_obj(void *object)
> >         if (kmem_dump_obj(object))
> >                 return;
> >
> > -       if (vmalloc_dump_obj(object))
> > -               return;
> > -
> > -       if (virt_addr_valid(object))
> > +       if (is_vmalloc_addr(object))
> > +               type = "vmalloc memory";
> > +       else if (virt_addr_valid(object))
> >                 type = "non-slab/vmalloc memory";
> >         else if (object == NULL)
> >                 type = "NULL pointer";
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 93cf99aba335..224af955bcb5 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -4271,20 +4271,6 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> >  }
> >  #endif /* CONFIG_SMP */
> >
> > -#ifdef CONFIG_PRINTK
> > -bool vmalloc_dump_obj(void *object)
> > -{
> > -       struct vm_struct *vm;
> > -       void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > -
> > -       vm = find_vm_area(objp);
>
> What about adding something like the following to vmalloc.c. Would it work?
>
> +static struct vmap_area *find_vmap_area_trylock(unsigned long addr)
> +{
> +       struct vmap_area *va;
> +
> +       if (!spin_trylock(&vmap_area_lock))
> +               return NULL;
> +       va = __find_vmap_area(addr);
> +       spin_unlock(&vmap_area_lock);
> +
> +       return va;
> +}
>
> and call it from vmalloc_dump_obj():
>
> vm = find_vm_area_trylock(objp);
>
> And then keep the rest of your patch as it is but add back the call to
> vmalloc_dump_obj() in mem_dump_obj():
>
>        if (vmalloc_dump_obj(object))
>                return;
>
>  -       if (virt_addr_valid(object))
>  +       if (is_vmalloc_addr(object))
>  +               type = "vmalloc memory";
>  +       else if (virt_addr_valid(object))
>                  type = "non-slab/vmalloc memory";
>
>
> If that still throws "sleeping function called" type warnings, then
> that is a preempt RT issue because spin_trylock() on conversion to a
> sleeping lock should not be sleeping as it is uncontended.
>

Hello, Joel

I did a test in the RT kernel with the following modifications, and
didn't trigger
sleeping function called warning.


diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 93cf99aba335..e07b0e34249b 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4272,12 +4272,24 @@ void pcpu_free_vm_areas(struct vm_struct
**vms, int nr_vms)
 #endif /* CONFIG_SMP */

 #ifdef CONFIG_PRINTK
+static struct vm_struct *find_vmap_area_trylock(unsigned long addr)
+{
+       struct vmap_area *va;
+
+       if (!spin_trylock(&vmap_area_lock))
+               return NULL;
+       va = __find_vmap_area(addr, &vmap_area_root);
+       spin_unlock(&vmap_area_lock);
+
+       return va ? va->vm : NULL;
+}
+
 bool vmalloc_dump_obj(void *object)
 {
        struct vm_struct *vm;
        void *objp = (void *)PAGE_ALIGN((unsigned long)object);

-       vm = find_vm_area(objp);
+       vm = find_vmap_area_trylock((unsigned long)objp);
        if (!vm)
                return false;
        pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",


If double call_rcu() appears in different threads at the same time,
It is possible to dump only one of the vmalloc information.


Thanks
Zqiang


>
> But I could be missing something and I'll look more later. But for now
> off to pick up a rental car here...
>
> thanks,
>
>   - Joel
Joel Fernandes Aug. 23, 2023, 1:20 p.m. UTC | #5
On Tue, Aug 22, 2023 at 10:27 PM Z qiang <qiang.zhang1211@gmail.com> wrote:
>
> >
> > On Thu, Aug 17, 2023 at 2:34 AM Zqiang <qiang.zhang1211@gmail.com> wrote:
> > >
> > > Currently, for double invoke call_rcu(), will dump rcu_head objects
> > > memory info, if the objects is not allocated from the slab allocator,
> > > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> > > need to be held, since the call_rcu() can be invoked in interrupt context,
> > > therefore, there is a possibility of spinlock deadlock scenarios.
[..]
> > > -
> > > -       if (virt_addr_valid(object))
> > > +       if (is_vmalloc_addr(object))
> > > +               type = "vmalloc memory";
> > > +       else if (virt_addr_valid(object))
> > >                 type = "non-slab/vmalloc memory";
> > >         else if (object == NULL)
> > >                 type = "NULL pointer";
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 93cf99aba335..224af955bcb5 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -4271,20 +4271,6 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > >  }
> > >  #endif /* CONFIG_SMP */
> > >
> > > -#ifdef CONFIG_PRINTK
> > > -bool vmalloc_dump_obj(void *object)
> > > -{
> > > -       struct vm_struct *vm;
> > > -       void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > > -
> > > -       vm = find_vm_area(objp);
> >
> > What about adding something like the following to vmalloc.c. Would it work?
> >
> > +static struct vmap_area *find_vmap_area_trylock(unsigned long addr)
> > +{
> > +       struct vmap_area *va;
> > +
> > +       if (!spin_trylock(&vmap_area_lock))
> > +               return NULL;
> > +       va = __find_vmap_area(addr);
> > +       spin_unlock(&vmap_area_lock);
> > +
> > +       return va;
> > +}
> >
> > and call it from vmalloc_dump_obj():
> >
> > vm = find_vm_area_trylock(objp);
> >
> > And then keep the rest of your patch as it is but add back the call to
> > vmalloc_dump_obj() in mem_dump_obj():
> >
> >        if (vmalloc_dump_obj(object))
> >                return;
> >
> >  -       if (virt_addr_valid(object))
> >  +       if (is_vmalloc_addr(object))
> >  +               type = "vmalloc memory";
> >  +       else if (virt_addr_valid(object))
> >                  type = "non-slab/vmalloc memory";
> >
> >
> > If that still throws "sleeping function called" type warnings, then
> > that is a preempt RT issue because spin_trylock() on conversion to a
> > sleeping lock should not be sleeping as it is uncontended.
> >
>
> Hello, Joel
>
> I did a test in the RT kernel with the following modifications, and
> didn't trigger
> sleeping function called warning.
>
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 93cf99aba335..e07b0e34249b 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4272,12 +4272,24 @@ void pcpu_free_vm_areas(struct vm_struct
> **vms, int nr_vms)
>  #endif /* CONFIG_SMP */
>
>  #ifdef CONFIG_PRINTK
> +static struct vm_struct *find_vmap_area_trylock(unsigned long addr)
> +{
> +       struct vmap_area *va;
> +
> +       if (!spin_trylock(&vmap_area_lock))
> +               return NULL;
> +       va = __find_vmap_area(addr, &vmap_area_root);
> +       spin_unlock(&vmap_area_lock);
> +
> +       return va ? va->vm : NULL;
> +}
> +
>  bool vmalloc_dump_obj(void *object)
>  {
>         struct vm_struct *vm;
>         void *objp = (void *)PAGE_ALIGN((unsigned long)object);
>
> -       vm = find_vm_area(objp);
> +       vm = find_vmap_area_trylock((unsigned long)objp);
>         if (!vm)
>                 return false;
>         pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
>
>
> If double call_rcu() appears in different threads at the same time,
> It is possible to dump only one of the vmalloc information.

Great to hear and thanks for checking! So what I'll do is create 2
patches -- one for the vmalloc change, and one for the patch you
submitted to use the vmalloc change. For the second patch, I'll keep
your authorship for attribution.  I'll also queue it on my staging
trees for further testing. Let me know if that does not sound okay.

Today is a lighter work day for me, but allow me 1-2 days for sending
this out for review. Thanks,

 - Joel
Zqiang Aug. 24, 2023, 2:09 a.m. UTC | #6
>
> On Tue, Aug 22, 2023 at 10:27 PM Z qiang <qiang.zhang1211@gmail.com> wrote:
> >
> > >
> > > On Thu, Aug 17, 2023 at 2:34 AM Zqiang <qiang.zhang1211@gmail.com> wrote:
> > > >
> > > > Currently, for double invoke call_rcu(), will dump rcu_head objects
> > > > memory info, if the objects is not allocated from the slab allocator,
> > > > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> > > > need to be held, since the call_rcu() can be invoked in interrupt context,
> > > > therefore, there is a possibility of spinlock deadlock scenarios.
> [..]
> > > > -
> > > > -       if (virt_addr_valid(object))
> > > > +       if (is_vmalloc_addr(object))
> > > > +               type = "vmalloc memory";
> > > > +       else if (virt_addr_valid(object))
> > > >                 type = "non-slab/vmalloc memory";
> > > >         else if (object == NULL)
> > > >                 type = "NULL pointer";
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 93cf99aba335..224af955bcb5 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -4271,20 +4271,6 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > > >  }
> > > >  #endif /* CONFIG_SMP */
> > > >
> > > > -#ifdef CONFIG_PRINTK
> > > > -bool vmalloc_dump_obj(void *object)
> > > > -{
> > > > -       struct vm_struct *vm;
> > > > -       void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > > > -
> > > > -       vm = find_vm_area(objp);
> > >
> > > What about adding something like the following to vmalloc.c. Would it work?
> > >
> > > +static struct vmap_area *find_vmap_area_trylock(unsigned long addr)
> > > +{
> > > +       struct vmap_area *va;
> > > +
> > > +       if (!spin_trylock(&vmap_area_lock))
> > > +               return NULL;
> > > +       va = __find_vmap_area(addr);
> > > +       spin_unlock(&vmap_area_lock);
> > > +
> > > +       return va;
> > > +}
> > >
> > > and call it from vmalloc_dump_obj():
> > >
> > > vm = find_vm_area_trylock(objp);
> > >
> > > And then keep the rest of your patch as it is but add back the call to
> > > vmalloc_dump_obj() in mem_dump_obj():
> > >
> > >        if (vmalloc_dump_obj(object))
> > >                return;
> > >
> > >  -       if (virt_addr_valid(object))
> > >  +       if (is_vmalloc_addr(object))
> > >  +               type = "vmalloc memory";
> > >  +       else if (virt_addr_valid(object))
> > >                  type = "non-slab/vmalloc memory";
> > >
> > >
> > > If that still throws "sleeping function called" type warnings, then
> > > that is a preempt RT issue because spin_trylock() on conversion to a
> > > sleeping lock should not be sleeping as it is uncontended.
> > >
> >
> > Hello, Joel
> >
> > I did a test in the RT kernel with the following modifications, and
> > didn't trigger
> > sleeping function called warning.
> >
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 93cf99aba335..e07b0e34249b 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -4272,12 +4272,24 @@ void pcpu_free_vm_areas(struct vm_struct
> > **vms, int nr_vms)
> >  #endif /* CONFIG_SMP */
> >
> >  #ifdef CONFIG_PRINTK
> > +static struct vm_struct *find_vmap_area_trylock(unsigned long addr)
> > +{
> > +       struct vmap_area *va;
> > +
> > +       if (!spin_trylock(&vmap_area_lock))
> > +               return NULL;
> > +       va = __find_vmap_area(addr, &vmap_area_root);
> > +       spin_unlock(&vmap_area_lock);
> > +
> > +       return va ? va->vm : NULL;
> > +}
> > +
> >  bool vmalloc_dump_obj(void *object)
> >  {
> >         struct vm_struct *vm;
> >         void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> >
> > -       vm = find_vm_area(objp);
> > +       vm = find_vmap_area_trylock((unsigned long)objp);
> >         if (!vm)
> >                 return false;
> >         pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> >
> >
> > If double call_rcu() appears in different threads at the same time,
> > It is possible to dump only one of the vmalloc information.
>
> Great to hear and thanks for checking! So what I'll do is create 2
> patches -- one for the vmalloc change, and one for the patch you
> submitted to use the vmalloc change. For the second patch, I'll keep
> your authorship for attribution.  I'll also queue it on my staging
> trees for further testing. Let me know if that does not sound okay.
>


Thanks, Joel. It's okay :)


> Today is a lighter work day for me, but allow me 1-2 days for sending
> this out for review. Thanks,
>
>  - Joel
diff mbox series

Patch

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index c720be70c8dd..d7c4e112a4ad 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -289,10 +289,5 @@  pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 int register_vmap_purge_notifier(struct notifier_block *nb);
 int unregister_vmap_purge_notifier(struct notifier_block *nb);
 
-#if defined(CONFIG_MMU) && defined(CONFIG_PRINTK)
-bool vmalloc_dump_obj(void *object);
-#else
-static inline bool vmalloc_dump_obj(void *object) { return false; }
-#endif
 
 #endif /* _LINUX_VMALLOC_H */
diff --git a/mm/util.c b/mm/util.c
index ddfbb22dc187..fbc007adc108 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1066,10 +1066,9 @@  void mem_dump_obj(void *object)
 	if (kmem_dump_obj(object))
 		return;
 
-	if (vmalloc_dump_obj(object))
-		return;
-
-	if (virt_addr_valid(object))
+	if (is_vmalloc_addr(object))
+		type = "vmalloc memory";
+	else if (virt_addr_valid(object))
 		type = "non-slab/vmalloc memory";
 	else if (object == NULL)
 		type = "NULL pointer";
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 93cf99aba335..224af955bcb5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4271,20 +4271,6 @@  void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 }
 #endif	/* CONFIG_SMP */
 
-#ifdef CONFIG_PRINTK
-bool vmalloc_dump_obj(void *object)
-{
-	struct vm_struct *vm;
-	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
-
-	vm = find_vm_area(objp);
-	if (!vm)
-		return false;
-	pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
-		vm->nr_pages, (unsigned long)vm->addr, vm->caller);
-	return true;
-}
-#endif
 
 #ifdef CONFIG_PROC_FS
 static void *s_start(struct seq_file *m, loff_t *pos)