diff mbox series

[RFC] mm/kmemleak: Fix sleeping function called from invalid context at print message

Message ID 20241115145410.114376-1-acarmina@redhat.com (mailing list archive)
State New
Headers show
Series [RFC] mm/kmemleak: Fix sleeping function called from invalid context at print message | expand

Commit Message

Alessandro Carminati Nov. 15, 2024, 2:54 p.m. UTC
Address a bug in the kernel that triggers a "sleeping function called from
invalid context" warning when /sys/kernel/debug/kmemleak is printed under
specific conditions:
- CONFIG_PREEMPT_RT=y
- Set SELinux as the LSM for the system
- Set kptr_restrict to 1
- kmemleak buffer contains at least one item
Ensure the kmemleak buffer contains at least one item
Commit 8c96f1bc6fc49c724c4cdd22d3e99260263b7384 ("mm/kmemleak: turn
kmemleak_lock and object->lock to raw_spinlock_t") introduced a change
where kmemleak_seq_show is executed in atomic context within the RT kernel.
However, the SELinux capability check in this function flow still relies on
regular spinlocks, leading to potential race conditions that trigger an
error when printing the kmemleak backtrace.
Move the backtrace printing out of the critical section. Use a stack
variable to store the backtrace pointers, avoiding spinlocks in the atomic
context.
Implement delta encoding to minimize the stack memory footprint,
addressing the potentially large memory demands for storing these pointers
on 64-bit systems.

Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
---
```
[  159.247069] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[  159.247193] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat
[  159.247241] preempt_count: 1, expected: 0
[  159.247277] RCU nest depth: 2, expected: 2
[  159.247388] 6 locks held by cat/136:
[  159.247438]  #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb8/0xe30
[  159.248835]  #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at: kmemleak_seq_start+0x34/0x128
[  159.249053]  #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at: kmemleak_seq_show+0x3c/0x1e0
[  159.249127]  #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at: has_ns_capability_noaudit+0x8/0x1b0
[  159.249205]  #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at: avc_compute_av+0xc4/0x3d0
[  159.249364] irq event stamp: 136660
[  159.249407] hardirqs last  enabled at (136659): [<ffffafe6a80fd7a0>] _raw_spin_unlock_irqrestore+0xa8/0xd8
[  159.249465] hardirqs last disabled at (136660): [<ffffafe6a80fd85c>] _raw_spin_lock_irqsave+0x8c/0xb0
[  159.249518] softirqs last  enabled at (0): [<ffffafe6a5d50b28>] copy_process+0x11d8/0x3df8
[  159.249571] softirqs last disabled at (0): [<0000000000000000>] 0x0
[  159.249970] Preemption disabled at:
[  159.249988] [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0
[  159.250609] CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G            E      6.11.0-rt7+ #34
[  159.250797] Tainted: [E]=UNSIGNED_MODULE
[  159.250822] Hardware name: linux,dummy-virt (DT)
[  159.251050] Call trace:
[  159.251079]  dump_backtrace+0xa0/0x128
[  159.251132]  show_stack+0x1c/0x30
[  159.251156]  dump_stack_lvl+0xe8/0x198
[  159.251180]  dump_stack+0x18/0x20
[  159.251227]  rt_spin_lock+0x8c/0x1a8
[  159.251273]  avc_perm_nonode+0xa0/0x150
[  159.251316]  cred_has_capability.isra.0+0x118/0x218
[  159.251340]  selinux_capable+0x50/0x80
[  159.251363]  security_capable+0x7c/0xd0
[  159.251388]  has_ns_capability_noaudit+0x94/0x1b0
[  159.251412]  has_capability_noaudit+0x20/0x30
[  159.251437]  restricted_pointer+0x21c/0x4b0
[  159.251461]  pointer+0x298/0x760
[  159.251482]  vsnprintf+0x330/0xf70
[  159.251504]  seq_printf+0x178/0x218
[  159.251526]  print_unreferenced+0x1a4/0x2d0
[  159.251551]  kmemleak_seq_show+0xd0/0x1e0
[  159.251576]  seq_read_iter+0x354/0xe30
[  159.251599]  seq_read+0x250/0x378
[  159.251622]  full_proxy_read+0xd8/0x148
[  159.251649]  vfs_read+0x190/0x918
[  159.251672]  ksys_read+0xf0/0x1e0
[  159.251693]  __arm64_sys_read+0x70/0xa8
[  159.251716]  invoke_syscall.constprop.0+0xd4/0x1d8
[  159.251767]  el0_svc+0x50/0x158
[  159.251813]  el0t_64_sync+0x17c/0x180
```
I have considered three potential approaches to address this matter:

1. Remove Raw Pointer Printing
The simplest solution is to eliminate raw pointer printing from the report.
This approach involves minimal changes to the kernel code and is 
straightforward to implement.

While I am confident that omitting the raw address would result in
negligible information loss in most scenarios, some may perceive it as a
feature regression. Below is an example of the modification:
```
- warn_or_seq_printf(seq, "    [<%pK>] %pS\n", ptr, ptr);
+ warn_or_seq_printf(seq, "    %pS\n", ptr);
```
This change may be acceptable since the %pS format outputs a hex string
if no kallsyms are available. However, it modifies the original behavior,
and in the kallsyms scenario, the raw pointer would no longer be present.

2. Modify SELinux to Avoid Sleeping Spinlocks
Another option is to alter the SELinux capability check to use
non-sleeping spinlocks.
However, this approach is not advisable. The SELinux capability check is
extensively used across the kernel and is far more critical than the
kmemleak reporting feature.
Adapting it to address this rare issue could unnecessarily introduce
latency across the entire kernel, particularly as kmemleak is rarely used
in production environments.

3. Move Stack Trace Printing Outside the Atomic Section
The third and preferred approach is to move the stack trace printing
outside the atomic section. This would preserve the current functionality
without modifying SELinux.

The primary challenge here lies in making the backtrace pointers available
after exiting the critical section, as they are captured within it.
To address this, the backtrace pointers can be copied to a safe location,
enabling access once the raw_spinlock is released.

Options for Creating a Safe Location for Backtrace Pointers
Several strategies have been considered for storing the backtrace pointers
safely:
* Dynamic Allocation
    * Allocating memory with kmalloc cannot be done within a raw_spinlock
      area. Using GFP_ATOMIC is also infeasible.
    * Since the code that prints the message is inside a loop, executed
      potentially multiple times, it is only within the raw_spinlock 
      section that we can determine whether allocation is needed.
    * Allocating and deallocating memory on every loop iteration would be
      prohibitively expensive.
* Global Data Section
    * In this strategy, the message would be printed after exiting the
      raw_spinlock protected section.
    * However, this approach risks data corruption if another occurrence
      of the issue arises before the first operation completes.
* Per-CPU Data
    * The same concerns as with global data apply here. While data 
      corruption is less likely, it is not impossible.
* Stack
    * Using the stack is the best option since each thread has its own 
      stack, ensuring data isolation. However, the size of the data poses
      a challenge.
    * Exporting a full stack trace pointer list requires considerable space.
      A 32-level stack trace in a 64-bit system would require 256 bytes, 
      which is contrary to best practices for stack size management.

To mitigate this, I propose using delta encoding to store the addresses.
This method reduces the size of each address from 8 bytes to 4 bytes on
64-bit systems. While this introduces some complexity, it significantly
reduces memory usage and allows us to preserve the kmemleak reports in their
current form.
 mm/kmemleak.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 71 insertions(+), 7 deletions(-)

Comments

Thomas Weißschuh Nov. 18, 2024, 2:22 p.m. UTC | #1
On Fri, Nov 15, 2024 at 02:54:10PM +0000, Alessandro Carminati wrote:
> Address a bug in the kernel that triggers a "sleeping function called from
> invalid context" warning when /sys/kernel/debug/kmemleak is printed under
> specific conditions:
> - CONFIG_PREEMPT_RT=y
> - Set SELinux as the LSM for the system
> - Set kptr_restrict to 1
> - kmemleak buffer contains at least one item
> Ensure the kmemleak buffer contains at least one item
> Commit 8c96f1bc6fc49c724c4cdd22d3e99260263b7384 ("mm/kmemleak: turn
> kmemleak_lock and object->lock to raw_spinlock_t") introduced a change
> where kmemleak_seq_show is executed in atomic context within the RT kernel.
> However, the SELinux capability check in this function flow still relies on
> regular spinlocks, leading to potential race conditions that trigger an
> error when printing the kmemleak backtrace.
> Move the backtrace printing out of the critical section. Use a stack
> variable to store the backtrace pointers, avoiding spinlocks in the atomic
> context.
>
> Implement delta encoding to minimize the stack memory footprint,
> addressing the potentially large memory demands for storing these pointers
> on 64-bit systems.

The stacktrace is already stored in the stackdepot.
Shouldn't it be possible to take a reference to the stackdepot entry
inside the critical section and then use that reference outside of the
critical section for printing?

> Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
> ---
> ```
> [  159.247069] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> [  159.247193] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat
> [  159.247241] preempt_count: 1, expected: 0
> [  159.247277] RCU nest depth: 2, expected: 2
> [  159.247388] 6 locks held by cat/136:
> [  159.247438]  #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb8/0xe30
> [  159.248835]  #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at: kmemleak_seq_start+0x34/0x128
> [  159.249053]  #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at: kmemleak_seq_show+0x3c/0x1e0
> [  159.249127]  #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at: has_ns_capability_noaudit+0x8/0x1b0
> [  159.249205]  #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at: avc_compute_av+0xc4/0x3d0
> [  159.249364] irq event stamp: 136660
> [  159.249407] hardirqs last  enabled at (136659): [<ffffafe6a80fd7a0>] _raw_spin_unlock_irqrestore+0xa8/0xd8
> [  159.249465] hardirqs last disabled at (136660): [<ffffafe6a80fd85c>] _raw_spin_lock_irqsave+0x8c/0xb0
> [  159.249518] softirqs last  enabled at (0): [<ffffafe6a5d50b28>] copy_process+0x11d8/0x3df8
> [  159.249571] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [  159.249970] Preemption disabled at:
> [  159.249988] [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0
> [  159.250609] CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G            E      6.11.0-rt7+ #34
> [  159.250797] Tainted: [E]=UNSIGNED_MODULE
> [  159.250822] Hardware name: linux,dummy-virt (DT)
> [  159.251050] Call trace:
> [  159.251079]  dump_backtrace+0xa0/0x128
> [  159.251132]  show_stack+0x1c/0x30
> [  159.251156]  dump_stack_lvl+0xe8/0x198
> [  159.251180]  dump_stack+0x18/0x20
> [  159.251227]  rt_spin_lock+0x8c/0x1a8
> [  159.251273]  avc_perm_nonode+0xa0/0x150
> [  159.251316]  cred_has_capability.isra.0+0x118/0x218
> [  159.251340]  selinux_capable+0x50/0x80
> [  159.251363]  security_capable+0x7c/0xd0
> [  159.251388]  has_ns_capability_noaudit+0x94/0x1b0
> [  159.251412]  has_capability_noaudit+0x20/0x30
> [  159.251437]  restricted_pointer+0x21c/0x4b0
> [  159.251461]  pointer+0x298/0x760
> [  159.251482]  vsnprintf+0x330/0xf70
> [  159.251504]  seq_printf+0x178/0x218
> [  159.251526]  print_unreferenced+0x1a4/0x2d0
> [  159.251551]  kmemleak_seq_show+0xd0/0x1e0
> [  159.251576]  seq_read_iter+0x354/0xe30
> [  159.251599]  seq_read+0x250/0x378
> [  159.251622]  full_proxy_read+0xd8/0x148
> [  159.251649]  vfs_read+0x190/0x918
> [  159.251672]  ksys_read+0xf0/0x1e0
> [  159.251693]  __arm64_sys_read+0x70/0xa8
> [  159.251716]  invoke_syscall.constprop.0+0xd4/0x1d8
> [  159.251767]  el0_svc+0x50/0x158
> [  159.251813]  el0t_64_sync+0x17c/0x180
> ```
> I have considered three potential approaches to address this matter:
> 
> 1. Remove Raw Pointer Printing
> The simplest solution is to eliminate raw pointer printing from the report.
> This approach involves minimal changes to the kernel code and is 
> straightforward to implement.
> 
> While I am confident that omitting the raw address would result in
> negligible information loss in most scenarios, some may perceive it as a
> feature regression. Below is an example of the modification:
> ```
> - warn_or_seq_printf(seq, "    [<%pK>] %pS\n", ptr, ptr);
> + warn_or_seq_printf(seq, "    %pS\n", ptr);
> ```
> This change may be acceptable since the %pS format outputs a hex string
> if no kallsyms are available. However, it modifies the original behavior,
> and in the kallsyms scenario, the raw pointer would no longer be present.
> 
> 2. Modify SELinux to Avoid Sleeping Spinlocks
> Another option is to alter the SELinux capability check to use
> non-sleeping spinlocks.
> However, this approach is not advisable. The SELinux capability check is
> extensively used across the kernel and is far more critical than the
> kmemleak reporting feature.
> Adapting it to address this rare issue could unnecessarily introduce
> latency across the entire kernel, particularly as kmemleak is rarely used
> in production environments.
> 
> 3. Move Stack Trace Printing Outside the Atomic Section
> The third and preferred approach is to move the stack trace printing
> outside the atomic section. This would preserve the current functionality
> without modifying SELinux.
> 
> The primary challenge here lies in making the backtrace pointers available
> after exiting the critical section, as they are captured within it.
> To address this, the backtrace pointers can be copied to a safe location,
> enabling access once the raw_spinlock is released.
> 
> Options for Creating a Safe Location for Backtrace Pointers
> Several strategies have been considered for storing the backtrace pointers
> safely:
> * Dynamic Allocation
>     * Allocating memory with kmalloc cannot be done within a raw_spinlock
>       area. Using GFP_ATOMIC is also infeasible.
>     * Since the code that prints the message is inside a loop, executed
>       potentially multiple times, it is only within the raw_spinlock 
>       section that we can determine whether allocation is needed.
>     * Allocating and deallocating memory on every loop iteration would be
>       prohibitively expensive.
> * Global Data Section
>     * In this strategy, the message would be printed after exiting the
>       raw_spinlock protected section.
>     * However, this approach risks data corruption if another occurrence
>       of the issue arises before the first operation completes.
> * Per-CPU Data
>     * The same concerns as with global data apply here. While data 
>       corruption is less likely, it is not impossible.
> * Stack
>     * Using the stack is the best option since each thread has its own 
>       stack, ensuring data isolation. However, the size of the data poses
>       a challenge.
>     * Exporting a full stack trace pointer list requires considerable space.
>       A 32-level stack trace in a 64-bit system would require 256 bytes, 
>       which is contrary to best practices for stack size management.
> 
> To mitigate this, I propose using delta encoding to store the addresses.
> This method reduces the size of each address from 8 bytes to 4 bytes on
> 64-bit systems. While this introduces some complexity, it significantly
> reduces memory usage and allows us to preserve the kmemleak reports in their
> current form.
>  mm/kmemleak.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 71 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index 0400f5e8ac60..fc5869e09280 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -274,6 +274,44 @@ static void kmemleak_disable(void);
>  		pr_warn(fmt, ##__VA_ARGS__);		\
>  } while (0)
>  
> +#define PTR_STORAGE_OP_OK	-1
> +#define PTR_STORAGE_OP_FAIL	0
> +#define PTR_STORAGE_CAPACITY	32
> +
> +struct ptr_storage {
> +	unsigned long	base;
> +	u32		data[PTR_STORAGE_CAPACITY];
> +	int		nr_entries;
> +};
> +
> +static int ptr_storage_insert(unsigned long p, struct ptr_storage *s)
> +{
> +	unsigned long diff_data;
> +
> +	if (s->nr_entries != 0) {
> +		diff_data = s->base - p;
> +		if (s->nr_entries < PTR_STORAGE_CAPACITY) {
> +			s->data[((s->nr_entries - 1))] = diff_data & 0xffffffff;
> +			s->nr_entries++;
> +			return PTR_STORAGE_OP_OK;
> +		}
> +		return PTR_STORAGE_OP_FAIL;
> +	}
> +	s->base = p;
> +	s->nr_entries++;
> +	return PTR_STORAGE_OP_OK;
> +}
> +
> +static void *ptr_storage_get(struct ptr_storage *s, int item_no)
> +{
> +	if (item_no < s->nr_entries && item_no > 0)
> +		return (void *)s->base - (s32)s->data[(item_no - 1)];
> +
> +	if (item_no == 0)
> +		return (void *)s->base;
> +	return NULL;
> +}
> +
>  static void warn_or_seq_hex_dump(struct seq_file *seq, int prefix_type,
>  				 int rowsize, int groupsize, const void *buf,
>  				 size_t len, bool ascii)
> @@ -357,11 +395,13 @@ static bool unreferenced_object(struct kmemleak_object *object)
>   * print_unreferenced function must be called with the object->lock held.
>   */
>  static void print_unreferenced(struct seq_file *seq,
> -			       struct kmemleak_object *object)
> +			       struct kmemleak_object *object,
> +			       struct ptr_storage *s)
>  {
>  	int i;
>  	unsigned long *entries;
>  	unsigned int nr_entries;
> +	unsigned long tmp;
>  
>  	nr_entries = stack_depot_fetch(object->trace_handle, &entries);
>  	warn_or_seq_printf(seq, "unreferenced object 0x%08lx (size %zu):\n",
> @@ -372,8 +412,8 @@ static void print_unreferenced(struct seq_file *seq,
>  	warn_or_seq_printf(seq, "  backtrace (crc %x):\n", object->checksum);
>  
>  	for (i = 0; i < nr_entries; i++) {
> -		void *ptr = (void *)entries[i];
> -		warn_or_seq_printf(seq, "    [<%pK>] %pS\n", ptr, ptr);
> +		tmp = (unsigned long)entries[i];
> +		ptr_storage_insert(tmp, s);
>  	}
>  }
>  
> @@ -1625,6 +1665,10 @@ static void kmemleak_scan(void)
>  	struct zone *zone;
>  	int __maybe_unused i;
>  	int new_leaks = 0;
> +	struct ptr_storage s = {0};
> +	bool do_print = false;
> +	void *tmp;
> +	int inx;
>  
>  	jiffies_last_scan = jiffies;
>  
> @@ -1783,12 +1827,20 @@ static void kmemleak_scan(void)
>  		    !(object->flags & OBJECT_REPORTED)) {
>  			object->flags |= OBJECT_REPORTED;
>  
> -			if (kmemleak_verbose)
> -				print_unreferenced(NULL, object);
> +			if (kmemleak_verbose) {
> +				print_unreferenced(NULL, object, &s);
> +				do_print = true;
> +			}
>  
>  			new_leaks++;
>  		}
>  		raw_spin_unlock_irq(&object->lock);
> +		if (kmemleak_verbose && do_print) {
> +			for (inx = 0; inx < s.nr_entries; inx++) {
> +				tmp = ptr_storage_get(&s, i);
> +				warn_or_seq_printf(NULL, "    [<%pK>] %pS\n", tmp, tmp);
> +			}
> +		}
>  	}
>  	rcu_read_unlock();
>  
> @@ -1939,11 +1991,23 @@ static int kmemleak_seq_show(struct seq_file *seq, void *v)
>  {
>  	struct kmemleak_object *object = v;
>  	unsigned long flags;
> +	struct ptr_storage s = {0};
> +	void *tmp;
> +	int i;
> +	bool do_print = false;
>  
>  	raw_spin_lock_irqsave(&object->lock, flags);
> -	if ((object->flags & OBJECT_REPORTED) && unreferenced_object(object))
> -		print_unreferenced(seq, object);
> +	if ((object->flags & OBJECT_REPORTED) && unreferenced_object(object)) {
> +		print_unreferenced(seq, object, &s);
> +		do_print = true;
> +	}
>  	raw_spin_unlock_irqrestore(&object->lock, flags);
> +	if (do_print) {
> +		for (i = 0; i < s.nr_entries; i++) {
> +			tmp = ptr_storage_get(&s, i);
> +			warn_or_seq_printf(seq, "    [<%pK>] %pS\n", tmp, tmp);
> +		}
> +	}
>  	return 0;
>  }
>  
> -- 
> 2.34.1
>
Alessandro Carminati Nov. 19, 2024, 3:45 p.m. UTC | #2
Hello Thomas,
Thanks for your suggestion.

On Mon, Nov 18, 2024 at 3:29 PM Thomas Weißschuh <
thomas.weissschuh@linutronix.de> wrote:

> On Fri, Nov 15, 2024 at 02:54:10PM +0000, Alessandro Carminati wrote:
> > Address a bug in the kernel that triggers a "sleeping function called
> from
> > invalid context" warning when /sys/kernel/debug/kmemleak is printed under
> > specific conditions:
> > - CONFIG_PREEMPT_RT=y
> > - Set SELinux as the LSM for the system
> > - Set kptr_restrict to 1
> > - kmemleak buffer contains at least one item
> > Ensure the kmemleak buffer contains at least one item
> > Commit 8c96f1bc6fc49c724c4cdd22d3e99260263b7384 ("mm/kmemleak: turn
> > kmemleak_lock and object->lock to raw_spinlock_t") introduced a change
> > where kmemleak_seq_show is executed in atomic context within the RT
> kernel.
> > However, the SELinux capability check in this function flow still relies
> on
> > regular spinlocks, leading to potential race conditions that trigger an
> > error when printing the kmemleak backtrace.
> > Move the backtrace printing out of the critical section. Use a stack
> > variable to store the backtrace pointers, avoiding spinlocks in the
> atomic
> > context.
> >
> > Implement delta encoding to minimize the stack memory footprint,
> > addressing the potentially large memory demands for storing these
> pointers
> > on 64-bit systems.
>
> The stacktrace is already stored in the stackdepot.
> Shouldn't it be possible to take a reference to the stackdepot entry
> inside the critical section and then use that reference outside of the
> critical section for printing?
>

Yes, it is indeed possible to do that.

However, kmemleak operates in such a way that entries can be deleted, for
example, if they are found to be false positives.
My concern was that using a reference to a potentially deleted entry could
cause problems. But after considering your suggestion, I verified that
stack_depot_put, which is used to delete a stack depot entry, does not
appear to be called when a kmemleak object is deleted.
This makes me question whether that is the intended behavior.

As things currently stand, it seems possible to remove all the code I added
to store the trace in the stack and instead directly use the stack_depot
handle.

I would appreciate feedback from kmemleak and stackdepot experts regarding
this approach.


>
> > Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
> > ---
> > ```
> > [  159.247069] BUG: sleeping function called from invalid context at
> kernel/locking/spinlock_rt.c:48
> > [  159.247193] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid:
> 136, name: cat
> > [  159.247241] preempt_count: 1, expected: 0
> > [  159.247277] RCU nest depth: 2, expected: 2
> > [  159.247388] 6 locks held by cat/136:
> > [  159.247438]  #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at:
> seq_read_iter+0xb8/0xe30
> > [  159.248835]  #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at:
> kmemleak_seq_start+0x34/0x128
> > [  159.249053]  #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at:
> kmemleak_seq_show+0x3c/0x1e0
> > [  159.249127]  #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at:
> has_ns_capability_noaudit+0x8/0x1b0
> > [  159.249205]  #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at:
> avc_compute_av+0xc4/0x3d0
> > [  159.249364] irq event stamp: 136660
> > [  159.249407] hardirqs last  enabled at (136659): [<ffffafe6a80fd7a0>]
> _raw_spin_unlock_irqrestore+0xa8/0xd8
> > [  159.249465] hardirqs last disabled at (136660): [<ffffafe6a80fd85c>]
> _raw_spin_lock_irqsave+0x8c/0xb0
> > [  159.249518] softirqs last  enabled at (0): [<ffffafe6a5d50b28>]
> copy_process+0x11d8/0x3df8
> > [  159.249571] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > [  159.249970] Preemption disabled at:
> > [  159.249988] [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0
> > [  159.250609] CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G            E
>     6.11.0-rt7+ #34
> > [  159.250797] Tainted: [E]=UNSIGNED_MODULE
> > [  159.250822] Hardware name: linux,dummy-virt (DT)
> > [  159.251050] Call trace:
> > [  159.251079]  dump_backtrace+0xa0/0x128
> > [  159.251132]  show_stack+0x1c/0x30
> > [  159.251156]  dump_stack_lvl+0xe8/0x198
> > [  159.251180]  dump_stack+0x18/0x20
> > [  159.251227]  rt_spin_lock+0x8c/0x1a8
> > [  159.251273]  avc_perm_nonode+0xa0/0x150
> > [  159.251316]  cred_has_capability.isra.0+0x118/0x218
> > [  159.251340]  selinux_capable+0x50/0x80
> > [  159.251363]  security_capable+0x7c/0xd0
> > [  159.251388]  has_ns_capability_noaudit+0x94/0x1b0
> > [  159.251412]  has_capability_noaudit+0x20/0x30
> > [  159.251437]  restricted_pointer+0x21c/0x4b0
> > [  159.251461]  pointer+0x298/0x760
> > [  159.251482]  vsnprintf+0x330/0xf70
> > [  159.251504]  seq_printf+0x178/0x218
> > [  159.251526]  print_unreferenced+0x1a4/0x2d0
> > [  159.251551]  kmemleak_seq_show+0xd0/0x1e0
> > [  159.251576]  seq_read_iter+0x354/0xe30
> > [  159.251599]  seq_read+0x250/0x378
> > [  159.251622]  full_proxy_read+0xd8/0x148
> > [  159.251649]  vfs_read+0x190/0x918
> > [  159.251672]  ksys_read+0xf0/0x1e0
> > [  159.251693]  __arm64_sys_read+0x70/0xa8
> > [  159.251716]  invoke_syscall.constprop.0+0xd4/0x1d8
> > [  159.251767]  el0_svc+0x50/0x158
> > [  159.251813]  el0t_64_sync+0x17c/0x180
> > ```
> > I have considered three potential approaches to address this matter:
> >
> > 1. Remove Raw Pointer Printing
> > The simplest solution is to eliminate raw pointer printing from the
> report.
> > This approach involves minimal changes to the kernel code and is
> > straightforward to implement.
> >
> > While I am confident that omitting the raw address would result in
> > negligible information loss in most scenarios, some may perceive it as a
> > feature regression. Below is an example of the modification:
> > ```
> > - warn_or_seq_printf(seq, "    [<%pK>] %pS\n", ptr, ptr);
> > + warn_or_seq_printf(seq, "    %pS\n", ptr);
> > ```
> > This change may be acceptable since the %pS format outputs a hex string
> > if no kallsyms are available. However, it modifies the original behavior,
> > and in the kallsyms scenario, the raw pointer would no longer be present.
> >
> > 2. Modify SELinux to Avoid Sleeping Spinlocks
> > Another option is to alter the SELinux capability check to use
> > non-sleeping spinlocks.
> > However, this approach is not advisable. The SELinux capability check is
> > extensively used across the kernel and is far more critical than the
> > kmemleak reporting feature.
> > Adapting it to address this rare issue could unnecessarily introduce
> > latency across the entire kernel, particularly as kmemleak is rarely used
> > in production environments.
> >
> > 3. Move Stack Trace Printing Outside the Atomic Section
> > The third and preferred approach is to move the stack trace printing
> > outside the atomic section. This would preserve the current functionality
> > without modifying SELinux.
> >
> > The primary challenge here lies in making the backtrace pointers
> available
> > after exiting the critical section, as they are captured within it.
> > To address this, the backtrace pointers can be copied to a safe location,
> > enabling access once the raw_spinlock is released.
> >
> > Options for Creating a Safe Location for Backtrace Pointers
> > Several strategies have been considered for storing the backtrace
> pointers
> > safely:
> > * Dynamic Allocation
> >     * Allocating memory with kmalloc cannot be done within a raw_spinlock
> >       area. Using GFP_ATOMIC is also infeasible.
> >     * Since the code that prints the message is inside a loop, executed
> >       potentially multiple times, it is only within the raw_spinlock
> >       section that we can determine whether allocation is needed.
> >     * Allocating and deallocating memory on every loop iteration would be
> >       prohibitively expensive.
> > * Global Data Section
> >     * In this strategy, the message would be printed after exiting the
> >       raw_spinlock protected section.
> >     * However, this approach risks data corruption if another occurrence
> >       of the issue arises before the first operation completes.
> > * Per-CPU Data
> >     * The same concerns as with global data apply here. While data
> >       corruption is less likely, it is not impossible.
> > * Stack
> >     * Using the stack is the best option since each thread has its own
> >       stack, ensuring data isolation. However, the size of the data poses
> >       a challenge.
> >     * Exporting a full stack trace pointer list requires considerable
> space.
> >       A 32-level stack trace in a 64-bit system would require 256 bytes,
> >       which is contrary to best practices for stack size management.
> >
> > To mitigate this, I propose using delta encoding to store the addresses.
> > This method reduces the size of each address from 8 bytes to 4 bytes on
> > 64-bit systems. While this introduces some complexity, it significantly
> > reduces memory usage and allows us to preserve the kmemleak reports in
> their
> > current form.
> >  mm/kmemleak.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 71 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> > index 0400f5e8ac60..fc5869e09280 100644
> > --- a/mm/kmemleak.c
> > +++ b/mm/kmemleak.c
> > @@ -274,6 +274,44 @@ static void kmemleak_disable(void);
> >               pr_warn(fmt, ##__VA_ARGS__);            \
> >  } while (0)
> >
> > +#define PTR_STORAGE_OP_OK    -1
> > +#define PTR_STORAGE_OP_FAIL  0
> > +#define PTR_STORAGE_CAPACITY 32
> > +
> > +struct ptr_storage {
> > +     unsigned long   base;
> > +     u32             data[PTR_STORAGE_CAPACITY];
> > +     int             nr_entries;
> > +};
> > +
> > +static int ptr_storage_insert(unsigned long p, struct ptr_storage *s)
> > +{
> > +     unsigned long diff_data;
> > +
> > +     if (s->nr_entries != 0) {
> > +             diff_data = s->base - p;
> > +             if (s->nr_entries < PTR_STORAGE_CAPACITY) {
> > +                     s->data[((s->nr_entries - 1))] = diff_data &
> 0xffffffff;
> > +                     s->nr_entries++;
> > +                     return PTR_STORAGE_OP_OK;
> > +             }
> > +             return PTR_STORAGE_OP_FAIL;
> > +     }
> > +     s->base = p;
> > +     s->nr_entries++;
> > +     return PTR_STORAGE_OP_OK;
> > +}
> > +
> > +static void *ptr_storage_get(struct ptr_storage *s, int item_no)
> > +{
> > +     if (item_no < s->nr_entries && item_no > 0)
> > +             return (void *)s->base - (s32)s->data[(item_no - 1)];
> > +
> > +     if (item_no == 0)
> > +             return (void *)s->base;
> > +     return NULL;
> > +}
> > +
> >  static void warn_or_seq_hex_dump(struct seq_file *seq, int prefix_type,
> >                                int rowsize, int groupsize, const void
> *buf,
> >                                size_t len, bool ascii)
> > @@ -357,11 +395,13 @@ static bool unreferenced_object(struct
> kmemleak_object *object)
> >   * print_unreferenced function must be called with the object->lock
> held.
> >   */
> >  static void print_unreferenced(struct seq_file *seq,
> > -                            struct kmemleak_object *object)
> > +                            struct kmemleak_object *object,
> > +                            struct ptr_storage *s)
> >  {
> >       int i;
> >       unsigned long *entries;
> >       unsigned int nr_entries;
> > +     unsigned long tmp;
> >
> >       nr_entries = stack_depot_fetch(object->trace_handle, &entries);
> >       warn_or_seq_printf(seq, "unreferenced object 0x%08lx (size
> %zu):\n",
> > @@ -372,8 +412,8 @@ static void print_unreferenced(struct seq_file *seq,
> >       warn_or_seq_printf(seq, "  backtrace (crc %x):\n",
> object->checksum);
> >
> >       for (i = 0; i < nr_entries; i++) {
> > -             void *ptr = (void *)entries[i];
> > -             warn_or_seq_printf(seq, "    [<%pK>] %pS\n", ptr, ptr);
> > +             tmp = (unsigned long)entries[i];
> > +             ptr_storage_insert(tmp, s);
> >       }
> >  }
> >
> > @@ -1625,6 +1665,10 @@ static void kmemleak_scan(void)
> >       struct zone *zone;
> >       int __maybe_unused i;
> >       int new_leaks = 0;
> > +     struct ptr_storage s = {0};
> > +     bool do_print = false;
> > +     void *tmp;
> > +     int inx;
> >
> >       jiffies_last_scan = jiffies;
> >
> > @@ -1783,12 +1827,20 @@ static void kmemleak_scan(void)
> >                   !(object->flags & OBJECT_REPORTED)) {
> >                       object->flags |= OBJECT_REPORTED;
> >
> > -                     if (kmemleak_verbose)
> > -                             print_unreferenced(NULL, object);
> > +                     if (kmemleak_verbose) {
> > +                             print_unreferenced(NULL, object, &s);
> > +                             do_print = true;
> > +                     }
> >
> >                       new_leaks++;
> >               }
> >               raw_spin_unlock_irq(&object->lock);
> > +             if (kmemleak_verbose && do_print) {
> > +                     for (inx = 0; inx < s.nr_entries; inx++) {
> > +                             tmp = ptr_storage_get(&s, i);
> > +                             warn_or_seq_printf(NULL, "    [<%pK>]
> %pS\n", tmp, tmp);
> > +                     }
> > +             }
> >       }
> >       rcu_read_unlock();
> >
> > @@ -1939,11 +1991,23 @@ static int kmemleak_seq_show(struct seq_file
> *seq, void *v)
> >  {
> >       struct kmemleak_object *object = v;
> >       unsigned long flags;
> > +     struct ptr_storage s = {0};
> > +     void *tmp;
> > +     int i;
> > +     bool do_print = false;
> >
> >       raw_spin_lock_irqsave(&object->lock, flags);
> > -     if ((object->flags & OBJECT_REPORTED) &&
> unreferenced_object(object))
> > -             print_unreferenced(seq, object);
> > +     if ((object->flags & OBJECT_REPORTED) &&
> unreferenced_object(object)) {
> > +             print_unreferenced(seq, object, &s);
> > +             do_print = true;
> > +     }
> >       raw_spin_unlock_irqrestore(&object->lock, flags);
> > +     if (do_print) {
> > +             for (i = 0; i < s.nr_entries; i++) {
> > +                     tmp = ptr_storage_get(&s, i);
> > +                     warn_or_seq_printf(seq, "    [<%pK>] %pS\n", tmp,
> tmp);
> > +             }
> > +     }
> >       return 0;
> >  }
> >
> > --
> > 2.34.1
> >
>
>
Thomas Weißschuh Nov. 19, 2024, 4:11 p.m. UTC | #3
Hi Alessandro,

On Tue, Nov 19, 2024 at 04:45:03PM +0100, Alessandro Carminati wrote:
> Hello Thomas,
> Thanks for your suggestion.
> 
> On Mon, Nov 18, 2024 at 3:29 PM Thomas Weißschuh <
> thomas.weissschuh@linutronix.de> wrote:
> 
> > On Fri, Nov 15, 2024 at 02:54:10PM +0000, Alessandro Carminati wrote:
> > > Address a bug in the kernel that triggers a "sleeping function called
> > from
> > > invalid context" warning when /sys/kernel/debug/kmemleak is printed under
> > > specific conditions:
> > > - CONFIG_PREEMPT_RT=y
> > > - Set SELinux as the LSM for the system
> > > - Set kptr_restrict to 1
> > > - kmemleak buffer contains at least one item
> > > Ensure the kmemleak buffer contains at least one item
> > > Commit 8c96f1bc6fc49c724c4cdd22d3e99260263b7384 ("mm/kmemleak: turn
> > > kmemleak_lock and object->lock to raw_spinlock_t") introduced a change
> > > where kmemleak_seq_show is executed in atomic context within the RT
> > kernel.
> > > However, the SELinux capability check in this function flow still relies
> > on
> > > regular spinlocks, leading to potential race conditions that trigger an
> > > error when printing the kmemleak backtrace.
> > > Move the backtrace printing out of the critical section. Use a stack
> > > variable to store the backtrace pointers, avoiding spinlocks in the
> > atomic
> > > context.
> > >
> > > Implement delta encoding to minimize the stack memory footprint,
> > > addressing the potentially large memory demands for storing these
> > pointers
> > > on 64-bit systems.
> >
> > The stacktrace is already stored in the stackdepot.
> > Shouldn't it be possible to take a reference to the stackdepot entry
> > inside the critical section and then use that reference outside of the
> > critical section for printing?
> >
> 
> Yes, it is indeed possible to do that.
> 
> However, kmemleak operates in such a way that entries can be deleted, for
> example, if they are found to be false positives.
> My concern was that using a reference to a potentially deleted entry could
> cause problems. But after considering your suggestion, I verified that
> stack_depot_put, which is used to delete a stack depot entry, does not
> appear to be called when a kmemleak object is deleted.
> This makes me question whether that is the intended behavior.

I also looked a bit more into stackdepot.

It seems that it is intentional that by default no stacks are ever
evicted. If actual refcounting and eviction is desired then
stack_depot_save_flags(STACK_DEPOT_FLAG_GET) and stack_depot_put()
have to be used.
If kmemleak wants to switch to that API, there would be a slight problem
when printing the stacktrace outside of the locked section, as currently
there is no API available to increment the refcount of an existing
depot_stack_handle_t.
But that's not relevant for now anyways and such a function could be
implemented, even now in a roundabout way through
stack_depot_save_flags(stack_depot_fetch(handle)).

> As things currently stand, it seems possible to remove all the code I added
> to store the trace in the stack and instead directly use the stack_depot
> handle.
> 
> I would appreciate feedback from kmemleak and stackdepot experts regarding
> this approach.
> 
> 
> >
> > > Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
> > > ---
> > > ```
> > > [  159.247069] BUG: sleeping function called from invalid context at
> > kernel/locking/spinlock_rt.c:48
> > > [  159.247193] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid:
> > 136, name: cat
> > > [  159.247241] preempt_count: 1, expected: 0
> > > [  159.247277] RCU nest depth: 2, expected: 2
> > > [  159.247388] 6 locks held by cat/136:
> > > [  159.247438]  #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at:
> > seq_read_iter+0xb8/0xe30
> > > [  159.248835]  #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at:
> > kmemleak_seq_start+0x34/0x128
> > > [  159.249053]  #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at:
> > kmemleak_seq_show+0x3c/0x1e0
> > > [  159.249127]  #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at:
> > has_ns_capability_noaudit+0x8/0x1b0
> > > [  159.249205]  #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at:
> > avc_compute_av+0xc4/0x3d0
> > > [  159.249364] irq event stamp: 136660
> > > [  159.249407] hardirqs last  enabled at (136659): [<ffffafe6a80fd7a0>]
> > _raw_spin_unlock_irqrestore+0xa8/0xd8
> > > [  159.249465] hardirqs last disabled at (136660): [<ffffafe6a80fd85c>]
> > _raw_spin_lock_irqsave+0x8c/0xb0
> > > [  159.249518] softirqs last  enabled at (0): [<ffffafe6a5d50b28>]
> > copy_process+0x11d8/0x3df8
> > > [  159.249571] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > > [  159.249970] Preemption disabled at:
> > > [  159.249988] [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0
> > > [  159.250609] CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G            E
> >     6.11.0-rt7+ #34
> > > [  159.250797] Tainted: [E]=UNSIGNED_MODULE
> > > [  159.250822] Hardware name: linux,dummy-virt (DT)
> > > [  159.251050] Call trace:
> > > [  159.251079]  dump_backtrace+0xa0/0x128
> > > [  159.251132]  show_stack+0x1c/0x30
> > > [  159.251156]  dump_stack_lvl+0xe8/0x198
> > > [  159.251180]  dump_stack+0x18/0x20
> > > [  159.251227]  rt_spin_lock+0x8c/0x1a8
> > > [  159.251273]  avc_perm_nonode+0xa0/0x150
> > > [  159.251316]  cred_has_capability.isra.0+0x118/0x218
> > > [  159.251340]  selinux_capable+0x50/0x80
> > > [  159.251363]  security_capable+0x7c/0xd0
> > > [  159.251388]  has_ns_capability_noaudit+0x94/0x1b0
> > > [  159.251412]  has_capability_noaudit+0x20/0x30
> > > [  159.251437]  restricted_pointer+0x21c/0x4b0
> > > [  159.251461]  pointer+0x298/0x760
> > > [  159.251482]  vsnprintf+0x330/0xf70
> > > [  159.251504]  seq_printf+0x178/0x218
> > > [  159.251526]  print_unreferenced+0x1a4/0x2d0
> > > [  159.251551]  kmemleak_seq_show+0xd0/0x1e0
> > > [  159.251576]  seq_read_iter+0x354/0xe30
> > > [  159.251599]  seq_read+0x250/0x378
> > > [  159.251622]  full_proxy_read+0xd8/0x148
> > > [  159.251649]  vfs_read+0x190/0x918
> > > [  159.251672]  ksys_read+0xf0/0x1e0
> > > [  159.251693]  __arm64_sys_read+0x70/0xa8
> > > [  159.251716]  invoke_syscall.constprop.0+0xd4/0x1d8
> > > [  159.251767]  el0_svc+0x50/0x158
> > > [  159.251813]  el0t_64_sync+0x17c/0x180
> > > ```
> > > I have considered three potential approaches to address this matter:
> > >
> > > 1. Remove Raw Pointer Printing
> > > The simplest solution is to eliminate raw pointer printing from the
> > report.
> > > This approach involves minimal changes to the kernel code and is
> > > straightforward to implement.
> > >
> > > While I am confident that omitting the raw address would result in
> > > negligible information loss in most scenarios, some may perceive it as a
> > > feature regression. Below is an example of the modification:
> > > ```
> > > - warn_or_seq_printf(seq, "    [<%pK>] %pS\n", ptr, ptr);
> > > + warn_or_seq_printf(seq, "    %pS\n", ptr);
> > > ```
> > > This change may be acceptable since the %pS format outputs a hex string
> > > if no kallsyms are available. However, it modifies the original behavior,
> > > and in the kallsyms scenario, the raw pointer would no longer be present.
> > >
> > > 2. Modify SELinux to Avoid Sleeping Spinlocks
> > > Another option is to alter the SELinux capability check to use
> > > non-sleeping spinlocks.
> > > However, this approach is not advisable. The SELinux capability check is
> > > extensively used across the kernel and is far more critical than the
> > > kmemleak reporting feature.
> > > Adapting it to address this rare issue could unnecessarily introduce
> > > latency across the entire kernel, particularly as kmemleak is rarely used
> > > in production environments.
> > >
> > > 3. Move Stack Trace Printing Outside the Atomic Section
> > > The third and preferred approach is to move the stack trace printing
> > > outside the atomic section. This would preserve the current functionality
> > > without modifying SELinux.
> > >
> > > The primary challenge here lies in making the backtrace pointers
> > available
> > > after exiting the critical section, as they are captured within it.
> > > To address this, the backtrace pointers can be copied to a safe location,
> > > enabling access once the raw_spinlock is released.
> > >
> > > Options for Creating a Safe Location for Backtrace Pointers
> > > Several strategies have been considered for storing the backtrace
> > pointers
> > > safely:
> > > * Dynamic Allocation
> > >     * Allocating memory with kmalloc cannot be done within a raw_spinlock
> > >       area. Using GFP_ATOMIC is also infeasible.
> > >     * Since the code that prints the message is inside a loop, executed
> > >       potentially multiple times, it is only within the raw_spinlock
> > >       section that we can determine whether allocation is needed.
> > >     * Allocating and deallocating memory on every loop iteration would be
> > >       prohibitively expensive.
> > > * Global Data Section
> > >     * In this strategy, the message would be printed after exiting the
> > >       raw_spinlock protected section.
> > >     * However, this approach risks data corruption if another occurrence
> > >       of the issue arises before the first operation completes.
> > > * Per-CPU Data
> > >     * The same concerns as with global data apply here. While data
> > >       corruption is less likely, it is not impossible.
> > > * Stack
> > >     * Using the stack is the best option since each thread has its own
> > >       stack, ensuring data isolation. However, the size of the data poses
> > >       a challenge.
> > >     * Exporting a full stack trace pointer list requires considerable
> > space.
> > >       A 32-level stack trace in a 64-bit system would require 256 bytes,
> > >       which is contrary to best practices for stack size management.
> > >
> > > To mitigate this, I propose using delta encoding to store the addresses.
> > > This method reduces the size of each address from 8 bytes to 4 bytes on
> > > 64-bit systems. While this introduces some complexity, it significantly
> > > reduces memory usage and allows us to preserve the kmemleak reports in
> > their
> > > current form.
> > >  mm/kmemleak.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 71 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> > > index 0400f5e8ac60..fc5869e09280 100644
> > > --- a/mm/kmemleak.c
> > > +++ b/mm/kmemleak.c
> > > @@ -274,6 +274,44 @@ static void kmemleak_disable(void);
> > >               pr_warn(fmt, ##__VA_ARGS__);            \
> > >  } while (0)
> > >
> > > +#define PTR_STORAGE_OP_OK    -1
> > > +#define PTR_STORAGE_OP_FAIL  0
> > > +#define PTR_STORAGE_CAPACITY 32
> > > +
> > > +struct ptr_storage {
> > > +     unsigned long   base;
> > > +     u32             data[PTR_STORAGE_CAPACITY];
> > > +     int             nr_entries;
> > > +};
> > > +
> > > +static int ptr_storage_insert(unsigned long p, struct ptr_storage *s)
> > > +{
> > > +     unsigned long diff_data;
> > > +
> > > +     if (s->nr_entries != 0) {
> > > +             diff_data = s->base - p;
> > > +             if (s->nr_entries < PTR_STORAGE_CAPACITY) {
> > > +                     s->data[((s->nr_entries - 1))] = diff_data &
> > 0xffffffff;
> > > +                     s->nr_entries++;
> > > +                     return PTR_STORAGE_OP_OK;
> > > +             }
> > > +             return PTR_STORAGE_OP_FAIL;
> > > +     }
> > > +     s->base = p;
> > > +     s->nr_entries++;
> > > +     return PTR_STORAGE_OP_OK;
> > > +}
> > > +
> > > +static void *ptr_storage_get(struct ptr_storage *s, int item_no)
> > > +{
> > > +     if (item_no < s->nr_entries && item_no > 0)
> > > +             return (void *)s->base - (s32)s->data[(item_no - 1)];
> > > +
> > > +     if (item_no == 0)
> > > +             return (void *)s->base;
> > > +     return NULL;
> > > +}
> > > +
> > >  static void warn_or_seq_hex_dump(struct seq_file *seq, int prefix_type,
> > >                                int rowsize, int groupsize, const void
> > *buf,
> > >                                size_t len, bool ascii)
> > > @@ -357,11 +395,13 @@ static bool unreferenced_object(struct
> > kmemleak_object *object)
> > >   * print_unreferenced function must be called with the object->lock
> > held.
> > >   */
> > >  static void print_unreferenced(struct seq_file *seq,
> > > -                            struct kmemleak_object *object)
> > > +                            struct kmemleak_object *object,
> > > +                            struct ptr_storage *s)
> > >  {
> > >       int i;
> > >       unsigned long *entries;
> > >       unsigned int nr_entries;
> > > +     unsigned long tmp;
> > >
> > >       nr_entries = stack_depot_fetch(object->trace_handle, &entries);
> > >       warn_or_seq_printf(seq, "unreferenced object 0x%08lx (size
> > %zu):\n",
> > > @@ -372,8 +412,8 @@ static void print_unreferenced(struct seq_file *seq,
> > >       warn_or_seq_printf(seq, "  backtrace (crc %x):\n",
> > object->checksum);
> > >
> > >       for (i = 0; i < nr_entries; i++) {
> > > -             void *ptr = (void *)entries[i];
> > > -             warn_or_seq_printf(seq, "    [<%pK>] %pS\n", ptr, ptr);
> > > +             tmp = (unsigned long)entries[i];
> > > +             ptr_storage_insert(tmp, s);
> > >       }
> > >  }
> > >
> > > @@ -1625,6 +1665,10 @@ static void kmemleak_scan(void)
> > >       struct zone *zone;
> > >       int __maybe_unused i;
> > >       int new_leaks = 0;
> > > +     struct ptr_storage s = {0};
> > > +     bool do_print = false;
> > > +     void *tmp;
> > > +     int inx;
> > >
> > >       jiffies_last_scan = jiffies;
> > >
> > > @@ -1783,12 +1827,20 @@ static void kmemleak_scan(void)
> > >                   !(object->flags & OBJECT_REPORTED)) {
> > >                       object->flags |= OBJECT_REPORTED;
> > >
> > > -                     if (kmemleak_verbose)
> > > -                             print_unreferenced(NULL, object);
> > > +                     if (kmemleak_verbose) {
> > > +                             print_unreferenced(NULL, object, &s);
> > > +                             do_print = true;
> > > +                     }
> > >
> > >                       new_leaks++;
> > >               }
> > >               raw_spin_unlock_irq(&object->lock);
> > > +             if (kmemleak_verbose && do_print) {
> > > +                     for (inx = 0; inx < s.nr_entries; inx++) {
> > > +                             tmp = ptr_storage_get(&s, i);
> > > +                             warn_or_seq_printf(NULL, "    [<%pK>]
> > %pS\n", tmp, tmp);
> > > +                     }
> > > +             }
> > >       }
> > >       rcu_read_unlock();
> > >
> > > @@ -1939,11 +1991,23 @@ static int kmemleak_seq_show(struct seq_file
> > *seq, void *v)
> > >  {
> > >       struct kmemleak_object *object = v;
> > >       unsigned long flags;
> > > +     struct ptr_storage s = {0};
> > > +     void *tmp;
> > > +     int i;
> > > +     bool do_print = false;
> > >
> > >       raw_spin_lock_irqsave(&object->lock, flags);
> > > -     if ((object->flags & OBJECT_REPORTED) &&
> > unreferenced_object(object))
> > > -             print_unreferenced(seq, object);
> > > +     if ((object->flags & OBJECT_REPORTED) &&
> > unreferenced_object(object)) {
> > > +             print_unreferenced(seq, object, &s);
> > > +             do_print = true;
> > > +     }
> > >       raw_spin_unlock_irqrestore(&object->lock, flags);
> > > +     if (do_print) {
> > > +             for (i = 0; i < s.nr_entries; i++) {
> > > +                     tmp = ptr_storage_get(&s, i);
> > > +                     warn_or_seq_printf(seq, "    [<%pK>] %pS\n", tmp,
> > tmp);
> > > +             }
> > > +     }
> > >       return 0;
> > >  }
> > >
> > > --
> > > 2.34.1
> > >
> >
> >
> 
> -- 
> ---
> 172
diff mbox series

Patch

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 0400f5e8ac60..fc5869e09280 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -274,6 +274,44 @@  static void kmemleak_disable(void);
 		pr_warn(fmt, ##__VA_ARGS__);		\
 } while (0)
 
+#define PTR_STORAGE_OP_OK	-1
+#define PTR_STORAGE_OP_FAIL	0
+#define PTR_STORAGE_CAPACITY	32
+
+struct ptr_storage {
+	unsigned long	base;
+	u32		data[PTR_STORAGE_CAPACITY];
+	int		nr_entries;
+};
+
+static int ptr_storage_insert(unsigned long p, struct ptr_storage *s)
+{
+	unsigned long diff_data;
+
+	if (s->nr_entries != 0) {
+		diff_data = s->base - p;
+		if (s->nr_entries < PTR_STORAGE_CAPACITY) {
+			s->data[((s->nr_entries - 1))] = diff_data & 0xffffffff;
+			s->nr_entries++;
+			return PTR_STORAGE_OP_OK;
+		}
+		return PTR_STORAGE_OP_FAIL;
+	}
+	s->base = p;
+	s->nr_entries++;
+	return PTR_STORAGE_OP_OK;
+}
+
+static void *ptr_storage_get(struct ptr_storage *s, int item_no)
+{
+	if (item_no < s->nr_entries && item_no > 0)
+		return (void *)s->base - (s32)s->data[(item_no - 1)];
+
+	if (item_no == 0)
+		return (void *)s->base;
+	return NULL;
+}
+
 static void warn_or_seq_hex_dump(struct seq_file *seq, int prefix_type,
 				 int rowsize, int groupsize, const void *buf,
 				 size_t len, bool ascii)
@@ -357,11 +395,13 @@  static bool unreferenced_object(struct kmemleak_object *object)
  * print_unreferenced function must be called with the object->lock held.
  */
 static void print_unreferenced(struct seq_file *seq,
-			       struct kmemleak_object *object)
+			       struct kmemleak_object *object,
+			       struct ptr_storage *s)
 {
 	int i;
 	unsigned long *entries;
 	unsigned int nr_entries;
+	unsigned long tmp;
 
 	nr_entries = stack_depot_fetch(object->trace_handle, &entries);
 	warn_or_seq_printf(seq, "unreferenced object 0x%08lx (size %zu):\n",
@@ -372,8 +412,8 @@  static void print_unreferenced(struct seq_file *seq,
 	warn_or_seq_printf(seq, "  backtrace (crc %x):\n", object->checksum);
 
 	for (i = 0; i < nr_entries; i++) {
-		void *ptr = (void *)entries[i];
-		warn_or_seq_printf(seq, "    [<%pK>] %pS\n", ptr, ptr);
+		tmp = (unsigned long)entries[i];
+		ptr_storage_insert(tmp, s);
 	}
 }
 
@@ -1625,6 +1665,10 @@  static void kmemleak_scan(void)
 	struct zone *zone;
 	int __maybe_unused i;
 	int new_leaks = 0;
+	struct ptr_storage s = {0};
+	bool do_print = false;
+	void *tmp;
+	int inx;
 
 	jiffies_last_scan = jiffies;
 
@@ -1783,12 +1827,20 @@  static void kmemleak_scan(void)
 		    !(object->flags & OBJECT_REPORTED)) {
 			object->flags |= OBJECT_REPORTED;
 
-			if (kmemleak_verbose)
-				print_unreferenced(NULL, object);
+			if (kmemleak_verbose) {
+				print_unreferenced(NULL, object, &s);
+				do_print = true;
+			}
 
 			new_leaks++;
 		}
 		raw_spin_unlock_irq(&object->lock);
+		if (kmemleak_verbose && do_print) {
+			for (inx = 0; inx < s.nr_entries; inx++) {
+				tmp = ptr_storage_get(&s, i);
+				warn_or_seq_printf(NULL, "    [<%pK>] %pS\n", tmp, tmp);
+			}
+		}
 	}
 	rcu_read_unlock();
 
@@ -1939,11 +1991,23 @@  static int kmemleak_seq_show(struct seq_file *seq, void *v)
 {
 	struct kmemleak_object *object = v;
 	unsigned long flags;
+	struct ptr_storage s = {0};
+	void *tmp;
+	int i;
+	bool do_print = false;
 
 	raw_spin_lock_irqsave(&object->lock, flags);
-	if ((object->flags & OBJECT_REPORTED) && unreferenced_object(object))
-		print_unreferenced(seq, object);
+	if ((object->flags & OBJECT_REPORTED) && unreferenced_object(object)) {
+		print_unreferenced(seq, object, &s);
+		do_print = true;
+	}
 	raw_spin_unlock_irqrestore(&object->lock, flags);
+	if (do_print) {
+		for (i = 0; i < s.nr_entries; i++) {
+			tmp = ptr_storage_get(&s, i);
+			warn_or_seq_printf(seq, "    [<%pK>] %pS\n", tmp, tmp);
+		}
+	}
 	return 0;
 }