diff mbox series

audit: fix possible soft lockup in __audit_inode_child()

Message ID 20230805023934.347828-1-cuigaosheng1@huawei.com (mailing list archive)
State Changes Requested
Delegated to: Paul Moore
Headers show
Series audit: fix possible soft lockup in __audit_inode_child() | expand

Commit Message

cuigaosheng Aug. 5, 2023, 2:39 a.m. UTC
Tracefs or debugfs maybe cause hundreds to thousands of PATH records,
too many PATH records maybe cause soft lockup.

For example:
  1. CONFIG_KASAN=y && CONFIG_PREEMPTION=n
  2. auditctl -a exit,always -S open -k key
  3. sysctl -w kernel.watchdog_thresh=5
  4. mkdir /sys/kernel/debug/tracing/instances/test

There may be a soft lockup as follows:
  watchdog: BUG: soft lockup - CPU#45 stuck for 7s! [mkdir:15498]
  Kernel panic - not syncing: softlockup: hung tasks
  Call trace:
   dump_backtrace+0x0/0x30c
   show_stack+0x20/0x30
   dump_stack+0x11c/0x174
   panic+0x27c/0x494
   watchdog_timer_fn+0x2bc/0x390
   __run_hrtimer+0x148/0x4fc
   __hrtimer_run_queues+0x154/0x210
   hrtimer_interrupt+0x2c4/0x760
   arch_timer_handler_phys+0x48/0x60
   handle_percpu_devid_irq+0xe0/0x340
   __handle_domain_irq+0xbc/0x130
   gic_handle_irq+0x78/0x460
   el1_irq+0xb8/0x140
   __audit_inode_child+0x240/0x7bc
   tracefs_create_file+0x1b8/0x2a0
   trace_create_file+0x18/0x50
   event_create_dir+0x204/0x30c
   __trace_add_new_event+0xac/0x100
   event_trace_add_tracer+0xa0/0x130
   trace_array_create_dir+0x60/0x140
   trace_array_create+0x1e0/0x370
   instance_mkdir+0x90/0xd0
   tracefs_syscall_mkdir+0x68/0xa0
   vfs_mkdir+0x21c/0x34c
   do_mkdirat+0x1b4/0x1d4
   __arm64_sys_mkdirat+0x4c/0x60
   el0_svc_common.constprop.0+0xa8/0x240
   do_el0_svc+0x8c/0xc0
   el0_svc+0x20/0x30
   el0_sync_handler+0xb0/0xb4
   el0_sync+0x160/0x180

Therefore, we add cond_resched() to __audit_inode_child() to fix it.

Fixes: 5195d8e217a7 ("audit: dynamically allocate audit_names when not enough space is in the names array")
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
---
 kernel/auditsc.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Paul Moore Aug. 7, 2023, 6:32 p.m. UTC | #1
On Fri, Aug 4, 2023 at 10:39 PM Gaosheng Cui <cuigaosheng1@huawei.com> wrote:
>
> Tracefs or debugfs maybe cause hundreds to thousands of PATH records,
> too many PATH records maybe cause soft lockup.
>
> For example:
>   1. CONFIG_KASAN=y && CONFIG_PREEMPTION=n
>   2. auditctl -a exit,always -S open -k key
>   3. sysctl -w kernel.watchdog_thresh=5
>   4. mkdir /sys/kernel/debug/tracing/instances/test
>
> There may be a soft lockup as follows:
>   watchdog: BUG: soft lockup - CPU#45 stuck for 7s! [mkdir:15498]
>   Kernel panic - not syncing: softlockup: hung tasks
>   Call trace:
>    dump_backtrace+0x0/0x30c
>    show_stack+0x20/0x30
>    dump_stack+0x11c/0x174
>    panic+0x27c/0x494
>    watchdog_timer_fn+0x2bc/0x390
>    __run_hrtimer+0x148/0x4fc
>    __hrtimer_run_queues+0x154/0x210
>    hrtimer_interrupt+0x2c4/0x760
>    arch_timer_handler_phys+0x48/0x60
>    handle_percpu_devid_irq+0xe0/0x340
>    __handle_domain_irq+0xbc/0x130
>    gic_handle_irq+0x78/0x460
>    el1_irq+0xb8/0x140
>    __audit_inode_child+0x240/0x7bc
>    tracefs_create_file+0x1b8/0x2a0
>    trace_create_file+0x18/0x50
>    event_create_dir+0x204/0x30c
>    __trace_add_new_event+0xac/0x100
>    event_trace_add_tracer+0xa0/0x130
>    trace_array_create_dir+0x60/0x140
>    trace_array_create+0x1e0/0x370
>    instance_mkdir+0x90/0xd0
>    tracefs_syscall_mkdir+0x68/0xa0
>    vfs_mkdir+0x21c/0x34c
>    do_mkdirat+0x1b4/0x1d4
>    __arm64_sys_mkdirat+0x4c/0x60
>    el0_svc_common.constprop.0+0xa8/0x240
>    do_el0_svc+0x8c/0xc0
>    el0_svc+0x20/0x30
>    el0_sync_handler+0xb0/0xb4
>    el0_sync+0x160/0x180
>
> Therefore, we add cond_resched() to __audit_inode_child() to fix it.
>
> Fixes: 5195d8e217a7 ("audit: dynamically allocate audit_names when not enough space is in the names array")
> Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
> ---
>  kernel/auditsc.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index addeed3df15d..fce37545644b 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2454,6 +2454,7 @@ void __audit_inode_child(struct inode *parent,
>                         found_parent = n;
>                         break;
>                 }
> +               cond_resched();
>         }
>
>         /* is there a matching child entry? */
> @@ -2473,6 +2474,7 @@ void __audit_inode_child(struct inode *parent,
>                         found_child = n;
>                         break;
>                 }
> +               cond_resched();
>         }
>
>         if (!found_parent) {

Exactly how many PATH entries were there in audit_context::names_list
when you ran into this problem?

I'm not sure we want to jump immediately into putting a cond_resched()
into both loops, especially since tracing/debugfs is a known corner
case and the default kernel.watchdog_thresh value appears to work
without error.  How about adding *one* cond_resched() between the two
loops?
cuigaosheng Aug. 8, 2023, 11:58 a.m. UTC | #2
Thanks for taking time to review this patch!

> Exactly how many PATH entries were there in audit_context::names_list
> when you ran into this problem?

In my test environment, the max length of the names_list is 26094, and the
number of names_list will change with the number of cpus.

> I'm not sure we want to jump immediately into putting a cond_resched()
> into both loops, especially since tracing/debugfs is a known corner
> case and the default kernel.watchdog_thresh value appears to work
> without error.  How about adding *one* cond_resched() between the two
> loops?

The default kernel.watchdog_thresh value is 12, but some scenarios require
that the kernel.watchdog_thresh value be set to 5, so we have this problem,
I enabled kasan to reproduce the problem more easily,I have tested that adding
cond_resched() between the two loops is ok for us, and I think that's enough
for most scenarios.

I will submit a patch v2, thanks again!

On 2023/8/8 2:32, Paul Moore wrote:
> On Fri, Aug 4, 2023 at 10:39 PM Gaosheng Cui <cuigaosheng1@huawei.com> wrote:
>> Tracefs or debugfs maybe cause hundreds to thousands of PATH records,
>> too many PATH records maybe cause soft lockup.
>>
>> For example:
>>    1. CONFIG_KASAN=y && CONFIG_PREEMPTION=n
>>    2. auditctl -a exit,always -S open -k key
>>    3. sysctl -w kernel.watchdog_thresh=5
>>    4. mkdir /sys/kernel/debug/tracing/instances/test
>>
>> There may be a soft lockup as follows:
>>    watchdog: BUG: soft lockup - CPU#45 stuck for 7s! [mkdir:15498]
>>    Kernel panic - not syncing: softlockup: hung tasks
>>    Call trace:
>>     dump_backtrace+0x0/0x30c
>>     show_stack+0x20/0x30
>>     dump_stack+0x11c/0x174
>>     panic+0x27c/0x494
>>     watchdog_timer_fn+0x2bc/0x390
>>     __run_hrtimer+0x148/0x4fc
>>     __hrtimer_run_queues+0x154/0x210
>>     hrtimer_interrupt+0x2c4/0x760
>>     arch_timer_handler_phys+0x48/0x60
>>     handle_percpu_devid_irq+0xe0/0x340
>>     __handle_domain_irq+0xbc/0x130
>>     gic_handle_irq+0x78/0x460
>>     el1_irq+0xb8/0x140
>>     __audit_inode_child+0x240/0x7bc
>>     tracefs_create_file+0x1b8/0x2a0
>>     trace_create_file+0x18/0x50
>>     event_create_dir+0x204/0x30c
>>     __trace_add_new_event+0xac/0x100
>>     event_trace_add_tracer+0xa0/0x130
>>     trace_array_create_dir+0x60/0x140
>>     trace_array_create+0x1e0/0x370
>>     instance_mkdir+0x90/0xd0
>>     tracefs_syscall_mkdir+0x68/0xa0
>>     vfs_mkdir+0x21c/0x34c
>>     do_mkdirat+0x1b4/0x1d4
>>     __arm64_sys_mkdirat+0x4c/0x60
>>     el0_svc_common.constprop.0+0xa8/0x240
>>     do_el0_svc+0x8c/0xc0
>>     el0_svc+0x20/0x30
>>     el0_sync_handler+0xb0/0xb4
>>     el0_sync+0x160/0x180
>>
>> Therefore, we add cond_resched() to __audit_inode_child() to fix it.
>>
>> Fixes: 5195d8e217a7 ("audit: dynamically allocate audit_names when not enough space is in the names array")
>> Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
>> ---
>>   kernel/auditsc.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
>> index addeed3df15d..fce37545644b 100644
>> --- a/kernel/auditsc.c
>> +++ b/kernel/auditsc.c
>> @@ -2454,6 +2454,7 @@ void __audit_inode_child(struct inode *parent,
>>                          found_parent = n;
>>                          break;
>>                  }
>> +               cond_resched();
>>          }
>>
>>          /* is there a matching child entry? */
>> @@ -2473,6 +2474,7 @@ void __audit_inode_child(struct inode *parent,
>>                          found_child = n;
>>                          break;
>>                  }
>> +               cond_resched();
>>          }
>>
>>          if (!found_parent) {
> Exactly how many PATH entries were there in audit_context::names_list
> when you ran into this problem?
>
> I'm not sure we want to jump immediately into putting a cond_resched()
> into both loops, especially since tracing/debugfs is a known corner
> case and the default kernel.watchdog_thresh value appears to work
> without error.  How about adding *one* cond_resched() between the two
> loops?
>
diff mbox series

Patch

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index addeed3df15d..fce37545644b 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -2454,6 +2454,7 @@  void __audit_inode_child(struct inode *parent,
 			found_parent = n;
 			break;
 		}
+		cond_resched();
 	}
 
 	/* is there a matching child entry? */
@@ -2473,6 +2474,7 @@  void __audit_inode_child(struct inode *parent,
 			found_child = n;
 			break;
 		}
+		cond_resched();
 	}
 
 	if (!found_parent) {