Message ID | 20220803222343.31673-1-yepeilin.cs@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker() | expand |
On Wed, Aug 3, 2022 at 6:24 PM Peilin Ye <yepeilin.cs@gmail.com> wrote: > > From: Peilin Ye <peilin.ye@bytedance.com> > > Currently @audit_context is allocated twice for io_uring workers: > > 1. copy_process() calls audit_alloc(); > 2. io_sq_thread() or io_wqe_worker() calls audit_alloc_kernel() (which > is effectively audit_alloc()) and overwrites @audit_context, > causing: > > BUG: memory leak > unreferenced object 0xffff888144547400 (size 1024): > <...> > hex dump (first 32 bytes): > 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [<ffffffff8135cfc3>] audit_alloc+0x133/0x210 > [<ffffffff81239e63>] copy_process+0xcd3/0x2340 > [<ffffffff8123b5f3>] create_io_thread+0x63/0x90 > [<ffffffff81686604>] create_io_worker+0xb4/0x230 > [<ffffffff81686f68>] io_wqe_enqueue+0x248/0x3b0 > [<ffffffff8167663a>] io_queue_iowq+0xba/0x200 > [<ffffffff816768b3>] io_queue_async+0x113/0x180 > [<ffffffff816840df>] io_req_task_submit+0x18f/0x1a0 > [<ffffffff816841cd>] io_apoll_task_func+0xdd/0x120 > [<ffffffff8167d49f>] tctx_task_work+0x11f/0x570 > [<ffffffff81272c4e>] task_work_run+0x7e/0xc0 > [<ffffffff8125a688>] get_signal+0xc18/0xf10 > [<ffffffff8111645b>] arch_do_signal_or_restart+0x2b/0x730 > [<ffffffff812ea44e>] exit_to_user_mode_prepare+0x5e/0x180 > [<ffffffff844ae1b2>] syscall_exit_to_user_mode+0x12/0x20 > [<ffffffff844a7e80>] do_syscall_64+0x40/0x80 > > Then, > > 3. io_sq_thread() or io_wqe_worker() frees @audit_context using > audit_free(); > 4. do_exit() eventually calls audit_free() again, which is okay > because audit_free() does a NULL check. > > As suggested by Paul Moore, fix it by deleting audit_alloc_kernel() and > redundant audit_free() calls. > > Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring") > Suggested-by: Paul Moore <paul@paul-moore.com> > Cc: stable@vger.kernel.org > Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> > --- > Change since v1: > - Delete audit_alloc_kernel() (Paul Moore) > > fs/io-wq.c | 3 --- > fs/io_uring.c | 4 ---- > include/linux/audit.h | 5 ----- > kernel/auditsc.c | 25 ------------------------- > 4 files changed, 37 deletions(-) This looks good to me, thanks! Although it looks like the io_uring related changes will need to be applied by hand as they are pointing to the old layout under fs/ as opposed to the newer layout in io_uring/ introduced during this merge window. Jens, did you want to take this via the io_uring tree or should I take it via the audit tree? If the latter, an ACK would be appreciated, if the former my ACK is below. Acked-by: Paul Moore <paul@paul-moore.com>
On 8/4/22 7:51 AM, Paul Moore wrote: > On Wed, Aug 3, 2022 at 6:24 PM Peilin Ye <yepeilin.cs@gmail.com> wrote: >> >> From: Peilin Ye <peilin.ye@bytedance.com> >> >> Currently @audit_context is allocated twice for io_uring workers: >> >> 1. copy_process() calls audit_alloc(); >> 2. io_sq_thread() or io_wqe_worker() calls audit_alloc_kernel() (which >> is effectively audit_alloc()) and overwrites @audit_context, >> causing: >> >> BUG: memory leak >> unreferenced object 0xffff888144547400 (size 1024): >> <...> >> hex dump (first 32 bytes): >> 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> backtrace: >> [<ffffffff8135cfc3>] audit_alloc+0x133/0x210 >> [<ffffffff81239e63>] copy_process+0xcd3/0x2340 >> [<ffffffff8123b5f3>] create_io_thread+0x63/0x90 >> [<ffffffff81686604>] create_io_worker+0xb4/0x230 >> [<ffffffff81686f68>] io_wqe_enqueue+0x248/0x3b0 >> [<ffffffff8167663a>] io_queue_iowq+0xba/0x200 >> [<ffffffff816768b3>] io_queue_async+0x113/0x180 >> [<ffffffff816840df>] io_req_task_submit+0x18f/0x1a0 >> [<ffffffff816841cd>] io_apoll_task_func+0xdd/0x120 >> [<ffffffff8167d49f>] tctx_task_work+0x11f/0x570 >> [<ffffffff81272c4e>] task_work_run+0x7e/0xc0 >> [<ffffffff8125a688>] get_signal+0xc18/0xf10 >> [<ffffffff8111645b>] arch_do_signal_or_restart+0x2b/0x730 >> [<ffffffff812ea44e>] exit_to_user_mode_prepare+0x5e/0x180 >> [<ffffffff844ae1b2>] syscall_exit_to_user_mode+0x12/0x20 >> [<ffffffff844a7e80>] do_syscall_64+0x40/0x80 >> >> Then, >> >> 3. io_sq_thread() or io_wqe_worker() frees @audit_context using >> audit_free(); >> 4. do_exit() eventually calls audit_free() again, which is okay >> because audit_free() does a NULL check. >> >> As suggested by Paul Moore, fix it by deleting audit_alloc_kernel() and >> redundant audit_free() calls. >> >> Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring") >> Suggested-by: Paul Moore <paul@paul-moore.com> >> Cc: stable@vger.kernel.org >> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> >> --- >> Change since v1: >> - Delete audit_alloc_kernel() (Paul Moore) >> >> fs/io-wq.c | 3 --- >> fs/io_uring.c | 4 ---- >> include/linux/audit.h | 5 ----- >> kernel/auditsc.c | 25 ------------------------- >> 4 files changed, 37 deletions(-) > > This looks good to me, thanks! Although it looks like the io_uring > related changes will need to be applied by hand as they are pointing > to the old layout under fs/ as opposed to the newer layout in > io_uring/ introduced during this merge window. > > Jens, did you want to take this via the io_uring tree or should I take > it via the audit tree? If the latter, an ACK would be appreciated, if > the former my ACK is below. > > Acked-by: Paul Moore <paul@paul-moore.com> Probably better if I take it, since I need to massage it into the current tree anyway. We can then use this one as the base for the stable backports that are going to be required.
On Wed, 3 Aug 2022 15:23:43 -0700, Peilin Ye wrote: > From: Peilin Ye <peilin.ye@bytedance.com> > > Currently @audit_context is allocated twice for io_uring workers: > > 1. copy_process() calls audit_alloc(); > 2. io_sq_thread() or io_wqe_worker() calls audit_alloc_kernel() (which > is effectively audit_alloc()) and overwrites @audit_context, > causing: > > [...] Applied, thanks! [1/1] audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker() commit: f482aa98652795846cc55da98ebe331eb74f3d0b Best regards,
On Thu, Aug 4, 2022 at 10:32 AM Jens Axboe <axboe@kernel.dk> wrote: > On 8/4/22 7:51 AM, Paul Moore wrote: > > On Wed, Aug 3, 2022 at 6:24 PM Peilin Ye <yepeilin.cs@gmail.com> wrote: > >> > >> From: Peilin Ye <peilin.ye@bytedance.com> > >> > >> Currently @audit_context is allocated twice for io_uring workers: > >> > >> 1. copy_process() calls audit_alloc(); > >> 2. io_sq_thread() or io_wqe_worker() calls audit_alloc_kernel() (which > >> is effectively audit_alloc()) and overwrites @audit_context, > >> causing: > >> > >> BUG: memory leak > >> unreferenced object 0xffff888144547400 (size 1024): > >> <...> > >> hex dump (first 32 bytes): > >> 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > >> backtrace: > >> [<ffffffff8135cfc3>] audit_alloc+0x133/0x210 > >> [<ffffffff81239e63>] copy_process+0xcd3/0x2340 > >> [<ffffffff8123b5f3>] create_io_thread+0x63/0x90 > >> [<ffffffff81686604>] create_io_worker+0xb4/0x230 > >> [<ffffffff81686f68>] io_wqe_enqueue+0x248/0x3b0 > >> [<ffffffff8167663a>] io_queue_iowq+0xba/0x200 > >> [<ffffffff816768b3>] io_queue_async+0x113/0x180 > >> [<ffffffff816840df>] io_req_task_submit+0x18f/0x1a0 > >> [<ffffffff816841cd>] io_apoll_task_func+0xdd/0x120 > >> [<ffffffff8167d49f>] tctx_task_work+0x11f/0x570 > >> [<ffffffff81272c4e>] task_work_run+0x7e/0xc0 > >> [<ffffffff8125a688>] get_signal+0xc18/0xf10 > >> [<ffffffff8111645b>] arch_do_signal_or_restart+0x2b/0x730 > >> [<ffffffff812ea44e>] exit_to_user_mode_prepare+0x5e/0x180 > >> [<ffffffff844ae1b2>] syscall_exit_to_user_mode+0x12/0x20 > >> [<ffffffff844a7e80>] do_syscall_64+0x40/0x80 > >> > >> Then, > >> > >> 3. io_sq_thread() or io_wqe_worker() frees @audit_context using > >> audit_free(); > >> 4. do_exit() eventually calls audit_free() again, which is okay > >> because audit_free() does a NULL check. > >> > >> As suggested by Paul Moore, fix it by deleting audit_alloc_kernel() and > >> redundant audit_free() calls. > >> > >> Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring") > >> Suggested-by: Paul Moore <paul@paul-moore.com> > >> Cc: stable@vger.kernel.org > >> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> > >> --- > >> Change since v1: > >> - Delete audit_alloc_kernel() (Paul Moore) > >> > >> fs/io-wq.c | 3 --- > >> fs/io_uring.c | 4 ---- > >> include/linux/audit.h | 5 ----- > >> kernel/auditsc.c | 25 ------------------------- > >> 4 files changed, 37 deletions(-) > > > > This looks good to me, thanks! Although it looks like the io_uring > > related changes will need to be applied by hand as they are pointing > > to the old layout under fs/ as opposed to the newer layout in > > io_uring/ introduced during this merge window. > > > > Jens, did you want to take this via the io_uring tree or should I take > > it via the audit tree? If the latter, an ACK would be appreciated, if > > the former my ACK is below. > > > > Acked-by: Paul Moore <paul@paul-moore.com> > > Probably better if I take it, since I need to massage it into the > current tree anyway. We can then use this one as the base for the stable > backports that are going to be required. Sounds good to me, thanks everyone.
diff --git a/fs/io-wq.c b/fs/io-wq.c index 824623bcf1a5..4d8a77a2a150 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -634,8 +634,6 @@ static int io_wqe_worker(void *data) snprintf(buf, sizeof(buf), "iou-wrk-%d", wq->task->pid); set_task_comm(current, buf); - audit_alloc_kernel(current); - while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) { long ret; @@ -670,7 +668,6 @@ static int io_wqe_worker(void *data) if (test_bit(IO_WQ_BIT_EXIT, &wq->state)) io_worker_handle_work(worker); - audit_free(current); io_worker_exit(worker); return 0; } diff --git a/fs/io_uring.c b/fs/io_uring.c index e8e769be9ed0..e0da0d2f71f2 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -9208,8 +9208,6 @@ static int io_sq_thread(void *data) set_cpus_allowed_ptr(current, cpu_online_mask); current->flags |= PF_NO_SETAFFINITY; - audit_alloc_kernel(current); - mutex_lock(&sqd->lock); while (1) { bool cap_entries, sqt_spin = false; @@ -9283,8 +9281,6 @@ static int io_sq_thread(void *data) io_run_task_work(); mutex_unlock(&sqd->lock); - audit_free(current); - complete(&sqd->exited); do_exit(0); } diff --git a/include/linux/audit.h b/include/linux/audit.h index cece70231138..c313df466e6e 100644 --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -287,7 +287,6 @@ static inline int audit_signal_info(int sig, struct task_struct *t) /* These are defined in auditsc.c */ /* Public API */ extern int audit_alloc(struct task_struct *task); -extern int audit_alloc_kernel(struct task_struct *task); extern void __audit_free(struct task_struct *task); extern void __audit_uring_entry(u8 op); extern void __audit_uring_exit(int success, long code); @@ -580,10 +579,6 @@ static inline int audit_alloc(struct task_struct *task) { return 0; } -static inline int audit_alloc_kernel(struct task_struct *task) -{ - return 0; -} static inline void audit_free(struct task_struct *task) { } static inline void audit_uring_entry(u8 op) diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 3a8c9d744800..dd8d9ab747c3 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -1073,31 +1073,6 @@ int audit_alloc(struct task_struct *tsk) return 0; } -/** - * audit_alloc_kernel - allocate an audit_context for a kernel task - * @tsk: the kernel task - * - * Similar to the audit_alloc() function, but intended for kernel private - * threads. Returns zero on success, negative values on failure. - */ -int audit_alloc_kernel(struct task_struct *tsk) -{ - /* - * At the moment we are just going to call into audit_alloc() to - * simplify the code, but there two things to keep in mind with this - * approach: - * - * 1. Filtering internal kernel tasks is a bit laughable in almost all - * cases, but there is at least one case where there is a benefit: - * the '-a task,never' case allows the admin to effectively disable - * task auditing at runtime. - * - * 2. The {set,clear}_task_syscall_work() ops likely have zero effect - * on these internal kernel tasks, but they probably don't hurt either. - */ - return audit_alloc(tsk); -} - static inline void audit_free_context(struct audit_context *context) { /* resetting is extra work, but it is likely just noise */