Message ID | 20210401000747.3648767-1-davemarchevsky@fb.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | BPF |
Headers | show |
Series | [bpf] bpf: refcount task stack in bpf_get_task_stack | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for bpf |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | fail | 1 blamed authors not CCed: andrii@kernel.org; 6 maintainers not CCed: netdev@vger.kernel.org yhs@fb.com kpsingh@kernel.org andrii@kernel.org kafai@fb.com john.fastabend@gmail.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 1 this patch: 1 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 19 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 1 this patch: 1 |
netdev/header_inline | success | Link |
> On Mar 31, 2021, at 5:07 PM, Dave Marchevsky <davemarchevsky@fb.com> wrote: > > On x86 the struct pt_regs * grabbed by task_pt_regs() points to an > offset of task->stack. The pt_regs are later dereferenced in > __bpf_get_stack (e.g. by user_mode() check). This can cause a fault if > the task in question exits while bpf_get_task_stack is executing, as > warned by task_stack_page's comment: > > * When accessing the stack of a non-current task that might exit, use > * try_get_task_stack() instead. task_stack_page will return a pointer > * that could get freed out from under you. > > Taking the comment's advice and using try_get_task_stack() and > put_task_stack() to hold task->stack refcount, or bail early if it's > already 0. Incrementing stack_refcount will ensure the task's stack > sticks around while we're using its data. > > I noticed this bug while testing a bpf task iter similar to > bpf_iter_task_stack in selftests, except mine grabbed user stack, and > getting intermittent crashes, which resulted in dumps like: > > BUG: unable to handle page fault for address: 0000000000003fe0 > \#PF: supervisor read access in kernel mode > \#PF: error_code(0x0000) - not-present page > RIP: 0010:__bpf_get_stack+0xd0/0x230 > <snip...> > Call Trace: > bpf_prog_0a2be35c092cb190_get_task_stacks+0x5d/0x3ec > bpf_iter_run_prog+0x24/0x81 > __task_seq_show+0x58/0x80 > bpf_seq_read+0xf7/0x3d0 > vfs_read+0x91/0x140 > ksys_read+0x59/0xd0 > do_syscall_64+0x48/0x120 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()") > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Thanks for the fix! Acked-by: Song Liu <songliubraving@fb.com> Could you please extend bpf_iter_task_stack to also grab user stack? Thanks, Song [...]
> On Mar 31, 2021, at 11:48 PM, Song Liu <songliubraving@fb.com> wrote: > > > >> On Mar 31, 2021, at 5:07 PM, Dave Marchevsky <davemarchevsky@fb.com> wrote: >> >> On x86 the struct pt_regs * grabbed by task_pt_regs() points to an >> offset of task->stack. The pt_regs are later dereferenced in >> __bpf_get_stack (e.g. by user_mode() check). This can cause a fault if >> the task in question exits while bpf_get_task_stack is executing, as >> warned by task_stack_page's comment: >> >> * When accessing the stack of a non-current task that might exit, use >> * try_get_task_stack() instead. task_stack_page will return a pointer >> * that could get freed out from under you. >> >> Taking the comment's advice and using try_get_task_stack() and >> put_task_stack() to hold task->stack refcount, or bail early if it's >> already 0. Incrementing stack_refcount will ensure the task's stack >> sticks around while we're using its data. >> >> I noticed this bug while testing a bpf task iter similar to >> bpf_iter_task_stack in selftests, except mine grabbed user stack, and >> getting intermittent crashes, which resulted in dumps like: >> >> BUG: unable to handle page fault for address: 0000000000003fe0 >> \#PF: supervisor read access in kernel mode >> \#PF: error_code(0x0000) - not-present page >> RIP: 0010:__bpf_get_stack+0xd0/0x230 >> <snip...> >> Call Trace: >> bpf_prog_0a2be35c092cb190_get_task_stacks+0x5d/0x3ec >> bpf_iter_run_prog+0x24/0x81 >> __task_seq_show+0x58/0x80 >> bpf_seq_read+0xf7/0x3d0 >> vfs_read+0x91/0x140 >> ksys_read+0x59/0xd0 >> do_syscall_64+0x48/0x120 >> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> >> Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()") >> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> > > Thanks for the fix! > > Acked-by: Song Liu <songliubraving@fb.com> > > Could you please extend bpf_iter_task_stack to also grab user stack? I think we can extend bpf_iter_task_stack in a follow up patch. It is not necessary to bundle these two patches in the same set. Thanks, Song
On Wed, Mar 31, 2021 at 5:08 PM Dave Marchevsky <davemarchevsky@fb.com> wrote: > > On x86 the struct pt_regs * grabbed by task_pt_regs() points to an > offset of task->stack. The pt_regs are later dereferenced in > __bpf_get_stack (e.g. by user_mode() check). This can cause a fault if > the task in question exits while bpf_get_task_stack is executing, as > warned by task_stack_page's comment: > > * When accessing the stack of a non-current task that might exit, use > * try_get_task_stack() instead. task_stack_page will return a pointer > * that could get freed out from under you. > > Taking the comment's advice and using try_get_task_stack() and > put_task_stack() to hold task->stack refcount, or bail early if it's > already 0. Incrementing stack_refcount will ensure the task's stack > sticks around while we're using its data. > > I noticed this bug while testing a bpf task iter similar to > bpf_iter_task_stack in selftests, except mine grabbed user stack, and > getting intermittent crashes, which resulted in dumps like: > > BUG: unable to handle page fault for address: 0000000000003fe0 > \#PF: supervisor read access in kernel mode > \#PF: error_code(0x0000) - not-present page > RIP: 0010:__bpf_get_stack+0xd0/0x230 > <snip...> > Call Trace: > bpf_prog_0a2be35c092cb190_get_task_stacks+0x5d/0x3ec > bpf_iter_run_prog+0x24/0x81 > __task_seq_show+0x58/0x80 > bpf_seq_read+0xf7/0x3d0 > vfs_read+0x91/0x140 > ksys_read+0x59/0xd0 > do_syscall_64+0x48/0x120 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()") > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Applied. Thanks
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index be35bfb7fb13..6fbc2abe9c91 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -517,9 +517,17 @@ const struct bpf_func_proto bpf_get_stack_proto = { BPF_CALL_4(bpf_get_task_stack, struct task_struct *, task, void *, buf, u32, size, u64, flags) { - struct pt_regs *regs = task_pt_regs(task); + struct pt_regs *regs; + long res; - return __bpf_get_stack(regs, task, NULL, buf, size, flags); + if (!try_get_task_stack(task)) + return -EFAULT; + + regs = task_pt_regs(task); + res = __bpf_get_stack(regs, task, NULL, buf, size, flags); + put_task_stack(task); + + return res; } BTF_ID_LIST_SINGLE(bpf_get_task_stack_btf_ids, struct, task_struct)
On x86 the struct pt_regs * grabbed by task_pt_regs() points to an offset of task->stack. The pt_regs are later dereferenced in __bpf_get_stack (e.g. by user_mode() check). This can cause a fault if the task in question exits while bpf_get_task_stack is executing, as warned by task_stack_page's comment: * When accessing the stack of a non-current task that might exit, use * try_get_task_stack() instead. task_stack_page will return a pointer * that could get freed out from under you. Taking the comment's advice and using try_get_task_stack() and put_task_stack() to hold task->stack refcount, or bail early if it's already 0. Incrementing stack_refcount will ensure the task's stack sticks around while we're using its data. I noticed this bug while testing a bpf task iter similar to bpf_iter_task_stack in selftests, except mine grabbed user stack, and getting intermittent crashes, which resulted in dumps like: BUG: unable to handle page fault for address: 0000000000003fe0 \#PF: supervisor read access in kernel mode \#PF: error_code(0x0000) - not-present page RIP: 0010:__bpf_get_stack+0xd0/0x230 <snip...> Call Trace: bpf_prog_0a2be35c092cb190_get_task_stacks+0x5d/0x3ec bpf_iter_run_prog+0x24/0x81 __task_seq_show+0x58/0x80 bpf_seq_read+0xf7/0x3d0 vfs_read+0x91/0x140 ksys_read+0x59/0xd0 do_syscall_64+0x48/0x120 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()") Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> --- kernel/bpf/stackmap.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)