Message ID | 20241101114410.234311-1-dmantipov@yandex.ru (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] Bluetooth: fix use-after-free in device_for_each_child() | expand |
Context | Check | Description |
---|---|---|
tedd_an/pre-ci_am | success | Success |
tedd_an/CheckPatch | warning | WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?) #77: CPU: 0 UID: 0 PID: 4980 Comm: kbnepd bnep0 Not tainted 6.12.0-rc4-00161-gae90f6a6170d #1 total: 0 errors, 1 warnings, 0 checks, 8 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. /github/workspace/src/src/13859262.patch has style problems, please review. NOTE: Ignored message types: UNKNOWN_COMMIT_ID NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. |
tedd_an/GitLint | fail | WARNING: I3 - ignore-body-lines: gitlint will be switching from using Python regex 'match' (match beginning) to 'search' (match anywhere) semantics. Please review your ignore-body-lines.regex option accordingly. To remove this warning, set general.regex-style-search=True. More details: https://jorisroovers.github.io/gitlint/configuration/#regex-style-search 8: B1 Line exceeds max length (88>80): "CPU: 0 UID: 0 PID: 4980 Comm: kbnepd bnep0 Not tainted 6.12.0-rc4-00161-gae90f6a6170d #1" 9: B1 Line exceeds max length (84>80): "Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014" |
tedd_an/SubjectPrefix | success | Gitlint PASS |
tedd_an/BuildKernel | success | BuildKernel PASS |
tedd_an/CheckAllWarning | success | CheckAllWarning PASS |
tedd_an/CheckSparse | success | CheckSparse PASS |
#syz test On Fri, Nov 1, 2024 at 7:44 AM Dmitry Antipov <dmantipov@yandex.ru> wrote: > > Syzbot has reported the following KASAN splat: > > BUG: KASAN: slab-use-after-free in device_for_each_child+0x18f/0x1a0 > Read of size 8 at addr ffff88801f605308 by task kbnepd bnep0/4980 > > CPU: 0 UID: 0 PID: 4980 Comm: kbnepd bnep0 Not tainted 6.12.0-rc4-00161-gae90f6a6170d #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 > Call Trace: > <TASK> > dump_stack_lvl+0x100/0x190 > ? device_for_each_child+0x18f/0x1a0 > print_report+0x13a/0x4cb > ? __virt_addr_valid+0x5e/0x590 > ? __phys_addr+0xc6/0x150 > ? device_for_each_child+0x18f/0x1a0 > kasan_report+0xda/0x110 > ? device_for_each_child+0x18f/0x1a0 > ? __pfx_dev_memalloc_noio+0x10/0x10 > device_for_each_child+0x18f/0x1a0 > ? __pfx_device_for_each_child+0x10/0x10 > pm_runtime_set_memalloc_noio+0xf2/0x180 > netdev_unregister_kobject+0x1ed/0x270 > unregister_netdevice_many_notify+0x123c/0x1d80 > ? __mutex_trylock_common+0xde/0x250 > ? __pfx_unregister_netdevice_many_notify+0x10/0x10 > ? trace_contention_end+0xe6/0x140 > ? __mutex_lock+0x4e7/0x8f0 > ? __pfx_lock_acquire.part.0+0x10/0x10 > ? rcu_is_watching+0x12/0xc0 > ? unregister_netdev+0x12/0x30 > unregister_netdevice_queue+0x30d/0x3f0 > ? __pfx_unregister_netdevice_queue+0x10/0x10 > ? __pfx_down_write+0x10/0x10 > unregister_netdev+0x1c/0x30 > bnep_session+0x1fb3/0x2ab0 > ? __pfx_bnep_session+0x10/0x10 > ? __pfx_lock_release+0x10/0x10 > ? __pfx_woken_wake_function+0x10/0x10 > ? __kthread_parkme+0x132/0x200 > ? __pfx_bnep_session+0x10/0x10 > ? kthread+0x13a/0x370 > ? __pfx_bnep_session+0x10/0x10 > kthread+0x2b7/0x370 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x48/0x80 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1a/0x30 > </TASK> > > Allocated by task 4974: > kasan_save_stack+0x30/0x50 > kasan_save_track+0x14/0x30 > __kasan_kmalloc+0xaa/0xb0 > __kmalloc_noprof+0x1d1/0x440 > hci_alloc_dev_priv+0x1d/0x2820 > __vhci_create_device+0xef/0x7d0 > vhci_write+0x2c7/0x480 > vfs_write+0x6a0/0xfc0 > ksys_write+0x12f/0x260 > do_syscall_64+0xc7/0x250 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > Freed by task 4979: > kasan_save_stack+0x30/0x50 > kasan_save_track+0x14/0x30 > kasan_save_free_info+0x3b/0x60 > __kasan_slab_free+0x4f/0x70 > kfree+0x141/0x490 > hci_release_dev+0x4d9/0x600 > bt_host_release+0x6a/0xb0 > device_release+0xa4/0x240 > kobject_put+0x1ec/0x5a0 > put_device+0x1f/0x30 > vhci_release+0x81/0xf0 > __fput+0x3f6/0xb30 > task_work_run+0x151/0x250 > do_exit+0xa79/0x2c30 > do_group_exit+0xd5/0x2a0 > get_signal+0x1fcd/0x2210 > arch_do_signal_or_restart+0x93/0x780 > syscall_exit_to_user_mode+0x140/0x290 > do_syscall_64+0xd4/0x250 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > In 'hci_conn_del_sysfs()', 'device_unregister()' may be called when > an underlying (kobject) reference counter is greater than 1. This > means that reparenting (happened when the device is actually freed) > is delayed and, during that delay, parent controller device (hciX) > may be deleted. Since the latter may create a dangling pointer to > freed parent, avoid that scenario by reparenting to NULL explicitly. > > Reported-by: syzbot+6cf5652d3df49fae2e3f@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=6cf5652d3df49fae2e3f > Fixes: a85fb91e3d72 ("Bluetooth: Fix double free in hci_conn_cleanup") > Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> > --- > v2: reparent per-connection 'struct device' explicitly > --- > net/bluetooth/hci_sysfs.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c > index 367e32fe30eb..80ac537fa500 100644 > --- a/net/bluetooth/hci_sysfs.c > +++ b/net/bluetooth/hci_sysfs.c > @@ -73,6 +73,8 @@ void hci_conn_del_sysfs(struct hci_conn *conn) > return; > } > > + device_move(&conn->dev, NULL, DPM_ORDER_DEV_LAST); > + > while (1) { > struct device *dev; > > -- > 2.47.0 >
On 11/1/24 6:01 PM, Luiz Augusto von Dentz wrote:
> #syz test
Please see at https://syzkaller.appspot.com/bug?extid=6cf5652d3df49fae2e3f just as usual.
Dmitry
Hi Dmitry, On Fri, Nov 1, 2024 at 11:06 AM Dmitry Antipov <dmantipov@yandex.ru> wrote: > > On 11/1/24 6:01 PM, Luiz Augusto von Dentz wrote: > > > #syz test > > Please see at https://syzkaller.appspot.com/bug?extid=6cf5652d3df49fae2e3f just as usual. There is no Tested-by thus why I assumed it wasn't tested by syzbot yet. > Dmitry >
On 11/1/24 6:12 PM, Luiz Augusto von Dentz wrote:
> There is no Tested-by thus why I assumed it wasn't tested by syzbot yet.
Ugh. Until today I've assumed that Tested-by: is applicable to human-driven
testing only :-).
Dmitry
Hi Dmitry, On Fri, Nov 1, 2024 at 11:17 AM Dmitry Antipov <dmantipov@yandex.ru> wrote: > > On 11/1/24 6:12 PM, Luiz Augusto von Dentz wrote: > > > There is no Tested-by thus why I assumed it wasn't tested by syzbot yet. > > Ugh. Until today I've assumed that Tested-by: is applicable to human-driven > testing only :-). Nope, in fact it is very handy to have syzbot test your changes since it may hit other problems as well. > Dmitry >
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
KASAN: slab-use-after-free Read in netdev_unregister_kobject
==================================================================
BUG: KASAN: slab-use-after-free in device_for_each_child+0xbb/0x1b0 drivers/base/core.c:3999
Read of size 8 at addr ffff888021c81308 by task kbnepd bnep0/5597
CPU: 0 UID: 0 PID: 5597 Comm: kbnepd bnep0 Not tainted 6.12.0-rc5-syzkaller-00181-g6c52d4da1c74 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
device_for_each_child+0xbb/0x1b0 drivers/base/core.c:3999
pm_runtime_set_memalloc_noio+0x114/0x260 drivers/base/power/runtime.c:248
netdev_unregister_kobject+0x178/0x250 net/core/net-sysfs.c:2109
unregister_netdevice_many_notify+0x1851/0x1da0 net/core/dev.c:11441
unregister_netdevice_many net/core/dev.c:11469 [inline]
unregister_netdevice_queue+0x303/0x370 net/core/dev.c:11343
unregister_netdevice include/linux/netdevice.h:3118 [inline]
unregister_netdev+0x1c/0x30 net/core/dev.c:11487
bnep_session+0x2e0e/0x3000 net/bluetooth/bnep/core.c:525
kthread+0x2f2/0x390 kernel/kthread.c:389
ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Allocated by task 5425:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:257 [inline]
__do_kmalloc_node mm/slub.c:4264 [inline]
__kmalloc_noprof+0x1fc/0x400 mm/slub.c:4276
kmalloc_noprof include/linux/slab.h:882 [inline]
kzalloc_noprof include/linux/slab.h:1014 [inline]
hci_alloc_dev_priv+0x27/0x2030 net/bluetooth/hci_core.c:2440
hci_alloc_dev include/net/bluetooth/hci_core.h:1621 [inline]
__vhci_create_device drivers/bluetooth/hci_vhci.c:399 [inline]
vhci_create_device+0x116/0x6a0 drivers/bluetooth/hci_vhci.c:470
vhci_get_user drivers/bluetooth/hci_vhci.c:527 [inline]
vhci_write+0x3cf/0x490 drivers/bluetooth/hci_vhci.c:607
new_sync_write fs/read_write.c:590 [inline]
vfs_write+0xaed/0xd30 fs/read_write.c:683
ksys_write+0x183/0x2b0 fs/read_write.c:736
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 5425:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x59/0x70 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:230 [inline]
slab_free_hook mm/slub.c:2342 [inline]
slab_free mm/slub.c:4579 [inline]
kfree+0x1a0/0x440 mm/slub.c:4727
hci_release_dev+0x1525/0x16b0 net/bluetooth/hci_core.c:2759
bt_host_release+0x83/0x90 net/bluetooth/hci_sysfs.c:94
device_release+0x9b/0x1c0
kobject_cleanup lib/kobject.c:689 [inline]
kobject_release lib/kobject.c:720 [inline]
kref_put include/linux/kref.h:65 [inline]
kobject_put+0x231/0x480 lib/kobject.c:737
vhci_release+0x88/0xd0 drivers/bluetooth/hci_vhci.c:665
__fput+0x241/0x880 fs/file_table.c:431
task_work_run+0x251/0x310 kernel/task_work.c:239
exit_task_work include/linux/task_work.h:43 [inline]
do_exit+0xa2f/0x28e0 kernel/exit.c:939
do_group_exit+0x207/0x2c0 kernel/exit.c:1088
__do_sys_exit_group kernel/exit.c:1099 [inline]
__se_sys_exit_group kernel/exit.c:1097 [inline]
__x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1097
x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Last potentially related work creation:
kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47
__kasan_record_aux_stack+0xac/0xc0 mm/kasan/generic.c:541
insert_work+0x3e/0x330 kernel/workqueue.c:2183
__queue_work+0xb66/0xf50 kernel/workqueue.c:2343
queue_work_on+0x1c2/0x380 kernel/workqueue.c:2390
process_one_work kernel/workqueue.c:3229 [inline]
process_scheduled_works+0xa65/0x1850 kernel/workqueue.c:3310
worker_thread+0x870/0xd30 kernel/workqueue.c:3391
kthread+0x2f2/0x390 kernel/kthread.c:389
ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Second to last potentially related work creation:
kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47
__kasan_record_aux_stack+0xac/0xc0 mm/kasan/generic.c:541
insert_work+0x3e/0x330 kernel/workqueue.c:2183
__queue_work+0xc8b/0xf50 kernel/workqueue.c:2339
call_timer_fn+0x190/0x650 kernel/time/timer.c:1794
expire_timers kernel/time/timer.c:1840 [inline]
__run_timers kernel/time/timer.c:2419 [inline]
__run_timer_base+0x695/0x8e0 kernel/time/timer.c:2430
run_timer_base kernel/time/timer.c:2439 [inline]
run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2449
handle_softirqs+0x2c7/0x980 kernel/softirq.c:554
__do_softirq kernel/softirq.c:588 [inline]
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1049
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
The buggy address belongs to the object at ffff888021c80000
which belongs to the cache kmalloc-8k of size 8192
The buggy address is located 4872 bytes inside of
freed 8192-byte region [ffff888021c80000, ffff888021c82000)
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x21c80
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
anon flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 00fff00000000040 ffff888015442280 0000000000000000 0000000000000001
raw: 0000000000000000 0000000000020002 00000001f5000000 0000000000000000
head: 00fff00000000040 ffff888015442280 0000000000000000 0000000000000001
head: 0000000000000000 0000000000020002 00000001f5000000 0000000000000000
head: 00fff00000000003 ffffea0000872001 ffffffffffffffff 0000000000000000
head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4752, tgid 4752 (S40network), ts 32094841667, free_ts 32026397912
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537
prep_new_page mm/page_alloc.c:1545 [inline]
get_page_from_freelist+0x303f/0x3190 mm/page_alloc.c:3457
__alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4733
alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265
alloc_slab_page+0x6a/0x120 mm/slub.c:2412
allocate_slab+0x5a/0x2f0 mm/slub.c:2578
new_slab mm/slub.c:2631 [inline]
___slab_alloc+0xcd1/0x14b0 mm/slub.c:3818
__slab_alloc+0x58/0xa0 mm/slub.c:3908
__slab_alloc_node mm/slub.c:3961 [inline]
slab_alloc_node mm/slub.c:4122 [inline]
__kmalloc_cache_noprof+0x1d5/0x2c0 mm/slub.c:4290
kmalloc_noprof include/linux/slab.h:878 [inline]
kzalloc_noprof include/linux/slab.h:1014 [inline]
tomoyo_print_bprm security/tomoyo/audit.c:26 [inline]
tomoyo_init_log+0x11cd/0x2050 security/tomoyo/audit.c:264
tomoyo_supervisor+0x38a/0x11f0 security/tomoyo/common.c:2089
tomoyo_audit_env_log security/tomoyo/environ.c:36 [inline]
tomoyo_env_perm+0x178/0x210 security/tomoyo/environ.c:63
tomoyo_environ security/tomoyo/domain.c:672 [inline]
tomoyo_find_next_domain+0x146e/0x1d40 security/tomoyo/domain.c:881
tomoyo_bprm_check_security+0x114/0x180 security/tomoyo/tomoyo.c:102
security_bprm_check+0x86/0x250 security/security.c:1297
search_binary_handler fs/exec.c:1740 [inline]
exec_binprm fs/exec.c:1794 [inline]
bprm_execve+0xa56/0x1770 fs/exec.c:1845
page last free pid 4751 tgid 4751 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1108 [inline]
free_unref_page+0xcfb/0xf20 mm/page_alloc.c:2638
__slab_free+0x31b/0x3d0 mm/slub.c:4490
qlink_free mm/kasan/quarantine.c:163 [inline]
qlist_free_all+0x9a/0x140 mm/kasan/quarantine.c:179
kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329
kasan_slab_alloc include/linux/kasan.h:247 [inline]
slab_post_alloc_hook mm/slub.c:4085 [inline]
slab_alloc_node mm/slub.c:4134 [inline]
kmem_cache_alloc_noprof+0x135/0x2a0 mm/slub.c:4141
vm_area_alloc+0x24/0x1d0 kernel/fork.c:472
mmap_region+0x1134/0x2940 mm/mmap.c:1436
do_mmap+0x8f0/0x1000 mm/mmap.c:496
vm_mmap_pgoff+0x1dd/0x3d0 mm/util.c:588
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Memory state around the buggy address:
ffff888021c81200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888021c81280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888021c81300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888021c81380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888021c81400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Tested on:
commit: 6c52d4da Merge tag 'for-linus' of git://git.kernel.org..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15e81630580000
kernel config: https://syzkaller.appspot.com/x/.config?x=2cf68159adbdf217
dashboard link: https://syzkaller.appspot.com/bug?extid=6cf5652d3df49fae2e3f
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
Note: no patches were applied.
Hi Dmitry, On Fri, Nov 1, 2024 at 11:31 AM Luiz Augusto von Dentz <luiz.dentz@gmail.com> wrote: > > Hi Dmitry, > > On Fri, Nov 1, 2024 at 11:17 AM Dmitry Antipov <dmantipov@yandex.ru> wrote: > > > > On 11/1/24 6:12 PM, Luiz Augusto von Dentz wrote: > > > > > There is no Tested-by thus why I assumed it wasn't tested by syzbot yet. > > > > Ugh. Until today I've assumed that Tested-by: is applicable to human-driven > > testing only :-). > > Nope, in fact it is very handy to have syzbot test your changes since > it may hit other problems as well. Looks like this doesn't solve the problem, in fact I think you are getting it backwards, you are trying to reparent the parent dev not the child and I assume by destroying the parent device there should be some way to reset the parent which seems to be the intent the following code in hci_conn_del_sysfs: while (1) { struct device *dev; dev = device_find_child(&conn->dev, NULL, __match_tty); if (!dev) break; device_move(dev, NULL, DPM_ORDER_DEV_LAST); put_device(dev); } But note that it only does that after matching tty, but I guess we want to do it regardless otherwise we may have the child objects still access it, that said we should probably use device_for_each_child though if that is safe to do calls to device_move under its callback. > > Dmitry > > > > > -- > Luiz Augusto von Dentz
On Fri, Nov 1, 2024 at 1:37 PM Luiz Augusto von Dentz <luiz.dentz@gmail.com> wrote: > > Hi Dmitry, > > On Fri, Nov 1, 2024 at 11:31 AM Luiz Augusto von Dentz > <luiz.dentz@gmail.com> wrote: > > > > Hi Dmitry, > > > > On Fri, Nov 1, 2024 at 11:17 AM Dmitry Antipov <dmantipov@yandex.ru> wrote: > > > > > > On 11/1/24 6:12 PM, Luiz Augusto von Dentz wrote: > > > > > > > There is no Tested-by thus why I assumed it wasn't tested by syzbot yet. > > > > > > Ugh. Until today I've assumed that Tested-by: is applicable to human-driven > > > testing only :-). > > > > Nope, in fact it is very handy to have syzbot test your changes since > > it may hit other problems as well. > > Looks like this doesn't solve the problem, in fact I think you are > getting it backwards, you are trying to reparent the parent dev not > the child and I assume by destroying the parent device there should be > some way to reset the parent which seems to be the intent the > following code in hci_conn_del_sysfs: > > while (1) { > struct device *dev; > > dev = device_find_child(&conn->dev, NULL, __match_tty); > if (!dev) > break; > device_move(dev, NULL, DPM_ORDER_DEV_LAST); > put_device(dev); > } > > But note that it only does that after matching tty, but I guess we > want to do it regardless otherwise we may have the child objects still > access it, that said we should probably use device_for_each_child > though if that is safe to do calls to device_move under its callback. #syz test > > > Dmitry > > > > > > > > > -- > > Luiz Augusto von Dentz > > > > -- > Luiz Augusto von Dentz
Hello,
syzbot has tested the proposed patch and the reproducer did not trigger any issue:
Reported-by: syzbot+6cf5652d3df49fae2e3f@syzkaller.appspotmail.com
Tested-by: syzbot+6cf5652d3df49fae2e3f@syzkaller.appspotmail.com
Tested on:
commit: c4264568 Merge tag 'acpi-6.12-rc6' of git://git.kernel..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=148ad340580000
kernel config: https://syzkaller.appspot.com/x/.config?x=2cf68159adbdf217
dashboard link: https://syzkaller.appspot.com/bug?extid=6cf5652d3df49fae2e3f
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=1122d340580000
Note: testing is done by a robot and is best-effort only.
On 11/1/24 8:37 PM, Luiz Augusto von Dentz wrote: > Looks like this doesn't solve the problem, in fact I think you are > getting it backwards, you are trying to reparent the parent dev not > the child and I assume by destroying the parent device there should be > some way to reset the parent which seems to be the intent the > following code in hci_conn_del_sysfs: > > while (1) { > struct device *dev; > > dev = device_find_child(&conn->dev, NULL, __match_tty); > if (!dev) > break; > device_move(dev, NULL, DPM_ORDER_DEV_LAST); > put_device(dev); > } > > But note that it only does that after matching tty, but I guess we > want to do it regardless otherwise we may have the child objects still > access it, that said we should probably use device_for_each_child > though if that is safe to do calls to device_move under its callback. I'm not sure that I've got your point. IIUC the problem is that a controller (parent) instance may be freed _after_ the child (connection) has passed 'device_unregister(&conn->dev)' but _before_ an actual removal of 'conn->dev' from the devices hierarchy, thus leaving the dangling 'conn->dev.parent' pointer. The latter may be fixed by reparenting 'conn->dev' to NULL explicitly. And handling children of 'conn->dev' (i.e. the grandchilren of the controller) is out of this problem's scope at all. And nothing to say about syzbot's innards but manual testing shows that the following thing: void hci_conn_del_sysfs(struct hci_conn *conn) { struct hci_dev *hdev = conn->hdev; bt_dev_dbg(hdev, "conn %p", conn); if (!device_is_registered(&conn->dev)) { /* If device_add() has *not* succeeded, use *only* put_device() * to drop the reference count. */ put_device(&conn->dev); return; } while (1) { struct device *dev; dev = device_find_any_child(&conn->dev); if (!dev) break; printk(KERN_ERR "%s:%d: reparent dev@%p(%s) with parent@%p(%s)\n", __FILE__, __LINE__, dev, dev_name(dev), dev->parent, (dev->parent ? dev_name(dev->parent) : "<none>")); device_move(dev, NULL, DPM_ORDER_DEV_LAST); put_device(dev); } device_unregister(&conn->dev); } occasionally triggers the following crash: net/bluetooth/hci_sysfs.c:82: reparent dev@ffff888114be86f8(bnep0) with parent@ffff888111c64b68(hci4:200) Oops: general protection fault, probably for non-canonical address 0xdffffc000000000b: 0000 [#1] PREEMPT SMP KASI KASAN: null-ptr-deref in range [0x0000000000000058-0x000000000000005f] CPU: 3 UID: 0 PID: 6033 Comm: repro Not tainted 6.12.0-rc5-00299-g11066801dd4b-dirty #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 RIP: 0010:klist_put+0x4d/0x1d0 Code: c1 ea 03 80 3c 02 00 0f 85 74 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 23 49 83 e4 fe 49 8d 7c 24 58 49 RSP: 0018:ffffc9000423f9b0 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff88810f43ac60 RCX: 0000000000000000 RDX: 000000000000000b RSI: ffffffff8a92d415 RDI: 0000000000000058 RBP: 0000000000000001 R08: 0000000000000000 R09: fffffbfff1fad62c R10: ffffffff8fd6b163 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000001 R14: ffffc9000423fa30 R15: ffffffff8a92d805 FS: 00007f24ebb78740(0000) GS:ffff888135f00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055c52afb6950 CR3: 0000000020c1c000 CR4: 00000000000006f0 Call Trace: <TASK> ? die_addr.cold+0x8/0xd ? exc_general_protection+0x147/0x240 ? asm_exc_general_protection+0x26/0x30 ? klist_remove+0x155/0x2b0 ? klist_put+0x15/0x1d0 ? klist_put+0x4d/0x1d0 klist_remove+0x15a/0x2b0 ? __pfx_klist_remove+0x10/0x10 device_move+0x12d/0x10b0 hci_conn_del_sysfs.cold+0xcf/0x14a hci_conn_del+0x467/0xd60 hci_conn_hash_flush+0x18f/0x270 hci_dev_close_sync+0x549/0x1260 hci_dev_do_close+0x2e/0x90 hci_unregister_dev+0x213/0x630 vhci_release+0x79/0xf0 ? __pfx_vhci_release+0x10/0x10 __fput+0x3f6/0xb30 task_work_run+0x151/0x250 ? __pfx_task_work_run+0x10/0x10 do_exit+0xa79/0x2c30 ? do_raw_spin_lock+0x12a/0x2b0 ? __pfx_do_exit+0x10/0x10 do_group_exit+0xd5/0x2a0 __x64_sys_exit_group+0x3e/0x50 x64_sys_call+0x14af/0x14b0 do_syscall_64+0xc7/0x250 entry_SYSCALL_64_after_hwframe+0x77/0x7f Dmitry
Hi Dmitry, On Sat, Nov 2, 2024 at 5:51 AM Dmitry Antipov <dmantipov@yandex.ru> wrote: > > On 11/1/24 8:37 PM, Luiz Augusto von Dentz wrote: > > > Looks like this doesn't solve the problem, in fact I think you are > > getting it backwards, you are trying to reparent the parent dev not > > the child and I assume by destroying the parent device there should be > > some way to reset the parent which seems to be the intent the > > following code in hci_conn_del_sysfs: > > > > while (1) { > > struct device *dev; > > > > dev = device_find_child(&conn->dev, NULL, __match_tty); > > if (!dev) > > break; > > device_move(dev, NULL, DPM_ORDER_DEV_LAST); > > put_device(dev); > > } > > > > But note that it only does that after matching tty, but I guess we > > want to do it regardless otherwise we may have the child objects still > > access it, that said we should probably use device_for_each_child > > though if that is safe to do calls to device_move under its callback. > > I'm not sure that I've got your point. IIUC the problem is that a controller > (parent) instance may be freed _after_ the child (connection) has passed > 'device_unregister(&conn->dev)' but _before_ an actual removal of 'conn->dev' > from the devices hierarchy, thus leaving the dangling 'conn->dev.parent' > pointer. The latter may be fixed by reparenting 'conn->dev' to NULL explicitly. > And handling children of 'conn->dev' (i.e. the grandchilren of the controller) > is out of this problem's scope at all. > > And nothing to say about syzbot's innards but manual testing shows that the > following thing: > > void hci_conn_del_sysfs(struct hci_conn *conn) > { > struct hci_dev *hdev = conn->hdev; > > bt_dev_dbg(hdev, "conn %p", conn); > > if (!device_is_registered(&conn->dev)) { > /* If device_add() has *not* succeeded, use *only* put_device() > * to drop the reference count. > */ > put_device(&conn->dev); > return; > } > > while (1) { > struct device *dev; > > dev = device_find_any_child(&conn->dev); > if (!dev) > break; > printk(KERN_ERR "%s:%d: reparent dev@%p(%s) with parent@%p(%s)\n", > __FILE__, __LINE__, dev, dev_name(dev), dev->parent, > (dev->parent ? dev_name(dev->parent) : "<none>")); > device_move(dev, NULL, DPM_ORDER_DEV_LAST); > put_device(dev); > } > > device_unregister(&conn->dev); > } > > occasionally triggers the following crash: > > net/bluetooth/hci_sysfs.c:82: reparent dev@ffff888114be86f8(bnep0) with parent@ffff888111c64b68(hci4:200) > Oops: general protection fault, probably for non-canonical address 0xdffffc000000000b: 0000 [#1] PREEMPT SMP KASI > KASAN: null-ptr-deref in range [0x0000000000000058-0x000000000000005f] > CPU: 3 UID: 0 PID: 6033 Comm: repro Not tainted 6.12.0-rc5-00299-g11066801dd4b-dirty #8 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 > RIP: 0010:klist_put+0x4d/0x1d0 > Code: c1 ea 03 80 3c 02 00 0f 85 74 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 23 49 83 e4 fe 49 8d 7c 24 58 49 > RSP: 0018:ffffc9000423f9b0 EFLAGS: 00010202 > RAX: dffffc0000000000 RBX: ffff88810f43ac60 RCX: 0000000000000000 > RDX: 000000000000000b RSI: ffffffff8a92d415 RDI: 0000000000000058 > RBP: 0000000000000001 R08: 0000000000000000 R09: fffffbfff1fad62c > R10: ffffffff8fd6b163 R11: 0000000000000000 R12: 0000000000000000 > R13: 0000000000000001 R14: ffffc9000423fa30 R15: ffffffff8a92d805 > FS: 00007f24ebb78740(0000) GS:ffff888135f00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000055c52afb6950 CR3: 0000000020c1c000 CR4: 00000000000006f0 > Call Trace: > <TASK> > ? die_addr.cold+0x8/0xd > ? exc_general_protection+0x147/0x240 > ? asm_exc_general_protection+0x26/0x30 > ? klist_remove+0x155/0x2b0 > ? klist_put+0x15/0x1d0 > ? klist_put+0x4d/0x1d0 > klist_remove+0x15a/0x2b0 > ? __pfx_klist_remove+0x10/0x10 > device_move+0x12d/0x10b0 > hci_conn_del_sysfs.cold+0xcf/0x14a > hci_conn_del+0x467/0xd60 > hci_conn_hash_flush+0x18f/0x270 > hci_dev_close_sync+0x549/0x1260 > hci_dev_do_close+0x2e/0x90 > hci_unregister_dev+0x213/0x630 > vhci_release+0x79/0xf0 > ? __pfx_vhci_release+0x10/0x10 > __fput+0x3f6/0xb30 > task_work_run+0x151/0x250 > ? __pfx_task_work_run+0x10/0x10 > do_exit+0xa79/0x2c30 > ? do_raw_spin_lock+0x12a/0x2b0 > ? __pfx_do_exit+0x10/0x10 > do_group_exit+0xd5/0x2a0 > __x64_sys_exit_group+0x3e/0x50 > x64_sys_call+0x14af/0x14b0 > do_syscall_64+0xc7/0x250 > entry_SYSCALL_64_after_hwframe+0x77/0x7f Weird, this was not reproduced by syzbot when I asked it to test, how are you reproducing this? This looks like to be a problem with klist being corrupted somehow, anyway if we can't unparent the objects we need a way to unregister the child without waiting for the bnep thread to clean it up. > Dmitry
On 11/4/24 5:06 PM, Luiz Augusto von Dentz wrote: > Weird, this was not reproduced by syzbot when I asked it to test, how > are you reproducing this? This is not regular, looks like a race, and I'm not sure how many runs are required to reproduce. Anyway we can't blame syzbot just because it was unable to reproduce some weird thing. Dmitry
Hi, On Mon, Nov 4, 2024 at 9:58 AM Dmitry Antipov <dmantipov@yandex.ru> wrote: > > On 11/4/24 5:06 PM, Luiz Augusto von Dentz wrote: > > > Weird, this was not reproduced by syzbot when I asked it to test, how > > are you reproducing this? > > This is not regular, looks like a race, and I'm not sure how many runs > are required to reproduce. Anyway we can't blame syzbot just because > it was unable to reproduce some weird thing. But how you reproducing, or you just running syzbot test over and over locally? Anyway I will look into how we can actually unregister the likes of bnep net_dev in a more direct manner rather than doing via bnep kthread. > Dmitry >
diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c index 367e32fe30eb..80ac537fa500 100644 --- a/net/bluetooth/hci_sysfs.c +++ b/net/bluetooth/hci_sysfs.c @@ -73,6 +73,8 @@ void hci_conn_del_sysfs(struct hci_conn *conn) return; } + device_move(&conn->dev, NULL, DPM_ORDER_DEV_LAST); + while (1) { struct device *dev;
Syzbot has reported the following KASAN splat: BUG: KASAN: slab-use-after-free in device_for_each_child+0x18f/0x1a0 Read of size 8 at addr ffff88801f605308 by task kbnepd bnep0/4980 CPU: 0 UID: 0 PID: 4980 Comm: kbnepd bnep0 Not tainted 6.12.0-rc4-00161-gae90f6a6170d #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x100/0x190 ? device_for_each_child+0x18f/0x1a0 print_report+0x13a/0x4cb ? __virt_addr_valid+0x5e/0x590 ? __phys_addr+0xc6/0x150 ? device_for_each_child+0x18f/0x1a0 kasan_report+0xda/0x110 ? device_for_each_child+0x18f/0x1a0 ? __pfx_dev_memalloc_noio+0x10/0x10 device_for_each_child+0x18f/0x1a0 ? __pfx_device_for_each_child+0x10/0x10 pm_runtime_set_memalloc_noio+0xf2/0x180 netdev_unregister_kobject+0x1ed/0x270 unregister_netdevice_many_notify+0x123c/0x1d80 ? __mutex_trylock_common+0xde/0x250 ? __pfx_unregister_netdevice_many_notify+0x10/0x10 ? trace_contention_end+0xe6/0x140 ? __mutex_lock+0x4e7/0x8f0 ? __pfx_lock_acquire.part.0+0x10/0x10 ? rcu_is_watching+0x12/0xc0 ? unregister_netdev+0x12/0x30 unregister_netdevice_queue+0x30d/0x3f0 ? __pfx_unregister_netdevice_queue+0x10/0x10 ? __pfx_down_write+0x10/0x10 unregister_netdev+0x1c/0x30 bnep_session+0x1fb3/0x2ab0 ? __pfx_bnep_session+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? __pfx_woken_wake_function+0x10/0x10 ? __kthread_parkme+0x132/0x200 ? __pfx_bnep_session+0x10/0x10 ? kthread+0x13a/0x370 ? __pfx_bnep_session+0x10/0x10 kthread+0x2b7/0x370 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x48/0x80 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 4974: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0xaa/0xb0 __kmalloc_noprof+0x1d1/0x440 hci_alloc_dev_priv+0x1d/0x2820 __vhci_create_device+0xef/0x7d0 vhci_write+0x2c7/0x480 vfs_write+0x6a0/0xfc0 ksys_write+0x12f/0x260 do_syscall_64+0xc7/0x250 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 4979: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x4f/0x70 kfree+0x141/0x490 hci_release_dev+0x4d9/0x600 bt_host_release+0x6a/0xb0 device_release+0xa4/0x240 kobject_put+0x1ec/0x5a0 put_device+0x1f/0x30 vhci_release+0x81/0xf0 __fput+0x3f6/0xb30 task_work_run+0x151/0x250 do_exit+0xa79/0x2c30 do_group_exit+0xd5/0x2a0 get_signal+0x1fcd/0x2210 arch_do_signal_or_restart+0x93/0x780 syscall_exit_to_user_mode+0x140/0x290 do_syscall_64+0xd4/0x250 entry_SYSCALL_64_after_hwframe+0x77/0x7f In 'hci_conn_del_sysfs()', 'device_unregister()' may be called when an underlying (kobject) reference counter is greater than 1. This means that reparenting (happened when the device is actually freed) is delayed and, during that delay, parent controller device (hciX) may be deleted. Since the latter may create a dangling pointer to freed parent, avoid that scenario by reparenting to NULL explicitly. Reported-by: syzbot+6cf5652d3df49fae2e3f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6cf5652d3df49fae2e3f Fixes: a85fb91e3d72 ("Bluetooth: Fix double free in hci_conn_cleanup") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> --- v2: reparent per-connection 'struct device' explicitly --- net/bluetooth/hci_sysfs.c | 2 ++ 1 file changed, 2 insertions(+)