Message ID | 20240725134741.27281-2-yskelg@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] Bluetooth: hci_core: fix suspicious RCU usage in hci_conn_drop() | expand |
Context | Check | Description |
---|---|---|
tedd_an/pre-ci_am | success | Success |
tedd_an/CheckPatch | success | CheckPatch PASS |
tedd_an/GitLint | success | Gitlint PASS |
tedd_an/SubjectPrefix | success | Gitlint PASS |
tedd_an/BuildKernel | success | BuildKernel PASS |
tedd_an/CheckAllWarning | success | CheckAllWarning PASS |
tedd_an/CheckSparse | success | CheckSparse PASS |
tedd_an/CheckSmatch | success | CheckSparse PASS |
tedd_an/BuildKernel32 | success | BuildKernel32 PASS |
tedd_an/TestRunnerSetup | success | TestRunnerSetup PASS |
tedd_an/TestRunner_l2cap-tester | success | TestRunner PASS |
tedd_an/TestRunner_iso-tester | fail | TestRunner_iso-tester: Total: 122, Passed: 117 (95.9%), Failed: 1, Not Run: 4 |
tedd_an/TestRunner_bnep-tester | success | TestRunner PASS |
tedd_an/TestRunner_mgmt-tester | fail | TestRunner_mgmt-tester: Total: 492, Passed: 489 (99.4%), Failed: 1, Not Run: 2 |
tedd_an/TestRunner_rfcomm-tester | success | TestRunner PASS |
tedd_an/TestRunner_sco-tester | success | TestRunner PASS |
tedd_an/TestRunner_ioctl-tester | success | TestRunner PASS |
tedd_an/TestRunner_mesh-tester | success | TestRunner PASS |
tedd_an/TestRunner_smp-tester | success | TestRunner PASS |
tedd_an/TestRunner_userchan-tester | success | TestRunner PASS |
tedd_an/IncrementalBuild | success | Incremental Build PASS |
On 2024/07/25 22:47, Yunseong Kim wrote: > ============================= > WARNING: suspicious RCU usage > 6.10.0-rc6-01340-gf14c0bb78769 #5 Not tainted > ----------------------------- > net/mac80211/util.c:4000 RCU-list traversed in non-reader section!! > > other info that might help us debug this: > > rcu_scheduler_active = 2, debug_locks = 1 > 2 locks held by syz-executor/798: > #0: ffff800089a3de50 (rtnl_mutex){+.+.}-{4:4}, > at: rtnl_lock+0x28/0x40 net/core/rtnetlink.c:79 > > stack backtrace: > CPU: 0 PID: 798 Comm: syz-executor Not tainted > 6.10.0-rc6-01340-gf14c0bb78769 #5 > Hardware name: linux,dummy-virt (DT) > Call trace: > dump_backtrace.part.0+0x1b8/0x1d0 arch/arm64/kernel/stacktrace.c:317 > dump_backtrace arch/arm64/kernel/stacktrace.c:323 [inline] > show_stack+0x34/0x50 arch/arm64/kernel/stacktrace.c:324 > __dump_stack lib/dump_stack.c:88 [inline] > dump_stack_lvl+0xf0/0x170 lib/dump_stack.c:114 > dump_stack+0x20/0x30 lib/dump_stack.c:123 > lockdep_rcu_suspicious+0x204/0x2f8 kernel/locking/lockdep.c:6712 > ieee80211_check_combinations+0x71c/0x828 [mac80211] > ieee80211_check_concurrent_iface+0x494/0x700 [mac80211] > ieee80211_open+0x140/0x238 [mac80211] > __dev_open+0x270/0x498 net/core/dev.c:1474 > __dev_change_flags+0x47c/0x610 net/core/dev.c:8837 > dev_change_flags+0x98/0x170 net/core/dev.c:8909 > devinet_ioctl+0xdf0/0x18d0 net/ipv4/devinet.c:1177 > inet_ioctl+0x34c/0x388 net/ipv4/af_inet.c:1003 > sock_do_ioctl+0xe4/0x240 net/socket.c:1222 > sock_ioctl+0x4cc/0x740 net/socket.c:1341 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:907 [inline] > __se_sys_ioctl fs/ioctl.c:893 [inline] > __arm64_sys_ioctl+0x184/0x218 fs/ioctl.c:893 > __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline] > invoke_syscall+0x90/0x2e8 arch/arm64/kernel/syscall.c:48 > el0_svc_common.constprop.0+0x200/0x2a8 arch/arm64/kernel/syscall.c:131 > el0_svc+0x48/0xc0 arch/arm64/kernel/entry-common.c:712 > el0t_64_sync_handler+0x120/0x130 arch/arm64/kernel/entry-common.c:730 > el0t_64_sync+0x190/0x198 arch/arm64/kernel/entry.S:598 > > This patch attempts to fix that issue with the same convention. Excuse me, but I can't interpret why this patch solves the warning. The warning says that list_for_each_entry_rcu() { } in ieee80211_check_combinations() is called outside of rcu_read_lock() and rcu_read_unlock() pair, doesn't it? How does that connected to guarding hci_dev_test_flag() and queue_delayed_work() with rcu_read_lock() and rcu_read_unlock() pair? Unless you guard list_for_each_entry_rcu() { } in ieee80211_check_combinations() with rcu_read_lock() and rcu_read_unlock() pair (or annotate that appropriate locks are already held), I can't expect that the warning will be solved... Also, what guarantees that drain_workqueue() won't be disturbed by queue_work(disc_work) which will be called after "timeo" delay, for you are not explicitly cancelling scheduled "disc_work" (unlike "cmd_timer" work and "ncmd_timer" work shown below) before calling drain_workqueue() ? /* Cancel these to avoid queueing non-chained pending work */ hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); /* Wait for * * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) * queue_delayed_work(&hdev->{cmd,ncmd}_timer) * * inside RCU section to see the flag or complete scheduling. */ synchronize_rcu(); /* Explicitly cancel works in case scheduled after setting the flag. */ cancel_delayed_work(&hdev->cmd_timer); cancel_delayed_work(&hdev->ncmd_timer); /* Avoid potential lockdep warnings from the *_flush() calls by * ensuring the workqueue is empty up front. */ drain_workqueue(hdev->workqueue);
This is automated email and please do not reply to this email! Dear submitter, Thank you for submitting the patches to the linux bluetooth mailing list. This is a CI test results with your patch series: PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=873851 ---Test result--- Test Summary: CheckPatch PASS 0.69 seconds GitLint PASS 0.32 seconds SubjectPrefix PASS 0.13 seconds BuildKernel PASS 29.69 seconds CheckAllWarning PASS 31.92 seconds CheckSparse PASS 36.93 seconds CheckSmatch PASS 102.67 seconds BuildKernel32 PASS 28.92 seconds TestRunnerSetup PASS 545.76 seconds TestRunner_l2cap-tester PASS 20.45 seconds TestRunner_iso-tester FAIL 38.60 seconds TestRunner_bnep-tester PASS 4.97 seconds TestRunner_mgmt-tester FAIL 122.62 seconds TestRunner_rfcomm-tester PASS 7.65 seconds TestRunner_sco-tester PASS 15.24 seconds TestRunner_ioctl-tester PASS 8.12 seconds TestRunner_mesh-tester PASS 6.04 seconds TestRunner_smp-tester PASS 7.04 seconds TestRunner_userchan-tester PASS 5.12 seconds IncrementalBuild PASS 29.77 seconds Details ############################## Test: TestRunner_iso-tester - FAIL Desc: Run iso-tester with test-runner Output: Total: 122, Passed: 117 (95.9%), Failed: 1, Not Run: 4 Failed Test Cases ISO Connect Suspend - Success Failed 4.194 seconds ############################## Test: TestRunner_mgmt-tester - FAIL Desc: Run mgmt-tester with test-runner Output: Total: 492, Passed: 489 (99.4%), Failed: 1, Not Run: 2 Failed Test Cases LL Privacy - Remove Device 4 (Disable Adv) Timed out 1.898 seconds --- Regards, Linux Bluetooth
Hi Tetsuo, > Excuse me, but I can't interpret why this patch solves the warning. > > The warning says that list_for_each_entry_rcu() { } in > ieee80211_check_combinations() is called outside of rcu_read_lock() and > rcu_read_unlock() pair, doesn't it? How does that connected to > guarding hci_dev_test_flag() and queue_delayed_work() with rcu_read_lock() > and rcu_read_unlock() pair? Unless you guard list_for_each_entry_rcu() { } > in ieee80211_check_combinations() with rcu_read_lock() and rcu_read_unlock() > pair (or annotate that appropriate locks are already held), I can't expect > that the warning will be solved... Thank you for the code review. Sorry, I apologize for attaching the wrong kernel dump. > Also, what guarantees that drain_workqueue() won't be disturbed by > queue_work(disc_work) which will be called after "timeo" delay, for you are > not explicitly cancelling scheduled "disc_work" (unlike "cmd_timer" work > and "ncmd_timer" work shown below) before calling drain_workqueue() ? > > /* Cancel these to avoid queueing non-chained pending work */ > hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); > /* Wait for > * > * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) > * queue_delayed_work(&hdev->{cmd,ncmd}_timer) > * > * inside RCU section to see the flag or complete scheduling. > */ > synchronize_rcu(); > /* Explicitly cancel works in case scheduled after setting the flag. */ > cancel_delayed_work(&hdev->cmd_timer); > cancel_delayed_work(&hdev->ncmd_timer); > > /* Avoid potential lockdep warnings from the *_flush() calls by > * ensuring the workqueue is empty up front. > */ > drain_workqueue(hdev->workqueue); Please bear with me for a moment. I'll attach the correct kernel dump and resend the patch email. Warm regards, Yunseong Kim
diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 31020891fc68..111509dc1a23 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -1572,8 +1572,13 @@ static inline void hci_conn_drop(struct hci_conn *conn) } cancel_delayed_work(&conn->disc_work); - queue_delayed_work(conn->hdev->workqueue, - &conn->disc_work, timeo); + + rcu_read_lock(); + if (!hci_dev_test_flag(conn->hdev, HCI_CMD_DRAIN_WORKQUEUE)) { + queue_delayed_work(conn->hdev->workqueue, + &conn->disc_work, timeo); + } + rcu_read_unlock(); } }