diff mbox series

[v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled

Message ID 20210514071452.25220-1-kai.heng.feng@canonical.com (mailing list archive)
State New, archived
Headers show
Series [v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled | expand

Commit Message

Kai-Heng Feng May 14, 2021, 7:14 a.m. UTC
Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
stops working:
[  509.691509] Bluetooth: hci0: HCI reset during shutdown failed
[  514.897584] Bluetooth: hci0: MSFT filter_enable is already on
[  530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
[  545.660350] usb 3-10: device descriptor read/64, error -110
[  561.283530] usb 3-10: device descriptor read/64, error -110
[  561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
[  566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
[  568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
[  578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
[  586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
[  596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)

Or kernel panics because other workqueues already freed skb:
[ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 2048.663775] #PF: supervisor read access in kernel mode
[ 2048.663779] #PF: error_code(0x0000) - not-present page
[ 2048.663782] PGD 0 P4D 0
[ 2048.663787] Oops: 0000 [#1] SMP NOPTI
[ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20
[ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
[ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
[ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
07 83 f8 01
[ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
[ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
[ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
[ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
[ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
[ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
[ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
[ 2048.663849] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
[ 2048.663856] PKRU: 55555554
[ 2048.663859] Call Trace:
[ 2048.663865]  ? skb_release_head_state+0x5e/0x80
[ 2048.663873]  kfree_skb+0x2f/0xb0
[ 2048.663881]  btusb_shutdown_intel_new+0x36/0x60 [btusb]
[ 2048.663905]  hci_dev_do_close+0x48c/0x5e0 [bluetooth]
[ 2048.663954]  ? __cond_resched+0x1a/0x50
[ 2048.663962]  hci_rfkill_set_block+0x56/0xa0 [bluetooth]
[ 2048.664007]  rfkill_set_block+0x98/0x170
[ 2048.664016]  rfkill_fop_write+0x136/0x1e0
[ 2048.664022]  vfs_write+0xc7/0x260
[ 2048.664030]  ksys_write+0xb1/0xe0
[ 2048.664035]  ? exit_to_user_mode_prepare+0x37/0x1c0
[ 2048.664042]  __x64_sys_write+0x1a/0x20
[ 2048.664048]  do_syscall_64+0x40/0xb0
[ 2048.664055]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 2048.664060] RIP: 0033:0x7fe02ac23c27
[ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
[ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
[ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
[ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
[ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff

So move the shutdown callback to a place where workqueues are either
flushed or cancelled to resolve the issue.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
v2:
 - Rebased on bluetooth-next.

 net/bluetooth/hci_core.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

Comments

bluez.test.bot@gmail.com May 14, 2021, 8:12 a.m. UTC | #1
This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=482277

---Test result---

Test Summary:
CheckPatch                    PASS      0.75 seconds
GitLint                       FAIL      0.11 seconds
BuildKernel                   PASS      523.24 seconds
TestRunner: Setup             PASS      352.71 seconds
TestRunner: l2cap-tester      PASS      2.68 seconds
TestRunner: bnep-tester       PASS      1.88 seconds
TestRunner: mgmt-tester       PASS      27.21 seconds
TestRunner: rfcomm-tester     PASS      2.03 seconds
TestRunner: sco-tester        PASS      2.04 seconds
TestRunner: smp-tester        PASS      2.18 seconds
TestRunner: userchan-tester   PASS      1.90 seconds

Details
##############################
Test: CheckPatch - PASS - 0.75 seconds
Run checkpatch.pl script with rule in .checkpatch.conf


##############################
Test: GitLint - FAIL - 0.11 seconds
Run gitlint with rule in .gitlint
Bluetooth: Shutdown controller after workqueues are flushed or cancelled
23: B1 Line exceeds max length (102>80): "[ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20"
24: B1 Line exceeds max length (104>80): "[ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020"
26: B1 Line exceeds max length (202>80): "[ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1"
34: B1 Line exceeds max length (91>80): "[ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000"
54: B1 Line exceeds max length (214>80): "[ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24"
55: B1 Line exceeds max length (85>80): "[ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001"


##############################
Test: BuildKernel - PASS - 523.24 seconds
Build Kernel with minimal configuration supports Bluetooth


##############################
Test: TestRunner: Setup - PASS - 352.71 seconds
Setup environment for running Test Runner


##############################
Test: TestRunner: l2cap-tester - PASS - 2.68 seconds
Run test-runner with l2cap-tester
Total: 40, Passed: 40 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: bnep-tester - PASS - 1.88 seconds
Run test-runner with bnep-tester
Total: 1, Passed: 1 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: mgmt-tester - PASS - 27.21 seconds
Run test-runner with mgmt-tester
Total: 416, Passed: 401 (96.4%), Failed: 0, Not Run: 15

##############################
Test: TestRunner: rfcomm-tester - PASS - 2.03 seconds
Run test-runner with rfcomm-tester
Total: 9, Passed: 9 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: sco-tester - PASS - 2.04 seconds
Run test-runner with sco-tester
Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: smp-tester - PASS - 2.18 seconds
Run test-runner with smp-tester
Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: userchan-tester - PASS - 1.90 seconds
Run test-runner with userchan-tester
Total: 3, Passed: 3 (100.0%), Failed: 0, Not Run: 0



---
Regards,
Linux Bluetooth
Marcel Holtmann May 14, 2021, 6:15 p.m. UTC | #2
Hi Kai-Heng,

> Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
> stops working:
> [  509.691509] Bluetooth: hci0: HCI reset during shutdown failed
> [  514.897584] Bluetooth: hci0: MSFT filter_enable is already on
> [  530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> [  545.660350] usb 3-10: device descriptor read/64, error -110
> [  561.283530] usb 3-10: device descriptor read/64, error -110
> [  561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> [  566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
> [  568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
> [  578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> [  586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
> [  596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> 
> Or kernel panics because other workqueues already freed skb:
> [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 2048.663775] #PF: supervisor read access in kernel mode
> [ 2048.663779] #PF: error_code(0x0000) - not-present page
> [ 2048.663782] PGD 0 P4D 0
> [ 2048.663787] Oops: 0000 [#1] SMP NOPTI
> [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20
> [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
> [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
> [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
> 07 83 f8 01
> [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
> [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
> [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
> [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
> [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
> [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
> [ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
> [ 2048.663849] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
> [ 2048.663856] PKRU: 55555554
> [ 2048.663859] Call Trace:
> [ 2048.663865]  ? skb_release_head_state+0x5e/0x80
> [ 2048.663873]  kfree_skb+0x2f/0xb0
> [ 2048.663881]  btusb_shutdown_intel_new+0x36/0x60 [btusb]
> [ 2048.663905]  hci_dev_do_close+0x48c/0x5e0 [bluetooth]
> [ 2048.663954]  ? __cond_resched+0x1a/0x50
> [ 2048.663962]  hci_rfkill_set_block+0x56/0xa0 [bluetooth]
> [ 2048.664007]  rfkill_set_block+0x98/0x170
> [ 2048.664016]  rfkill_fop_write+0x136/0x1e0
> [ 2048.664022]  vfs_write+0xc7/0x260
> [ 2048.664030]  ksys_write+0xb1/0xe0
> [ 2048.664035]  ? exit_to_user_mode_prepare+0x37/0x1c0
> [ 2048.664042]  __x64_sys_write+0x1a/0x20
> [ 2048.664048]  do_syscall_64+0x40/0xb0
> [ 2048.664055]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 2048.664060] RIP: 0033:0x7fe02ac23c27
> [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
> [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
> [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
> [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
> [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff
> 
> So move the shutdown callback to a place where workqueues are either
> flushed or cancelled to resolve the issue.
> 
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
> v2:
> - Rebased on bluetooth-next.
> 
> net/bluetooth/hci_core.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)

patch has been applied to bluetooth-next tree.

Regards

Marcel
Mattijs Korpershoek July 28, 2021, 1:50 p.m. UTC | #3
Hi Kai-Heng, Marcel,

Marcel Holtmann <marcel@holtmann.org> writes:

> Hi Kai-Heng,
>
>> Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
>> stops working:
>> [  509.691509] Bluetooth: hci0: HCI reset during shutdown failed
>> [  514.897584] Bluetooth: hci0: MSFT filter_enable is already on
>> [  530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
>> [  545.660350] usb 3-10: device descriptor read/64, error -110
>> [  561.283530] usb 3-10: device descriptor read/64, error -110
>> [  561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
>> [  566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
>> [  568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
>> [  578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
>> [  586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
>> [  596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)
>> 
>> Or kernel panics because other workqueues already freed skb:
>> [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
>> [ 2048.663775] #PF: supervisor read access in kernel mode
>> [ 2048.663779] #PF: error_code(0x0000) - not-present page
>> [ 2048.663782] PGD 0 P4D 0
>> [ 2048.663787] Oops: 0000 [#1] SMP NOPTI
>> [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20
>> [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
>> [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
>> [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
>> 07 83 f8 01
>> [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
>> [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
>> [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
>> [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
>> [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
>> [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
>> [ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
>> [ 2048.663849] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
>> [ 2048.663856] PKRU: 55555554
>> [ 2048.663859] Call Trace:
>> [ 2048.663865]  ? skb_release_head_state+0x5e/0x80
>> [ 2048.663873]  kfree_skb+0x2f/0xb0
>> [ 2048.663881]  btusb_shutdown_intel_new+0x36/0x60 [btusb]
>> [ 2048.663905]  hci_dev_do_close+0x48c/0x5e0 [bluetooth]
>> [ 2048.663954]  ? __cond_resched+0x1a/0x50
>> [ 2048.663962]  hci_rfkill_set_block+0x56/0xa0 [bluetooth]
>> [ 2048.664007]  rfkill_set_block+0x98/0x170
>> [ 2048.664016]  rfkill_fop_write+0x136/0x1e0
>> [ 2048.664022]  vfs_write+0xc7/0x260
>> [ 2048.664030]  ksys_write+0xb1/0xe0
>> [ 2048.664035]  ? exit_to_user_mode_prepare+0x37/0x1c0
>> [ 2048.664042]  __x64_sys_write+0x1a/0x20
>> [ 2048.664048]  do_syscall_64+0x40/0xb0
>> [ 2048.664055]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [ 2048.664060] RIP: 0033:0x7fe02ac23c27
>> [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
>> [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>> [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
>> [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
>> [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
>> [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
>> [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff
>> 
>> So move the shutdown callback to a place where workqueues are either
>> flushed or cancelled to resolve the issue.
>> 
>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>> ---
>> v2:
>> - Rebased on bluetooth-next.
>> 
>> net/bluetooth/hci_core.c | 16 ++++++++--------
>> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> patch has been applied to bluetooth-next tree.

This patch seems to introduce a regression in the btmtksdio driver.
With this patch applied, I can't enable the hci0 interface anymore on mt8183-pumpkin:

i500-pumpkin login: root
root@i500-pumpkin:~# uname -a
Linux i500-pumpkin 5.14.0-rc3 #94 SMP PREEMPT Wed Jul 28 11:58:20 CEST 2021 aarch64 aarch64 aarch64 GNU/Linux
root@i500-pumpkin:~# hciconfig hci0 up
Can't init device hci0: Connection timed out (110)
root@i500-pumpkin:~# hciconfig hci0 down
root@i500-pumpkin:~# hciconfig hci0 up
Can't init device hci0: Input/output error (5)

Reverting it fixes the above issue.
Any suggestion on how to fix this without touching hci_core ?
Maybe the btmtksdio driver needs some rework. As I'm not familiar with the code, I would appreciate any tips.

Thanks,
Mattijs Korpershoek


>
> Regards
>
> Marcel
Kai-Heng Feng July 28, 2021, 3:25 p.m. UTC | #4
On Wed, Jul 28, 2021 at 9:50 PM Mattijs Korpershoek
<mkorpershoek@baylibre.com> wrote:
>
> Hi Kai-Heng, Marcel,
>
> Marcel Holtmann <marcel@holtmann.org> writes:
>
> > Hi Kai-Heng,
> >
> >> Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
> >> stops working:
> >> [  509.691509] Bluetooth: hci0: HCI reset during shutdown failed
> >> [  514.897584] Bluetooth: hci0: MSFT filter_enable is already on
> >> [  530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> >> [  545.660350] usb 3-10: device descriptor read/64, error -110
> >> [  561.283530] usb 3-10: device descriptor read/64, error -110
> >> [  561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> >> [  566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
> >> [  568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
> >> [  578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> >> [  586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
> >> [  596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> >>
> >> Or kernel panics because other workqueues already freed skb:
> >> [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
> >> [ 2048.663775] #PF: supervisor read access in kernel mode
> >> [ 2048.663779] #PF: error_code(0x0000) - not-present page
> >> [ 2048.663782] PGD 0 P4D 0
> >> [ 2048.663787] Oops: 0000 [#1] SMP NOPTI
> >> [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20
> >> [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
> >> [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
> >> [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
> >> 07 83 f8 01
> >> [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
> >> [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
> >> [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
> >> [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
> >> [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
> >> [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
> >> [ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
> >> [ 2048.663849] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
> >> [ 2048.663856] PKRU: 55555554
> >> [ 2048.663859] Call Trace:
> >> [ 2048.663865]  ? skb_release_head_state+0x5e/0x80
> >> [ 2048.663873]  kfree_skb+0x2f/0xb0
> >> [ 2048.663881]  btusb_shutdown_intel_new+0x36/0x60 [btusb]
> >> [ 2048.663905]  hci_dev_do_close+0x48c/0x5e0 [bluetooth]
> >> [ 2048.663954]  ? __cond_resched+0x1a/0x50
> >> [ 2048.663962]  hci_rfkill_set_block+0x56/0xa0 [bluetooth]
> >> [ 2048.664007]  rfkill_set_block+0x98/0x170
> >> [ 2048.664016]  rfkill_fop_write+0x136/0x1e0
> >> [ 2048.664022]  vfs_write+0xc7/0x260
> >> [ 2048.664030]  ksys_write+0xb1/0xe0
> >> [ 2048.664035]  ? exit_to_user_mode_prepare+0x37/0x1c0
> >> [ 2048.664042]  __x64_sys_write+0x1a/0x20
> >> [ 2048.664048]  do_syscall_64+0x40/0xb0
> >> [ 2048.664055]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> >> [ 2048.664060] RIP: 0033:0x7fe02ac23c27
> >> [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> >> [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> >> [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
> >> [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
> >> [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
> >> [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
> >> [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff
> >>
> >> So move the shutdown callback to a place where workqueues are either
> >> flushed or cancelled to resolve the issue.
> >>
> >> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> >> ---
> >> v2:
> >> - Rebased on bluetooth-next.
> >>
> >> net/bluetooth/hci_core.c | 16 ++++++++--------
> >> 1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > patch has been applied to bluetooth-next tree.
>
> This patch seems to introduce a regression in the btmtksdio driver.
> With this patch applied, I can't enable the hci0 interface anymore on mt8183-pumpkin:
>
> i500-pumpkin login: root
> root@i500-pumpkin:~# uname -a
> Linux i500-pumpkin 5.14.0-rc3 #94 SMP PREEMPT Wed Jul 28 11:58:20 CEST 2021 aarch64 aarch64 aarch64 GNU/Linux
> root@i500-pumpkin:~# hciconfig hci0 up
> Can't init device hci0: Connection timed out (110)
> root@i500-pumpkin:~# hciconfig hci0 down
> root@i500-pumpkin:~# hciconfig hci0 up
> Can't init device hci0: Input/output error (5)
>
> Reverting it fixes the above issue.
> Any suggestion on how to fix this without touching hci_core ?
> Maybe the btmtksdio driver needs some rework. As I'm not familiar with the code, I would appreciate any tips.

Can you please attach dmesg?  Also, full ftrace log on btmtksdio can
also be helpful.

Kai-Heng

>
> Thanks,
> Mattijs Korpershoek
>
>
> >
> > Regards
> >
> > Marcel
Mattijs Korpershoek July 30, 2021, 11:40 a.m. UTC | #5
Hi Kai-Heng,

Kai-Heng Feng <kai.heng.feng@canonical.com> writes:

> On Wed, Jul 28, 2021 at 9:50 PM Mattijs Korpershoek
> <mkorpershoek@baylibre.com> wrote:
>>
>> Hi Kai-Heng, Marcel,
>>
>> Marcel Holtmann <marcel@holtmann.org> writes:
>>
>> > Hi Kai-Heng,
>> >
>> >> Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
>> >> stops working:
>> >> [  509.691509] Bluetooth: hci0: HCI reset during shutdown failed
>> >> [  514.897584] Bluetooth: hci0: MSFT filter_enable is already on
>> >> [  530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
>> >> [  545.660350] usb 3-10: device descriptor read/64, error -110
>> >> [  561.283530] usb 3-10: device descriptor read/64, error -110
>> >> [  561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
>> >> [  566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
>> >> [  568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
>> >> [  578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
>> >> [  586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
>> >> [  596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)
>> >>
>> >> Or kernel panics because other workqueues already freed skb:
>> >> [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
>> >> [ 2048.663775] #PF: supervisor read access in kernel mode
>> >> [ 2048.663779] #PF: error_code(0x0000) - not-present page
>> >> [ 2048.663782] PGD 0 P4D 0
>> >> [ 2048.663787] Oops: 0000 [#1] SMP NOPTI
>> >> [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20
>> >> [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
>> >> [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
>> >> [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
>> >> 07 83 f8 01
>> >> [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
>> >> [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
>> >> [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
>> >> [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
>> >> [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
>> >> [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
>> >> [ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
>> >> [ 2048.663849] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
>> >> [ 2048.663856] PKRU: 55555554
>> >> [ 2048.663859] Call Trace:
>> >> [ 2048.663865]  ? skb_release_head_state+0x5e/0x80
>> >> [ 2048.663873]  kfree_skb+0x2f/0xb0
>> >> [ 2048.663881]  btusb_shutdown_intel_new+0x36/0x60 [btusb]
>> >> [ 2048.663905]  hci_dev_do_close+0x48c/0x5e0 [bluetooth]
>> >> [ 2048.663954]  ? __cond_resched+0x1a/0x50
>> >> [ 2048.663962]  hci_rfkill_set_block+0x56/0xa0 [bluetooth]
>> >> [ 2048.664007]  rfkill_set_block+0x98/0x170
>> >> [ 2048.664016]  rfkill_fop_write+0x136/0x1e0
>> >> [ 2048.664022]  vfs_write+0xc7/0x260
>> >> [ 2048.664030]  ksys_write+0xb1/0xe0
>> >> [ 2048.664035]  ? exit_to_user_mode_prepare+0x37/0x1c0
>> >> [ 2048.664042]  __x64_sys_write+0x1a/0x20
>> >> [ 2048.664048]  do_syscall_64+0x40/0xb0
>> >> [ 2048.664055]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> >> [ 2048.664060] RIP: 0033:0x7fe02ac23c27
>> >> [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
>> >> [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>> >> [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
>> >> [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
>> >> [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
>> >> [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
>> >> [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff
>> >>
>> >> So move the shutdown callback to a place where workqueues are either
>> >> flushed or cancelled to resolve the issue.
>> >>
>> >> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>> >> ---
>> >> v2:
>> >> - Rebased on bluetooth-next.
>> >>
>> >> net/bluetooth/hci_core.c | 16 ++++++++--------
>> >> 1 file changed, 8 insertions(+), 8 deletions(-)
>> >
>> > patch has been applied to bluetooth-next tree.
>>
>> This patch seems to introduce a regression in the btmtksdio driver.
>> With this patch applied, I can't enable the hci0 interface anymore on mt8183-pumpkin:
>>
>> i500-pumpkin login: root
>> root@i500-pumpkin:~# uname -a
>> Linux i500-pumpkin 5.14.0-rc3 #94 SMP PREEMPT Wed Jul 28 11:58:20 CEST 2021 aarch64 aarch64 aarch64 GNU/Linux
>> root@i500-pumpkin:~# hciconfig hci0 up
>> Can't init device hci0: Connection timed out (110)
>> root@i500-pumpkin:~# hciconfig hci0 down
>> root@i500-pumpkin:~# hciconfig hci0 up
>> Can't init device hci0: Input/output error (5)
>>
>> Reverting it fixes the above issue.
>> Any suggestion on how to fix this without touching hci_core ?
>> Maybe the btmtksdio driver needs some rework. As I'm not familiar with the code, I would appreciate any tips.
>
> Can you please attach dmesg?  Also, full ftrace log on btmtksdio can
> also be helpful.
Thank you for your help. Sorry I did not post the logs previously.

dmesg: https://pastebin.com/tpWDNyQr
ftrace on btmtksdio: https://pastebin.com/jmhvmwUw

Mattijs
>
> Kai-Heng
>
>>
>> Thanks,
>> Mattijs Korpershoek
>>
>>
>> >
>> > Regards
>> >
>> > Marcel
Kai-Heng Feng Aug. 3, 2021, 6:42 a.m. UTC | #6
Hi Mattijs,

On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek
<mkorpershoek@baylibre.com> wrote:
>
> Hi Kai-Heng,

[snipped]

> Thank you for your help. Sorry I did not post the logs previously.
>
> dmesg: https://pastebin.com/tpWDNyQr
> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw

Seems like btmtksdio needs shudown() to be called before flush().
Since the order was there for a very long time, changing the calling
order indeed can break what driver expects.
Can you please test the following patch:
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 2560ed2f144d..a61e610a400c 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
        aosp_do_close(hdev);
        msft_do_close(hdev);

+       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
+           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
+           test_bit(HCI_UP, &hdev->flags)) {
+               /* Execute vendor specific shutdown routine */
+               if (hdev->shutdown)
+                       hdev->shutdown(hdev);
+       }
+
        if (hdev->flush)
                hdev->flush(hdev);

@@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
                clear_bit(HCI_INIT, &hdev->flags);
        }

-       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
-           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
-           test_bit(HCI_UP, &hdev->flags)) {
-               /* Execute vendor specific shutdown routine */
-               if (hdev->shutdown)
-                       hdev->shutdown(hdev);
-       }
-
        /* flush cmd  work */
        flush_work(&hdev->cmd_work);

Kai-Heng

>
> Mattijs
> >
> > Kai-Heng
> >
> >>
> >> Thanks,
> >> Mattijs Korpershoek
> >>
> >>
> >> >
> >> > Regards
> >> >
> >> > Marcel
Kai-Heng Feng Aug. 3, 2021, 6:45 a.m. UTC | #7
On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <hdanton@sina.com> wrote:
>
> On Fri, 14 May 2021 15:14:52 +0800 Kai-Heng Feng wrote:
> > Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
> > stops working:
> > [  509.691509] Bluetooth: hci0: HCI reset during shutdown failed
> > [  514.897584] Bluetooth: hci0: MSFT filter_enable is already on
> > [  530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> > [  545.660350] usb 3-10: device descriptor read/64, error -110
> > [  561.283530] usb 3-10: device descriptor read/64, error -110
> > [  561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> > [  566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
> > [  568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
> > [  578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> > [  586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
> > [  596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> >
> > Or kernel panics because other workqueues already freed skb:
> > [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > [ 2048.663775] #PF: supervisor read access in kernel mode
> > [ 2048.663779] #PF: error_code(0x0000) - not-present page
> > [ 2048.663782] PGD 0 P4D 0
> > [ 2048.663787] Oops: 0000 [#1] SMP NOPTI
> > [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20
> > [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
> > [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
> > [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
> > 07 83 f8 01
> > [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
> > [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
> > [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
> > [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
> > [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
> > [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
> > [ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
> > [ 2048.663849] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
> > [ 2048.663856] PKRU: 55555554
> > [ 2048.663859] Call Trace:
> > [ 2048.663865]  ? skb_release_head_state+0x5e/0x80
> > [ 2048.663873]  kfree_skb+0x2f/0xb0
> > [ 2048.663881]  btusb_shutdown_intel_new+0x36/0x60 [btusb]
> > [ 2048.663905]  hci_dev_do_close+0x48c/0x5e0 [bluetooth]
> > [ 2048.663954]  ? __cond_resched+0x1a/0x50
> > [ 2048.663962]  hci_rfkill_set_block+0x56/0xa0 [bluetooth]
> > [ 2048.664007]  rfkill_set_block+0x98/0x170
> > [ 2048.664016]  rfkill_fop_write+0x136/0x1e0
> > [ 2048.664022]  vfs_write+0xc7/0x260
> > [ 2048.664030]  ksys_write+0xb1/0xe0
> > [ 2048.664035]  ? exit_to_user_mode_prepare+0x37/0x1c0
> > [ 2048.664042]  __x64_sys_write+0x1a/0x20
> > [ 2048.664048]  do_syscall_64+0x40/0xb0
> > [ 2048.664055]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 2048.664060] RIP: 0033:0x7fe02ac23c27
> > [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> > [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
> > [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
> > [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
> > [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
> > [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff
> >
> > So move the shutdown callback to a place where workqueues are either
> > flushed or cancelled to resolve the issue.
> >
> > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > ---
> > v2:
> >  - Rebased on bluetooth-next.
> >
> >  net/bluetooth/hci_core.c | 16 ++++++++--------
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index 7baf93eda936..6eedf334f943 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -1716,14 +1716,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
> >
> >       BT_DBG("%s %p", hdev->name, hdev);
> >
> > -     if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > -         !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > -         test_bit(HCI_UP, &hdev->flags)) {
> > -             /* Execute vendor specific shutdown routine */
> > -             if (hdev->shutdown)
> > -                     hdev->shutdown(hdev);
> > -     }
> > -
> >       cancel_delayed_work(&hdev->power_off);
> >       cancel_delayed_work(&hdev->ncmd_timer);
> >
> > @@ -1801,6 +1793,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
> >               clear_bit(HCI_INIT, &hdev->flags);
> >       }
> >
> > +     if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > +         !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > +         test_bit(HCI_UP, &hdev->flags)) {
> > +             /* Execute vendor specific shutdown routine */
> > +             if (hdev->shutdown)
> > +                     hdev->shutdown(hdev);
> > +     }
> > +
> >       /* flush cmd  work */
> >       flush_work(&hdev->cmd_work);
> >
> > --
> > 2.30.2
>
> btusb_shutdown_intel_new
>   __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT);
>     __hci_cmd_sync_ev(hdev, opcode, plen, param, 0, timeout);
>       hci_req_run_skb(&req, hci_req_sync_complete);
>
> hci_req_sync_complete
>   if (skb)
>         hdev->req_skb = skb_get(skb);
>
> Given the skb_get in hci_req_sync_complete makes it safe to free skb on
> driver side, I doubt this patch is the correct fix as it is.

Some workqueues are still active.
The shutdown() should be called at least after hci_request_cancel_all().

Kai-Heng
Mattijs Korpershoek Aug. 3, 2021, 8:21 a.m. UTC | #8
Hi Kai-Heng,

Kai-Heng Feng <kai.heng.feng@canonical.com> writes:

> Hi Mattijs,
>
> On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek
> <mkorpershoek@baylibre.com> wrote:
>>
>> Hi Kai-Heng,
>
> [snipped]
>
>> Thank you for your help. Sorry I did not post the logs previously.
>>
>> dmesg: https://pastebin.com/tpWDNyQr
>> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw
>
> Seems like btmtksdio needs shudown() to be called before flush().
> Since the order was there for a very long time, changing the calling
> order indeed can break what driver expects.
> Can you please test the following patch:
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 2560ed2f144d..a61e610a400c 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
>         aosp_do_close(hdev);
>         msft_do_close(hdev);
>
> +       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> +           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> +           test_bit(HCI_UP, &hdev->flags)) {
> +               /* Execute vendor specific shutdown routine */
> +               if (hdev->shutdown)
> +                       hdev->shutdown(hdev);
> +       }
> +
>         if (hdev->flush)
>                 hdev->flush(hdev);
>
> @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
>                 clear_bit(HCI_INIT, &hdev->flags);
>         }
>
> -       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> -           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> -           test_bit(HCI_UP, &hdev->flags)) {
> -               /* Execute vendor specific shutdown routine */
> -               if (hdev->shutdown)
> -                       hdev->shutdown(hdev);
> -       }
> -
>         /* flush cmd  work */
>         flush_work(&hdev->cmd_work);

Thanks for the patch and your help.
I've tried it, but it seems that it does not improve for me.
I'm still observing:

i500-pumpkin login: root                                                                  
root@i500-pumpkin:~# hciconfig hci0 up                                                    
Can't init device hci0: Connection timed out (110)

Logs for this session:
dmesg:   https://pastebin.com/iAFk5Tzi
ftrace:  https://pastebin.com/kEMWSYrE


>
> Kai-Heng
>
>>
>> Mattijs
>> >
>> > Kai-Heng
>> >
>> >>
>> >> Thanks,
>> >> Mattijs Korpershoek
>> >>
>> >>
>> >> >
>> >> > Regards
>> >> >
>> >> > Marcel
Kai-Heng Feng Aug. 4, 2021, 2:35 p.m. UTC | #9
On Tue, Aug 3, 2021 at 3:47 PM Hillf Danton <hdanton@sina.com> wrote:
>
> On Tue, 3 Aug 2021 14:45:07 +0800 Kai-Heng Feng wrote:
> >On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <hdanton@sina.com> wrote:
> >>
> >> Given the skb_get in hci_req_sync_complete makes it safe to free skb on
> >> driver side, I doubt this patch is the correct fix as it is.
> >
> >Some workqueues are still active.
> >The shutdown() should be called at least after hci_request_cancel_all().
>
> What is muddy then is how active workqueues prevent skb_get from protecting
> kfree_skb. Can you spot what workqueue it is?

I managed to reproduce the issue with another kernel splat:
------------[ cut here ]------------
kernel BUG at mm/slub.c:321!
invalid opcode: 0000 [#1] SMP NOPTI
CPU: 2 PID: 2208 Comm: kworker/u9:3 Not tainted 5.14.0-rc4+ #16
Hardware name: HP HP ProBook 650 G8 Notebook PC/87ED, BIOS T74 Ver.
01.03.04 01/07/2021
Workqueue: hci0 discov_update [bluetooth]
RIP: 0010:__slab_free+0x20c/0x3a0
Code: 00 44 0f b6 54 24 1a 8b 74 24 14 44 0f b6 4c 24 1b 44 8b 44 24
1c 48 89 44 24 08 48 8b 54 24 20 48 8b 7c 24 28 e9 ad fe ff ff <0f> 0b
49 3b 54 24 28 0f 85 6b ff ff ff 49 89 5c 24 20 49 89 4c 24
RSP: 0018:ffffaa0e4164fc50 EFLAGS: 00010246
RAX: ffff9cc9a217e668 RBX: ffff9cc9a217e600 RCX: ffff9cc9a217e600
RDX: 000000008010000e RSI: ffffd09044885f80 RDI: ffff9cc980e96500
RBP: ffffaa0e4164fd00 R08: 0000000000000001 R09: ffffffff885b3a4e
R10: ffff9cc999aab800 R11: ffff9cc9a217e600 R12: ffffd09044885f80
R13: ffff9cc9a217e600 R14: ffff9cc980e96500 R15: ffff9cc9a217e600
FS:  0000000000000000(0000) GS:ffff9cca2b900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe164d5b98 CR3: 000000013f410002 CR4: 0000000000770ee0
PKRU: 55555554
Call Trace:
 ? psi_task_switch+0xc3/0x1e0
 ? __switch_to_asm+0x36/0x70
 ? skb_free_head+0x67/0x80
 kmem_cache_free+0x370/0x3d0
 ? kfree_skbmem+0x4e/0x90
 kfree_skbmem+0x4e/0x90
 kfree_skb+0x47/0xb0
 __hci_req_sync+0x134/0x2a0 [bluetooth]
 ? wait_woken+0x70/0x70
 discov_update+0x2ae/0x310 [bluetooth]
 process_one_work+0x21d/0x3c0
 worker_thread+0x53/0x420
 ? process_one_work+0x3c0/0x3c0
 kthread+0x127/0x150
 ? set_kthread_struct+0x50/0x50
 ret_from_fork+0x1f/0x30
Modules linked in: rfcomm cmac algif_hash algif_skcipher af_alg bnep
nls_iso8859_1 snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common
snd_soc_hdac_hdmi snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic snd_soc_dmic snd_sof_pci_intel_tgl
snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_pci snd_sof
snd_sof_xtensa_dsp snd_sof_intel_hda snd_hda_ext_core
snd_soc_acpi_intel_match snd_soc_acpi ledtrig_audio snd_soc_core
snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core
snd_pcm snd_seq i915 snd_seq_device snd_timer hp_wmi intel_tcc_cooling
x86_pkg_temp_thermal intel_powerclamp i2c_algo_bit coretemp joydev
kvm_intel ttm mei_hdcp intel_rapl_msr platform_profile wmi_bmof kvm
uvcvideo crct10dif_pclmul btusb videobuf2_vmalloc videobuf2_memops
drm_kms_helper btrtl videobuf2_v4l2 crc32_pclmul btbcm
ghash_clmulni_intel input_leds videobuf2_common btintel snd videodev
syscopyarea sysfillrect sysimgblt aesni_intel mc serio_raw fb_sys_fops
bluetooth crypto_simd
 cec cryptd intel_cstate ecdh_generic efi_pstore ecc rc_core
hid_multitouch processor_thermal_device_pci_legacy mei_me
intel_soc_dts_iosf processor_thermal_device mei soundcore
processor_thermal_rfim ee1004 processor_thermal_mbox
processor_thermal_rapl intel_pmt_telemetry intel_rapl_common
intel_pmt_class ucsi_acpi typec_ucsi typec wmi soc_button_array
int3403_thermal int340x_thermal_zone video int3400_thermal
acpi_thermal_rel acpi_pad mac_hid intel_hid sparse_keymap sch_fq_codel
msr parport_pc ppdev lp drm parport ip_tables x_tables autofs4
hid_generic nvme nvme_core intel_lpss_pci e1000e intel_lpss i2c_i801
idma64 i2c_smbus xhci_pci xhci_pci_renesas vmd intel_pmt i2c_hid_acpi
i2c_hid hid pinctrl_tigerlake
---[ end trace c09445d4697039ed ]---

So hci_request_cancel_all() -> cancel_work_sync(&hdev->discov_update)
and can prevent the race from happening.

And the kernel splat is just one symptom of the issue, most of the
time it's just "Bluetooth: hci0: HCI reset during shutdown failed" in
dmesg.

Kai-Heng
Kai-Heng Feng Aug. 4, 2021, 2:42 p.m. UTC | #10
On Tue, Aug 3, 2021 at 4:21 PM Mattijs Korpershoek
<mkorpershoek@baylibre.com> wrote:
>
> Hi Kai-Heng,
>
> Kai-Heng Feng <kai.heng.feng@canonical.com> writes:
>
> > Hi Mattijs,
> >
> > On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek
> > <mkorpershoek@baylibre.com> wrote:
> >>
> >> Hi Kai-Heng,
> >
> > [snipped]
> >
> >> Thank you for your help. Sorry I did not post the logs previously.
> >>
> >> dmesg: https://pastebin.com/tpWDNyQr
> >> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw
> >
> > Seems like btmtksdio needs shudown() to be called before flush().
> > Since the order was there for a very long time, changing the calling
> > order indeed can break what driver expects.
> > Can you please test the following patch:
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index 2560ed2f144d..a61e610a400c 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
> >         aosp_do_close(hdev);
> >         msft_do_close(hdev);
> >
> > +       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > +           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > +           test_bit(HCI_UP, &hdev->flags)) {
> > +               /* Execute vendor specific shutdown routine */
> > +               if (hdev->shutdown)
> > +                       hdev->shutdown(hdev);
> > +       }
> > +
> >         if (hdev->flush)
> >                 hdev->flush(hdev);
> >
> > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
> >                 clear_bit(HCI_INIT, &hdev->flags);
> >         }
> >
> > -       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > -           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > -           test_bit(HCI_UP, &hdev->flags)) {
> > -               /* Execute vendor specific shutdown routine */
> > -               if (hdev->shutdown)
> > -                       hdev->shutdown(hdev);
> > -       }
> > -
> >         /* flush cmd  work */
> >         flush_work(&hdev->cmd_work);
>
> Thanks for the patch and your help.
> I've tried it, but it seems that it does not improve for me.
> I'm still observing:
>
> i500-pumpkin login: root
> root@i500-pumpkin:~# hciconfig hci0 up
> Can't init device hci0: Connection timed out (110)
>
> Logs for this session:
> dmesg:   https://pastebin.com/iAFk5Tzi
> ftrace:  https://pastebin.com/kEMWSYrE

Thanks for the testing!
What about moving the shutdown() part right after hci_req_sync_lock()
so tx/rx can still work:

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 2560ed2f144d4..be3113fb7d4b0 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1727,6 +1727,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
        hci_request_cancel_all(hdev);
        hci_req_sync_lock(hdev);

+       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
+           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
+           test_bit(HCI_UP, &hdev->flags)) {
+               /* Execute vendor specific shutdown routine */
+               if (hdev->shutdown)
+                       hdev->shutdown(hdev);
+       }
+
        if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
                cancel_delayed_work_sync(&hdev->cmd_timer);
                hci_req_sync_unlock(hdev);
@@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
                clear_bit(HCI_INIT, &hdev->flags);
        }

-       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
-           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
-           test_bit(HCI_UP, &hdev->flags)) {
-               /* Execute vendor specific shutdown routine */
-               if (hdev->shutdown)
-                       hdev->shutdown(hdev);
-       }
-
        /* flush cmd  work */
        flush_work(&hdev->cmd_work);





>
>
> >
> > Kai-Heng
> >
> >>
> >> Mattijs
> >> >
> >> > Kai-Heng
> >> >
> >> >>
> >> >> Thanks,
> >> >> Mattijs Korpershoek
> >> >>
> >> >>
> >> >> >
> >> >> > Regards
> >> >> >
> >> >> > Marcel
Kai-Heng Feng Aug. 5, 2021, 3:44 a.m. UTC | #11
On Thu, Aug 5, 2021 at 11:00 AM Hillf Danton <hdanton@sina.com> wrote:
>
> On Wed, 4 Aug 2021 22:35:43 +0800 Kai-Heng Feng wrote:
> >On Tue, Aug 3, 2021 at 3:47 PM Hillf Danton <hdanton@sina.com> wrote:
> >>
> >> On Tue, 3 Aug 2021 14:45:07 +0800 Kai-Heng Feng wrote:
> >> >On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <hdanton@sina.com> wrote:
> >> >>
> >> >> Given the skb_get in hci_req_sync_complete makes it safe to free skb on
> >> >> driver side, I doubt this patch is the correct fix as it is.
> >> >
> >> >Some workqueues are still active.
> >> >The shutdown() should be called at least after hci_request_cancel_all().
> >>
> >> What is muddy then is how active workqueues prevent skb_get from protecting
> >> kfree_skb. Can you spot what workqueue it is?
> >
> >I managed to reproduce the issue with another kernel splat:
> >------------[ cut here ]------------
> >kernel BUG at mm/slub.c:321!
> >invalid opcode: 0000 [#1] SMP NOPTI
> >CPU: 2 PID: 2208 Comm: kworker/u9:3 Not tainted 5.14.0-rc4+ #16
> >Hardware name: HP HP ProBook 650 G8 Notebook PC/87ED, BIOS T74 Ver.
> >01.03.04 01/07/2021
> >Workqueue: hci0 discov_update [bluetooth]
> >RIP: 0010:__slab_free+0x20c/0x3a0
> >Code: 00 44 0f b6 54 24 1a 8b 74 24 14 44 0f b6 4c 24 1b 44 8b 44 24
> >1c 48 89 44 24 08 48 8b 54 24 20 48 8b 7c 24 28 e9 ad fe ff ff <0f> 0b
> >49 3b 54 24 28 0f 85 6b ff ff ff 49 89 5c 24 20 49 89 4c 24
> >RSP: 0018:ffffaa0e4164fc50 EFLAGS: 00010246
> >RAX: ffff9cc9a217e668 RBX: ffff9cc9a217e600 RCX: ffff9cc9a217e600
> >RDX: 000000008010000e RSI: ffffd09044885f80 RDI: ffff9cc980e96500
> >RBP: ffffaa0e4164fd00 R08: 0000000000000001 R09: ffffffff885b3a4e
> >R10: ffff9cc999aab800 R11: ffff9cc9a217e600 R12: ffffd09044885f80
> >R13: ffff9cc9a217e600 R14: ffff9cc980e96500 R15: ffff9cc9a217e600
> >FS:  0000000000000000(0000) GS:ffff9cca2b900000(0000) knlGS:0000000000000000
> >CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >CR2: 00007ffe164d5b98 CR3: 000000013f410002 CR4: 0000000000770ee0
> >PKRU: 55555554
> >Call Trace:
> > ? psi_task_switch+0xc3/0x1e0
> > ? __switch_to_asm+0x36/0x70
> > ? skb_free_head+0x67/0x80
> > kmem_cache_free+0x370/0x3d0
> > ? kfree_skbmem+0x4e/0x90
> > kfree_skbmem+0x4e/0x90
> > kfree_skb+0x47/0xb0
> > __hci_req_sync+0x134/0x2a0 [bluetooth]
> > ? wait_woken+0x70/0x70
> > discov_update+0x2ae/0x310 [bluetooth]
> > process_one_work+0x21d/0x3c0
> > worker_thread+0x53/0x420
> > ? process_one_work+0x3c0/0x3c0
> > kthread+0x127/0x150
> > ? set_kthread_struct+0x50/0x50
> > ret_from_fork+0x1f/0x30
>
> [...]
>
> >
> >So hci_request_cancel_all() -> cancel_work_sync(&hdev->discov_update)
> >and can prevent the race from happening.
>
> Given
>
> __hci_req_sync
>   err = hci_req_run_skb(&req, hci_req_sync_complete);
>   kfree_skb(hdev->req_skb);
>
> hci_req_sync_complete
>   if (skb)
>         hdev->req_skb = skb_get(skb);
>
> once more skb_get makes the above race hard to understand, though it should
> better check error before freeing skb there to avoid blind free.
>
> >
> >And the kernel splat is just one symptom of the issue, most of the
>
> It fails to support your reasoning so far.

How? Most of the time the BT controller can't be brought up again
after shutdown(), and we need to stop other activities before that.
What other reasoning is expected?

Kai-Heng

>
> >time it's just "Bluetooth: hci0: HCI reset during shutdown failed" in
> >dmesg.
> >
> >Kai-Heng
>
>
> +++ x/net/bluetooth/hci_request.c
> @@ -257,8 +257,10 @@ int __hci_req_sync(struct hci_dev *hdev,
>                 break;
>         }
>
> -       kfree_skb(hdev->req_skb);
> -       hdev->req_skb = NULL;
> +       if (!err) {
> +               kfree_skb(hdev->req_skb);
> +               hdev->req_skb = NULL;
> +       }
>         hdev->req_status = hdev->req_result = 0;
>
>         bt_dev_dbg(hdev, "end: err %d", err);
Hsin-Yi Wang Aug. 5, 2021, 6:12 a.m. UTC | #12
On Tue, Aug 3, 2021 at 2:45 PM Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
>
> On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <hdanton@sina.com> wrote:
> >
> > On Fri, 14 May 2021 15:14:52 +0800 Kai-Heng Feng wrote:
> > > Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
> > > stops working:
> > > [  509.691509] Bluetooth: hci0: HCI reset during shutdown failed
> > > [  514.897584] Bluetooth: hci0: MSFT filter_enable is already on
> > > [  530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> > > [  545.660350] usb 3-10: device descriptor read/64, error -110
> > > [  561.283530] usb 3-10: device descriptor read/64, error -110
> > > [  561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> > > [  566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
> > > [  568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
> > > [  578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> > > [  586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
> > > [  596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> > >
> > > Or kernel panics because other workqueues already freed skb:
> > > [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > > [ 2048.663775] #PF: supervisor read access in kernel mode
> > > [ 2048.663779] #PF: error_code(0x0000) - not-present page
> > > [ 2048.663782] PGD 0 P4D 0
> > > [ 2048.663787] Oops: 0000 [#1] SMP NOPTI
> > > [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20
> > > [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
> > > [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
> > > [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
> > > 07 83 f8 01
> > > [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
> > > [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
> > > [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
> > > [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
> > > [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
> > > [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
> > > [ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
> > > [ 2048.663849] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
> > > [ 2048.663856] PKRU: 55555554
> > > [ 2048.663859] Call Trace:
> > > [ 2048.663865]  ? skb_release_head_state+0x5e/0x80
> > > [ 2048.663873]  kfree_skb+0x2f/0xb0
> > > [ 2048.663881]  btusb_shutdown_intel_new+0x36/0x60 [btusb]
> > > [ 2048.663905]  hci_dev_do_close+0x48c/0x5e0 [bluetooth]
> > > [ 2048.663954]  ? __cond_resched+0x1a/0x50
> > > [ 2048.663962]  hci_rfkill_set_block+0x56/0xa0 [bluetooth]
> > > [ 2048.664007]  rfkill_set_block+0x98/0x170
> > > [ 2048.664016]  rfkill_fop_write+0x136/0x1e0
> > > [ 2048.664022]  vfs_write+0xc7/0x260
> > > [ 2048.664030]  ksys_write+0xb1/0xe0
> > > [ 2048.664035]  ? exit_to_user_mode_prepare+0x37/0x1c0
> > > [ 2048.664042]  __x64_sys_write+0x1a/0x20
> > > [ 2048.664048]  do_syscall_64+0x40/0xb0
> > > [ 2048.664055]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > [ 2048.664060] RIP: 0033:0x7fe02ac23c27
> > > [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> > > [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > > [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
> > > [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
> > > [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
> > > [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
> > > [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff
> > >
> > > So move the shutdown callback to a place where workqueues are either
> > > flushed or cancelled to resolve the issue.
> > >
> > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > > ---
> > > v2:
> > >  - Rebased on bluetooth-next.
> > >
> > >  net/bluetooth/hci_core.c | 16 ++++++++--------
> > >  1 file changed, 8 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > > index 7baf93eda936..6eedf334f943 100644
> > > --- a/net/bluetooth/hci_core.c
> > > +++ b/net/bluetooth/hci_core.c
> > > @@ -1716,14 +1716,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
> > >
> > >       BT_DBG("%s %p", hdev->name, hdev);
> > >
> > > -     if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > > -         !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > > -         test_bit(HCI_UP, &hdev->flags)) {
> > > -             /* Execute vendor specific shutdown routine */
> > > -             if (hdev->shutdown)
> > > -                     hdev->shutdown(hdev);
> > > -     }
> > > -
> > >       cancel_delayed_work(&hdev->power_off);
> > >       cancel_delayed_work(&hdev->ncmd_timer);
> > >
> > > @@ -1801,6 +1793,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
> > >               clear_bit(HCI_INIT, &hdev->flags);
> > >       }
> > >
> > > +     if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > > +         !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > > +         test_bit(HCI_UP, &hdev->flags)) {
> > > +             /* Execute vendor specific shutdown routine */
> > > +             if (hdev->shutdown)
> > > +                     hdev->shutdown(hdev);
> > > +     }
> > > +
> > >       /* flush cmd  work */
> > >       flush_work(&hdev->cmd_work);
> > >
> > > --
> > > 2.30.2
> >
> > btusb_shutdown_intel_new
> >   __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT);
> >     __hci_cmd_sync_ev(hdev, opcode, plen, param, 0, timeout);
> >       hci_req_run_skb(&req, hci_req_sync_complete);
> >
> > hci_req_sync_complete
> >   if (skb)
> >         hdev->req_skb = skb_get(skb);
> >
> > Given the skb_get in hci_req_sync_complete makes it safe to free skb on
> > driver side, I doubt this patch is the correct fix as it is.
>
> Some workqueues are still active.
> The shutdown() should be called at least after hci_request_cancel_all().
>
Hello,

The original patch 60789afc02f592b8d91217b60930e7a76271ae07
("Bluetooth: Shutdown controller after workqueues are flushed or
cancelled") is causing the tx fail when downloading fw on mediatek
mt8183 device using QCA bluetooth:

[  225.205061] Bluetooth: qca_download_firmware() hci0: QCA
Downloading qca/rampatch_00440302.bin
[  227.252653] Bluetooth: hci_cmd_timeout() hci0: command 0xfc00 tx timeout
...
follows by a lot of:
[  223.604971] Bluetooth: qca_recv() hci0: Frame reassembly failed (-84)
[  223.605027] Bluetooth: qca_recv() hci0: Frame reassembly failed (-84)

After applying the fixup to allow tx, the issue is solved.


> Kai-Heng
Mattijs Korpershoek Aug. 5, 2021, 6:55 a.m. UTC | #13
Hi Kai-Heng,

Thanks for your patch,

Kai-Heng Feng <kai.heng.feng@canonical.com> writes:

> On Tue, Aug 3, 2021 at 4:21 PM Mattijs Korpershoek
> <mkorpershoek@baylibre.com> wrote:
>>
>> Hi Kai-Heng,
>>
>> Kai-Heng Feng <kai.heng.feng@canonical.com> writes:
>>
>> > Hi Mattijs,
>> >
>> > On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek
>> > <mkorpershoek@baylibre.com> wrote:
>> >>
>> >> Hi Kai-Heng,
>> >
>> > [snipped]
>> >
>> >> Thank you for your help. Sorry I did not post the logs previously.
>> >>
>> >> dmesg: https://pastebin.com/tpWDNyQr
>> >> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw
>> >
>> > Seems like btmtksdio needs shudown() to be called before flush().
>> > Since the order was there for a very long time, changing the calling
>> > order indeed can break what driver expects.
>> > Can you please test the following patch:
>> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> > index 2560ed2f144d..a61e610a400c 100644
>> > --- a/net/bluetooth/hci_core.c
>> > +++ b/net/bluetooth/hci_core.c
>> > @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
>> >         aosp_do_close(hdev);
>> >         msft_do_close(hdev);
>> >
>> > +       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
>> > +           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
>> > +           test_bit(HCI_UP, &hdev->flags)) {
>> > +               /* Execute vendor specific shutdown routine */
>> > +               if (hdev->shutdown)
>> > +                       hdev->shutdown(hdev);
>> > +       }
>> > +
>> >         if (hdev->flush)
>> >                 hdev->flush(hdev);
>> >
>> > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
>> >                 clear_bit(HCI_INIT, &hdev->flags);
>> >         }
>> >
>> > -       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
>> > -           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
>> > -           test_bit(HCI_UP, &hdev->flags)) {
>> > -               /* Execute vendor specific shutdown routine */
>> > -               if (hdev->shutdown)
>> > -                       hdev->shutdown(hdev);
>> > -       }
>> > -
>> >         /* flush cmd  work */
>> >         flush_work(&hdev->cmd_work);
>>
>> Thanks for the patch and your help.
>> I've tried it, but it seems that it does not improve for me.
>> I'm still observing:
>>
>> i500-pumpkin login: root
>> root@i500-pumpkin:~# hciconfig hci0 up
>> Can't init device hci0: Connection timed out (110)
>>
>> Logs for this session:
>> dmesg:   https://pastebin.com/iAFk5Tzi
>> ftrace:  https://pastebin.com/kEMWSYrE
>
> Thanks for the testing!
> What about moving the shutdown() part right after hci_req_sync_lock()
> so tx/rx can still work:
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 2560ed2f144d4..be3113fb7d4b0 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -1727,6 +1727,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
>         hci_request_cancel_all(hdev);
>         hci_req_sync_lock(hdev);
>
> +       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> +           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> +           test_bit(HCI_UP, &hdev->flags)) {
> +               /* Execute vendor specific shutdown routine */
> +               if (hdev->shutdown)
> +                       hdev->shutdown(hdev);
> +       }
> +
>         if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
>                 cancel_delayed_work_sync(&hdev->cmd_timer);
>                 hci_req_sync_unlock(hdev);
> @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
>                 clear_bit(HCI_INIT, &hdev->flags);
>         }
>
> -       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> -           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> -           test_bit(HCI_UP, &hdev->flags)) {
> -               /* Execute vendor specific shutdown routine */
> -               if (hdev->shutdown)
> -                       hdev->shutdown(hdev);
> -       }
> -
>         /* flush cmd  work */
>         flush_work(&hdev->cmd_work);
I confirm this diff works for me:

root@i500-pumpkin:~# hciconfig hci0 up
root@i500-pumpkin:~# hciconfig hci0 down
root@i500-pumpkin:~# hciconfig hci0 up
root@i500-pumpkin:~# hciconfig hci0
hci0:   Type: Primary  Bus: SDIO
        BD Address: 00:0C:E7:55:FF:12  ACL MTU: 1021:8  SCO MTU: 244:4
        UP RUNNING 
        RX bytes:11268 acl:0 sco:0 events:829 errors:0
        TX bytes:182569 acl:0 sco:0 commands:829 errors:0

root@i500-pumpkin:~# hcitool scan 
Scanning ...
        <redacted>       Pixel 3 XL

Tested-by: Mattijs Korpershoek <mkorpershoek@baylibre.com>
>
>
>
>
>
>>
>>
>> >
>> > Kai-Heng
>> >
>> >>
>> >> Mattijs
>> >> >
>> >> > Kai-Heng
>> >> >
>> >> >>
>> >> >> Thanks,
>> >> >> Mattijs Korpershoek
>> >> >>
>> >> >>
>> >> >> >
>> >> >> > Regards
>> >> >> >
>> >> >> > Marcel
Hsin-Yi Wang Aug. 5, 2021, 7:04 a.m. UTC | #14
On Thu, Aug 5, 2021 at 3:01 PM Hillf Danton <hdanton@sina.com> wrote:
>
> On Thu, 5 Aug 2021 14:12:10 +0800 Hsin-Yi Wang wrote:
> >
> >The original patch 60789afc02f592b8d91217b60930e7a76271ae07
> >("Bluetooth: Shutdown controller after workqueues are flushed or
> >cancelled") is causing the tx fail when downloading fw on mediatek
> >mt8183 device using QCA bluetooth:
>
> Want to know if reverting 60789afc02f59 is a cure.

Reverting 60789afc02f592b8d91217b60930e7a76271ae07 can also fixed the issue.
>
> >
> >[  225.205061] Bluetooth: qca_download_firmware() hci0: QCA
> >Downloading qca/rampatch_00440302.bin
> >[  227.252653] Bluetooth: hci_cmd_timeout() hci0: command 0xfc00 tx timeout
> >...
> >follows by a lot of:
> >[  223.604971] Bluetooth: qca_recv() hci0: Frame reassembly failed (-84)
> >[  223.605027] Bluetooth: qca_recv() hci0: Frame reassembly failed (-84)
> >
> >After applying the fixup to allow tx, the issue is solved.
>
> Feel free to specify the fixup and its link to 60789afc02f59.

The fixup is pasted by Kai-Heng here:
https://lkml.org/lkml/2021/8/3/103
Kai-Heng Feng Aug. 5, 2021, 7:19 a.m. UTC | #15
[snipped]

> >How? Most of the time the BT controller can't be brought up again
> >after shutdown(), and we need to stop other activities before that.
> >What other reasoning is expected?
>
> See below.
> >
> >Kai-Heng
> >
> >>
> >> >time it's just "Bluetooth: hci0: HCI reset during shutdown failed" in
> >> >dmesg.
>
> In drivers/bluetooth/btusb.c, there are three cases of
>
>         bt_dev_err(hdev, "HCI reset during shutdown failed");
>
> and in btusb_shutdown_intel_new() it has nothing to do with kfree_skb()
> because of IS_ERR(skb).

No, kfree_skb() doesn't gets called in this case. But when that
happens the BT controller won't work anymore.

>
> Feel free to specify why an skb error links to the race you are trying to fix.

The race here is that the btusb_shutdown_intel_new() is trying to
reset the controller while other works like scanning or discovering
are still underway.
So the patch is to ensure that shutdown() callback is invoked after
other works are cancelled.

I think I understand what you are trying to ask, you want to know
where the double kfree_skb() race happens.
I didn't really investigate that because quiesce the other activities
then call shutdown() is the right thing to do and I haven't seen the
kernel splat since.

Kai-Heng

>
> >> >
> >> >Kai-Heng
> >>
> >>
> >> +++ x/net/bluetooth/hci_request.c
> >> @@ -257,8 +257,10 @@ int __hci_req_sync(struct hci_dev *hdev,
> >>                 break;
> >>         }
> >>
> >> -       kfree_skb(hdev->req_skb);
> >> -       hdev->req_skb = NULL;
> >> +       if (!err) {
> >> +               kfree_skb(hdev->req_skb);
> >> +               hdev->req_skb = NULL;
> >> +       }
> >>         hdev->req_status = hdev->req_result = 0;
> >>
> >>         bt_dev_dbg(hdev, "end: err %d", err);
Kai-Heng Feng Aug. 5, 2021, 3:50 p.m. UTC | #16
Hi Mattijs,

On Thu, Aug 5, 2021 at 2:55 PM Mattijs Korpershoek
<mkorpershoek@baylibre.com> wrote:
>
> Hi Kai-Heng,
>
> Thanks for your patch,
>
> Kai-Heng Feng <kai.heng.feng@canonical.com> writes:
>

[snipped]

> I confirm this diff works for me:
>
> root@i500-pumpkin:~# hciconfig hci0 up
> root@i500-pumpkin:~# hciconfig hci0 down
> root@i500-pumpkin:~# hciconfig hci0 up
> root@i500-pumpkin:~# hciconfig hci0
> hci0:   Type: Primary  Bus: SDIO
>         BD Address: 00:0C:E7:55:FF:12  ACL MTU: 1021:8  SCO MTU: 244:4
>         UP RUNNING
>         RX bytes:11268 acl:0 sco:0 events:829 errors:0
>         TX bytes:182569 acl:0 sco:0 commands:829 errors:0
>
> root@i500-pumpkin:~# hcitool scan
> Scanning ...
>         <redacted>       Pixel 3 XL
>
> Tested-by: Mattijs Korpershoek <mkorpershoek@baylibre.com>

I found that btmtksdio_flush() only cancels the work instead of doing
flush_work(). That probably explains why putting ->shutdown right
before ->flush doesn't work.
So can you please test the following again:
diff --git a/drivers/bluetooth/btmtksdio.c b/drivers/bluetooth/btmtksdio.c
index 9872ef18f9fea..b33c05ad2150b 100644
--- a/drivers/bluetooth/btmtksdio.c
+++ b/drivers/bluetooth/btmtksdio.c
@@ -649,9 +649,9 @@ static int btmtksdio_flush(struct hci_dev *hdev)
 {
        struct btmtksdio_dev *bdev = hci_get_drvdata(hdev);

-       skb_queue_purge(&bdev->txq);
+       flush_work(&bdev->tx_work);

-       cancel_work_sync(&bdev->tx_work);
+       skb_queue_purge(&bdev->txq);

        return 0;
 }
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 2560ed2f144d4..a61e610a400cb 100644

--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
        aosp_do_close(hdev);
        msft_do_close(hdev);

+       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
+           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
+           test_bit(HCI_UP, &hdev->flags)) {
+               /* Execute vendor specific shutdown routine */
+               if (hdev->shutdown)
+                       hdev->shutdown(hdev);
+       }
+
        if (hdev->flush)
                hdev->flush(hdev);

@@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
                clear_bit(HCI_INIT, &hdev->flags);
        }

-       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
-           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
-           test_bit(HCI_UP, &hdev->flags)) {
-               /* Execute vendor specific shutdown routine */
-               if (hdev->shutdown)
-                       hdev->shutdown(hdev);
-       }
-
        /* flush cmd  work */
        flush_work(&hdev->cmd_work);

Kai-Heng
Mattijs Korpershoek Aug. 6, 2021, 8:51 a.m. UTC | #17
Hi Kai-Heng,

Kai-Heng Feng <kai.heng.feng@canonical.com> writes:

> Hi Mattijs,
>
> On Thu, Aug 5, 2021 at 2:55 PM Mattijs Korpershoek
> <mkorpershoek@baylibre.com> wrote:
>>
>> Hi Kai-Heng,
>>
>> Thanks for your patch,
>>
>> Kai-Heng Feng <kai.heng.feng@canonical.com> writes:
>>
>
> [snipped]
>
>> I confirm this diff works for me:
>>
>> root@i500-pumpkin:~# hciconfig hci0 up
>> root@i500-pumpkin:~# hciconfig hci0 down
>> root@i500-pumpkin:~# hciconfig hci0 up
>> root@i500-pumpkin:~# hciconfig hci0
>> hci0:   Type: Primary  Bus: SDIO
>>         BD Address: 00:0C:E7:55:FF:12  ACL MTU: 1021:8  SCO MTU: 244:4
>>         UP RUNNING
>>         RX bytes:11268 acl:0 sco:0 events:829 errors:0
>>         TX bytes:182569 acl:0 sco:0 commands:829 errors:0
>>
>> root@i500-pumpkin:~# hcitool scan
>> Scanning ...
>>         <redacted>       Pixel 3 XL
>>
>> Tested-by: Mattijs Korpershoek <mkorpershoek@baylibre.com>
>
> I found that btmtksdio_flush() only cancels the work instead of doing
> flush_work(). That probably explains why putting ->shutdown right
> before ->flush doesn't work.
> So can you please test the following again:
> diff --git a/drivers/bluetooth/btmtksdio.c b/drivers/bluetooth/btmtksdio.c
> index 9872ef18f9fea..b33c05ad2150b 100644
> --- a/drivers/bluetooth/btmtksdio.c
> +++ b/drivers/bluetooth/btmtksdio.c
> @@ -649,9 +649,9 @@ static int btmtksdio_flush(struct hci_dev *hdev)
>  {
>         struct btmtksdio_dev *bdev = hci_get_drvdata(hdev);
>
> -       skb_queue_purge(&bdev->txq);
> +       flush_work(&bdev->tx_work);
>
> -       cancel_work_sync(&bdev->tx_work);
> +       skb_queue_purge(&bdev->txq);
>
>         return 0;
>  }
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 2560ed2f144d4..a61e610a400cb 100644
>
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
>         aosp_do_close(hdev);
>         msft_do_close(hdev);
>
> +       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> +           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> +           test_bit(HCI_UP, &hdev->flags)) {
> +               /* Execute vendor specific shutdown routine */
> +               if (hdev->shutdown)
> +                       hdev->shutdown(hdev);
> +       }
> +
>         if (hdev->flush)
>                 hdev->flush(hdev);
>
> @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
>                 clear_bit(HCI_INIT, &hdev->flags);
>         }
>
> -       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> -           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> -           test_bit(HCI_UP, &hdev->flags)) {
> -               /* Execute vendor specific shutdown routine */
> -               if (hdev->shutdown)
> -                       hdev->shutdown(hdev);
> -       }
> -
>         /* flush cmd  work */
>         flush_work(&hdev->cmd_work);
I've tried this but I have the same (broken) symptoms as before.

Here are some logs of v3:
dmesg: https://pastebin.com/1x4UHkzy
ftrace: https://pastebin.com/Lm1d6AWy

Mattijs

>
> Kai-Heng
Kai-Heng Feng Aug. 6, 2021, 3:36 p.m. UTC | #18
Hi Mattijs,

On Fri, Aug 6, 2021 at 4:51 PM Mattijs Korpershoek
<mkorpershoek@baylibre.com> wrote:
>
> Hi Kai-Heng,
>
> Kai-Heng Feng <kai.heng.feng@canonical.com> writes:
>
> > Hi Mattijs,
> >
> > On Thu, Aug 5, 2021 at 2:55 PM Mattijs Korpershoek
> > <mkorpershoek@baylibre.com> wrote:
> >>
> >> Hi Kai-Heng,
> >>
> >> Thanks for your patch,
> >>
> >> Kai-Heng Feng <kai.heng.feng@canonical.com> writes:
> >>
> >
> > [snipped]
> >
> >> I confirm this diff works for me:
> >>
> >> root@i500-pumpkin:~# hciconfig hci0 up
> >> root@i500-pumpkin:~# hciconfig hci0 down
> >> root@i500-pumpkin:~# hciconfig hci0 up
> >> root@i500-pumpkin:~# hciconfig hci0
> >> hci0:   Type: Primary  Bus: SDIO
> >>         BD Address: 00:0C:E7:55:FF:12  ACL MTU: 1021:8  SCO MTU: 244:4
> >>         UP RUNNING
> >>         RX bytes:11268 acl:0 sco:0 events:829 errors:0
> >>         TX bytes:182569 acl:0 sco:0 commands:829 errors:0
> >>
> >> root@i500-pumpkin:~# hcitool scan
> >> Scanning ...
> >>         <redacted>       Pixel 3 XL
> >>
> >> Tested-by: Mattijs Korpershoek <mkorpershoek@baylibre.com>
> >
> > I found that btmtksdio_flush() only cancels the work instead of doing
> > flush_work(). That probably explains why putting ->shutdown right
> > before ->flush doesn't work.
> > So can you please test the following again:
> > diff --git a/drivers/bluetooth/btmtksdio.c b/drivers/bluetooth/btmtksdio.c
> > index 9872ef18f9fea..b33c05ad2150b 100644
> > --- a/drivers/bluetooth/btmtksdio.c
> > +++ b/drivers/bluetooth/btmtksdio.c
> > @@ -649,9 +649,9 @@ static int btmtksdio_flush(struct hci_dev *hdev)
> >  {
> >         struct btmtksdio_dev *bdev = hci_get_drvdata(hdev);
> >
> > -       skb_queue_purge(&bdev->txq);
> > +       flush_work(&bdev->tx_work);
> >
> > -       cancel_work_sync(&bdev->tx_work);
> > +       skb_queue_purge(&bdev->txq);
> >
> >         return 0;
> >  }
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index 2560ed2f144d4..a61e610a400cb 100644
> >
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
> >         aosp_do_close(hdev);
> >         msft_do_close(hdev);
> >
> > +       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > +           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > +           test_bit(HCI_UP, &hdev->flags)) {
> > +               /* Execute vendor specific shutdown routine */
> > +               if (hdev->shutdown)
> > +                       hdev->shutdown(hdev);
> > +       }
> > +
> >         if (hdev->flush)
> >                 hdev->flush(hdev);
> >
> > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
> >                 clear_bit(HCI_INIT, &hdev->flags);
> >         }
> >
> > -       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > -           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > -           test_bit(HCI_UP, &hdev->flags)) {
> > -               /* Execute vendor specific shutdown routine */
> > -               if (hdev->shutdown)
> > -                       hdev->shutdown(hdev);
> > -       }
> > -
> >         /* flush cmd  work */
> >         flush_work(&hdev->cmd_work);
> I've tried this but I have the same (broken) symptoms as before.
>
> Here are some logs of v3:
> dmesg: https://pastebin.com/1x4UHkzy
> ftrace: https://pastebin.com/Lm1d6AWy

Thanks for your testing. I think I finally got it:
btmtksdio_shutdown()
-> mtk_hci_wmt_sync()
 -> __hci_cmd_send()
  then waiting for BTMTKSDIO_TX_WAIT_VND_EVT, which is cleared in
btmtksdio_recv_event():
btmtksdio_recv_event()
-> hci_recv_frame()
 -> queue_work(hdev->workqueue, &hdev->rx_work);

That means it has to be done before the following drain_workqueue() call.
Can you please see if moving the ->shutdown() part right before
drain_workqueue() can fix the issue?

Kai-Heng

>
> Mattijs
>
> >
> > Kai-Heng
Mattijs Korpershoek Aug. 9, 2021, 9:19 a.m. UTC | #19
Hi Kai-Heng,

Kai-Heng Feng <kai.heng.feng@canonical.com> writes:

> Hi Mattijs,
>
>> [snipped]
>
> Thanks for your testing. I think I finally got it:
> btmtksdio_shutdown()
> -> mtk_hci_wmt_sync()
>  -> __hci_cmd_send()
>   then waiting for BTMTKSDIO_TX_WAIT_VND_EVT, which is cleared in
> btmtksdio_recv_event():
> btmtksdio_recv_event()
> -> hci_recv_frame()
>  -> queue_work(hdev->workqueue, &hdev->rx_work);
>
> That means it has to be done before the following drain_workqueue() call.
> Can you please see if moving the ->shutdown() part right before
> drain_workqueue() can fix the issue?
I've tested the following patch:

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 2560ed2f144d..131e69a9a66a 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1757,6 +1757,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
                        cancel_delayed_work_sync(&adv_instance->rpa_expired_cb);
        }
 
+       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
+           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
+           test_bit(HCI_UP, &hdev->flags)) {
+               /* Execute vendor specific shutdown routine */
+               if (hdev->shutdown)
+                       hdev->shutdown(hdev);
+       }
+
        /* Avoid potential lockdep warnings from the *_flush() calls by
         * ensuring the workqueue is empty up front.
         */
@@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
                clear_bit(HCI_INIT, &hdev->flags);
        }
 
-       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
-           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
-           test_bit(HCI_UP, &hdev->flags)) {
-               /* Execute vendor specific shutdown routine */
-               if (hdev->shutdown)
-                       hdev->shutdown(hdev);
-       }
-
        /* flush cmd  work */
        flush_work(&hdev->cmd_work);

It does not seem to fix the issue.
Adding the bits in btmtksdio_flush() does not change the result of the
above patch.

Here are the logs. These are just with the above patch (not with the
btmtksdio_flush() changes.

dmesg: https://pastebin.com/FZZBkqGC
btmtksdio_ftrace: https://pastebin.com/JQ0UWenY

Mattijs
>
> Kai-Heng
>
>>
>> Mattijs
>>
>> >
>> > Kai-Heng
diff mbox series

Patch

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 7baf93eda936..6eedf334f943 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1716,14 +1716,6 @@  int hci_dev_do_close(struct hci_dev *hdev)
 
 	BT_DBG("%s %p", hdev->name, hdev);
 
-	if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
-	    !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
-	    test_bit(HCI_UP, &hdev->flags)) {
-		/* Execute vendor specific shutdown routine */
-		if (hdev->shutdown)
-			hdev->shutdown(hdev);
-	}
-
 	cancel_delayed_work(&hdev->power_off);
 	cancel_delayed_work(&hdev->ncmd_timer);
 
@@ -1801,6 +1793,14 @@  int hci_dev_do_close(struct hci_dev *hdev)
 		clear_bit(HCI_INIT, &hdev->flags);
 	}
 
+	if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
+	    !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
+	    test_bit(HCI_UP, &hdev->flags)) {
+		/* Execute vendor specific shutdown routine */
+		if (hdev->shutdown)
+			hdev->shutdown(hdev);
+	}
+
 	/* flush cmd  work */
 	flush_work(&hdev->cmd_work);