Message ID | 20240409112758.467112-2-rickaran@axis.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | smb client hang | expand |
I tried this on 6.8-rc7 and current 6.9-rc3 and it didn't fail on either. I tried (to Samba, and using the mount parms you suggested for cifs.ko) with both "killall -s SIGSTOP smbd" and also with "killall smbd" - but unmount worked fine in both cases. Do you know if this fails to ksmbd or Samba or ...? Other ideas for how to repro this? On Tue, Apr 9, 2024 at 6:29 AM Rickard Andersson <rickaran@axis.com> wrote: > > From: Rickard x Andersson <rickaran@axis.com> > > Test case: > mount -t cifs //192.168.0.1/test y -o > port=19999,ro,vers=2.1,sec=none,echo_interval=1 > kill smbd with SIGSTOP > umount /tmp/y > > Gives the following error: > INFO: task umount:466 blocked for more than 122 seconds. > Not tainted 6.8.2-axis9-devel #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > task:umount state:D stack:0 pid:466 tgid:466 ppid:464 flags:0x00000004 > __schedule from schedule+0x34/0x108 > schedule from schedule_preempt_disabled+0x24/0x34 > schedule_preempt_disabled from __mutex_lock.constprop.0+0x330/0x8b0 > __mutex_lock.constprop.0 from smb2_reconnect+0x278/0x8fc [cifs] > smb2_reconnect [cifs] from SMB2_open_init+0x54/0x9f4 [cifs] > SMB2_open_init [cifs] from smb2_query_info_compound+0x1a0/0x500[cifs] > smb2_query_info_compound [cifs] from smb2_queryfs+0x64/0x134 [cifs] > smb2_queryfs [cifs] from cifs_statfs+0xc8/0x318 [cifs] > cifs_statfs [cifs] from statfs_by_dentry+0x60/0x84 > statfs_by_dentry from fd_statfs+0x30/0x74 > fd_statfs from sys_fstatfs64+0x40/0x6c > sys_fstatfs64 from ret_fast_syscall+0x0/0x54 > > The umount task is blocked waiting on the session mutex. The reason it > never gets the session mutex is because 'kworker/0:3' is holding the > mutex and is waiting for response (see line 1209 in > fs/smb/client/smb2pdu.c. > > Stack trace of 'kworker/0:3' just before calling wait_for_response: > CPU: 0 PID: 220 Comm: kworker/0:3 Not tainted 6.8.2-axis9-devel #1 > Hardware name: Freescale i.MX6 SoloX (Device Tree) > Workqueue: cifsiod smb2_reconnect_server [cifs] > unwind_backtrace from show_stack+0x18/0x1c > show_stack from dump_stack_lvl+0x24/0x2c > dump_stack_lvl from compound_send_recv+0x7bc/0xac8 [cifs] > compound_send_recv [cifs] from cifs_send_recv+0x34/0x3c [cifs] > cifs_send_recv [cifs] from SMB2_negotiate+0x410/0x13dc [cifs] > SMB2_negotiate [cifs] from smb2_negotiate+0x4c/0x58 [cifs] > smb2_negotiate [cifs] from cifs_negotiate_protocol+0x9c/0x100 [cifs] > cifs_negotiate_protocol [cifs] from smb2_reconnect+0x418/0x8fc [cifs] > smb2_reconnect [cifs] from smb2_reconnect_server+0x1dc/0x514 [cifs] > smb2_reconnect_server [cifs] from process_one_work+0x188/0x3ec > process_one_work from worker_thread+0x1fc/0x430 > worker_thread from kthread+0x110/0x130 > kthread from ret_from_fork+0x14/0x28 > > Change-Id: I53439ffb007c9c51d77ce40fb655a34e5ca291ec > Signed-off-by: Rickard x Andersson <rickaran@axis.com> > --- > fs/smb/client/transport.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c > index 994d70193432..96b8f8757ddc 100644 > --- a/fs/smb/client/transport.c > +++ b/fs/smb/client/transport.c > @@ -32,6 +32,8 @@ > /* Max number of iovectors we can use off the stack when sending requests. */ > #define CIFS_MAX_IOV_SIZE 8 > > +#define RESPONSE_TIMEOUT_SECS 50 > + > void > cifs_wake_up_task(struct mid_q_entry *mid) > { > @@ -735,13 +737,14 @@ static int allocate_mid(struct cifs_ses *ses, struct smb_hdr *in_buf, > static int > wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ) > { > - int error; > + int ret; > + > + ret = wait_event_killable_timeout(server->response_q, > + midQ->mid_state != MID_REQUEST_SUBMITTED && > + midQ->mid_state != MID_RESPONSE_RECEIVED, > + RESPONSE_TIMEOUT_SECS*HZ); > > - error = wait_event_state(server->response_q, > - midQ->mid_state != MID_REQUEST_SUBMITTED && > - midQ->mid_state != MID_RESPONSE_RECEIVED, > - (TASK_KILLABLE|TASK_FREEZABLE_UNSAFE)); > - if (error < 0) > + if ((ret < 0) || (ret == 0)) > return -ERESTARTSYS; > > return 0; > -- > 2.30.2 > >
Shyam and I tried some experiments and there are two cases where the patch breaks: 1) ChangeNotify will time out 2) Certainly byte range lock calls (they can be allowed to block) will timeout An obvious alternative would be to not make this change for the commands like ChangeNotify and blocking locks but allow it for the others. On Tue, Apr 9, 2024 at 6:29 AM Rickard Andersson <rickaran@axis.com> wrote: > > From: Rickard x Andersson <rickaran@axis.com> > > Test case: > mount -t cifs //192.168.0.1/test y -o > port=19999,ro,vers=2.1,sec=none,echo_interval=1 > kill smbd with SIGSTOP > umount /tmp/y > > Gives the following error: > INFO: task umount:466 blocked for more than 122 seconds. > Not tainted 6.8.2-axis9-devel #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > task:umount state:D stack:0 pid:466 tgid:466 ppid:464 flags:0x00000004 > __schedule from schedule+0x34/0x108 > schedule from schedule_preempt_disabled+0x24/0x34 > schedule_preempt_disabled from __mutex_lock.constprop.0+0x330/0x8b0 > __mutex_lock.constprop.0 from smb2_reconnect+0x278/0x8fc [cifs] > smb2_reconnect [cifs] from SMB2_open_init+0x54/0x9f4 [cifs] > SMB2_open_init [cifs] from smb2_query_info_compound+0x1a0/0x500[cifs] > smb2_query_info_compound [cifs] from smb2_queryfs+0x64/0x134 [cifs] > smb2_queryfs [cifs] from cifs_statfs+0xc8/0x318 [cifs] > cifs_statfs [cifs] from statfs_by_dentry+0x60/0x84 > statfs_by_dentry from fd_statfs+0x30/0x74 > fd_statfs from sys_fstatfs64+0x40/0x6c > sys_fstatfs64 from ret_fast_syscall+0x0/0x54 > > The umount task is blocked waiting on the session mutex. The reason it > never gets the session mutex is because 'kworker/0:3' is holding the > mutex and is waiting for response (see line 1209 in > fs/smb/client/smb2pdu.c. > > Stack trace of 'kworker/0:3' just before calling wait_for_response: > CPU: 0 PID: 220 Comm: kworker/0:3 Not tainted 6.8.2-axis9-devel #1 > Hardware name: Freescale i.MX6 SoloX (Device Tree) > Workqueue: cifsiod smb2_reconnect_server [cifs] > unwind_backtrace from show_stack+0x18/0x1c > show_stack from dump_stack_lvl+0x24/0x2c > dump_stack_lvl from compound_send_recv+0x7bc/0xac8 [cifs] > compound_send_recv [cifs] from cifs_send_recv+0x34/0x3c [cifs] > cifs_send_recv [cifs] from SMB2_negotiate+0x410/0x13dc [cifs] > SMB2_negotiate [cifs] from smb2_negotiate+0x4c/0x58 [cifs] > smb2_negotiate [cifs] from cifs_negotiate_protocol+0x9c/0x100 [cifs] > cifs_negotiate_protocol [cifs] from smb2_reconnect+0x418/0x8fc [cifs] > smb2_reconnect [cifs] from smb2_reconnect_server+0x1dc/0x514 [cifs] > smb2_reconnect_server [cifs] from process_one_work+0x188/0x3ec > process_one_work from worker_thread+0x1fc/0x430 > worker_thread from kthread+0x110/0x130 > kthread from ret_from_fork+0x14/0x28 > > Change-Id: I53439ffb007c9c51d77ce40fb655a34e5ca291ec > Signed-off-by: Rickard x Andersson <rickaran@axis.com> > --- > fs/smb/client/transport.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c > index 994d70193432..96b8f8757ddc 100644 > --- a/fs/smb/client/transport.c > +++ b/fs/smb/client/transport.c > @@ -32,6 +32,8 @@ > /* Max number of iovectors we can use off the stack when sending requests. */ > #define CIFS_MAX_IOV_SIZE 8 > > +#define RESPONSE_TIMEOUT_SECS 50 > + > void > cifs_wake_up_task(struct mid_q_entry *mid) > { > @@ -735,13 +737,14 @@ static int allocate_mid(struct cifs_ses *ses, struct smb_hdr *in_buf, > static int > wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ) > { > - int error; > + int ret; > + > + ret = wait_event_killable_timeout(server->response_q, > + midQ->mid_state != MID_REQUEST_SUBMITTED && > + midQ->mid_state != MID_RESPONSE_RECEIVED, > + RESPONSE_TIMEOUT_SECS*HZ); > > - error = wait_event_state(server->response_q, > - midQ->mid_state != MID_REQUEST_SUBMITTED && > - midQ->mid_state != MID_RESPONSE_RECEIVED, > - (TASK_KILLABLE|TASK_FREEZABLE_UNSAFE)); > - if (error < 0) > + if ((ret < 0) || (ret == 0)) > return -ERESTARTSYS; > > return 0; > -- > 2.30.2 > >
On 4/13/24 21:40, Steve French wrote: > Is there any way to verify if the same problem can be reproduced with > 6.9-rc3 or later I have now reproduced the problem also on 6.9.0-rc3: INFO: task umount:511 blocked for more than 614 seconds. Not tainted 6.9.0-rc3-axis9-devel #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:umount state:D stack:0 pid:511 tgid:511 ppid:509 flags:0x00000004 Call trace: __schedule from schedule+0x34/0x14c schedule from schedule_preempt_disabled+0x24/0x34 schedule_preempt_disabled from __mutex_lock.constprop.0+0x330/0x8b0 __mutex_lock.constprop.0 from smb2_reconnect+0x114/0x8fc [cifs] smb2_reconnect [cifs] from SMB2_open_init+0x58/0xa5c [cifs] SMB2_open_init [cifs] from smb2_query_info_compound+0x1a4/0x50c [cifs] smb2_query_info_compound [cifs] from smb2_queryfs+0x64/0x134 [cifs] smb2_queryfs [cifs] from cifs_statfs+0xc8/0x318 [cifs] cifs_statfs [cifs] from statfs_by_dentry+0x60/0x84 statfs_by_dentry from fd_statfs+0x30/0x74 fd_statfs from sys_fstatfs64+0x40/0x6c sys_fstatfs64 from ret_fast_syscall+0x0/0x54 Exception stack(0xc749ffa8 to 0xc749fff0) ffa0: 01716218 00000000 00000003 00000058 beb62b90 00000000 ffc0: 01716218 00000000 00000003 0000010b b6fbd138 01716448 017162a4 beb62b90 ffe0: 0000010b beb62ad4 b6f0861f b6e87866 Best regards, Rickard
On 4/13/24 11:02, Steve French wrote: > Shyam and I tried some experiments and there are two cases where the > patch breaks: > 1) ChangeNotify will time out > 2) Certainly byte range lock calls (they can be allowed to block) will timeout > > An obvious alternative would be to not make this change for the > commands like ChangeNotify and blocking locks but allow it for the > others. > Would it make sense to make the patch less intrusive by doing something like the following?: @@ -1209,7 +1216,12 @@ compound_send_recv(const unsigned int xid, struct cifs_ses *ses, spin_unlock(&ses->ses_lock); for (i = 0; i < num_rqst; i++) { - rc = wait_for_response(server, midQ[i]); + if (flags == CIFS_NEG_OP) { + rc = wait_for_response(server, midQ[i], 50*1000); + } + else + rc = wait_for_response(server, midQ[i], -1); if (rc != 0) break; So, we are just waiting with timeout in the case it is a "CIFS_NEG_OP". Note that I am not familiar at all with this code base. Best regards, Rickard
diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c index 994d70193432..96b8f8757ddc 100644 --- a/fs/smb/client/transport.c +++ b/fs/smb/client/transport.c @@ -32,6 +32,8 @@ /* Max number of iovectors we can use off the stack when sending requests. */ #define CIFS_MAX_IOV_SIZE 8 +#define RESPONSE_TIMEOUT_SECS 50 + void cifs_wake_up_task(struct mid_q_entry *mid) { @@ -735,13 +737,14 @@ static int allocate_mid(struct cifs_ses *ses, struct smb_hdr *in_buf, static int wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ) { - int error; + int ret; + + ret = wait_event_killable_timeout(server->response_q, + midQ->mid_state != MID_REQUEST_SUBMITTED && + midQ->mid_state != MID_RESPONSE_RECEIVED, + RESPONSE_TIMEOUT_SECS*HZ); - error = wait_event_state(server->response_q, - midQ->mid_state != MID_REQUEST_SUBMITTED && - midQ->mid_state != MID_RESPONSE_RECEIVED, - (TASK_KILLABLE|TASK_FREEZABLE_UNSAFE)); - if (error < 0) + if ((ret < 0) || (ret == 0)) return -ERESTARTSYS; return 0;