diff mbox series

fuse: Prevent hung task warning if FUSE server gets stuck

Message ID 20241204164316.219105-1-etmartin4313@gmail.com (mailing list archive)
State New
Headers show
Series fuse: Prevent hung task warning if FUSE server gets stuck | expand

Commit Message

Etienne Martineau Dec. 4, 2024, 4:43 p.m. UTC
From: Etienne Martineau <etmartin4313@gmail.com>

If hung task checking is enabled and FUSE server stops responding for a
long period of time, the hung task timer may fire towards the FUSE clients
and trigger stack dumps that unnecessarily alarm the user.

So, if hung task checking is enabled, we should wake up periodically to
prevent it from triggering stack dumps. This patch uses a wake-up interval
equal to half the hung_task_timeout_secs timer period, which keeps overhead
low.

Without this patch, an unresponsive FUSE server can leave the FUSE clients
in D state and produce stack dumps like below when the hung task timer
expire.

 INFO: task cp:2780 blocked for more than 30 seconds.
       Not tainted 6.13.0-rc1 #4
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:cat state:D stack:0 pid:2598 tgid:2598 ppid:2583 flags:0x00004006
  ......
  <TASK>
  __schedule+0x443/0x16b0
  schedule+0x2b/0x140
  request_wait_answer+0x143/0x220
  __fuse_simple_request+0xd8/0x2c0
  fuse_send_open+0xc5/0x130
  fuse_file_open+0x117/0x1a0
  fuse_open+0x92/0x2f0
  do_dentry_open+0x25d/0x5c0
  vfs_open+0x2a/0x100
  path_openat+0x2f5/0x11d0
  do_filp_open+0xbe/0x180
  do_sys_openat2+0xa1/0xd0
  __x64_sys_openat+0x55/0xa0
  x64_sys_call+0x1998/0x26f0
  do_syscall_64+0x7c/0x170
  ......
  </TASK>

Signed-off-by: Etienne Martineau <etmartin4313@gmail.com>
---
 fs/fuse/dev.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Jingbo Xu Dec. 5, 2024, 1:51 a.m. UTC | #1
On 12/5/24 12:43 AM, etmartin4313@gmail.com wrote:
> From: Etienne Martineau <etmartin4313@gmail.com>
> 
> If hung task checking is enabled and FUSE server stops responding for a
> long period of time, the hung task timer may fire towards the FUSE clients
> and trigger stack dumps that unnecessarily alarm the user.

Isn't that expected that users shall be notified that there's something
wrong with the FUSE service (because of either buggy implementation or
malicious purpose)?  Or is it expected that the normal latency of
handling a FUSE request is more than 30 seconds?

> 
> So, if hung task checking is enabled, we should wake up periodically to
> prevent it from triggering stack dumps. This patch uses a wake-up interval
> equal to half the hung_task_timeout_secs timer period, which keeps overhead
> low.
> 
> Without this patch, an unresponsive FUSE server can leave the FUSE clients
> in D state and produce stack dumps like below when the hung task timer
> expire.
> 
>  INFO: task cp:2780 blocked for more than 30 seconds.
>        Not tainted 6.13.0-rc1 #4
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  task:cat state:D stack:0 pid:2598 tgid:2598 ppid:2583 flags:0x00004006
>   ......
>   <TASK>
>   __schedule+0x443/0x16b0
>   schedule+0x2b/0x140
>   request_wait_answer+0x143/0x220
>   __fuse_simple_request+0xd8/0x2c0
>   fuse_send_open+0xc5/0x130
>   fuse_file_open+0x117/0x1a0
>   fuse_open+0x92/0x2f0
>   do_dentry_open+0x25d/0x5c0
>   vfs_open+0x2a/0x100
>   path_openat+0x2f5/0x11d0
>   do_filp_open+0xbe/0x180
>   do_sys_openat2+0xa1/0xd0
>   __x64_sys_openat+0x55/0xa0
>   x64_sys_call+0x1998/0x26f0
>   do_syscall_64+0x7c/0x170
>   ......
>   </TASK>
> 
> Signed-off-by: Etienne Martineau <etmartin4313@gmail.com>
> ---
>  fs/fuse/dev.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 27ccae63495d..29e0c9adb799 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -21,6 +21,7 @@
>  #include <linux/swap.h>
>  #include <linux/splice.h>
>  #include <linux/sched.h>
> +#include <linux/sched/sysctl.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include "fuse_trace.h"
> @@ -422,6 +423,8 @@ static void request_wait_answer(struct fuse_req *req)
>  {
>  	struct fuse_conn *fc = req->fm->fc;
>  	struct fuse_iqueue *fiq = &fc->iq;
> +	/* Prevent hung task timer from firing at us */
> +	unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2;
>  	int err;
>  
>  	if (!fc->no_interrupt) {
> @@ -461,7 +464,12 @@ static void request_wait_answer(struct fuse_req *req)
>  	 * Either request is already in userspace, or it was forced.
>  	 * Wait it out.
>  	 */
> -	wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
> +	if (timeout)
> +		while (!wait_event_timeout(req->waitq,
> +				test_bit(FR_FINISHED, &req->flags), timeout))
> +			;
> +	else
> +		wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
>  }
>  
>  static void __fuse_request_send(struct fuse_req *req)
Etienne Martineau Dec. 5, 2024, 5:09 p.m. UTC | #2
On Wed, Dec 4, 2024 at 8:51 PM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
>
>
>
> On 12/5/24 12:43 AM, etmartin4313@gmail.com wrote:
> > From: Etienne Martineau <etmartin4313@gmail.com>
> >
> > If hung task checking is enabled and FUSE server stops responding for a
> > long period of time, the hung task timer may fire towards the FUSE clients
> > and trigger stack dumps that unnecessarily alarm the user.
>
> Isn't that expected that users shall be notified that there's something
> wrong with the FUSE service (because of either buggy implementation or
> malicious purpose)?  Or is it expected that the normal latency of
> handling a FUSE request is more than 30 seconds?

In one way you're right because seeing those stack dumps tells you
right away that something is wrong with a FUSE service.
Having said that, with many FUSE services running, those stack dumps
are not helpful at pointing out which of the FUSE services is having
issues.

Maybe we should instead have proper debug in place to dump the FUSE
connection so that user can abort via
/sys/fs/fuse/connections/'nn'/abort
Something like "pr_warn("Fuse connection %u not responding\n", fc->dev);" maybe?

Also, now that you are pointing out a malicious implementation, I
realized that on a system with 'hung_task_panic' set, a non-privileged
user can easily trip the hung task timer and force a panic.

I just tried the following sequence using FUSE sshfs and without this
patch my system went down.

 sudo bash -c 'echo 30 > /proc/sys/kernel/hung_task_timeout_secs'
 sudo bash -c 'echo 1 > /proc/sys/kernel/hung_task_panic'
 sshfs -o allow_other,default_permissions you@localhost:/home/you/test ./mnt
 kill -STOP `pidof /usr/lib/openssh/sftp-server`
 ls ./mnt/
 ^C

thanks,
Etienne
Joanne Koong Dec. 5, 2024, 10:53 p.m. UTC | #3
On Thu, Dec 5, 2024 at 9:10 AM Etienne <etmartin4313@gmail.com> wrote:
>
> On Wed, Dec 4, 2024 at 8:51 PM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
> >
> >
> >
> > On 12/5/24 12:43 AM, etmartin4313@gmail.com wrote:
> > > From: Etienne Martineau <etmartin4313@gmail.com>
> > >
> > > If hung task checking is enabled and FUSE server stops responding for a
> > > long period of time, the hung task timer may fire towards the FUSE clients
> > > and trigger stack dumps that unnecessarily alarm the user.
> >
> > Isn't that expected that users shall be notified that there's something
> > wrong with the FUSE service (because of either buggy implementation or
> > malicious purpose)?  Or is it expected that the normal latency of
> > handling a FUSE request is more than 30 seconds?
>
> In one way you're right because seeing those stack dumps tells you
> right away that something is wrong with a FUSE service.
> Having said that, with many FUSE services running, those stack dumps
> are not helpful at pointing out which of the FUSE services is having
> issues.
>
> Maybe we should instead have proper debug in place to dump the FUSE
> connection so that user can abort via
> /sys/fs/fuse/connections/'nn'/abort
> Something like "pr_warn("Fuse connection %u not responding\n", fc->dev);" maybe?

Having some identifying information about which connection is
unresponsive seems useful, but I don't see a straightforward way of
implementing this without adding additional per-request overhead.

>
> Also, now that you are pointing out a malicious implementation, I
> realized that on a system with 'hung_task_panic' set, a non-privileged
> user can easily trip the hung task timer and force a panic.
>
> I just tried the following sequence using FUSE sshfs and without this
> patch my system went down.
>
>  sudo bash -c 'echo 30 > /proc/sys/kernel/hung_task_timeout_secs'
>  sudo bash -c 'echo 1 > /proc/sys/kernel/hung_task_panic'
>  sshfs -o allow_other,default_permissions you@localhost:/home/you/test ./mnt
>  kill -STOP `pidof /usr/lib/openssh/sftp-server`
>  ls ./mnt/
>  ^C

I'm not sure if this addresses your particular use case, but there's a
patch upstream that adds request timeouts
https://lore.kernel.org/linux-fsdevel/20241114191332.669127-1-joannelkoong@gmail.com/

This can be set globally via sysctls (eg
"/proc/sys/fs/fuse/max_request_timeout") or on a per-server basis. If
the timeout elapses and the request has not been fulfilled (eg
malicious or buggy fuse server), the kernel will abort the connection
automatically.


Thanks,
Joanne

>
> thanks,
> Etienne
>
Jingbo Xu Dec. 6, 2024, 1:47 a.m. UTC | #4
On 12/6/24 1:09 AM, Etienne wrote:
> On Wed, Dec 4, 2024 at 8:51 PM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
>>
>>
>>
>> On 12/5/24 12:43 AM, etmartin4313@gmail.com wrote:
>>> From: Etienne Martineau <etmartin4313@gmail.com>
>>>
>>> If hung task checking is enabled and FUSE server stops responding for a
>>> long period of time, the hung task timer may fire towards the FUSE clients
>>> and trigger stack dumps that unnecessarily alarm the user.
>>
>> Isn't that expected that users shall be notified that there's something
>> wrong with the FUSE service (because of either buggy implementation or
>> malicious purpose)?  Or is it expected that the normal latency of
>> handling a FUSE request is more than 30 seconds?
> 
> In one way you're right because seeing those stack dumps tells you
> right away that something is wrong with a FUSE service.
> Having said that, with many FUSE services running, those stack dumps
> are not helpful at pointing out which of the FUSE services is having
> issues.
> 
> Maybe we should instead have proper debug in place to dump the FUSE
> connection so that user can abort via
> /sys/fs/fuse/connections/'nn'/abort
> Something like "pr_warn("Fuse connection %u not responding\n", fc->dev);" maybe?

If the goal is to identifying which fuse connection is problematic, then
yes, it is not that easy to do that as the hung task has no concept of
underlying filesystem.  It is not what the hung task mechanism needs to do.

To do that, at least you should record the per-request timestamp when
the request is submitted, or a complete timeout mechanism in FUSE as
pointed by Joanne [1].

[1]
https://lore.kernel.org/linux-fsdevel/20241114191332.669127-1-joannelkoong@gmail.com/


> 
> Also, now that you are pointing out a malicious implementation, I
> realized that on a system with 'hung_task_panic' set, a non-privileged
> user can easily trip the hung task timer and force a panic.
> 
> I just tried the following sequence using FUSE sshfs and without this
> patch my system went down.
> 
>  sudo bash -c 'echo 30 > /proc/sys/kernel/hung_task_timeout_secs'
>  sudo bash -c 'echo 1 > /proc/sys/kernel/hung_task_panic'
>  sshfs -o allow_other,default_permissions you@localhost:/home/you/test ./mnt
>  kill -STOP `pidof /usr/lib/openssh/sftp-server`
>  ls ./mnt/
>  ^C

IMHO hung_task_panic shall not be enabled in productive environment.
Etienne Martineau Dec. 6, 2024, 4:56 p.m. UTC | #5
On Thu, Dec 5, 2024 at 8:47 PM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
>
>
>
> On 12/6/24 1:09 AM, Etienne wrote:
> > On Wed, Dec 4, 2024 at 8:51 PM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
> >>
> >>
> >>
> >> On 12/5/24 12:43 AM, etmartin4313@gmail.com wrote:
> >>> From: Etienne Martineau <etmartin4313@gmail.com>
> >>>
> >>> If hung task checking is enabled and FUSE server stops responding for a
> >>> long period of time, the hung task timer may fire towards the FUSE clients
> >>> and trigger stack dumps that unnecessarily alarm the user.
> >>
> >> Isn't that expected that users shall be notified that there's something
> >> wrong with the FUSE service (because of either buggy implementation or
> >> malicious purpose)?  Or is it expected that the normal latency of
> >> handling a FUSE request is more than 30 seconds?
> >
> > In one way you're right because seeing those stack dumps tells you
> > right away that something is wrong with a FUSE service.
> > Having said that, with many FUSE services running, those stack dumps
> > are not helpful at pointing out which of the FUSE services is having
> > issues.
> >
> > Maybe we should instead have proper debug in place to dump the FUSE
> > connection so that user can abort via
> > /sys/fs/fuse/connections/'nn'/abort
> > Something like "pr_warn("Fuse connection %u not responding\n", fc->dev);" maybe?
>
> If the goal is to identifying which fuse connection is problematic, then
> yes, it is not that easy to do that as the hung task has no concept of
> underlying filesystem.  It is not what the hung task mechanism needs to do.
>
> To do that, at least you should record the per-request timestamp when
> the request is submitted, or a complete timeout mechanism in FUSE as
> pointed by Joanne [1].
>
> [1]
> https://lore.kernel.org/linux-fsdevel/20241114191332.669127-1-joannelkoong@gmail.com/
>
>
> >
> > Also, now that you are pointing out a malicious implementation, I
> > realized that on a system with 'hung_task_panic' set, a non-privileged
> > user can easily trip the hung task timer and force a panic.
> >
> > I just tried the following sequence using FUSE sshfs and without this
> > patch my system went down.
> >
> >  sudo bash -c 'echo 30 > /proc/sys/kernel/hung_task_timeout_secs'
> >  sudo bash -c 'echo 1 > /proc/sys/kernel/hung_task_panic'
> >  sshfs -o allow_other,default_permissions you@localhost:/home/you/test ./mnt
> >  kill -STOP `pidof /usr/lib/openssh/sftp-server`
> >  ls ./mnt/
> >  ^C
>
> IMHO hung_task_panic shall not be enabled in productive environment.
>
>
>
> --
> Thanks,
> Jingbo

Thanks Jingbo and Joanne for the pointers. I left some comments on
that other thread.
Thanks,
Etienne
diff mbox series

Patch

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 27ccae63495d..29e0c9adb799 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -21,6 +21,7 @@ 
 #include <linux/swap.h>
 #include <linux/splice.h>
 #include <linux/sched.h>
+#include <linux/sched/sysctl.h>
 
 #define CREATE_TRACE_POINTS
 #include "fuse_trace.h"
@@ -422,6 +423,8 @@  static void request_wait_answer(struct fuse_req *req)
 {
 	struct fuse_conn *fc = req->fm->fc;
 	struct fuse_iqueue *fiq = &fc->iq;
+	/* Prevent hung task timer from firing at us */
+	unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2;
 	int err;
 
 	if (!fc->no_interrupt) {
@@ -461,7 +464,12 @@  static void request_wait_answer(struct fuse_req *req)
 	 * Either request is already in userspace, or it was forced.
 	 * Wait it out.
 	 */
-	wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
+	if (timeout)
+		while (!wait_event_timeout(req->waitq,
+				test_bit(FR_FINISHED, &req->flags), timeout))
+			;
+	else
+		wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
 }
 
 static void __fuse_request_send(struct fuse_req *req)