Message ID | 20241204164316.219105-1-etmartin4313@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fuse: Prevent hung task warning if FUSE server gets stuck | expand |
On 12/5/24 12:43 AM, etmartin4313@gmail.com wrote: > From: Etienne Martineau <etmartin4313@gmail.com> > > If hung task checking is enabled and FUSE server stops responding for a > long period of time, the hung task timer may fire towards the FUSE clients > and trigger stack dumps that unnecessarily alarm the user. Isn't that expected that users shall be notified that there's something wrong with the FUSE service (because of either buggy implementation or malicious purpose)? Or is it expected that the normal latency of handling a FUSE request is more than 30 seconds? > > So, if hung task checking is enabled, we should wake up periodically to > prevent it from triggering stack dumps. This patch uses a wake-up interval > equal to half the hung_task_timeout_secs timer period, which keeps overhead > low. > > Without this patch, an unresponsive FUSE server can leave the FUSE clients > in D state and produce stack dumps like below when the hung task timer > expire. > > INFO: task cp:2780 blocked for more than 30 seconds. > Not tainted 6.13.0-rc1 #4 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:cat state:D stack:0 pid:2598 tgid:2598 ppid:2583 flags:0x00004006 > ...... > <TASK> > __schedule+0x443/0x16b0 > schedule+0x2b/0x140 > request_wait_answer+0x143/0x220 > __fuse_simple_request+0xd8/0x2c0 > fuse_send_open+0xc5/0x130 > fuse_file_open+0x117/0x1a0 > fuse_open+0x92/0x2f0 > do_dentry_open+0x25d/0x5c0 > vfs_open+0x2a/0x100 > path_openat+0x2f5/0x11d0 > do_filp_open+0xbe/0x180 > do_sys_openat2+0xa1/0xd0 > __x64_sys_openat+0x55/0xa0 > x64_sys_call+0x1998/0x26f0 > do_syscall_64+0x7c/0x170 > ...... > </TASK> > > Signed-off-by: Etienne Martineau <etmartin4313@gmail.com> > --- > fs/fuse/dev.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > index 27ccae63495d..29e0c9adb799 100644 > --- a/fs/fuse/dev.c > +++ b/fs/fuse/dev.c > @@ -21,6 +21,7 @@ > #include <linux/swap.h> > #include <linux/splice.h> > #include <linux/sched.h> > +#include <linux/sched/sysctl.h> > > #define CREATE_TRACE_POINTS > #include "fuse_trace.h" > @@ -422,6 +423,8 @@ static void request_wait_answer(struct fuse_req *req) > { > struct fuse_conn *fc = req->fm->fc; > struct fuse_iqueue *fiq = &fc->iq; > + /* Prevent hung task timer from firing at us */ > + unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2; > int err; > > if (!fc->no_interrupt) { > @@ -461,7 +464,12 @@ static void request_wait_answer(struct fuse_req *req) > * Either request is already in userspace, or it was forced. > * Wait it out. > */ > - wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > + if (timeout) > + while (!wait_event_timeout(req->waitq, > + test_bit(FR_FINISHED, &req->flags), timeout)) > + ; > + else > + wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > } > > static void __fuse_request_send(struct fuse_req *req)
On Wed, Dec 4, 2024 at 8:51 PM Jingbo Xu <jefflexu@linux.alibaba.com> wrote: > > > > On 12/5/24 12:43 AM, etmartin4313@gmail.com wrote: > > From: Etienne Martineau <etmartin4313@gmail.com> > > > > If hung task checking is enabled and FUSE server stops responding for a > > long period of time, the hung task timer may fire towards the FUSE clients > > and trigger stack dumps that unnecessarily alarm the user. > > Isn't that expected that users shall be notified that there's something > wrong with the FUSE service (because of either buggy implementation or > malicious purpose)? Or is it expected that the normal latency of > handling a FUSE request is more than 30 seconds? In one way you're right because seeing those stack dumps tells you right away that something is wrong with a FUSE service. Having said that, with many FUSE services running, those stack dumps are not helpful at pointing out which of the FUSE services is having issues. Maybe we should instead have proper debug in place to dump the FUSE connection so that user can abort via /sys/fs/fuse/connections/'nn'/abort Something like "pr_warn("Fuse connection %u not responding\n", fc->dev);" maybe? Also, now that you are pointing out a malicious implementation, I realized that on a system with 'hung_task_panic' set, a non-privileged user can easily trip the hung task timer and force a panic. I just tried the following sequence using FUSE sshfs and without this patch my system went down. sudo bash -c 'echo 30 > /proc/sys/kernel/hung_task_timeout_secs' sudo bash -c 'echo 1 > /proc/sys/kernel/hung_task_panic' sshfs -o allow_other,default_permissions you@localhost:/home/you/test ./mnt kill -STOP `pidof /usr/lib/openssh/sftp-server` ls ./mnt/ ^C thanks, Etienne
On Thu, Dec 5, 2024 at 9:10 AM Etienne <etmartin4313@gmail.com> wrote: > > On Wed, Dec 4, 2024 at 8:51 PM Jingbo Xu <jefflexu@linux.alibaba.com> wrote: > > > > > > > > On 12/5/24 12:43 AM, etmartin4313@gmail.com wrote: > > > From: Etienne Martineau <etmartin4313@gmail.com> > > > > > > If hung task checking is enabled and FUSE server stops responding for a > > > long period of time, the hung task timer may fire towards the FUSE clients > > > and trigger stack dumps that unnecessarily alarm the user. > > > > Isn't that expected that users shall be notified that there's something > > wrong with the FUSE service (because of either buggy implementation or > > malicious purpose)? Or is it expected that the normal latency of > > handling a FUSE request is more than 30 seconds? > > In one way you're right because seeing those stack dumps tells you > right away that something is wrong with a FUSE service. > Having said that, with many FUSE services running, those stack dumps > are not helpful at pointing out which of the FUSE services is having > issues. > > Maybe we should instead have proper debug in place to dump the FUSE > connection so that user can abort via > /sys/fs/fuse/connections/'nn'/abort > Something like "pr_warn("Fuse connection %u not responding\n", fc->dev);" maybe? Having some identifying information about which connection is unresponsive seems useful, but I don't see a straightforward way of implementing this without adding additional per-request overhead. > > Also, now that you are pointing out a malicious implementation, I > realized that on a system with 'hung_task_panic' set, a non-privileged > user can easily trip the hung task timer and force a panic. > > I just tried the following sequence using FUSE sshfs and without this > patch my system went down. > > sudo bash -c 'echo 30 > /proc/sys/kernel/hung_task_timeout_secs' > sudo bash -c 'echo 1 > /proc/sys/kernel/hung_task_panic' > sshfs -o allow_other,default_permissions you@localhost:/home/you/test ./mnt > kill -STOP `pidof /usr/lib/openssh/sftp-server` > ls ./mnt/ > ^C I'm not sure if this addresses your particular use case, but there's a patch upstream that adds request timeouts https://lore.kernel.org/linux-fsdevel/20241114191332.669127-1-joannelkoong@gmail.com/ This can be set globally via sysctls (eg "/proc/sys/fs/fuse/max_request_timeout") or on a per-server basis. If the timeout elapses and the request has not been fulfilled (eg malicious or buggy fuse server), the kernel will abort the connection automatically. Thanks, Joanne > > thanks, > Etienne >
On 12/6/24 1:09 AM, Etienne wrote: > On Wed, Dec 4, 2024 at 8:51 PM Jingbo Xu <jefflexu@linux.alibaba.com> wrote: >> >> >> >> On 12/5/24 12:43 AM, etmartin4313@gmail.com wrote: >>> From: Etienne Martineau <etmartin4313@gmail.com> >>> >>> If hung task checking is enabled and FUSE server stops responding for a >>> long period of time, the hung task timer may fire towards the FUSE clients >>> and trigger stack dumps that unnecessarily alarm the user. >> >> Isn't that expected that users shall be notified that there's something >> wrong with the FUSE service (because of either buggy implementation or >> malicious purpose)? Or is it expected that the normal latency of >> handling a FUSE request is more than 30 seconds? > > In one way you're right because seeing those stack dumps tells you > right away that something is wrong with a FUSE service. > Having said that, with many FUSE services running, those stack dumps > are not helpful at pointing out which of the FUSE services is having > issues. > > Maybe we should instead have proper debug in place to dump the FUSE > connection so that user can abort via > /sys/fs/fuse/connections/'nn'/abort > Something like "pr_warn("Fuse connection %u not responding\n", fc->dev);" maybe? If the goal is to identifying which fuse connection is problematic, then yes, it is not that easy to do that as the hung task has no concept of underlying filesystem. It is not what the hung task mechanism needs to do. To do that, at least you should record the per-request timestamp when the request is submitted, or a complete timeout mechanism in FUSE as pointed by Joanne [1]. [1] https://lore.kernel.org/linux-fsdevel/20241114191332.669127-1-joannelkoong@gmail.com/ > > Also, now that you are pointing out a malicious implementation, I > realized that on a system with 'hung_task_panic' set, a non-privileged > user can easily trip the hung task timer and force a panic. > > I just tried the following sequence using FUSE sshfs and without this > patch my system went down. > > sudo bash -c 'echo 30 > /proc/sys/kernel/hung_task_timeout_secs' > sudo bash -c 'echo 1 > /proc/sys/kernel/hung_task_panic' > sshfs -o allow_other,default_permissions you@localhost:/home/you/test ./mnt > kill -STOP `pidof /usr/lib/openssh/sftp-server` > ls ./mnt/ > ^C IMHO hung_task_panic shall not be enabled in productive environment.
On Thu, Dec 5, 2024 at 8:47 PM Jingbo Xu <jefflexu@linux.alibaba.com> wrote: > > > > On 12/6/24 1:09 AM, Etienne wrote: > > On Wed, Dec 4, 2024 at 8:51 PM Jingbo Xu <jefflexu@linux.alibaba.com> wrote: > >> > >> > >> > >> On 12/5/24 12:43 AM, etmartin4313@gmail.com wrote: > >>> From: Etienne Martineau <etmartin4313@gmail.com> > >>> > >>> If hung task checking is enabled and FUSE server stops responding for a > >>> long period of time, the hung task timer may fire towards the FUSE clients > >>> and trigger stack dumps that unnecessarily alarm the user. > >> > >> Isn't that expected that users shall be notified that there's something > >> wrong with the FUSE service (because of either buggy implementation or > >> malicious purpose)? Or is it expected that the normal latency of > >> handling a FUSE request is more than 30 seconds? > > > > In one way you're right because seeing those stack dumps tells you > > right away that something is wrong with a FUSE service. > > Having said that, with many FUSE services running, those stack dumps > > are not helpful at pointing out which of the FUSE services is having > > issues. > > > > Maybe we should instead have proper debug in place to dump the FUSE > > connection so that user can abort via > > /sys/fs/fuse/connections/'nn'/abort > > Something like "pr_warn("Fuse connection %u not responding\n", fc->dev);" maybe? > > If the goal is to identifying which fuse connection is problematic, then > yes, it is not that easy to do that as the hung task has no concept of > underlying filesystem. It is not what the hung task mechanism needs to do. > > To do that, at least you should record the per-request timestamp when > the request is submitted, or a complete timeout mechanism in FUSE as > pointed by Joanne [1]. > > [1] > https://lore.kernel.org/linux-fsdevel/20241114191332.669127-1-joannelkoong@gmail.com/ > > > > > > Also, now that you are pointing out a malicious implementation, I > > realized that on a system with 'hung_task_panic' set, a non-privileged > > user can easily trip the hung task timer and force a panic. > > > > I just tried the following sequence using FUSE sshfs and without this > > patch my system went down. > > > > sudo bash -c 'echo 30 > /proc/sys/kernel/hung_task_timeout_secs' > > sudo bash -c 'echo 1 > /proc/sys/kernel/hung_task_panic' > > sshfs -o allow_other,default_permissions you@localhost:/home/you/test ./mnt > > kill -STOP `pidof /usr/lib/openssh/sftp-server` > > ls ./mnt/ > > ^C > > IMHO hung_task_panic shall not be enabled in productive environment. > > > > -- > Thanks, > Jingbo Thanks Jingbo and Joanne for the pointers. I left some comments on that other thread. Thanks, Etienne
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 27ccae63495d..29e0c9adb799 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -21,6 +21,7 @@ #include <linux/swap.h> #include <linux/splice.h> #include <linux/sched.h> +#include <linux/sched/sysctl.h> #define CREATE_TRACE_POINTS #include "fuse_trace.h" @@ -422,6 +423,8 @@ static void request_wait_answer(struct fuse_req *req) { struct fuse_conn *fc = req->fm->fc; struct fuse_iqueue *fiq = &fc->iq; + /* Prevent hung task timer from firing at us */ + unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2; int err; if (!fc->no_interrupt) { @@ -461,7 +464,12 @@ static void request_wait_answer(struct fuse_req *req) * Either request is already in userspace, or it was forced. * Wait it out. */ - wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); + if (timeout) + while (!wait_event_timeout(req->waitq, + test_bit(FR_FINISHED, &req->flags), timeout)) + ; + else + wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); } static void __fuse_request_send(struct fuse_req *req)