Message ID | 20240830162649.3849586-2-joannelkoong@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fuse: add timeout option for requests | expand |
On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote: > > There are situations where fuse servers can become unresponsive or > stuck, for example if the server is in a deadlock. Currently, there's > no good way to detect if a server is stuck and needs to be killed > manually. > > This commit adds an option for enforcing a timeout (in seconds) on > requests where if the timeout elapses without a reply from the server, > the connection will be automatically aborted. Okay. I'm not sure what the overhead (scheduling and memory) of timers, but starting one for each request seems excessive. Can we make the timeout per-connection instead of per request? I.e. When the first request is sent, the timer is started. When a reply is received but there are still outstanding requests, the timer is reset. When the last reply is received, the timer is stopped. This should handle the frozen server case just as well. It may not perfectly handle the case when the server is still alive but for some reason one or more requests get stuck, while others are still being processed. The latter case is unlikely to be an issue in practice, IMO. Thanks, Miklos
On 9/2/24 12:37, Miklos Szeredi wrote: > On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote: >> >> There are situations where fuse servers can become unresponsive or >> stuck, for example if the server is in a deadlock. Currently, there's >> no good way to detect if a server is stuck and needs to be killed >> manually. >> >> This commit adds an option for enforcing a timeout (in seconds) on >> requests where if the timeout elapses without a reply from the server, >> the connection will be automatically aborted. > > Okay. > > I'm not sure what the overhead (scheduling and memory) of timers, but > starting one for each request seems excessive. > > Can we make the timeout per-connection instead of per request? > > I.e. When the first request is sent, the timer is started. When a > reply is received but there are still outstanding requests, the timer > is reset. When the last reply is received, the timer is stopped. > > This should handle the frozen server case just as well. It may not > perfectly handle the case when the server is still alive but for some > reason one or more requests get stuck, while others are still being > processed. The latter case is unlikely to be an issue in practice, > IMO. In case of distributed servers, it can easily happen that one server has an issue, while other servers still process requests. Especially when these are just requests that read/getattr/etc and do not write, i.e. accessing the stuck server is not needed by other servers. So in my opinion not so unlikely. Although for such cases not difficult to timeout within the fuse server. Thanks, Bernd
On Mon, 2 Sept 2024 at 12:50, Bernd Schubert <bernd.schubert@fastmail.fm> wrote: > In case of distributed servers, it can easily happen that one server has > an issue, while other servers still process requests. Especially when > these are just requests that read/getattr/etc and do not write, i.e. > accessing the stuck server is not needed by other servers. So in my > opinion not so unlikely. Although for such cases not difficult to > timeout within the fuse server. Exactly. Normally the kernel should not need to time out fuse requests, and it might be actively detrimental to do so. The main case this wants to solve is a deadlocked server due to programming error, AFAICS. And this would only work in environments where requests are guaranteed to complete within some time period. So if the server needs to handle request timeouts, then it should *not* rely on the kernel timeout. The kernel timeout should be a safeguard against broken or malicious servers, not an aid to implement request timeouts. Thanks, Miklos
On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote: > > > > There are situations where fuse servers can become unresponsive or > > stuck, for example if the server is in a deadlock. Currently, there's > > no good way to detect if a server is stuck and needs to be killed > > manually. > > > > This commit adds an option for enforcing a timeout (in seconds) on > > requests where if the timeout elapses without a reply from the server, > > the connection will be automatically aborted. > > Okay. > > I'm not sure what the overhead (scheduling and memory) of timers, but > starting one for each request seems excessive. > > Can we make the timeout per-connection instead of per request? > > I.e. When the first request is sent, the timer is started. When a > reply is received but there are still outstanding requests, the timer > is reset. When the last reply is received, the timer is stopped. > > This should handle the frozen server case just as well. It may not > perfectly handle the case when the server is still alive but for some > reason one or more requests get stuck, while others are still being > processed. The latter case is unlikely to be an issue in practice, > IMO. In that case, if the timeout is per-connection instead of per-request and we're not stringent about some requests getting stuck, maybe it makes more sense to just do this in userspace (libfuse) then? That seems pretty simple with having a watchdog thread that periodically (according to whatever specified timeout) checks if the number of requests serviced is increasing when /sys/fs/fuse/connections/*/waiting is non-zero. If there are multiple server threads (eg libfuse's fuse_loop_mt interface) and say, all of them are deadlocked except for 1 that is actively servicing requests, then this wouldn't catch that case, but even if this per-connection timeout was enforced in the kernel instead, it wouldn't catch that case either. So maybe this logic should just be moved to libfuse then? For this we'd need to pass the connection's device id (fc->dev) to userspace which i don't think we currently do, but that seems pretty simple. The one downside I see is that this doesn't let sysadmins enforce an automatic system-wide "max timeout" against malicious fuse servers but if we are having the timeout be per-connection instead of per-request, then a malicious server could still be malicious anyways (eg deadlocking all threads except for 1). Curious to hear what your and Bernrd's thoughts on this are. Thanks, Joanne > > Thanks, > Miklos
On 9/3/24 19:25, Joanne Koong wrote: > On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote: >> >> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote: >>> >>> There are situations where fuse servers can become unresponsive or >>> stuck, for example if the server is in a deadlock. Currently, there's >>> no good way to detect if a server is stuck and needs to be killed >>> manually. >>> >>> This commit adds an option for enforcing a timeout (in seconds) on >>> requests where if the timeout elapses without a reply from the server, >>> the connection will be automatically aborted. >> >> Okay. >> >> I'm not sure what the overhead (scheduling and memory) of timers, but >> starting one for each request seems excessive. >> >> Can we make the timeout per-connection instead of per request? >> >> I.e. When the first request is sent, the timer is started. When a >> reply is received but there are still outstanding requests, the timer >> is reset. When the last reply is received, the timer is stopped. >> >> This should handle the frozen server case just as well. It may not >> perfectly handle the case when the server is still alive but for some >> reason one or more requests get stuck, while others are still being >> processed. The latter case is unlikely to be an issue in practice, >> IMO. > > In that case, if the timeout is per-connection instead of per-request > and we're not stringent about some requests getting stuck, maybe it > makes more sense to just do this in userspace (libfuse) then? That > seems pretty simple with having a watchdog thread that periodically > (according to whatever specified timeout) checks if the number of > requests serviced is increasing when > /sys/fs/fuse/connections/*/waiting is non-zero. > > If there are multiple server threads (eg libfuse's fuse_loop_mt > interface) and say, all of them are deadlocked except for 1 that is > actively servicing requests, then this wouldn't catch that case, but > even if this per-connection timeout was enforced in the kernel > instead, it wouldn't catch that case either. > > So maybe this logic should just be moved to libfuse then? For this > we'd need to pass the connection's device id (fc->dev) to userspace > which i don't think we currently do, but that seems pretty simple. The > one downside I see is that this doesn't let sysadmins enforce an > automatic system-wide "max timeout" against malicious fuse servers but > if we are having the timeout be per-connection instead of per-request, > then a malicious server could still be malicious anyways (eg > deadlocking all threads except for 1). > > Curious to hear what your and Bernrd's thoughts on this are. I have question here, does it need to be an exact timeout or could it be an interval/epoch? Let's say you timeout based on epoch lists? Example 1) epoch-a starts, requests are added to epoch-a list. 2) epoch-b starts, epoch-a list should get empty 3) epoch-c starts, epoch-b list should get empty, kill the connection if epoch-a list is not empty (epoch-c list should not be needed, as epoch-a list can be used, once confirmed it is empty.) Here timeout would be epoch-a + epoch-b, i.e. max-timeout <= 2 * epoch-time. We could have more epochs/list-heads to make it more fine grained. From my point of view that should be a rather cheap, as it just adding/removing requests from list and checking for timeout if a list is empty. With the caveat that it is not precise anymore. That could be implemented in kernel and/or libfuse? Thanks, Bernd
On Tue, Sep 3, 2024 at 3:38 PM Bernd Schubert <bernd.schubert@fastmail.fm> wrote: > > On 9/3/24 19:25, Joanne Koong wrote: > > On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > >> > >> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote: > >>> > >>> There are situations where fuse servers can become unresponsive or > >>> stuck, for example if the server is in a deadlock. Currently, there's > >>> no good way to detect if a server is stuck and needs to be killed > >>> manually. > >>> > >>> This commit adds an option for enforcing a timeout (in seconds) on > >>> requests where if the timeout elapses without a reply from the server, > >>> the connection will be automatically aborted. > >> > >> Okay. > >> > >> I'm not sure what the overhead (scheduling and memory) of timers, but > >> starting one for each request seems excessive. > >> > >> Can we make the timeout per-connection instead of per request? > >> > >> I.e. When the first request is sent, the timer is started. When a > >> reply is received but there are still outstanding requests, the timer > >> is reset. When the last reply is received, the timer is stopped. > >> > >> This should handle the frozen server case just as well. It may not > >> perfectly handle the case when the server is still alive but for some > >> reason one or more requests get stuck, while others are still being > >> processed. The latter case is unlikely to be an issue in practice, > >> IMO. > > > > In that case, if the timeout is per-connection instead of per-request > > and we're not stringent about some requests getting stuck, maybe it > > makes more sense to just do this in userspace (libfuse) then? That > > seems pretty simple with having a watchdog thread that periodically > > (according to whatever specified timeout) checks if the number of > > requests serviced is increasing when > > /sys/fs/fuse/connections/*/waiting is non-zero. > > > > If there are multiple server threads (eg libfuse's fuse_loop_mt > > interface) and say, all of them are deadlocked except for 1 that is > > actively servicing requests, then this wouldn't catch that case, but > > even if this per-connection timeout was enforced in the kernel > > instead, it wouldn't catch that case either. > > > > So maybe this logic should just be moved to libfuse then? For this > > we'd need to pass the connection's device id (fc->dev) to userspace > > which i don't think we currently do, but that seems pretty simple. The > > one downside I see is that this doesn't let sysadmins enforce an > > automatic system-wide "max timeout" against malicious fuse servers but > > if we are having the timeout be per-connection instead of per-request, > > then a malicious server could still be malicious anyways (eg > > deadlocking all threads except for 1). > > > > Curious to hear what your and Bernrd's thoughts on this are. > > > I have question here, does it need to be an exact timeout or could it be > an interval/epoch? Let's say you timeout based on epoch lists? Example > > 1) epoch-a starts, requests are added to epoch-a list. > 2) epoch-b starts, epoch-a list should get empty > 3) epoch-c starts, epoch-b list should get empty, kill the connection if > epoch-a list is not empty (epoch-c list should not be needed, as epoch-a > list can be used, once confirmed it is empty.) > > > Here timeout would be epoch-a + epoch-b, i.e. > max-timeout <= 2 * epoch-time. > We could have more epochs/list-heads to make it more fine grained. > > > From my point of view that should be a rather cheap, as it just > adding/removing requests from list and checking for timeout if a list is > empty. With the caveat that it is not precise anymore. I like this idea a lot. I like that it enforces per-request behavior and guarantees that any stalled request will abort the connection. I think it's fine for the timeout to be an interval/epoch so long as the documentation explicitly makes that clear. I think this would need to be done in the kernel instead of libfuse because if the server is in a deadlock when there are no pending requests in the lists and then the kernel sends requests to the server, none of the requests will make it to the list for the timer handler to detect any issues. Before I make this change for v7, Miklos what are your thoughts on this direction? Thanks, Joanne > > That could be implemented in kernel and/or libfuse? > > > Thanks, > Bernd
Hi Joanne, On 9/4/24 19:23, Joanne Koong wrote: > On Tue, Sep 3, 2024 at 3:38 PM Bernd Schubert >> >> >> I have question here, does it need to be an exact timeout or could it be >> an interval/epoch? Let's say you timeout based on epoch lists? Example >> >> 1) epoch-a starts, requests are added to epoch-a list. >> 2) epoch-b starts, epoch-a list should get empty >> 3) epoch-c starts, epoch-b list should get empty, kill the connection if >> epoch-a list is not empty (epoch-c list should not be needed, as epoch-a >> list can be used, once confirmed it is empty.) >> >> >> Here timeout would be epoch-a + epoch-b, i.e. >> max-timeout <= 2 * epoch-time. >> We could have more epochs/list-heads to make it more fine grained. >> >> >> From my point of view that should be a rather cheap, as it just >> adding/removing requests from list and checking for timeout if a list is >> empty. With the caveat that it is not precise anymore. > > I like this idea a lot. I like that it enforces per-request behavior > and guarantees that any stalled request will abort the connection. I > think it's fine for the timeout to be an interval/epoch so long as the > documentation explicitly makes that clear. I think this would need to > be done in the kernel instead of libfuse because if the server is in a > deadlock when there are no pending requests in the lists and then the > kernel sends requests to the server, none of the requests will make it > to the list for the timer handler to detect any issues. > > Before I make this change for v7, Miklos what are your thoughts on > this direction? we briefly discussed it with Miklos and Miklos agreed that epoch list should be fine (would be great if you could quickly confirm, Miklos). In the mean time I have another use case for timeout lists. Basically Jakob from Cern (in CC) is asking for way to stop requests to fuse-server and then to resume. I think that can be done easily through notifications and unsetting (and later setting) fc->initialized. Demo patch follows around tomorrow, but then Jakob actually wants to know when it is safe to restart fuse-server (or part of it). That is where the epoch timeout list would be handy - reply to the notification should happen when the lists got empty, i.e. no request is handled anymore. I think like this is better than FUSE_NOTIFY_RESEND, as that has an issue with non-idempotent requests. Thanks, Bernd
On Wed, 18 Sept 2024 at 00:00, Bernd Schubert <bernd.schubert@fastmail.fm> wrote: > > I like this idea a lot. I like that it enforces per-request behavior > > and guarantees that any stalled request will abort the connection. I > > think it's fine for the timeout to be an interval/epoch so long as the > > documentation explicitly makes that clear. I think this would need to > > be done in the kernel instead of libfuse because if the server is in a > > deadlock when there are no pending requests in the lists and then the > > kernel sends requests to the server, none of the requests will make it > > to the list for the timer handler to detect any issues. > > > > Before I make this change for v7, Miklos what are your thoughts on > > this direction? I have no problem with epochs, but there are scalability issue with shared lists. So I'm not sure. Maybe the individual timers would be the best solution but I don't know the overhead and the scalability of that solution either. Thanks, Miklos
On September 18, 2024 9:29:44 AM GMT+02:00, Miklos Szeredi <miklos@szeredi.hu> wrote: >On Wed, 18 Sept 2024 at 00:00, Bernd Schubert ><bernd.schubert@fastmail.fm> wrote: > >> > I like this idea a lot. I like that it enforces per-request behavior >> > and guarantees that any stalled request will abort the connection. I >> > think it's fine for the timeout to be an interval/epoch so long as the >> > documentation explicitly makes that clear. I think this would need to >> > be done in the kernel instead of libfuse because if the server is in a >> > deadlock when there are no pending requests in the lists and then the >> > kernel sends requests to the server, none of the requests will make it >> > to the list for the timer handler to detect any issues. >> > >> > Before I make this change for v7, Miklos what are your thoughts on >> > this direction? > >I have no problem with epochs, but there are scalability issue with >shared lists. So I'm not sure. Maybe the individual timers would be >the best solution but I don't know the overhead and the scalability of >that solution either. > >Thanks, >Miklos In principle we could also have lists per device and then a bitmap when the lists are empty. But now certainly more complex, especially as a device clone can happen any time. Thanks, Bernd
On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote: > > > > There are situations where fuse servers can become unresponsive or > > stuck, for example if the server is in a deadlock. Currently, there's > > no good way to detect if a server is stuck and needs to be killed > > manually. > > > > This commit adds an option for enforcing a timeout (in seconds) on > > requests where if the timeout elapses without a reply from the server, > > the connection will be automatically aborted. > > Okay. > > I'm not sure what the overhead (scheduling and memory) of timers, but > starting one for each request seems excessive. I ran some benchmarks on this using the passthrough_ll server and saw roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s): fio --name randwrite --ioengine=sync --thread --invalidate=1 --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4 --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345 --group_reporting=1 --directory=/root/fuse_mount Instead of attaching a timer to each request, I think we can instead do the following: * add a "start_time" field to each request tracking (in jiffies) when the request was started * add a new list to the connection that all requests get enqueued onto. When the request is completed, it gets dequeued from this list * have a timer for the connection that fires off every 10 seconds or so. When this timer is fired, it checks if "jiffies > req->start_time + fc->req_timeout" against the head of the list to check if the timeout has expired and we need to abort the request. We only need to check against the head of the list because we know every other request after this was started later in time. I think we could even just use the fc->lock for this instead of needing a separate lock. In the worst case, this grants a 10 second upper bound on the timeout a user requests (eg if the user requests 2 minutes, in the worst case the timeout would trigger at 2 minutes and 10 seconds). Also, now that we're aborting the connection entirely on a timeout instead of just aborting the request, maybe it makes sense to change the timeout granularity to minutes instead of seconds. I'm envisioning that this timeout mechanism will mostly be used as a safeguard against malicious or buggy servers with a high timeout configured (eg 10 minutes), and minutes seems like a nicer interface for users than them having to convert that to seconds. Let me know if I've missed anything with this approach but if not, then I'll submit v7 with this change. Thanks, Joanne > > Can we make the timeout per-connection instead of per request? > > I.e. When the first request is sent, the timer is started. When a > reply is received but there are still outstanding requests, the timer > is reset. When the last reply is received, the timer is stopped. > > This should handle the frozen server case just as well. It may not > perfectly handle the case when the server is still alive but for some > reason one or more requests get stuck, while others are still being > processed. The latter case is unlikely to be an issue in practice, > IMO. > > Thanks, > Miklos
Hi Joanne, On 9/27/24 21:36, Joanne Koong wrote: > On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote: >> >> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote: >>> >>> There are situations where fuse servers can become unresponsive or >>> stuck, for example if the server is in a deadlock. Currently, there's >>> no good way to detect if a server is stuck and needs to be killed >>> manually. >>> >>> This commit adds an option for enforcing a timeout (in seconds) on >>> requests where if the timeout elapses without a reply from the server, >>> the connection will be automatically aborted. >> >> Okay. >> >> I'm not sure what the overhead (scheduling and memory) of timers, but >> starting one for each request seems excessive. > > I ran some benchmarks on this using the passthrough_ll server and saw > roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s): > fio --name randwrite --ioengine=sync --thread --invalidate=1 > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4 > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345 > --group_reporting=1 --directory=/root/fuse_mount > > Instead of attaching a timer to each request, I think we can instead > do the following: > * add a "start_time" field to each request tracking (in jiffies) when > the request was started > * add a new list to the connection that all requests get enqueued > onto. When the request is completed, it gets dequeued from this list > * have a timer for the connection that fires off every 10 seconds or > so. When this timer is fired, it checks if "jiffies > req->start_time > + fc->req_timeout" against the head of the list to check if the > timeout has expired and we need to abort the request. We only need to > check against the head of the list because we know every other request > after this was started later in time. I think we could even just use > the fc->lock for this instead of needing a separate lock. In the worst > case, this grants a 10 second upper bound on the timeout a user > requests (eg if the user requests 2 minutes, in the worst case the > timeout would trigger at 2 minutes and 10 seconds). > > Also, now that we're aborting the connection entirely on a timeout > instead of just aborting the request, maybe it makes sense to change > the timeout granularity to minutes instead of seconds. I'm envisioning > that this timeout mechanism will mostly be used as a safeguard against > malicious or buggy servers with a high timeout configured (eg 10 > minutes), and minutes seems like a nicer interface for users than them > having to convert that to seconds. > > Let me know if I've missed anything with this approach but if not, > then I'll submit v7 with this change. sounds great to me. Just, could we do this per fuse_dev to avoid a single lock for all cores? Thanks, Bernd
On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert <bernd.schubert@fastmail.fm> wrote: > > Hi Joanne, > > On 9/27/24 21:36, Joanne Koong wrote: > > On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > >> > >> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote: > >>> > >>> There are situations where fuse servers can become unresponsive or > >>> stuck, for example if the server is in a deadlock. Currently, there's > >>> no good way to detect if a server is stuck and needs to be killed > >>> manually. > >>> > >>> This commit adds an option for enforcing a timeout (in seconds) on > >>> requests where if the timeout elapses without a reply from the server, > >>> the connection will be automatically aborted. > >> > >> Okay. > >> > >> I'm not sure what the overhead (scheduling and memory) of timers, but > >> starting one for each request seems excessive. > > > > I ran some benchmarks on this using the passthrough_ll server and saw > > roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s): > > fio --name randwrite --ioengine=sync --thread --invalidate=1 > > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4 > > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345 > > --group_reporting=1 --directory=/root/fuse_mount > > > > Instead of attaching a timer to each request, I think we can instead > > do the following: > > * add a "start_time" field to each request tracking (in jiffies) when > > the request was started > > * add a new list to the connection that all requests get enqueued > > onto. When the request is completed, it gets dequeued from this list > > * have a timer for the connection that fires off every 10 seconds or > > so. When this timer is fired, it checks if "jiffies > req->start_time > > + fc->req_timeout" against the head of the list to check if the > > timeout has expired and we need to abort the request. We only need to > > check against the head of the list because we know every other request > > after this was started later in time. I think we could even just use > > the fc->lock for this instead of needing a separate lock. In the worst > > case, this grants a 10 second upper bound on the timeout a user > > requests (eg if the user requests 2 minutes, in the worst case the > > timeout would trigger at 2 minutes and 10 seconds). > > > > Also, now that we're aborting the connection entirely on a timeout > > instead of just aborting the request, maybe it makes sense to change > > the timeout granularity to minutes instead of seconds. I'm envisioning > > that this timeout mechanism will mostly be used as a safeguard against > > malicious or buggy servers with a high timeout configured (eg 10 > > minutes), and minutes seems like a nicer interface for users than them > > having to convert that to seconds. > > > > Let me know if I've missed anything with this approach but if not, > > then I'll submit v7 with this change. > > > sounds great to me. Just, could we do this per fuse_dev to avoid a > single lock for all cores? > Will do! thanks for the suggestion - in that case, I'll add its own spinlock for it too then. Thanks, Joanne > > Thanks, > Bernd
On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@gmail.com> wrote: > > On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert > <bernd.schubert@fastmail.fm> wrote: > > > > Hi Joanne, > > > > On 9/27/24 21:36, Joanne Koong wrote: > > > On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > >> > > >> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote: > > >>> > > >>> There are situations where fuse servers can become unresponsive or > > >>> stuck, for example if the server is in a deadlock. Currently, there's > > >>> no good way to detect if a server is stuck and needs to be killed > > >>> manually. > > >>> > > >>> This commit adds an option for enforcing a timeout (in seconds) on > > >>> requests where if the timeout elapses without a reply from the server, > > >>> the connection will be automatically aborted. > > >> > > >> Okay. > > >> > > >> I'm not sure what the overhead (scheduling and memory) of timers, but > > >> starting one for each request seems excessive. > > > > > > I ran some benchmarks on this using the passthrough_ll server and saw > > > roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s): > > > fio --name randwrite --ioengine=sync --thread --invalidate=1 > > > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4 > > > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345 > > > --group_reporting=1 --directory=/root/fuse_mount > > > > > > Instead of attaching a timer to each request, I think we can instead > > > do the following: > > > * add a "start_time" field to each request tracking (in jiffies) when > > > the request was started > > > * add a new list to the connection that all requests get enqueued > > > onto. When the request is completed, it gets dequeued from this list > > > * have a timer for the connection that fires off every 10 seconds or > > > so. When this timer is fired, it checks if "jiffies > req->start_time > > > + fc->req_timeout" against the head of the list to check if the > > > timeout has expired and we need to abort the request. We only need to > > > check against the head of the list because we know every other request > > > after this was started later in time. I think we could even just use > > > the fc->lock for this instead of needing a separate lock. In the worst > > > case, this grants a 10 second upper bound on the timeout a user > > > requests (eg if the user requests 2 minutes, in the worst case the > > > timeout would trigger at 2 minutes and 10 seconds). > > > > > > Also, now that we're aborting the connection entirely on a timeout > > > instead of just aborting the request, maybe it makes sense to change > > > the timeout granularity to minutes instead of seconds. I'm envisioning > > > that this timeout mechanism will mostly be used as a safeguard against > > > malicious or buggy servers with a high timeout configured (eg 10 > > > minutes), and minutes seems like a nicer interface for users than them > > > having to convert that to seconds. > > > > > > Let me know if I've missed anything with this approach but if not, > > > then I'll submit v7 with this change. > > > > > > sounds great to me. Just, could we do this per fuse_dev to avoid a > > single lock for all cores? > > > > Will do! thanks for the suggestion - in that case, I'll add its own > spinlock for it too then. Actually, looking at this some more, we can just put this in the "struct fuse_pqueue" and use the fpq spinlock since the check for whether any requests timed out will be very quick (eg just checking against the first entry in the list). > > Thanks, > Joanne > > > > > Thanks, > > Bernd
On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@gmail.com> wrote: > > On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert > <bernd.schubert@fastmail.fm> wrote: > > > > Hi Joanne, > > > > On 9/27/24 21:36, Joanne Koong wrote: > > > On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > >> > > >> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote: > > >>> > > >>> There are situations where fuse servers can become unresponsive or > > >>> stuck, for example if the server is in a deadlock. Currently, there's > > >>> no good way to detect if a server is stuck and needs to be killed > > >>> manually. > > >>> > > >>> This commit adds an option for enforcing a timeout (in seconds) on > > >>> requests where if the timeout elapses without a reply from the server, > > >>> the connection will be automatically aborted. > > >> > > >> Okay. > > >> > > >> I'm not sure what the overhead (scheduling and memory) of timers, but > > >> starting one for each request seems excessive. > > > > > > I ran some benchmarks on this using the passthrough_ll server and saw > > > roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s): > > > fio --name randwrite --ioengine=sync --thread --invalidate=1 > > > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4 > > > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345 > > > --group_reporting=1 --directory=/root/fuse_mount > > > > > > Instead of attaching a timer to each request, I think we can instead > > > do the following: > > > * add a "start_time" field to each request tracking (in jiffies) when > > > the request was started > > > * add a new list to the connection that all requests get enqueued > > > onto. When the request is completed, it gets dequeued from this list > > > * have a timer for the connection that fires off every 10 seconds or > > > so. When this timer is fired, it checks if "jiffies > req->start_time > > > + fc->req_timeout" against the head of the list to check if the > > > timeout has expired and we need to abort the request. We only need to > > > check against the head of the list because we know every other request > > > after this was started later in time. I think we could even just use > > > the fc->lock for this instead of needing a separate lock. In the worst > > > case, this grants a 10 second upper bound on the timeout a user > > > requests (eg if the user requests 2 minutes, in the worst case the > > > timeout would trigger at 2 minutes and 10 seconds). > > > > > > Also, now that we're aborting the connection entirely on a timeout > > > instead of just aborting the request, maybe it makes sense to change > > > the timeout granularity to minutes instead of seconds. I'm envisioning > > > that this timeout mechanism will mostly be used as a safeguard against > > > malicious or buggy servers with a high timeout configured (eg 10 > > > minutes), and minutes seems like a nicer interface for users than them > > > having to convert that to seconds. > > > > > > Let me know if I've missed anything with this approach but if not, > > > then I'll submit v7 with this change. > > > > > > sounds great to me. Just, could we do this per fuse_dev to avoid a > > single lock for all cores? > > > > Will do! thanks for the suggestion - in that case, I'll add its own > spinlock for it too then. I realized while working on v7 that we can't do this per fuse device because the request is only associated with a device once it's read in by the server (eg fuse_dev_do_read). I ran some rough preliminary benchmarks with ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=4 -o source=~/fstests ~/fuse_mount and fio --name randwrite --ioengine=sync --thread --invalidate=1 --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4 --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345 --group_reporting=1 --directory=fuse_mount and didn't see any noticeable difference in throughput (~37 MiB/sec on my system) with vs without the timeout. Thanks, Joanne > > Thanks, > Joanne > > > > > Thanks, > > Bernd
On 10/7/24 20:39, Joanne Koong wrote: > On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@gmail.com> wrote: >> >> On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert >> <bernd.schubert@fastmail.fm> wrote: >>> >>> Hi Joanne, >>> >>> On 9/27/24 21:36, Joanne Koong wrote: >>>> On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote: >>>>> >>>>> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote: >>>>>> >>>>>> There are situations where fuse servers can become unresponsive or >>>>>> stuck, for example if the server is in a deadlock. Currently, there's >>>>>> no good way to detect if a server is stuck and needs to be killed >>>>>> manually. >>>>>> >>>>>> This commit adds an option for enforcing a timeout (in seconds) on >>>>>> requests where if the timeout elapses without a reply from the server, >>>>>> the connection will be automatically aborted. >>>>> >>>>> Okay. >>>>> >>>>> I'm not sure what the overhead (scheduling and memory) of timers, but >>>>> starting one for each request seems excessive. >>>> >>>> I ran some benchmarks on this using the passthrough_ll server and saw >>>> roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s): >>>> fio --name randwrite --ioengine=sync --thread --invalidate=1 >>>> --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4 >>>> --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345 >>>> --group_reporting=1 --directory=/root/fuse_mount >>>> >>>> Instead of attaching a timer to each request, I think we can instead >>>> do the following: >>>> * add a "start_time" field to each request tracking (in jiffies) when >>>> the request was started >>>> * add a new list to the connection that all requests get enqueued >>>> onto. When the request is completed, it gets dequeued from this list >>>> * have a timer for the connection that fires off every 10 seconds or >>>> so. When this timer is fired, it checks if "jiffies > req->start_time >>>> + fc->req_timeout" against the head of the list to check if the >>>> timeout has expired and we need to abort the request. We only need to >>>> check against the head of the list because we know every other request >>>> after this was started later in time. I think we could even just use >>>> the fc->lock for this instead of needing a separate lock. In the worst >>>> case, this grants a 10 second upper bound on the timeout a user >>>> requests (eg if the user requests 2 minutes, in the worst case the >>>> timeout would trigger at 2 minutes and 10 seconds). >>>> >>>> Also, now that we're aborting the connection entirely on a timeout >>>> instead of just aborting the request, maybe it makes sense to change >>>> the timeout granularity to minutes instead of seconds. I'm envisioning >>>> that this timeout mechanism will mostly be used as a safeguard against >>>> malicious or buggy servers with a high timeout configured (eg 10 >>>> minutes), and minutes seems like a nicer interface for users than them >>>> having to convert that to seconds. >>>> >>>> Let me know if I've missed anything with this approach but if not, >>>> then I'll submit v7 with this change. >>> >>> >>> sounds great to me. Just, could we do this per fuse_dev to avoid a >>> single lock for all cores? >>> >> >> Will do! thanks for the suggestion - in that case, I'll add its own >> spinlock for it too then. > > I realized while working on v7 that we can't do this per fuse device > because the request is only associated with a device once it's read in > by the server (eg fuse_dev_do_read). > > I ran some rough preliminary benchmarks with > ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=4 > -o source=~/fstests ~/fuse_mount > and > fio --name randwrite --ioengine=sync --thread --invalidate=1 > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4 > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345 > --group_reporting=1 --directory=fuse_mount > > and didn't see any noticeable difference in throughput (~37 MiB/sec on > my system) with vs without the timeout. > That is not much, isn't your limit the backend? I wonder what would happen with 25GB/s I'm testing with. Wouldn't it make sense for this to test with sequential large IO? And possibly with the passthrough-hp branch that bypasses IO? And a NUMA system probably would be helpful as well. I.e. to test the effect on the kernel side without having an IO limited system? With the io-uring interface requests stay in queues from the in-coming CPU so easier to achieve there, although I wonder for your use-case if it wouldn't be sufficient to start the timer only when the request is on the way to fuse-server? One disadvantage I see is that virtiofs would need to be specially handled. Thanks, Bernd
On Mon, Oct 7, 2024 at 1:02 PM Bernd Schubert <bernd.schubert@fastmail.fm> wrote: > > On 10/7/24 20:39, Joanne Koong wrote: > > On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@gmail.com> wrote: > >> > >> On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert > >> <bernd.schubert@fastmail.fm> wrote: > >>> > >>> Hi Joanne, > >>> > >>> On 9/27/24 21:36, Joanne Koong wrote: > >>>> On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > >>>>> > >>>>> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote: > >>>>>> > >>>>>> There are situations where fuse servers can become unresponsive or > >>>>>> stuck, for example if the server is in a deadlock. Currently, there's > >>>>>> no good way to detect if a server is stuck and needs to be killed > >>>>>> manually. > >>>>>> > >>>>>> This commit adds an option for enforcing a timeout (in seconds) on > >>>>>> requests where if the timeout elapses without a reply from the server, > >>>>>> the connection will be automatically aborted. > >>>>> > >>>>> Okay. > >>>>> > >>>>> I'm not sure what the overhead (scheduling and memory) of timers, but > >>>>> starting one for each request seems excessive. > >>>> > >>>> I ran some benchmarks on this using the passthrough_ll server and saw > >>>> roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s): > >>>> fio --name randwrite --ioengine=sync --thread --invalidate=1 > >>>> --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4 > >>>> --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345 > >>>> --group_reporting=1 --directory=/root/fuse_mount > >>>> > >>>> Instead of attaching a timer to each request, I think we can instead > >>>> do the following: > >>>> * add a "start_time" field to each request tracking (in jiffies) when > >>>> the request was started > >>>> * add a new list to the connection that all requests get enqueued > >>>> onto. When the request is completed, it gets dequeued from this list > >>>> * have a timer for the connection that fires off every 10 seconds or > >>>> so. When this timer is fired, it checks if "jiffies > req->start_time > >>>> + fc->req_timeout" against the head of the list to check if the > >>>> timeout has expired and we need to abort the request. We only need to > >>>> check against the head of the list because we know every other request > >>>> after this was started later in time. I think we could even just use > >>>> the fc->lock for this instead of needing a separate lock. In the worst > >>>> case, this grants a 10 second upper bound on the timeout a user > >>>> requests (eg if the user requests 2 minutes, in the worst case the > >>>> timeout would trigger at 2 minutes and 10 seconds). > >>>> > >>>> Also, now that we're aborting the connection entirely on a timeout > >>>> instead of just aborting the request, maybe it makes sense to change > >>>> the timeout granularity to minutes instead of seconds. I'm envisioning > >>>> that this timeout mechanism will mostly be used as a safeguard against > >>>> malicious or buggy servers with a high timeout configured (eg 10 > >>>> minutes), and minutes seems like a nicer interface for users than them > >>>> having to convert that to seconds. > >>>> > >>>> Let me know if I've missed anything with this approach but if not, > >>>> then I'll submit v7 with this change. > >>> > >>> > >>> sounds great to me. Just, could we do this per fuse_dev to avoid a > >>> single lock for all cores? > >>> > >> > >> Will do! thanks for the suggestion - in that case, I'll add its own > >> spinlock for it too then. > > > > I realized while working on v7 that we can't do this per fuse device > > because the request is only associated with a device once it's read in > > by the server (eg fuse_dev_do_read). > > > > I ran some rough preliminary benchmarks with > > ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=4 > > -o source=~/fstests ~/fuse_mount > > and > > fio --name randwrite --ioengine=sync --thread --invalidate=1 > > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4 > > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345 > > --group_reporting=1 --directory=fuse_mount > > > > and didn't see any noticeable difference in throughput (~37 MiB/sec on > > my system) with vs without the timeout. > > > > > That is not much, isn't your limit the backend? I wonder what would happen > with 25GB/s I'm testing with. Wouldn't it make sense for this to test with > sequential large IO? And possibly with the passthrough-hp branch that > bypasses IO? And a NUMA system probably would be helpful as well. > I.e. to test the effect on the kernel side without having an IO limited > system? > The preliminary benchmarks yesterday were run on a VM because I had trouble getting consistent results between baseline runs (on origin w/out my changes) on my usual test machine. I'm going to get this sorted out and run some tests again. What are you testing on that's giving you 25 GB/s? > > With the io-uring interface requests stay in queues from the in-coming CPU > so easier to achieve there, although I wonder for your use-case if it > wouldn't be sufficient to start the timer only when the request is on > the way to fuse-server? One disadvantage I see is that virtiofs would need > to be specially handled. Unfortunately I don't think it suffices to only start the timer when the request is on the way to the fuse server. If there's a malicious or deadlocked server, they might not read from /dev/fuse, but we would want to abort the connection in those cases as well. Thanks, Joanne > > > Thanks, > Bernd > >
On Tue, Oct 8, 2024 at 9:26 AM Joanne Koong <joannelkoong@gmail.com> wrote: > > On Mon, Oct 7, 2024 at 1:02 PM Bernd Schubert > <bernd.schubert@fastmail.fm> wrote: > > > > On 10/7/24 20:39, Joanne Koong wrote: > > > On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@gmail.com> wrote: > > >> > > >> On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert > > >> <bernd.schubert@fastmail.fm> wrote: > > >>> > > >>> Hi Joanne, > > >>> > > >>> On 9/27/24 21:36, Joanne Koong wrote: > > >>>> On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > >>>>> > > >>>>> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote: > > >>>>>> > > >>>>>> There are situations where fuse servers can become unresponsive or > > >>>>>> stuck, for example if the server is in a deadlock. Currently, there's > > >>>>>> no good way to detect if a server is stuck and needs to be killed > > >>>>>> manually. > > >>>>>> > > >>>>>> This commit adds an option for enforcing a timeout (in seconds) on > > >>>>>> requests where if the timeout elapses without a reply from the server, > > >>>>>> the connection will be automatically aborted. > > >>>>> > > >>>>> Okay. > > >>>>> > > >>>>> I'm not sure what the overhead (scheduling and memory) of timers, but > > >>>>> starting one for each request seems excessive. > > >>>> > > >>>> I ran some benchmarks on this using the passthrough_ll server and saw > > >>>> roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s): > > >>>> fio --name randwrite --ioengine=sync --thread --invalidate=1 > > >>>> --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4 > > >>>> --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345 > > >>>> --group_reporting=1 --directory=/root/fuse_mount > > >>>> > > >>>> Instead of attaching a timer to each request, I think we can instead > > >>>> do the following: > > >>>> * add a "start_time" field to each request tracking (in jiffies) when > > >>>> the request was started > > >>>> * add a new list to the connection that all requests get enqueued > > >>>> onto. When the request is completed, it gets dequeued from this list > > >>>> * have a timer for the connection that fires off every 10 seconds or > > >>>> so. When this timer is fired, it checks if "jiffies > req->start_time > > >>>> + fc->req_timeout" against the head of the list to check if the > > >>>> timeout has expired and we need to abort the request. We only need to > > >>>> check against the head of the list because we know every other request > > >>>> after this was started later in time. I think we could even just use > > >>>> the fc->lock for this instead of needing a separate lock. In the worst > > >>>> case, this grants a 10 second upper bound on the timeout a user > > >>>> requests (eg if the user requests 2 minutes, in the worst case the > > >>>> timeout would trigger at 2 minutes and 10 seconds). > > >>>> > > >>>> Also, now that we're aborting the connection entirely on a timeout > > >>>> instead of just aborting the request, maybe it makes sense to change > > >>>> the timeout granularity to minutes instead of seconds. I'm envisioning > > >>>> that this timeout mechanism will mostly be used as a safeguard against > > >>>> malicious or buggy servers with a high timeout configured (eg 10 > > >>>> minutes), and minutes seems like a nicer interface for users than them > > >>>> having to convert that to seconds. > > >>>> > > >>>> Let me know if I've missed anything with this approach but if not, > > >>>> then I'll submit v7 with this change. > > >>> > > >>> > > >>> sounds great to me. Just, could we do this per fuse_dev to avoid a > > >>> single lock for all cores? > > >>> > > >> > > >> Will do! thanks for the suggestion - in that case, I'll add its own > > >> spinlock for it too then. > > > > > > I realized while working on v7 that we can't do this per fuse device > > > because the request is only associated with a device once it's read in > > > by the server (eg fuse_dev_do_read). > > > > > > I ran some rough preliminary benchmarks with > > > ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=4 > > > -o source=~/fstests ~/fuse_mount > > > and > > > fio --name randwrite --ioengine=sync --thread --invalidate=1 > > > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4 > > > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345 > > > --group_reporting=1 --directory=fuse_mount > > > > > > and didn't see any noticeable difference in throughput (~37 MiB/sec on > > > my system) with vs without the timeout. > > > > > > > > > That is not much, isn't your limit the backend? I wonder what would happen > > with 25GB/s I'm testing with. Wouldn't it make sense for this to test with > > sequential large IO? And possibly with the passthrough-hp branch that > > bypasses IO? And a NUMA system probably would be helpful as well. > > I.e. to test the effect on the kernel side without having an IO limited > > system? > > > > The preliminary benchmarks yesterday were run on a VM because I had > trouble getting consistent results between baseline runs (on origin > w/out my changes) on my usual test machine. I'm going to get this > sorted out and run some tests again. > I'll attach the updated benchmark numbers on this thread (v7 of the timeout patchset) so that everything's in one place: https://lore.kernel.org/linux-fsdevel/20241007184258.2837492-1-joannelkoong@gmail.com/T/#t > What are you testing on that's giving you 25 GB/s? > > > > > With the io-uring interface requests stay in queues from the in-coming CPU > > so easier to achieve there, although I wonder for your use-case if it > > wouldn't be sufficient to start the timer only when the request is on > > the way to fuse-server? One disadvantage I see is that virtiofs would need > > to be specially handled. > > Unfortunately I don't think it suffices to only start the timer when > the request is on the way to the fuse server. If there's a malicious > or deadlocked server, they might not read from /dev/fuse, but we would > want to abort the connection in those cases as well. > > > Thanks, > Joanne > > > > > > > Thanks, > > Bernd > > > >
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 9eb191b5c4de..a4ec817074a2 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -40,6 +40,16 @@ static struct fuse_dev *fuse_get_dev(struct file *file) return READ_ONCE(file->private_data); } +static void fuse_request_timeout(struct timer_list *timer) +{ + struct fuse_req *req = container_of(timer, struct fuse_req, timer); + struct fuse_conn *fc = req->fm->fc; + + req->timer.function = NULL; + + fuse_abort_conn(fc); +} + static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req) { INIT_LIST_HEAD(&req->list); @@ -48,6 +58,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req) refcount_set(&req->count, 1); __set_bit(FR_PENDING, &req->flags); req->fm = fm; + if (fm->fc->req_timeout) + timer_setup(&req->timer, fuse_request_timeout, 0); } static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags) @@ -283,6 +295,9 @@ void fuse_request_end(struct fuse_req *req) struct fuse_conn *fc = fm->fc; struct fuse_iqueue *fiq = &fc->iq; + if (req->timer.function) + timer_delete_sync(&req->timer); + if (test_and_set_bit(FR_FINISHED, &req->flags)) goto put_request; @@ -393,6 +408,8 @@ static void request_wait_answer(struct fuse_req *req) if (test_bit(FR_PENDING, &req->flags)) { list_del(&req->list); spin_unlock(&fiq->lock); + if (req->timer.function) + timer_delete_sync(&req->timer); __fuse_put_request(req); req->out.h.error = -EINTR; return; @@ -409,7 +426,8 @@ static void request_wait_answer(struct fuse_req *req) static void __fuse_request_send(struct fuse_req *req) { - struct fuse_iqueue *fiq = &req->fm->fc->iq; + struct fuse_conn *fc = req->fm->fc; + struct fuse_iqueue *fiq = &fc->iq; BUG_ON(test_bit(FR_BACKGROUND, &req->flags)); spin_lock(&fiq->lock); @@ -421,6 +439,8 @@ static void __fuse_request_send(struct fuse_req *req) /* acquire extra reference, since request is still needed after fuse_request_end() */ __fuse_get_request(req); + if (req->timer.function) + mod_timer(&req->timer, jiffies + fc->req_timeout); queue_request_and_unlock(fiq, req); request_wait_answer(req); @@ -539,6 +559,8 @@ static bool fuse_request_queue_background(struct fuse_req *req) if (fc->num_background == fc->max_background) fc->blocked = 1; list_add_tail(&req->list, &fc->bg_queue); + if (req->timer.function) + mod_timer(&req->timer, jiffies + fc->req_timeout); flush_bg_queue(fc); queued = true; } @@ -594,6 +616,8 @@ static int fuse_simple_notify_reply(struct fuse_mount *fm, spin_lock(&fiq->lock); if (fiq->connected) { + if (req->timer.function) + mod_timer(&req->timer, jiffies + fm->fc->req_timeout); queue_request_and_unlock(fiq, req); } else { err = -ENODEV; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index f23919610313..97dacafa4289 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -435,6 +435,9 @@ struct fuse_req { /** fuse_mount this request belongs to */ struct fuse_mount *fm; + + /** timer for request replies, if timeout option is enabled */ + struct timer_list timer; }; struct fuse_iqueue; @@ -574,6 +577,8 @@ struct fuse_fs_context { enum fuse_dax_mode dax_mode; unsigned int max_read; unsigned int blksize; + /* Request timeout (in seconds). 0 = no timeout (infinite wait) */ + unsigned int req_timeout; const char *subtype; /* DAX device, may be NULL */ @@ -633,6 +638,9 @@ struct fuse_conn { /** Constrain ->max_pages to this value during feature negotiation */ unsigned int max_pages_limit; + /* Request timeout (in jiffies). 0 = no timeout (infinite wait) */ + unsigned long req_timeout; + /** Input queue */ struct fuse_iqueue iq; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 99e44ea7d875..9e69006fc026 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -733,6 +733,7 @@ enum { OPT_ALLOW_OTHER, OPT_MAX_READ, OPT_BLKSIZE, + OPT_REQUEST_TIMEOUT, OPT_ERR }; @@ -747,6 +748,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = { fsparam_u32 ("max_read", OPT_MAX_READ), fsparam_u32 ("blksize", OPT_BLKSIZE), fsparam_string ("subtype", OPT_SUBTYPE), + fsparam_u32 ("request_timeout", OPT_REQUEST_TIMEOUT), {} }; @@ -830,6 +832,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param) ctx->blksize = result.uint_32; break; + case OPT_REQUEST_TIMEOUT: + ctx->req_timeout = result.uint_32; + break; + default: return -EINVAL; } @@ -1724,6 +1730,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) fc->group_id = ctx->group_id; fc->legacy_opts_show = ctx->legacy_opts_show; fc->max_read = max_t(unsigned int, 4096, ctx->max_read); + fc->req_timeout = ctx->req_timeout * HZ; fc->destroy = ctx->destroy; fc->no_control = ctx->no_control; fc->no_force_umount = ctx->no_force_umount;
There are situations where fuse servers can become unresponsive or stuck, for example if the server is in a deadlock. Currently, there's no good way to detect if a server is stuck and needs to be killed manually. This commit adds an option for enforcing a timeout (in seconds) on requests where if the timeout elapses without a reply from the server, the connection will be automatically aborted. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> --- fs/fuse/dev.c | 26 +++++++++++++++++++++++++- fs/fuse/fuse_i.h | 8 ++++++++ fs/fuse/inode.c | 7 +++++++ 3 files changed, 40 insertions(+), 1 deletion(-)