diff mbox series

[v6,1/2] fuse: add optional kernel-enforced timeout for requests

Message ID 20240830162649.3849586-2-joannelkoong@gmail.com (mailing list archive)
State New
Headers show
Series fuse: add timeout option for requests | expand

Commit Message

Joanne Koong Aug. 30, 2024, 4:26 p.m. UTC
There are situations where fuse servers can become unresponsive or
stuck, for example if the server is in a deadlock. Currently, there's
no good way to detect if a server is stuck and needs to be killed
manually.

This commit adds an option for enforcing a timeout (in seconds) on
requests where if the timeout elapses without a reply from the server,
the connection will be automatically aborted.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 fs/fuse/dev.c    | 26 +++++++++++++++++++++++++-
 fs/fuse/fuse_i.h |  8 ++++++++
 fs/fuse/inode.c  |  7 +++++++
 3 files changed, 40 insertions(+), 1 deletion(-)

Comments

Miklos Szeredi Sept. 2, 2024, 10:37 a.m. UTC | #1
On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
>
> There are situations where fuse servers can become unresponsive or
> stuck, for example if the server is in a deadlock. Currently, there's
> no good way to detect if a server is stuck and needs to be killed
> manually.
>
> This commit adds an option for enforcing a timeout (in seconds) on
> requests where if the timeout elapses without a reply from the server,
> the connection will be automatically aborted.

Okay.

I'm not sure what the overhead (scheduling and memory) of timers, but
starting one for each request seems excessive.

Can we make the timeout per-connection instead of per request?

I.e. When the first request is sent, the timer is started. When a
reply is received but there are still outstanding requests, the timer
is reset.  When the last reply is received, the timer is stopped.

This should handle the frozen server case just as well.  It may not
perfectly handle the case when the server is still alive but for some
reason one or more requests get stuck, while others are still being
processed.   The latter case is unlikely to be an issue in practice,
IMO.

Thanks,
Miklos
Bernd Schubert Sept. 2, 2024, 10:50 a.m. UTC | #2
On 9/2/24 12:37, Miklos Szeredi wrote:
> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
>>
>> There are situations where fuse servers can become unresponsive or
>> stuck, for example if the server is in a deadlock. Currently, there's
>> no good way to detect if a server is stuck and needs to be killed
>> manually.
>>
>> This commit adds an option for enforcing a timeout (in seconds) on
>> requests where if the timeout elapses without a reply from the server,
>> the connection will be automatically aborted.
> 
> Okay.
> 
> I'm not sure what the overhead (scheduling and memory) of timers, but
> starting one for each request seems excessive.
> 
> Can we make the timeout per-connection instead of per request?
> 
> I.e. When the first request is sent, the timer is started. When a
> reply is received but there are still outstanding requests, the timer
> is reset.  When the last reply is received, the timer is stopped.
> 
> This should handle the frozen server case just as well.  It may not
> perfectly handle the case when the server is still alive but for some
> reason one or more requests get stuck, while others are still being
> processed.   The latter case is unlikely to be an issue in practice,
> IMO.

In case of distributed servers, it can easily happen that one server has
an issue, while other servers still process requests. Especially when
these are just requests that read/getattr/etc and do not write, i.e.
accessing the stuck server is not needed by other servers. So in my
opinion not so unlikely. Although for such cases not difficult to
timeout within the fuse server.


Thanks,
Bernd
Miklos Szeredi Sept. 2, 2024, 11:11 a.m. UTC | #3
On Mon, 2 Sept 2024 at 12:50, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:

> In case of distributed servers, it can easily happen that one server has
> an issue, while other servers still process requests. Especially when
> these are just requests that read/getattr/etc and do not write, i.e.
> accessing the stuck server is not needed by other servers. So in my
> opinion not so unlikely. Although for such cases not difficult to
> timeout within the fuse server.

Exactly.  Normally the kernel should not need to time out fuse
requests, and it might be actively detrimental to do so.

The main case this wants to solve is a deadlocked server due to
programming error, AFAICS.   And this would only work in environments
where requests are  guaranteed to complete within some time period.

So if the server needs to handle request timeouts, then it should
*not* rely on the kernel timeout.   The kernel timeout should be a
safeguard against broken or malicious servers, not an aid to implement
request timeouts.

Thanks,
Miklos
Joanne Koong Sept. 3, 2024, 5:25 p.m. UTC | #4
On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > There are situations where fuse servers can become unresponsive or
> > stuck, for example if the server is in a deadlock. Currently, there's
> > no good way to detect if a server is stuck and needs to be killed
> > manually.
> >
> > This commit adds an option for enforcing a timeout (in seconds) on
> > requests where if the timeout elapses without a reply from the server,
> > the connection will be automatically aborted.
>
> Okay.
>
> I'm not sure what the overhead (scheduling and memory) of timers, but
> starting one for each request seems excessive.
>
> Can we make the timeout per-connection instead of per request?
>
> I.e. When the first request is sent, the timer is started. When a
> reply is received but there are still outstanding requests, the timer
> is reset.  When the last reply is received, the timer is stopped.
>
> This should handle the frozen server case just as well.  It may not
> perfectly handle the case when the server is still alive but for some
> reason one or more requests get stuck, while others are still being
> processed.   The latter case is unlikely to be an issue in practice,
> IMO.

In that case, if the timeout is per-connection instead of per-request
and we're not stringent about some requests getting stuck, maybe it
makes more sense to just do this in userspace (libfuse) then? That
seems pretty simple with having a watchdog thread that periodically
(according to whatever specified timeout) checks if the number of
requests serviced is increasing when
/sys/fs/fuse/connections/*/waiting is non-zero.

If there are multiple server threads (eg libfuse's fuse_loop_mt
interface) and say, all of them are deadlocked except for 1 that is
actively servicing requests, then this wouldn't catch that case, but
even if this per-connection timeout was enforced in the kernel
instead, it wouldn't catch that case either.

So maybe this logic should just be moved to libfuse then? For this
we'd need to pass the connection's device id (fc->dev) to userspace
which i don't think we currently do, but that seems pretty simple. The
one downside I see is that this doesn't let sysadmins enforce an
automatic system-wide "max timeout" against malicious fuse servers but
if we are having the timeout be per-connection instead of per-request,
then a malicious server could still be malicious anyways (eg
deadlocking all threads except for 1).

Curious to hear what your and Bernrd's thoughts on this are.

Thanks,
Joanne
>
> Thanks,
> Miklos
Bernd Schubert Sept. 3, 2024, 10:37 p.m. UTC | #5
On 9/3/24 19:25, Joanne Koong wrote:
> On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
>>>
>>> There are situations where fuse servers can become unresponsive or
>>> stuck, for example if the server is in a deadlock. Currently, there's
>>> no good way to detect if a server is stuck and needs to be killed
>>> manually.
>>>
>>> This commit adds an option for enforcing a timeout (in seconds) on
>>> requests where if the timeout elapses without a reply from the server,
>>> the connection will be automatically aborted.
>>
>> Okay.
>>
>> I'm not sure what the overhead (scheduling and memory) of timers, but
>> starting one for each request seems excessive.
>>
>> Can we make the timeout per-connection instead of per request?
>>
>> I.e. When the first request is sent, the timer is started. When a
>> reply is received but there are still outstanding requests, the timer
>> is reset.  When the last reply is received, the timer is stopped.
>>
>> This should handle the frozen server case just as well.  It may not
>> perfectly handle the case when the server is still alive but for some
>> reason one or more requests get stuck, while others are still being
>> processed.   The latter case is unlikely to be an issue in practice,
>> IMO.
> 
> In that case, if the timeout is per-connection instead of per-request
> and we're not stringent about some requests getting stuck, maybe it
> makes more sense to just do this in userspace (libfuse) then? That
> seems pretty simple with having a watchdog thread that periodically
> (according to whatever specified timeout) checks if the number of
> requests serviced is increasing when
> /sys/fs/fuse/connections/*/waiting is non-zero.
> 
> If there are multiple server threads (eg libfuse's fuse_loop_mt
> interface) and say, all of them are deadlocked except for 1 that is
> actively servicing requests, then this wouldn't catch that case, but
> even if this per-connection timeout was enforced in the kernel
> instead, it wouldn't catch that case either.
> 
> So maybe this logic should just be moved to libfuse then? For this
> we'd need to pass the connection's device id (fc->dev) to userspace
> which i don't think we currently do, but that seems pretty simple. The
> one downside I see is that this doesn't let sysadmins enforce an
> automatic system-wide "max timeout" against malicious fuse servers but
> if we are having the timeout be per-connection instead of per-request,
> then a malicious server could still be malicious anyways (eg
> deadlocking all threads except for 1).
> 
> Curious to hear what your and Bernrd's thoughts on this are.


I have question here, does it need to be an exact timeout or could it be
an interval/epoch? Let's say you timeout based on epoch lists? Example

1) epoch-a starts, requests are added to epoch-a list.
2) epoch-b starts, epoch-a list should get empty
3) epoch-c starts, epoch-b list should get empty, kill the connection if
epoch-a list is not empty (epoch-c list should not be needed, as epoch-a
list can be used, once confirmed it is empty.)


Here timeout would be epoch-a + epoch-b, i.e.
max-timeout <= 2 * epoch-time.
We could have more epochs/list-heads to make it more fine grained.


From my point of view that should be a rather cheap, as it just
adding/removing requests from list and checking for timeout if a list is
empty. With the caveat that it is not precise anymore.

That could be implemented in kernel and/or libfuse?


Thanks,
Bernd
Joanne Koong Sept. 4, 2024, 5:23 p.m. UTC | #6
On Tue, Sep 3, 2024 at 3:38 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> On 9/3/24 19:25, Joanne Koong wrote:
> > On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >>
> >> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> >>>
> >>> There are situations where fuse servers can become unresponsive or
> >>> stuck, for example if the server is in a deadlock. Currently, there's
> >>> no good way to detect if a server is stuck and needs to be killed
> >>> manually.
> >>>
> >>> This commit adds an option for enforcing a timeout (in seconds) on
> >>> requests where if the timeout elapses without a reply from the server,
> >>> the connection will be automatically aborted.
> >>
> >> Okay.
> >>
> >> I'm not sure what the overhead (scheduling and memory) of timers, but
> >> starting one for each request seems excessive.
> >>
> >> Can we make the timeout per-connection instead of per request?
> >>
> >> I.e. When the first request is sent, the timer is started. When a
> >> reply is received but there are still outstanding requests, the timer
> >> is reset.  When the last reply is received, the timer is stopped.
> >>
> >> This should handle the frozen server case just as well.  It may not
> >> perfectly handle the case when the server is still alive but for some
> >> reason one or more requests get stuck, while others are still being
> >> processed.   The latter case is unlikely to be an issue in practice,
> >> IMO.
> >
> > In that case, if the timeout is per-connection instead of per-request
> > and we're not stringent about some requests getting stuck, maybe it
> > makes more sense to just do this in userspace (libfuse) then? That
> > seems pretty simple with having a watchdog thread that periodically
> > (according to whatever specified timeout) checks if the number of
> > requests serviced is increasing when
> > /sys/fs/fuse/connections/*/waiting is non-zero.
> >
> > If there are multiple server threads (eg libfuse's fuse_loop_mt
> > interface) and say, all of them are deadlocked except for 1 that is
> > actively servicing requests, then this wouldn't catch that case, but
> > even if this per-connection timeout was enforced in the kernel
> > instead, it wouldn't catch that case either.
> >
> > So maybe this logic should just be moved to libfuse then? For this
> > we'd need to pass the connection's device id (fc->dev) to userspace
> > which i don't think we currently do, but that seems pretty simple. The
> > one downside I see is that this doesn't let sysadmins enforce an
> > automatic system-wide "max timeout" against malicious fuse servers but
> > if we are having the timeout be per-connection instead of per-request,
> > then a malicious server could still be malicious anyways (eg
> > deadlocking all threads except for 1).
> >
> > Curious to hear what your and Bernrd's thoughts on this are.
>
>
> I have question here, does it need to be an exact timeout or could it be
> an interval/epoch? Let's say you timeout based on epoch lists? Example
>
> 1) epoch-a starts, requests are added to epoch-a list.
> 2) epoch-b starts, epoch-a list should get empty
> 3) epoch-c starts, epoch-b list should get empty, kill the connection if
> epoch-a list is not empty (epoch-c list should not be needed, as epoch-a
> list can be used, once confirmed it is empty.)
>
>
> Here timeout would be epoch-a + epoch-b, i.e.
> max-timeout <= 2 * epoch-time.
> We could have more epochs/list-heads to make it more fine grained.
>
>
> From my point of view that should be a rather cheap, as it just
> adding/removing requests from list and checking for timeout if a list is
> empty. With the caveat that it is not precise anymore.

I like this idea a lot. I like that it enforces per-request behavior
and guarantees that any stalled request will abort the connection. I
think it's fine for the timeout to be an interval/epoch so long as the
documentation explicitly makes that clear. I think this would need to
be done in the kernel instead of libfuse because if the server is in a
deadlock when there are no pending requests in the lists and then the
kernel sends requests to the server, none of the requests will make it
to the list for the timer handler to detect any issues.

Before I make this change for v7, Miklos what are your thoughts on
this direction?

Thanks,
Joanne

>
> That could be implemented in kernel and/or libfuse?
>
>
> Thanks,
> Bernd
Bernd Schubert Sept. 17, 2024, 10 p.m. UTC | #7
Hi Joanne,

On 9/4/24 19:23, Joanne Koong wrote:
> On Tue, Sep 3, 2024 at 3:38 PM Bernd Schubert

>>
>>
>> I have question here, does it need to be an exact timeout or could it be
>> an interval/epoch? Let's say you timeout based on epoch lists? Example
>>
>> 1) epoch-a starts, requests are added to epoch-a list.
>> 2) epoch-b starts, epoch-a list should get empty
>> 3) epoch-c starts, epoch-b list should get empty, kill the connection if
>> epoch-a list is not empty (epoch-c list should not be needed, as epoch-a
>> list can be used, once confirmed it is empty.)
>>
>>
>> Here timeout would be epoch-a + epoch-b, i.e.
>> max-timeout <= 2 * epoch-time.
>> We could have more epochs/list-heads to make it more fine grained.
>>
>>
>> From my point of view that should be a rather cheap, as it just
>> adding/removing requests from list and checking for timeout if a list is
>> empty. With the caveat that it is not precise anymore.
> 
> I like this idea a lot. I like that it enforces per-request behavior
> and guarantees that any stalled request will abort the connection. I
> think it's fine for the timeout to be an interval/epoch so long as the
> documentation explicitly makes that clear. I think this would need to
> be done in the kernel instead of libfuse because if the server is in a
> deadlock when there are no pending requests in the lists and then the
> kernel sends requests to the server, none of the requests will make it
> to the list for the timer handler to detect any issues.
> 
> Before I make this change for v7, Miklos what are your thoughts on
> this direction?

we briefly discussed it with Miklos and Miklos agreed that epoch list
should be fine (would be great if you could quickly confirm, Miklos).

In the mean time I have another use case for timeout lists. Basically
Jakob from Cern (in CC) is asking for way to stop requests to
fuse-server and then to resume. I think that can be done easily through
notifications and unsetting (and later setting) fc->initialized. Demo
patch follows around tomorrow, but then Jakob actually wants to know
when it is safe to restart fuse-server (or part of it). That is where
the epoch timeout list would be handy - reply to the notification should
happen when the lists got empty, i.e. no request is handled anymore.
I think like this is better than FUSE_NOTIFY_RESEND, as that has an
issue with non-idempotent requests.


Thanks,
Bernd
Miklos Szeredi Sept. 18, 2024, 7:29 a.m. UTC | #8
On Wed, 18 Sept 2024 at 00:00, Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:

> > I like this idea a lot. I like that it enforces per-request behavior
> > and guarantees that any stalled request will abort the connection. I
> > think it's fine for the timeout to be an interval/epoch so long as the
> > documentation explicitly makes that clear. I think this would need to
> > be done in the kernel instead of libfuse because if the server is in a
> > deadlock when there are no pending requests in the lists and then the
> > kernel sends requests to the server, none of the requests will make it
> > to the list for the timer handler to detect any issues.
> >
> > Before I make this change for v7, Miklos what are your thoughts on
> > this direction?

I have no problem with epochs, but there are scalability issue with
shared lists.  So I'm not sure.  Maybe the individual timers would be
the best solution but I don't know the overhead and the scalability of
that solution either.

Thanks,
Miklos
Bernd Schubert Sept. 18, 2024, 9:12 a.m. UTC | #9
On September 18, 2024 9:29:44 AM GMT+02:00, Miklos Szeredi <miklos@szeredi.hu> wrote:
>On Wed, 18 Sept 2024 at 00:00, Bernd Schubert
><bernd.schubert@fastmail.fm> wrote:
>
>> > I like this idea a lot. I like that it enforces per-request behavior
>> > and guarantees that any stalled request will abort the connection. I
>> > think it's fine for the timeout to be an interval/epoch so long as the
>> > documentation explicitly makes that clear. I think this would need to
>> > be done in the kernel instead of libfuse because if the server is in a
>> > deadlock when there are no pending requests in the lists and then the
>> > kernel sends requests to the server, none of the requests will make it
>> > to the list for the timer handler to detect any issues.
>> >
>> > Before I make this change for v7, Miklos what are your thoughts on
>> > this direction?
>
>I have no problem with epochs, but there are scalability issue with
>shared lists.  So I'm not sure.  Maybe the individual timers would be
>the best solution but I don't know the overhead and the scalability of
>that solution either.
>
>Thanks,
>Miklos

In principle we could also have lists per device and then a bitmap when the lists are empty. But now certainly more complex, especially as a device clone can happen any time.


Thanks,
Bernd
Joanne Koong Sept. 27, 2024, 7:36 p.m. UTC | #10
On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > There are situations where fuse servers can become unresponsive or
> > stuck, for example if the server is in a deadlock. Currently, there's
> > no good way to detect if a server is stuck and needs to be killed
> > manually.
> >
> > This commit adds an option for enforcing a timeout (in seconds) on
> > requests where if the timeout elapses without a reply from the server,
> > the connection will be automatically aborted.
>
> Okay.
>
> I'm not sure what the overhead (scheduling and memory) of timers, but
> starting one for each request seems excessive.

I ran some benchmarks on this using the passthrough_ll server and saw
roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
fio --name randwrite --ioengine=sync --thread --invalidate=1
--runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
--bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
--group_reporting=1 --directory=/root/fuse_mount

Instead of attaching a timer to each request, I think we can instead
do the following:
* add a "start_time" field to each request tracking (in jiffies) when
the request was started
* add a new list to the connection that all requests get enqueued
onto. When the request is completed, it gets dequeued from this list
* have a timer for the connection that fires off every 10 seconds or
so. When this timer is fired, it checks if "jiffies > req->start_time
+ fc->req_timeout" against the head of the list to check if the
timeout has expired and we need to abort the request. We only need to
check against the head of the list because we know every other request
after this was started later in time. I think we could even just use
the fc->lock for this instead of needing a separate lock. In the worst
case, this grants a 10 second upper bound on the timeout a user
requests (eg if the user requests 2 minutes, in the worst case the
timeout would trigger at 2 minutes and 10 seconds).

Also, now that we're aborting the connection entirely on a timeout
instead of just aborting the request, maybe it makes sense to change
the timeout granularity to minutes instead of seconds. I'm envisioning
that this timeout mechanism will mostly be used as a safeguard against
malicious or buggy servers with a high timeout configured (eg 10
minutes), and minutes seems like a nicer interface for users than them
having to convert that to seconds.

Let me know if I've missed anything with this approach but if not,
then I'll submit v7 with this change.


Thanks,
Joanne

>
> Can we make the timeout per-connection instead of per request?
>
> I.e. When the first request is sent, the timer is started. When a
> reply is received but there are still outstanding requests, the timer
> is reset.  When the last reply is received, the timer is stopped.
>
> This should handle the frozen server case just as well.  It may not
> perfectly handle the case when the server is still alive but for some
> reason one or more requests get stuck, while others are still being
> processed.   The latter case is unlikely to be an issue in practice,
> IMO.
>
> Thanks,
> Miklos
Bernd Schubert Sept. 28, 2024, 8:43 a.m. UTC | #11
Hi Joanne,

On 9/27/24 21:36, Joanne Koong wrote:
> On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
>>>
>>> There are situations where fuse servers can become unresponsive or
>>> stuck, for example if the server is in a deadlock. Currently, there's
>>> no good way to detect if a server is stuck and needs to be killed
>>> manually.
>>>
>>> This commit adds an option for enforcing a timeout (in seconds) on
>>> requests where if the timeout elapses without a reply from the server,
>>> the connection will be automatically aborted.
>>
>> Okay.
>>
>> I'm not sure what the overhead (scheduling and memory) of timers, but
>> starting one for each request seems excessive.
> 
> I ran some benchmarks on this using the passthrough_ll server and saw
> roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
> fio --name randwrite --ioengine=sync --thread --invalidate=1
> --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> --group_reporting=1 --directory=/root/fuse_mount
> 
> Instead of attaching a timer to each request, I think we can instead
> do the following:
> * add a "start_time" field to each request tracking (in jiffies) when
> the request was started
> * add a new list to the connection that all requests get enqueued
> onto. When the request is completed, it gets dequeued from this list
> * have a timer for the connection that fires off every 10 seconds or
> so. When this timer is fired, it checks if "jiffies > req->start_time
> + fc->req_timeout" against the head of the list to check if the
> timeout has expired and we need to abort the request. We only need to
> check against the head of the list because we know every other request
> after this was started later in time. I think we could even just use
> the fc->lock for this instead of needing a separate lock. In the worst
> case, this grants a 10 second upper bound on the timeout a user
> requests (eg if the user requests 2 minutes, in the worst case the
> timeout would trigger at 2 minutes and 10 seconds).
> 
> Also, now that we're aborting the connection entirely on a timeout
> instead of just aborting the request, maybe it makes sense to change
> the timeout granularity to minutes instead of seconds. I'm envisioning
> that this timeout mechanism will mostly be used as a safeguard against
> malicious or buggy servers with a high timeout configured (eg 10
> minutes), and minutes seems like a nicer interface for users than them
> having to convert that to seconds.
> 
> Let me know if I've missed anything with this approach but if not,
> then I'll submit v7 with this change.


sounds great to me. Just, could we do this per fuse_dev to avoid a
single lock for all cores?


Thanks,
Bernd
Joanne Koong Oct. 1, 2024, 5:03 p.m. UTC | #12
On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> Hi Joanne,
>
> On 9/27/24 21:36, Joanne Koong wrote:
> > On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >>
> >> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> >>>
> >>> There are situations where fuse servers can become unresponsive or
> >>> stuck, for example if the server is in a deadlock. Currently, there's
> >>> no good way to detect if a server is stuck and needs to be killed
> >>> manually.
> >>>
> >>> This commit adds an option for enforcing a timeout (in seconds) on
> >>> requests where if the timeout elapses without a reply from the server,
> >>> the connection will be automatically aborted.
> >>
> >> Okay.
> >>
> >> I'm not sure what the overhead (scheduling and memory) of timers, but
> >> starting one for each request seems excessive.
> >
> > I ran some benchmarks on this using the passthrough_ll server and saw
> > roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
> > fio --name randwrite --ioengine=sync --thread --invalidate=1
> > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> > --group_reporting=1 --directory=/root/fuse_mount
> >
> > Instead of attaching a timer to each request, I think we can instead
> > do the following:
> > * add a "start_time" field to each request tracking (in jiffies) when
> > the request was started
> > * add a new list to the connection that all requests get enqueued
> > onto. When the request is completed, it gets dequeued from this list
> > * have a timer for the connection that fires off every 10 seconds or
> > so. When this timer is fired, it checks if "jiffies > req->start_time
> > + fc->req_timeout" against the head of the list to check if the
> > timeout has expired and we need to abort the request. We only need to
> > check against the head of the list because we know every other request
> > after this was started later in time. I think we could even just use
> > the fc->lock for this instead of needing a separate lock. In the worst
> > case, this grants a 10 second upper bound on the timeout a user
> > requests (eg if the user requests 2 minutes, in the worst case the
> > timeout would trigger at 2 minutes and 10 seconds).
> >
> > Also, now that we're aborting the connection entirely on a timeout
> > instead of just aborting the request, maybe it makes sense to change
> > the timeout granularity to minutes instead of seconds. I'm envisioning
> > that this timeout mechanism will mostly be used as a safeguard against
> > malicious or buggy servers with a high timeout configured (eg 10
> > minutes), and minutes seems like a nicer interface for users than them
> > having to convert that to seconds.
> >
> > Let me know if I've missed anything with this approach but if not,
> > then I'll submit v7 with this change.
>
>
> sounds great to me. Just, could we do this per fuse_dev to avoid a
> single lock for all cores?
>

Will do! thanks for the suggestion - in that case, I'll add its own
spinlock for it too then.

Thanks,
Joanne

>
> Thanks,
> Bernd
Joanne Koong Oct. 1, 2024, 5:12 p.m. UTC | #13
On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
> >
> > Hi Joanne,
> >
> > On 9/27/24 21:36, Joanne Koong wrote:
> > > On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >>
> > >> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> > >>>
> > >>> There are situations where fuse servers can become unresponsive or
> > >>> stuck, for example if the server is in a deadlock. Currently, there's
> > >>> no good way to detect if a server is stuck and needs to be killed
> > >>> manually.
> > >>>
> > >>> This commit adds an option for enforcing a timeout (in seconds) on
> > >>> requests where if the timeout elapses without a reply from the server,
> > >>> the connection will be automatically aborted.
> > >>
> > >> Okay.
> > >>
> > >> I'm not sure what the overhead (scheduling and memory) of timers, but
> > >> starting one for each request seems excessive.
> > >
> > > I ran some benchmarks on this using the passthrough_ll server and saw
> > > roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
> > > fio --name randwrite --ioengine=sync --thread --invalidate=1
> > > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> > > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> > > --group_reporting=1 --directory=/root/fuse_mount
> > >
> > > Instead of attaching a timer to each request, I think we can instead
> > > do the following:
> > > * add a "start_time" field to each request tracking (in jiffies) when
> > > the request was started
> > > * add a new list to the connection that all requests get enqueued
> > > onto. When the request is completed, it gets dequeued from this list
> > > * have a timer for the connection that fires off every 10 seconds or
> > > so. When this timer is fired, it checks if "jiffies > req->start_time
> > > + fc->req_timeout" against the head of the list to check if the
> > > timeout has expired and we need to abort the request. We only need to
> > > check against the head of the list because we know every other request
> > > after this was started later in time. I think we could even just use
> > > the fc->lock for this instead of needing a separate lock. In the worst
> > > case, this grants a 10 second upper bound on the timeout a user
> > > requests (eg if the user requests 2 minutes, in the worst case the
> > > timeout would trigger at 2 minutes and 10 seconds).
> > >
> > > Also, now that we're aborting the connection entirely on a timeout
> > > instead of just aborting the request, maybe it makes sense to change
> > > the timeout granularity to minutes instead of seconds. I'm envisioning
> > > that this timeout mechanism will mostly be used as a safeguard against
> > > malicious or buggy servers with a high timeout configured (eg 10
> > > minutes), and minutes seems like a nicer interface for users than them
> > > having to convert that to seconds.
> > >
> > > Let me know if I've missed anything with this approach but if not,
> > > then I'll submit v7 with this change.
> >
> >
> > sounds great to me. Just, could we do this per fuse_dev to avoid a
> > single lock for all cores?
> >
>
> Will do! thanks for the suggestion - in that case, I'll add its own
> spinlock for it too then.

Actually, looking at this some more, we can just put this in the
"struct fuse_pqueue" and use the fpq spinlock since the check for
whether any requests timed out will be very quick (eg just checking
against the first entry in the list).
>
> Thanks,
> Joanne
>
> >
> > Thanks,
> > Bernd
Joanne Koong Oct. 7, 2024, 6:39 p.m. UTC | #14
On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
> >
> > Hi Joanne,
> >
> > On 9/27/24 21:36, Joanne Koong wrote:
> > > On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >>
> > >> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> > >>>
> > >>> There are situations where fuse servers can become unresponsive or
> > >>> stuck, for example if the server is in a deadlock. Currently, there's
> > >>> no good way to detect if a server is stuck and needs to be killed
> > >>> manually.
> > >>>
> > >>> This commit adds an option for enforcing a timeout (in seconds) on
> > >>> requests where if the timeout elapses without a reply from the server,
> > >>> the connection will be automatically aborted.
> > >>
> > >> Okay.
> > >>
> > >> I'm not sure what the overhead (scheduling and memory) of timers, but
> > >> starting one for each request seems excessive.
> > >
> > > I ran some benchmarks on this using the passthrough_ll server and saw
> > > roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
> > > fio --name randwrite --ioengine=sync --thread --invalidate=1
> > > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> > > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> > > --group_reporting=1 --directory=/root/fuse_mount
> > >
> > > Instead of attaching a timer to each request, I think we can instead
> > > do the following:
> > > * add a "start_time" field to each request tracking (in jiffies) when
> > > the request was started
> > > * add a new list to the connection that all requests get enqueued
> > > onto. When the request is completed, it gets dequeued from this list
> > > * have a timer for the connection that fires off every 10 seconds or
> > > so. When this timer is fired, it checks if "jiffies > req->start_time
> > > + fc->req_timeout" against the head of the list to check if the
> > > timeout has expired and we need to abort the request. We only need to
> > > check against the head of the list because we know every other request
> > > after this was started later in time. I think we could even just use
> > > the fc->lock for this instead of needing a separate lock. In the worst
> > > case, this grants a 10 second upper bound on the timeout a user
> > > requests (eg if the user requests 2 minutes, in the worst case the
> > > timeout would trigger at 2 minutes and 10 seconds).
> > >
> > > Also, now that we're aborting the connection entirely on a timeout
> > > instead of just aborting the request, maybe it makes sense to change
> > > the timeout granularity to minutes instead of seconds. I'm envisioning
> > > that this timeout mechanism will mostly be used as a safeguard against
> > > malicious or buggy servers with a high timeout configured (eg 10
> > > minutes), and minutes seems like a nicer interface for users than them
> > > having to convert that to seconds.
> > >
> > > Let me know if I've missed anything with this approach but if not,
> > > then I'll submit v7 with this change.
> >
> >
> > sounds great to me. Just, could we do this per fuse_dev to avoid a
> > single lock for all cores?
> >
>
> Will do! thanks for the suggestion - in that case, I'll add its own
> spinlock for it too then.

I realized while working on v7 that we can't do this per fuse device
because the request is only associated with a device once it's read in
by the server (eg fuse_dev_do_read).

I ran some rough preliminary benchmarks with
./libfuse/build/example/passthrough_ll  -o writeback -o max_threads=4
-o source=~/fstests ~/fuse_mount
and
fio --name randwrite --ioengine=sync --thread --invalidate=1
--runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
--bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
--group_reporting=1 --directory=fuse_mount

and didn't see any noticeable difference in throughput (~37 MiB/sec on
my system) with vs without the timeout.


Thanks,
Joanne

>
> Thanks,
> Joanne
>
> >
> > Thanks,
> > Bernd
Bernd Schubert Oct. 7, 2024, 8:02 p.m. UTC | #15
On 10/7/24 20:39, Joanne Koong wrote:
> On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>>
>> On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert
>> <bernd.schubert@fastmail.fm> wrote:
>>>
>>> Hi Joanne,
>>>
>>> On 9/27/24 21:36, Joanne Koong wrote:
>>>> On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>>
>>>>> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
>>>>>>
>>>>>> There are situations where fuse servers can become unresponsive or
>>>>>> stuck, for example if the server is in a deadlock. Currently, there's
>>>>>> no good way to detect if a server is stuck and needs to be killed
>>>>>> manually.
>>>>>>
>>>>>> This commit adds an option for enforcing a timeout (in seconds) on
>>>>>> requests where if the timeout elapses without a reply from the server,
>>>>>> the connection will be automatically aborted.
>>>>>
>>>>> Okay.
>>>>>
>>>>> I'm not sure what the overhead (scheduling and memory) of timers, but
>>>>> starting one for each request seems excessive.
>>>>
>>>> I ran some benchmarks on this using the passthrough_ll server and saw
>>>> roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
>>>> fio --name randwrite --ioengine=sync --thread --invalidate=1
>>>> --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
>>>> --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
>>>> --group_reporting=1 --directory=/root/fuse_mount
>>>>
>>>> Instead of attaching a timer to each request, I think we can instead
>>>> do the following:
>>>> * add a "start_time" field to each request tracking (in jiffies) when
>>>> the request was started
>>>> * add a new list to the connection that all requests get enqueued
>>>> onto. When the request is completed, it gets dequeued from this list
>>>> * have a timer for the connection that fires off every 10 seconds or
>>>> so. When this timer is fired, it checks if "jiffies > req->start_time
>>>> + fc->req_timeout" against the head of the list to check if the
>>>> timeout has expired and we need to abort the request. We only need to
>>>> check against the head of the list because we know every other request
>>>> after this was started later in time. I think we could even just use
>>>> the fc->lock for this instead of needing a separate lock. In the worst
>>>> case, this grants a 10 second upper bound on the timeout a user
>>>> requests (eg if the user requests 2 minutes, in the worst case the
>>>> timeout would trigger at 2 minutes and 10 seconds).
>>>>
>>>> Also, now that we're aborting the connection entirely on a timeout
>>>> instead of just aborting the request, maybe it makes sense to change
>>>> the timeout granularity to minutes instead of seconds. I'm envisioning
>>>> that this timeout mechanism will mostly be used as a safeguard against
>>>> malicious or buggy servers with a high timeout configured (eg 10
>>>> minutes), and minutes seems like a nicer interface for users than them
>>>> having to convert that to seconds.
>>>>
>>>> Let me know if I've missed anything with this approach but if not,
>>>> then I'll submit v7 with this change.
>>>
>>>
>>> sounds great to me. Just, could we do this per fuse_dev to avoid a
>>> single lock for all cores?
>>>
>>
>> Will do! thanks for the suggestion - in that case, I'll add its own
>> spinlock for it too then.
> 
> I realized while working on v7 that we can't do this per fuse device
> because the request is only associated with a device once it's read in
> by the server (eg fuse_dev_do_read).
> 
> I ran some rough preliminary benchmarks with
> ./libfuse/build/example/passthrough_ll  -o writeback -o max_threads=4
> -o source=~/fstests ~/fuse_mount
> and
> fio --name randwrite --ioengine=sync --thread --invalidate=1
> --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> --group_reporting=1 --directory=fuse_mount
> 
> and didn't see any noticeable difference in throughput (~37 MiB/sec on
> my system) with vs without the timeout.
> 


That is not much, isn't your limit the backend? I wonder what would happen
with 25GB/s I'm testing with. Wouldn't it make sense for this to test with 
sequential large IO? And possibly with the passthrough-hp branch that
bypasses IO? And a NUMA system probably would be helpful as well. 
I.e. to test the effect on the kernel side without having an IO limited
system?


With the io-uring interface requests stay in queues from the in-coming CPU
so easier to achieve there, although I wonder for your use-case if it
wouldn't be sufficient to start the timer only when the request is on
the way to fuse-server? One disadvantage I see is that virtiofs would need
to be specially handled.


Thanks,
Bernd
Joanne Koong Oct. 8, 2024, 4:26 p.m. UTC | #16
On Mon, Oct 7, 2024 at 1:02 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> On 10/7/24 20:39, Joanne Koong wrote:
> > On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> >>
> >> On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert
> >> <bernd.schubert@fastmail.fm> wrote:
> >>>
> >>> Hi Joanne,
> >>>
> >>> On 9/27/24 21:36, Joanne Koong wrote:
> >>>> On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >>>>>
> >>>>> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> >>>>>>
> >>>>>> There are situations where fuse servers can become unresponsive or
> >>>>>> stuck, for example if the server is in a deadlock. Currently, there's
> >>>>>> no good way to detect if a server is stuck and needs to be killed
> >>>>>> manually.
> >>>>>>
> >>>>>> This commit adds an option for enforcing a timeout (in seconds) on
> >>>>>> requests where if the timeout elapses without a reply from the server,
> >>>>>> the connection will be automatically aborted.
> >>>>>
> >>>>> Okay.
> >>>>>
> >>>>> I'm not sure what the overhead (scheduling and memory) of timers, but
> >>>>> starting one for each request seems excessive.
> >>>>
> >>>> I ran some benchmarks on this using the passthrough_ll server and saw
> >>>> roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
> >>>> fio --name randwrite --ioengine=sync --thread --invalidate=1
> >>>> --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> >>>> --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> >>>> --group_reporting=1 --directory=/root/fuse_mount
> >>>>
> >>>> Instead of attaching a timer to each request, I think we can instead
> >>>> do the following:
> >>>> * add a "start_time" field to each request tracking (in jiffies) when
> >>>> the request was started
> >>>> * add a new list to the connection that all requests get enqueued
> >>>> onto. When the request is completed, it gets dequeued from this list
> >>>> * have a timer for the connection that fires off every 10 seconds or
> >>>> so. When this timer is fired, it checks if "jiffies > req->start_time
> >>>> + fc->req_timeout" against the head of the list to check if the
> >>>> timeout has expired and we need to abort the request. We only need to
> >>>> check against the head of the list because we know every other request
> >>>> after this was started later in time. I think we could even just use
> >>>> the fc->lock for this instead of needing a separate lock. In the worst
> >>>> case, this grants a 10 second upper bound on the timeout a user
> >>>> requests (eg if the user requests 2 minutes, in the worst case the
> >>>> timeout would trigger at 2 minutes and 10 seconds).
> >>>>
> >>>> Also, now that we're aborting the connection entirely on a timeout
> >>>> instead of just aborting the request, maybe it makes sense to change
> >>>> the timeout granularity to minutes instead of seconds. I'm envisioning
> >>>> that this timeout mechanism will mostly be used as a safeguard against
> >>>> malicious or buggy servers with a high timeout configured (eg 10
> >>>> minutes), and minutes seems like a nicer interface for users than them
> >>>> having to convert that to seconds.
> >>>>
> >>>> Let me know if I've missed anything with this approach but if not,
> >>>> then I'll submit v7 with this change.
> >>>
> >>>
> >>> sounds great to me. Just, could we do this per fuse_dev to avoid a
> >>> single lock for all cores?
> >>>
> >>
> >> Will do! thanks for the suggestion - in that case, I'll add its own
> >> spinlock for it too then.
> >
> > I realized while working on v7 that we can't do this per fuse device
> > because the request is only associated with a device once it's read in
> > by the server (eg fuse_dev_do_read).
> >
> > I ran some rough preliminary benchmarks with
> > ./libfuse/build/example/passthrough_ll  -o writeback -o max_threads=4
> > -o source=~/fstests ~/fuse_mount
> > and
> > fio --name randwrite --ioengine=sync --thread --invalidate=1
> > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> > --group_reporting=1 --directory=fuse_mount
> >
> > and didn't see any noticeable difference in throughput (~37 MiB/sec on
> > my system) with vs without the timeout.
> >
>
>
> That is not much, isn't your limit the backend? I wonder what would happen
> with 25GB/s I'm testing with. Wouldn't it make sense for this to test with
> sequential large IO? And possibly with the passthrough-hp branch that
> bypasses IO? And a NUMA system probably would be helpful as well.
> I.e. to test the effect on the kernel side without having an IO limited
> system?
>

The preliminary benchmarks yesterday were run on a VM because I had
trouble getting consistent results between baseline runs (on origin
w/out my changes) on my usual test machine. I'm going to get this
sorted out and run some tests again.

What are you testing on that's giving you 25 GB/s?

>
> With the io-uring interface requests stay in queues from the in-coming CPU
> so easier to achieve there, although I wonder for your use-case if it
> wouldn't be sufficient to start the timer only when the request is on
> the way to fuse-server? One disadvantage I see is that virtiofs would need
> to be specially handled.

Unfortunately I don't think it suffices to only start the timer when
the request is on the way to the fuse server. If there's a malicious
or deadlocked server, they might not read from /dev/fuse, but we would
want to abort the connection in those cases as well.


Thanks,
Joanne

>
>
> Thanks,
> Bernd
>
>
Joanne Koong Oct. 8, 2024, 7 p.m. UTC | #17
On Tue, Oct 8, 2024 at 9:26 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Mon, Oct 7, 2024 at 1:02 PM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
> >
> > On 10/7/24 20:39, Joanne Koong wrote:
> > > On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >>
> > >> On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert
> > >> <bernd.schubert@fastmail.fm> wrote:
> > >>>
> > >>> Hi Joanne,
> > >>>
> > >>> On 9/27/24 21:36, Joanne Koong wrote:
> > >>>> On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >>>>>
> > >>>>> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> > >>>>>>
> > >>>>>> There are situations where fuse servers can become unresponsive or
> > >>>>>> stuck, for example if the server is in a deadlock. Currently, there's
> > >>>>>> no good way to detect if a server is stuck and needs to be killed
> > >>>>>> manually.
> > >>>>>>
> > >>>>>> This commit adds an option for enforcing a timeout (in seconds) on
> > >>>>>> requests where if the timeout elapses without a reply from the server,
> > >>>>>> the connection will be automatically aborted.
> > >>>>>
> > >>>>> Okay.
> > >>>>>
> > >>>>> I'm not sure what the overhead (scheduling and memory) of timers, but
> > >>>>> starting one for each request seems excessive.
> > >>>>
> > >>>> I ran some benchmarks on this using the passthrough_ll server and saw
> > >>>> roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
> > >>>> fio --name randwrite --ioengine=sync --thread --invalidate=1
> > >>>> --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> > >>>> --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> > >>>> --group_reporting=1 --directory=/root/fuse_mount
> > >>>>
> > >>>> Instead of attaching a timer to each request, I think we can instead
> > >>>> do the following:
> > >>>> * add a "start_time" field to each request tracking (in jiffies) when
> > >>>> the request was started
> > >>>> * add a new list to the connection that all requests get enqueued
> > >>>> onto. When the request is completed, it gets dequeued from this list
> > >>>> * have a timer for the connection that fires off every 10 seconds or
> > >>>> so. When this timer is fired, it checks if "jiffies > req->start_time
> > >>>> + fc->req_timeout" against the head of the list to check if the
> > >>>> timeout has expired and we need to abort the request. We only need to
> > >>>> check against the head of the list because we know every other request
> > >>>> after this was started later in time. I think we could even just use
> > >>>> the fc->lock for this instead of needing a separate lock. In the worst
> > >>>> case, this grants a 10 second upper bound on the timeout a user
> > >>>> requests (eg if the user requests 2 minutes, in the worst case the
> > >>>> timeout would trigger at 2 minutes and 10 seconds).
> > >>>>
> > >>>> Also, now that we're aborting the connection entirely on a timeout
> > >>>> instead of just aborting the request, maybe it makes sense to change
> > >>>> the timeout granularity to minutes instead of seconds. I'm envisioning
> > >>>> that this timeout mechanism will mostly be used as a safeguard against
> > >>>> malicious or buggy servers with a high timeout configured (eg 10
> > >>>> minutes), and minutes seems like a nicer interface for users than them
> > >>>> having to convert that to seconds.
> > >>>>
> > >>>> Let me know if I've missed anything with this approach but if not,
> > >>>> then I'll submit v7 with this change.
> > >>>
> > >>>
> > >>> sounds great to me. Just, could we do this per fuse_dev to avoid a
> > >>> single lock for all cores?
> > >>>
> > >>
> > >> Will do! thanks for the suggestion - in that case, I'll add its own
> > >> spinlock for it too then.
> > >
> > > I realized while working on v7 that we can't do this per fuse device
> > > because the request is only associated with a device once it's read in
> > > by the server (eg fuse_dev_do_read).
> > >
> > > I ran some rough preliminary benchmarks with
> > > ./libfuse/build/example/passthrough_ll  -o writeback -o max_threads=4
> > > -o source=~/fstests ~/fuse_mount
> > > and
> > > fio --name randwrite --ioengine=sync --thread --invalidate=1
> > > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> > > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> > > --group_reporting=1 --directory=fuse_mount
> > >
> > > and didn't see any noticeable difference in throughput (~37 MiB/sec on
> > > my system) with vs without the timeout.
> > >
> >
> >
> > That is not much, isn't your limit the backend? I wonder what would happen
> > with 25GB/s I'm testing with. Wouldn't it make sense for this to test with
> > sequential large IO? And possibly with the passthrough-hp branch that
> > bypasses IO? And a NUMA system probably would be helpful as well.
> > I.e. to test the effect on the kernel side without having an IO limited
> > system?
> >
>
> The preliminary benchmarks yesterday were run on a VM because I had
> trouble getting consistent results between baseline runs (on origin
> w/out my changes) on my usual test machine. I'm going to get this
> sorted out and run some tests again.
>

I'll attach the updated benchmark numbers on this thread (v7 of the
timeout patchset) so that everything's in one place:
https://lore.kernel.org/linux-fsdevel/20241007184258.2837492-1-joannelkoong@gmail.com/T/#t

> What are you testing on that's giving you 25 GB/s?
>
> >
> > With the io-uring interface requests stay in queues from the in-coming CPU
> > so easier to achieve there, although I wonder for your use-case if it
> > wouldn't be sufficient to start the timer only when the request is on
> > the way to fuse-server? One disadvantage I see is that virtiofs would need
> > to be specially handled.
>
> Unfortunately I don't think it suffices to only start the timer when
> the request is on the way to the fuse server. If there's a malicious
> or deadlocked server, they might not read from /dev/fuse, but we would
> want to abort the connection in those cases as well.
>
>
> Thanks,
> Joanne
>
> >
> >
> > Thanks,
> > Bernd
> >
> >
diff mbox series

Patch

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 9eb191b5c4de..a4ec817074a2 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -40,6 +40,16 @@  static struct fuse_dev *fuse_get_dev(struct file *file)
 	return READ_ONCE(file->private_data);
 }
 
+static void fuse_request_timeout(struct timer_list *timer)
+{
+	struct fuse_req *req = container_of(timer, struct fuse_req, timer);
+	struct fuse_conn *fc = req->fm->fc;
+
+	req->timer.function = NULL;
+
+	fuse_abort_conn(fc);
+}
+
 static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
 {
 	INIT_LIST_HEAD(&req->list);
@@ -48,6 +58,8 @@  static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
 	refcount_set(&req->count, 1);
 	__set_bit(FR_PENDING, &req->flags);
 	req->fm = fm;
+	if (fm->fc->req_timeout)
+		timer_setup(&req->timer, fuse_request_timeout, 0);
 }
 
 static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
@@ -283,6 +295,9 @@  void fuse_request_end(struct fuse_req *req)
 	struct fuse_conn *fc = fm->fc;
 	struct fuse_iqueue *fiq = &fc->iq;
 
+	if (req->timer.function)
+		timer_delete_sync(&req->timer);
+
 	if (test_and_set_bit(FR_FINISHED, &req->flags))
 		goto put_request;
 
@@ -393,6 +408,8 @@  static void request_wait_answer(struct fuse_req *req)
 		if (test_bit(FR_PENDING, &req->flags)) {
 			list_del(&req->list);
 			spin_unlock(&fiq->lock);
+			if (req->timer.function)
+				timer_delete_sync(&req->timer);
 			__fuse_put_request(req);
 			req->out.h.error = -EINTR;
 			return;
@@ -409,7 +426,8 @@  static void request_wait_answer(struct fuse_req *req)
 
 static void __fuse_request_send(struct fuse_req *req)
 {
-	struct fuse_iqueue *fiq = &req->fm->fc->iq;
+	struct fuse_conn *fc = req->fm->fc;
+	struct fuse_iqueue *fiq = &fc->iq;
 
 	BUG_ON(test_bit(FR_BACKGROUND, &req->flags));
 	spin_lock(&fiq->lock);
@@ -421,6 +439,8 @@  static void __fuse_request_send(struct fuse_req *req)
 		/* acquire extra reference, since request is still needed
 		   after fuse_request_end() */
 		__fuse_get_request(req);
+		if (req->timer.function)
+			mod_timer(&req->timer, jiffies + fc->req_timeout);
 		queue_request_and_unlock(fiq, req);
 
 		request_wait_answer(req);
@@ -539,6 +559,8 @@  static bool fuse_request_queue_background(struct fuse_req *req)
 		if (fc->num_background == fc->max_background)
 			fc->blocked = 1;
 		list_add_tail(&req->list, &fc->bg_queue);
+		if (req->timer.function)
+			mod_timer(&req->timer, jiffies + fc->req_timeout);
 		flush_bg_queue(fc);
 		queued = true;
 	}
@@ -594,6 +616,8 @@  static int fuse_simple_notify_reply(struct fuse_mount *fm,
 
 	spin_lock(&fiq->lock);
 	if (fiq->connected) {
+		if (req->timer.function)
+			mod_timer(&req->timer, jiffies + fm->fc->req_timeout);
 		queue_request_and_unlock(fiq, req);
 	} else {
 		err = -ENODEV;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f23919610313..97dacafa4289 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -435,6 +435,9 @@  struct fuse_req {
 
 	/** fuse_mount this request belongs to */
 	struct fuse_mount *fm;
+
+	/** timer for request replies, if timeout option is enabled */
+	struct timer_list timer;
 };
 
 struct fuse_iqueue;
@@ -574,6 +577,8 @@  struct fuse_fs_context {
 	enum fuse_dax_mode dax_mode;
 	unsigned int max_read;
 	unsigned int blksize;
+	/*  Request timeout (in seconds). 0 = no timeout (infinite wait) */
+	unsigned int req_timeout;
 	const char *subtype;
 
 	/* DAX device, may be NULL */
@@ -633,6 +638,9 @@  struct fuse_conn {
 	/** Constrain ->max_pages to this value during feature negotiation */
 	unsigned int max_pages_limit;
 
+	/* Request timeout (in jiffies). 0 = no timeout (infinite wait) */
+	unsigned long req_timeout;
+
 	/** Input queue */
 	struct fuse_iqueue iq;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 99e44ea7d875..9e69006fc026 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -733,6 +733,7 @@  enum {
 	OPT_ALLOW_OTHER,
 	OPT_MAX_READ,
 	OPT_BLKSIZE,
+	OPT_REQUEST_TIMEOUT,
 	OPT_ERR
 };
 
@@ -747,6 +748,7 @@  static const struct fs_parameter_spec fuse_fs_parameters[] = {
 	fsparam_u32	("max_read",		OPT_MAX_READ),
 	fsparam_u32	("blksize",		OPT_BLKSIZE),
 	fsparam_string	("subtype",		OPT_SUBTYPE),
+	fsparam_u32	("request_timeout",	OPT_REQUEST_TIMEOUT),
 	{}
 };
 
@@ -830,6 +832,10 @@  static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
 		ctx->blksize = result.uint_32;
 		break;
 
+	case OPT_REQUEST_TIMEOUT:
+		ctx->req_timeout = result.uint_32;
+		break;
+
 	default:
 		return -EINVAL;
 	}
@@ -1724,6 +1730,7 @@  int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	fc->group_id = ctx->group_id;
 	fc->legacy_opts_show = ctx->legacy_opts_show;
 	fc->max_read = max_t(unsigned int, 4096, ctx->max_read);
+	fc->req_timeout = ctx->req_timeout * HZ;
 	fc->destroy = ctx->destroy;
 	fc->no_control = ctx->no_control;
 	fc->no_force_umount = ctx->no_force_umount;