diff mbox series

[RFC,v2,16/19] fuse: {uring} Wake requests on the the current cpu

Message ID 20240529-fuse-uring-for-6-9-rfc2-out-v1-16-d149476b1d65@ddn.com (mailing list archive)
State New
Headers show
Series fuse: fuse-over-io-uring | expand

Commit Message

Bernd Schubert May 29, 2024, 6 p.m. UTC
Most of the performance improvements
with fuse-over-io-uring for synchronous requests is the possibility
to run processing on the submitting cpu core and to also wake
the submitting process on the same core - switching between
cpu cores.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
 fs/fuse/dev.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Shachar Sharon May 30, 2024, 4:44 p.m. UTC | #1
On Wed, May 29, 2024 at 10:36 PM Bernd Schubert <bschubert@ddn.com> wrote:
>
> Most of the performance improvements
> with fuse-over-io-uring for synchronous requests is the possibility
> to run processing on the submitting cpu core and to also wake
> the submitting process on the same core - switching between
> cpu cores.
>
> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> ---
>  fs/fuse/dev.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index c7fd3849a105..851c5fa99946 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -333,7 +333,10 @@ void fuse_request_end(struct fuse_req *req)
>                 spin_unlock(&fc->bg_lock);
>         } else {
>                 /* Wake up waiter sleeping in request_wait_answer() */
> -               wake_up(&req->waitq);
> +               if (fuse_per_core_queue(fc))
> +                       __wake_up_on_current_cpu(&req->waitq, TASK_NORMAL, NULL);
> +               else
> +                       wake_up(&req->waitq);

Would it be possible to apply this idea for regular FUSE connection?
What would happen if some (buggy or malicious) userspace FUSE server uses
sched_setaffinity(2) to run only on a subset of active CPUs?


>         }
>
>         if (test_bit(FR_ASYNC, &req->flags))
>
> --
> 2.40.1
>
>
Bernd Schubert May 30, 2024, 4:59 p.m. UTC | #2
On 5/30/24 18:44, Shachar Sharon wrote:
> On Wed, May 29, 2024 at 10:36 PM Bernd Schubert <bschubert@ddn.com> wrote:
>>
>> Most of the performance improvements
>> with fuse-over-io-uring for synchronous requests is the possibility
>> to run processing on the submitting cpu core and to also wake
>> the submitting process on the same core - switching between
>> cpu cores.
>>
>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
>> ---
>>  fs/fuse/dev.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>> index c7fd3849a105..851c5fa99946 100644
>> --- a/fs/fuse/dev.c
>> +++ b/fs/fuse/dev.c
>> @@ -333,7 +333,10 @@ void fuse_request_end(struct fuse_req *req)
>>                 spin_unlock(&fc->bg_lock);
>>         } else {
>>                 /* Wake up waiter sleeping in request_wait_answer() */
>> -               wake_up(&req->waitq);
>> +               if (fuse_per_core_queue(fc))
>> +                       __wake_up_on_current_cpu(&req->waitq, TASK_NORMAL, NULL);
>> +               else
>> +                       wake_up(&req->waitq);
> 
> Would it be possible to apply this idea for regular FUSE connection?

I probably should have written it in the commit message, without uring
performance is the same or slightly worse. With direct-IO reads

jobs    /dev/fuse         /dev/fuse
        (migrate off)     (migrate on)
1           2023             1652
2           3375   	     2805
4           3823             4193
8           7796             8161
16          8520             8518
24          8361             8084
32          8717             8342


(in MB/s).

I think there is no improvement as daemon threads process requests on
random cores. I.e. request processing doesn't happen on the same core
a request was submitted to.


> What would happen if some (buggy or malicious) userspace FUSE server uses
> sched_setaffinity(2) to run only on a subset of active CPUs?


The request goes to the ring, which cpu it eventually handles should not
matter. Performance will not be optimal then.
That being said, the introduction mail points out an issue with xfstest
generic/650,
which disables/enables CPUs in a loop - I need to investigate what
happens there.


Thanks,
Bernd
diff mbox series

Patch

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index c7fd3849a105..851c5fa99946 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -333,7 +333,10 @@  void fuse_request_end(struct fuse_req *req)
 		spin_unlock(&fc->bg_lock);
 	} else {
 		/* Wake up waiter sleeping in request_wait_answer() */
-		wake_up(&req->waitq);
+		if (fuse_per_core_queue(fc))
+			__wake_up_on_current_cpu(&req->waitq, TASK_NORMAL, NULL);
+		else
+			wake_up(&req->waitq);
 	}
 
 	if (test_bit(FR_ASYNC, &req->flags))