[fuse-devel] fuse: when are release requests queued?

Message ID	87mv9svnpk.fsf@thinkpad.rath.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> From: Nikolaus Rath <Nikolaus@rath.org> To: Maxim Patlasov <mpatlasov@virtuozzo.com>, fuse-devel <fuse-devel@lists.sourceforge.net>, linux-fsdevel <linux-fsdevel@vger.kernel.org> Subject: Re: [fuse-devel] fuse: when are release requests queued? References: <87a860r0v9.fsf@thinkpad.rath.org> <b7c495bd-6e24-7d67-1232-5e2bf15abf3e@virtuozzo.com> <87zidva9r3.fsf@vostro.rath.org> <87eb44a9-359f-68eb-fe42-614a0d9c8193@virtuozzo.com> Mail-Copies-To: never Mail-Followup-To: Maxim Patlasov <mpatlasov@virtuozzo.com>, fuse-devel <fuse-devel@lists.sourceforge.net>, linux-fsdevel <linux-fsdevel@vger.kernel.org> Date: Wed, 31 May 2017 12:19:19 -0700 In-Reply-To: <87eb44a9-359f-68eb-fe42-614a0d9c8193@virtuozzo.com> (Maxim Patlasov's message of "Wed, 31 May 2017 10:50:57 -0700") Message-ID: <87mv9svnpk.fsf@thinkpad.rath.org> User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk

Nikolaus Rath May 31, 2017, 7:19 p.m. UTC

On May 31 2017, Maxim Patlasov <mpatlasov@virtuozzo.com> wrote:
>>>> Can someone tell me at which point the fuse kernel module will send a
>>>> RELEASE request to userspace?
>>>
>>> Anytime after fuse_release(). It only puts request to background
>>> queue. Later, the request will be transferred to pending queue. And
>>> later, the userspace will fetch it by fuse_dev_do_read().
>>>
>>>> Is it possible that this is delayed until
>>>> after the close() syscall for the last fd has returned and userspace has
>>>> submitted a different fuse request for the same fs?
>>> I think it's possible. See how flush_bg_queue() do nothing if
>>> fc->active_background > fc->max_background.
>> Thanks Maxim! Not sure what I'd do with these issues without you :-).
>>
>>
>> Is there a way to deliberate trigger this behavior for debugging? For
>> example, is there a kernel equivalent of sleep(1) that I could put into
>> fuse_release()?
>
> schedule_timeout_interruptible(HZ).

Hmm. I made the following change in linux 4.10:


But when doing e.g. "echo test > newfile", the RELEASE request still
comes right away (judging from the libfuse debugging output).

Do I need to do something else?

> But it's better to instrument fuse
> userspace to postpone processing some i/o requests. Then you'll keep
> fc->active_background > fc->max_background for a while. During that
> period fuse_release may succeed with FUSE_RELEASE queued, but not
> passed to the userspace. Then you cat try to sneak another request --
> something not involving fuse background queue.

I don't know.. why is this better? It seems a lot more complicated. I
need to generate the extra request, add some switch to tell libfuse when
to start processing again, synchronize this with sneaking in the other
request...



Best,
-Nikolaus

Maxim Patlasov May 31, 2017, 7:32 p.m. UTC | #1

On 05/31/2017 12:19 PM, Nikolaus Rath wrote:

> On May 31 2017, Maxim Patlasov <mpatlasov@virtuozzo.com> wrote:
>>>>> Can someone tell me at which point the fuse kernel module will send a
>>>>> RELEASE request to userspace?
>>>> Anytime after fuse_release(). It only puts request to background
>>>> queue. Later, the request will be transferred to pending queue. And
>>>> later, the userspace will fetch it by fuse_dev_do_read().
>>>>
>>>>> Is it possible that this is delayed until
>>>>> after the close() syscall for the last fd has returned and userspace has
>>>>> submitted a different fuse request for the same fs?
>>>> I think it's possible. See how flush_bg_queue() do nothing if
>>>> fc->active_background > fc->max_background.
>>> Thanks Maxim! Not sure what I'd do with these issues without you :-).
>>>
>>>
>>> Is there a way to deliberate trigger this behavior for debugging? For
>>> example, is there a kernel equivalent of sleep(1) that I could put into
>>> fuse_release()?
>> schedule_timeout_interruptible(HZ).
> Hmm. I made the following change in linux 4.10:
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 2401c5..3568a8 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -252,6 +252,9 @@ void fuse_release_common(struct file *file, int opcode)
>          if (unlikely(!ff))
>                  return;
>   
> +        // Wait a little to force race condition in userspace
> +        schedule_timeout_interruptible(1);
> +
>          req = ff->reserved_req;
>          fuse_prepare_release(ff, file->f_flags, opcode);
>   
>
> But when doing e.g. "echo test > newfile", the RELEASE request still
> comes right away (judging from the libfuse debugging output).
>
> Do I need to do something else?

Try HZ*10 instead of 1 as an argument of schedule_timeout_interruptible.

>
>> But it's better to instrument fuse
>> userspace to postpone processing some i/o requests. Then you'll keep
>> fc->active_background > fc->max_background for a while. During that
>> period fuse_release may succeed with FUSE_RELEASE queued, but not
>> passed to the userspace. Then you cat try to sneak another request --
>> something not involving fuse background queue.
> I don't know.. why is this better? It seems a lot more complicated. I
> need to generate the extra request, add some switch to tell libfuse when
> to start processing again, synchronize this with sneaking in the other
> request...

I thought it's better because it would trigger delayed processing of 
FUSE_RELEASE: last __fput() succeeded, but fuse userspace will see 
FUSE_RELEASE only later. Adding sleep to fuse_release_common would only 
extend processing time of last __fput(), is that what you need?

>
>
>
> Best,
> -Nikolaus
>

Nikolaus Rath May 31, 2017, 7:41 p.m. UTC | #2

On May 31 2017, Maxim Patlasov <mpatlasov@virtuozzo.com> wrote:
> On 05/31/2017 12:19 PM, Nikolaus Rath wrote:
>
>> On May 31 2017, Maxim Patlasov <mpatlasov@virtuozzo.com> wrote:
>>>>>> Can someone tell me at which point the fuse kernel module will send a
>>>>>> RELEASE request to userspace?
>>>>> Anytime after fuse_release(). It only puts request to background
>>>>> queue. Later, the request will be transferred to pending queue. And
>>>>> later, the userspace will fetch it by fuse_dev_do_read().
>>>>>
>>>>>> Is it possible that this is delayed until
>>>>>> after the close() syscall for the last fd has returned and userspace has
>>>>>> submitted a different fuse request for the same fs?
>>>>> I think it's possible. See how flush_bg_queue() do nothing if
>>>>> fc->active_background > fc->max_background.
>>>> Thanks Maxim! Not sure what I'd do with these issues without you :-).
>>>>
>>>>
>>>> Is there a way to deliberate trigger this behavior for debugging? For
>>>> example, is there a kernel equivalent of sleep(1) that I could put into
>>>> fuse_release()?
>>> schedule_timeout_interruptible(HZ).
>> Hmm. I made the following change in linux 4.10:
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index 2401c5..3568a8 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -252,6 +252,9 @@ void fuse_release_common(struct file *file, int opcode)
>>          if (unlikely(!ff))
>>                  return;
>>   +        // Wait a little to force race condition in userspace
>> +        schedule_timeout_interruptible(1);
>> +
>>          req = ff->reserved_req;
>>          fuse_prepare_release(ff, file->f_flags, opcode);
>>   
>>
>> But when doing e.g. "echo test > newfile", the RELEASE request still
>> comes right away (judging from the libfuse debugging output).
>>
>> Do I need to do something else?
>
> Try HZ*10 instead of 1 as an argument of
> schedule_timeout_interruptible.

Ok, now the RELEASE comes a lot later. But now userspace is also
blocking until RELEASE comes in.

>>> But it's better to instrument fuse
>>> userspace to postpone processing some i/o requests. Then you'll keep
>>> fc->active_background > fc->max_background for a while. During that
>>> period fuse_release may succeed with FUSE_RELEASE queued, but not
>>> passed to the userspace. Then you cat try to sneak another request --
>>> something not involving fuse background queue.
>>
>> I don't know.. why is this better? It seems a lot more complicated. I
>> need to generate the extra request, add some switch to tell libfuse when
>> to start processing again, synchronize this with sneaking in the other
>> request...
>
> I thought it's better because it would trigger delayed processing of
> FUSE_RELEASE: last __fput() succeeded, but fuse userspace will see
> FUSE_RELEASE only later. Adding sleep to fuse_release_common would
> only extend processing time of last __fput(), is that what you need?

I do not fully understand the difference you describe. What I would like
to construct is the following scenario:

1. Userspace calls close()
2. Userspace close() returns
3. Userspace calls unlink()
4. Userspace unlink() returns
5. libfuse reads UNLINK request from kernel pipe
6. libfuse reads RELEASE request from kernel pipe

What would be the simplest way to do that?

Thanks!
-Nikolaus

Maxim Patlasov May 31, 2017, 8:23 p.m. UTC | #3

On 05/31/2017 12:41 PM, Nikolaus Rath wrote:

> On May 31 2017, Maxim Patlasov <mpatlasov@virtuozzo.com> wrote:
>> On 05/31/2017 12:19 PM, Nikolaus Rath wrote:
>>
>>> On May 31 2017, Maxim Patlasov <mpatlasov@virtuozzo.com> wrote:
>>>>>>> Can someone tell me at which point the fuse kernel module will send a
>>>>>>> RELEASE request to userspace?
>>>>>> Anytime after fuse_release(). It only puts request to background
>>>>>> queue. Later, the request will be transferred to pending queue. And
>>>>>> later, the userspace will fetch it by fuse_dev_do_read().
>>>>>>
>>>>>>> Is it possible that this is delayed until
>>>>>>> after the close() syscall for the last fd has returned and userspace has
>>>>>>> submitted a different fuse request for the same fs?
>>>>>> I think it's possible. See how flush_bg_queue() do nothing if
>>>>>> fc->active_background > fc->max_background.
>>>>> Thanks Maxim! Not sure what I'd do with these issues without you :-).
>>>>>
>>>>>
>>>>> Is there a way to deliberate trigger this behavior for debugging? For
>>>>> example, is there a kernel equivalent of sleep(1) that I could put into
>>>>> fuse_release()?
>>>> schedule_timeout_interruptible(HZ).
>>> Hmm. I made the following change in linux 4.10:
>>>
>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>>> index 2401c5..3568a8 100644
>>> --- a/fs/fuse/file.c
>>> +++ b/fs/fuse/file.c
>>> @@ -252,6 +252,9 @@ void fuse_release_common(struct file *file, int opcode)
>>>           if (unlikely(!ff))
>>>                   return;
>>>    +        // Wait a little to force race condition in userspace
>>> +        schedule_timeout_interruptible(1);
>>> +
>>>           req = ff->reserved_req;
>>>           fuse_prepare_release(ff, file->f_flags, opcode);
>>>    
>>>
>>> But when doing e.g. "echo test > newfile", the RELEASE request still
>>> comes right away (judging from the libfuse debugging output).
>>>
>>> Do I need to do something else?
>> Try HZ*10 instead of 1 as an argument of
>> schedule_timeout_interruptible.
> Ok, now the RELEASE comes a lot later. But now userspace is also
> blocking until RELEASE comes in.

That's exactly why I thought that adding sleep there wouldn't be very 
useful.

>
>>>> But it's better to instrument fuse
>>>> userspace to postpone processing some i/o requests. Then you'll keep
>>>> fc->active_background > fc->max_background for a while. During that
>>>> period fuse_release may succeed with FUSE_RELEASE queued, but not
>>>> passed to the userspace. Then you cat try to sneak another request --
>>>> something not involving fuse background queue.
>>> I don't know.. why is this better? It seems a lot more complicated. I
>>> need to generate the extra request, add some switch to tell libfuse when
>>> to start processing again, synchronize this with sneaking in the other
>>> request...
>> I thought it's better because it would trigger delayed processing of
>> FUSE_RELEASE: last __fput() succeeded, but fuse userspace will see
>> FUSE_RELEASE only later. Adding sleep to fuse_release_common would
>> only extend processing time of last __fput(), is that what you need?
> I do not fully understand the difference you describe. What I would like
> to construct is the following scenario:
>
> 1. Userspace calls close()
> 2. Userspace close() returns
> 3. Userspace calls unlink()
> 4. Userspace unlink() returns
> 5. libfuse reads UNLINK request from kernel pipe
> 6. libfuse reads RELEASE request from kernel pipe
>
> What would be the simplest way to do that?

I would try to keep fc->active_background elevated somehow. For example 
you add sleep(1) for every incoming write request to libfuse and 
serialize processing them. Then you generate enough writes to achieve 
fc->max_background. If you call close() now, and if it really ends up in 
last __fput(), corresponding FUSE_RELEASE will sit in background queue 
for long while (as many seconds as # elements in the queue). But close() 
from your 2. will return much earlier because it doesn't wait for 
completion of FUSE_RELEASE. Hence unlink() might succeed.

>
> Thanks!
> -Nikolaus
>

Nikolaus Rath May 31, 2017, 8:31 p.m. UTC | #4

On May 31 2017, Maxim Patlasov <mpatlasov@virtuozzo.com> wrote:
>> I do not fully understand the difference you describe. What I would like
>> to construct is the following scenario:
>>
>> 1. Userspace calls close()
>> 2. Userspace close() returns
>> 3. Userspace calls unlink()
>> 4. Userspace unlink() returns
>> 5. libfuse reads UNLINK request from kernel pipe
>> 6. libfuse reads RELEASE request from kernel pipe
>>
>> What would be the simplest way to do that?
>
> I would try to keep fc->active_background elevated somehow. For
> example you add sleep(1) for every incoming write request to libfuse
> and serialize processing them. Then you generate enough writes to
> achieve fc->max_background. If you call close() now, and if it really
> ends up in last __fput(), corresponding FUSE_RELEASE will sit in
> background queue for long while (as many seconds as # elements in the
> queue). But close() from your 2. will return much earlier because it
> doesn't wait for completion of FUSE_RELEASE. Hence unlink() might
> succeed.

Ah, got it now, thanks!

Wouldn't be a simpler solution be to just patch the kernel module to
*always* put FUSE_RELEASE requests into the background queue, so that I
don't have to manually keep fc->active_background elevated?

I just can't seem to find the code that does this check... I would
expect it in fuse_file_put(), but the condition in there does not seem to
look at the number of background requests at all.


Best,
-Nikolaus

Nikolaus Rath May 31, 2017, 8:34 p.m. UTC | #5

On May 31 2017, Michael Theall <pigman46-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> I do not fully understand the difference you describe. What I would like
>> to construct is the following scenario:
>>
>> 1. Userspace calls close()
>> 2. Userspace close() returns
>> 3. Userspace calls unlink()
>> 4. Userspace unlink() returns
>> 5. libfuse reads UNLINK request from kernel pipe
>> 6. libfuse reads RELEASE request from kernel pipe
>>
>> What would be the simplest way to do that?
>>
> Won't unlink(2) block until the fuse server has responded?

Yes, you are right. It should be:

 1. Userspace calls close()
 2. Userspace close() returns
 3. Userspace calls unlink()
 4. libfuse reads UNLINK request from kernel pipe
 5. Userspace unlink() returns
 6. libfuse reads RELEASE request from kernel pipe

> I'm pretty sure
> the close(2) should come back after the fuse server responds to FLUSH. It
> sounds like with your RELEASE delay in the kernel, you should get your
> steps as described buy step 4 and 5 must be swapped.

No, the delay comes in between (1) and (2).

Best,
-Nikolaus

Maxim Patlasov May 31, 2017, 8:47 p.m. UTC | #6

On 05/31/2017 01:31 PM, Nikolaus Rath wrote:

> On May 31 2017, Maxim Patlasov <mpatlasov@virtuozzo.com> wrote:
>>> I do not fully understand the difference you describe. What I would like
>>> to construct is the following scenario:
>>>
>>> 1. Userspace calls close()
>>> 2. Userspace close() returns
>>> 3. Userspace calls unlink()
>>> 4. Userspace unlink() returns
>>> 5. libfuse reads UNLINK request from kernel pipe
>>> 6. libfuse reads RELEASE request from kernel pipe
>>>
>>> What would be the simplest way to do that?
>> I would try to keep fc->active_background elevated somehow. For
>> example you add sleep(1) for every incoming write request to libfuse
>> and serialize processing them. Then you generate enough writes to
>> achieve fc->max_background. If you call close() now, and if it really
>> ends up in last __fput(), corresponding FUSE_RELEASE will sit in
>> background queue for long while (as many seconds as # elements in the
>> queue). But close() from your 2. will return much earlier because it
>> doesn't wait for completion of FUSE_RELEASE. Hence unlink() might
>> succeed.
> Ah, got it now, thanks!
>
> Wouldn't be a simpler solution be to just patch the kernel module to
> *always* put FUSE_RELEASE requests into the background queue, so that I
> don't have to manually keep fc->active_background elevated?
>
> I just can't seem to find the code that does this check... I would
> expect it in fuse_file_put(), but the condition in there does not seem to
> look at the number of background requests at all.

The decision is made at mount stage: it's either fuseblk mount or not. 
If it's not fuseblk mount, the kernel always put FUSE_RELEASE to 
background queue. And vice versa.

Keeping active_background elevated may help us to win the race: you want 
unlink is queued and processed before the userspace reads FUSE_RELEASE 
from kernel.


>
>
> Best,
> -Nikolaus
>

[fuse-devel] fuse: when are release requests queued?

Commit Message

Comments

Patch