Message ID | 87mv9svnpk.fsf@thinkpad.rath.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 05/31/2017 12:19 PM, Nikolaus Rath wrote: > On May 31 2017, Maxim Patlasov <mpatlasov@virtuozzo.com> wrote: >>>>> Can someone tell me at which point the fuse kernel module will send a >>>>> RELEASE request to userspace? >>>> Anytime after fuse_release(). It only puts request to background >>>> queue. Later, the request will be transferred to pending queue. And >>>> later, the userspace will fetch it by fuse_dev_do_read(). >>>> >>>>> Is it possible that this is delayed until >>>>> after the close() syscall for the last fd has returned and userspace has >>>>> submitted a different fuse request for the same fs? >>>> I think it's possible. See how flush_bg_queue() do nothing if >>>> fc->active_background > fc->max_background. >>> Thanks Maxim! Not sure what I'd do with these issues without you :-). >>> >>> >>> Is there a way to deliberate trigger this behavior for debugging? For >>> example, is there a kernel equivalent of sleep(1) that I could put into >>> fuse_release()? >> schedule_timeout_interruptible(HZ). > Hmm. I made the following change in linux 4.10: > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c > index 2401c5..3568a8 100644 > --- a/fs/fuse/file.c > +++ b/fs/fuse/file.c > @@ -252,6 +252,9 @@ void fuse_release_common(struct file *file, int opcode) > if (unlikely(!ff)) > return; > > + // Wait a little to force race condition in userspace > + schedule_timeout_interruptible(1); > + > req = ff->reserved_req; > fuse_prepare_release(ff, file->f_flags, opcode); > > > But when doing e.g. "echo test > newfile", the RELEASE request still > comes right away (judging from the libfuse debugging output). > > Do I need to do something else? Try HZ*10 instead of 1 as an argument of schedule_timeout_interruptible. > >> But it's better to instrument fuse >> userspace to postpone processing some i/o requests. Then you'll keep >> fc->active_background > fc->max_background for a while. During that >> period fuse_release may succeed with FUSE_RELEASE queued, but not >> passed to the userspace. Then you cat try to sneak another request -- >> something not involving fuse background queue. > I don't know.. why is this better? It seems a lot more complicated. I > need to generate the extra request, add some switch to tell libfuse when > to start processing again, synchronize this with sneaking in the other > request... I thought it's better because it would trigger delayed processing of FUSE_RELEASE: last __fput() succeeded, but fuse userspace will see FUSE_RELEASE only later. Adding sleep to fuse_release_common would only extend processing time of last __fput(), is that what you need? > > > > Best, > -Nikolaus >
On May 31 2017, Maxim Patlasov <mpatlasov@virtuozzo.com> wrote: > On 05/31/2017 12:19 PM, Nikolaus Rath wrote: > >> On May 31 2017, Maxim Patlasov <mpatlasov@virtuozzo.com> wrote: >>>>>> Can someone tell me at which point the fuse kernel module will send a >>>>>> RELEASE request to userspace? >>>>> Anytime after fuse_release(). It only puts request to background >>>>> queue. Later, the request will be transferred to pending queue. And >>>>> later, the userspace will fetch it by fuse_dev_do_read(). >>>>> >>>>>> Is it possible that this is delayed until >>>>>> after the close() syscall for the last fd has returned and userspace has >>>>>> submitted a different fuse request for the same fs? >>>>> I think it's possible. See how flush_bg_queue() do nothing if >>>>> fc->active_background > fc->max_background. >>>> Thanks Maxim! Not sure what I'd do with these issues without you :-). >>>> >>>> >>>> Is there a way to deliberate trigger this behavior for debugging? For >>>> example, is there a kernel equivalent of sleep(1) that I could put into >>>> fuse_release()? >>> schedule_timeout_interruptible(HZ). >> Hmm. I made the following change in linux 4.10: >> >> diff --git a/fs/fuse/file.c b/fs/fuse/file.c >> index 2401c5..3568a8 100644 >> --- a/fs/fuse/file.c >> +++ b/fs/fuse/file.c >> @@ -252,6 +252,9 @@ void fuse_release_common(struct file *file, int opcode) >> if (unlikely(!ff)) >> return; >> + // Wait a little to force race condition in userspace >> + schedule_timeout_interruptible(1); >> + >> req = ff->reserved_req; >> fuse_prepare_release(ff, file->f_flags, opcode); >> >> >> But when doing e.g. "echo test > newfile", the RELEASE request still >> comes right away (judging from the libfuse debugging output). >> >> Do I need to do something else? > > Try HZ*10 instead of 1 as an argument of > schedule_timeout_interruptible. Ok, now the RELEASE comes a lot later. But now userspace is also blocking until RELEASE comes in. >>> But it's better to instrument fuse >>> userspace to postpone processing some i/o requests. Then you'll keep >>> fc->active_background > fc->max_background for a while. During that >>> period fuse_release may succeed with FUSE_RELEASE queued, but not >>> passed to the userspace. Then you cat try to sneak another request -- >>> something not involving fuse background queue. >> >> I don't know.. why is this better? It seems a lot more complicated. I >> need to generate the extra request, add some switch to tell libfuse when >> to start processing again, synchronize this with sneaking in the other >> request... > > I thought it's better because it would trigger delayed processing of > FUSE_RELEASE: last __fput() succeeded, but fuse userspace will see > FUSE_RELEASE only later. Adding sleep to fuse_release_common would > only extend processing time of last __fput(), is that what you need? I do not fully understand the difference you describe. What I would like to construct is the following scenario: 1. Userspace calls close() 2. Userspace close() returns 3. Userspace calls unlink() 4. Userspace unlink() returns 5. libfuse reads UNLINK request from kernel pipe 6. libfuse reads RELEASE request from kernel pipe What would be the simplest way to do that? Thanks! -Nikolaus
On 05/31/2017 12:41 PM, Nikolaus Rath wrote: > On May 31 2017, Maxim Patlasov <mpatlasov@virtuozzo.com> wrote: >> On 05/31/2017 12:19 PM, Nikolaus Rath wrote: >> >>> On May 31 2017, Maxim Patlasov <mpatlasov@virtuozzo.com> wrote: >>>>>>> Can someone tell me at which point the fuse kernel module will send a >>>>>>> RELEASE request to userspace? >>>>>> Anytime after fuse_release(). It only puts request to background >>>>>> queue. Later, the request will be transferred to pending queue. And >>>>>> later, the userspace will fetch it by fuse_dev_do_read(). >>>>>> >>>>>>> Is it possible that this is delayed until >>>>>>> after the close() syscall for the last fd has returned and userspace has >>>>>>> submitted a different fuse request for the same fs? >>>>>> I think it's possible. See how flush_bg_queue() do nothing if >>>>>> fc->active_background > fc->max_background. >>>>> Thanks Maxim! Not sure what I'd do with these issues without you :-). >>>>> >>>>> >>>>> Is there a way to deliberate trigger this behavior for debugging? For >>>>> example, is there a kernel equivalent of sleep(1) that I could put into >>>>> fuse_release()? >>>> schedule_timeout_interruptible(HZ). >>> Hmm. I made the following change in linux 4.10: >>> >>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c >>> index 2401c5..3568a8 100644 >>> --- a/fs/fuse/file.c >>> +++ b/fs/fuse/file.c >>> @@ -252,6 +252,9 @@ void fuse_release_common(struct file *file, int opcode) >>> if (unlikely(!ff)) >>> return; >>> + // Wait a little to force race condition in userspace >>> + schedule_timeout_interruptible(1); >>> + >>> req = ff->reserved_req; >>> fuse_prepare_release(ff, file->f_flags, opcode); >>> >>> >>> But when doing e.g. "echo test > newfile", the RELEASE request still >>> comes right away (judging from the libfuse debugging output). >>> >>> Do I need to do something else? >> Try HZ*10 instead of 1 as an argument of >> schedule_timeout_interruptible. > Ok, now the RELEASE comes a lot later. But now userspace is also > blocking until RELEASE comes in. That's exactly why I thought that adding sleep there wouldn't be very useful. > >>>> But it's better to instrument fuse >>>> userspace to postpone processing some i/o requests. Then you'll keep >>>> fc->active_background > fc->max_background for a while. During that >>>> period fuse_release may succeed with FUSE_RELEASE queued, but not >>>> passed to the userspace. Then you cat try to sneak another request -- >>>> something not involving fuse background queue. >>> I don't know.. why is this better? It seems a lot more complicated. I >>> need to generate the extra request, add some switch to tell libfuse when >>> to start processing again, synchronize this with sneaking in the other >>> request... >> I thought it's better because it would trigger delayed processing of >> FUSE_RELEASE: last __fput() succeeded, but fuse userspace will see >> FUSE_RELEASE only later. Adding sleep to fuse_release_common would >> only extend processing time of last __fput(), is that what you need? > I do not fully understand the difference you describe. What I would like > to construct is the following scenario: > > 1. Userspace calls close() > 2. Userspace close() returns > 3. Userspace calls unlink() > 4. Userspace unlink() returns > 5. libfuse reads UNLINK request from kernel pipe > 6. libfuse reads RELEASE request from kernel pipe > > What would be the simplest way to do that? I would try to keep fc->active_background elevated somehow. For example you add sleep(1) for every incoming write request to libfuse and serialize processing them. Then you generate enough writes to achieve fc->max_background. If you call close() now, and if it really ends up in last __fput(), corresponding FUSE_RELEASE will sit in background queue for long while (as many seconds as # elements in the queue). But close() from your 2. will return much earlier because it doesn't wait for completion of FUSE_RELEASE. Hence unlink() might succeed. > > Thanks! > -Nikolaus >
On May 31 2017, Maxim Patlasov <mpatlasov@virtuozzo.com> wrote: >> I do not fully understand the difference you describe. What I would like >> to construct is the following scenario: >> >> 1. Userspace calls close() >> 2. Userspace close() returns >> 3. Userspace calls unlink() >> 4. Userspace unlink() returns >> 5. libfuse reads UNLINK request from kernel pipe >> 6. libfuse reads RELEASE request from kernel pipe >> >> What would be the simplest way to do that? > > I would try to keep fc->active_background elevated somehow. For > example you add sleep(1) for every incoming write request to libfuse > and serialize processing them. Then you generate enough writes to > achieve fc->max_background. If you call close() now, and if it really > ends up in last __fput(), corresponding FUSE_RELEASE will sit in > background queue for long while (as many seconds as # elements in the > queue). But close() from your 2. will return much earlier because it > doesn't wait for completion of FUSE_RELEASE. Hence unlink() might > succeed. Ah, got it now, thanks! Wouldn't be a simpler solution be to just patch the kernel module to *always* put FUSE_RELEASE requests into the background queue, so that I don't have to manually keep fc->active_background elevated? I just can't seem to find the code that does this check... I would expect it in fuse_file_put(), but the condition in there does not seem to look at the number of background requests at all. Best, -Nikolaus
On May 31 2017, Michael Theall <pigman46-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> I do not fully understand the difference you describe. What I would like >> to construct is the following scenario: >> >> 1. Userspace calls close() >> 2. Userspace close() returns >> 3. Userspace calls unlink() >> 4. Userspace unlink() returns >> 5. libfuse reads UNLINK request from kernel pipe >> 6. libfuse reads RELEASE request from kernel pipe >> >> What would be the simplest way to do that? >> > Won't unlink(2) block until the fuse server has responded? Yes, you are right. It should be: 1. Userspace calls close() 2. Userspace close() returns 3. Userspace calls unlink() 4. libfuse reads UNLINK request from kernel pipe 5. Userspace unlink() returns 6. libfuse reads RELEASE request from kernel pipe > I'm pretty sure > the close(2) should come back after the fuse server responds to FLUSH. It > sounds like with your RELEASE delay in the kernel, you should get your > steps as described buy step 4 and 5 must be swapped. No, the delay comes in between (1) and (2). Best, -Nikolaus
On 05/31/2017 01:31 PM, Nikolaus Rath wrote: > On May 31 2017, Maxim Patlasov <mpatlasov@virtuozzo.com> wrote: >>> I do not fully understand the difference you describe. What I would like >>> to construct is the following scenario: >>> >>> 1. Userspace calls close() >>> 2. Userspace close() returns >>> 3. Userspace calls unlink() >>> 4. Userspace unlink() returns >>> 5. libfuse reads UNLINK request from kernel pipe >>> 6. libfuse reads RELEASE request from kernel pipe >>> >>> What would be the simplest way to do that? >> I would try to keep fc->active_background elevated somehow. For >> example you add sleep(1) for every incoming write request to libfuse >> and serialize processing them. Then you generate enough writes to >> achieve fc->max_background. If you call close() now, and if it really >> ends up in last __fput(), corresponding FUSE_RELEASE will sit in >> background queue for long while (as many seconds as # elements in the >> queue). But close() from your 2. will return much earlier because it >> doesn't wait for completion of FUSE_RELEASE. Hence unlink() might >> succeed. > Ah, got it now, thanks! > > Wouldn't be a simpler solution be to just patch the kernel module to > *always* put FUSE_RELEASE requests into the background queue, so that I > don't have to manually keep fc->active_background elevated? > > I just can't seem to find the code that does this check... I would > expect it in fuse_file_put(), but the condition in there does not seem to > look at the number of background requests at all. The decision is made at mount stage: it's either fuseblk mount or not. If it's not fuseblk mount, the kernel always put FUSE_RELEASE to background queue. And vice versa. Keeping active_background elevated may help us to win the race: you want unlink is queued and processed before the userspace reads FUSE_RELEASE from kernel. > > > Best, > -Nikolaus >
diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 2401c5..3568a8 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -252,6 +252,9 @@ void fuse_release_common(struct file *file, int opcode) if (unlikely(!ff)) return; + // Wait a little to force race condition in userspace + schedule_timeout_interruptible(1); + req = ff->reserved_req; fuse_prepare_release(ff, file->f_flags, opcode);