diff mbox series

[v2] fput: Allow calling __fput_sync() from !PF_KTHREAD thread.

Message ID 1596027885-4730-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp (mailing list archive)
State New, archived
Headers show
Series [v2] fput: Allow calling __fput_sync() from !PF_KTHREAD thread. | expand

Commit Message

Tetsuo Handa July 29, 2020, 1:04 p.m. UTC
__fput_sync() was introduced by commit 4a9d4b024a3102fc ("switch fput to
task_work_add") with BUG_ON(!(current->flags & PF_KTHREAD)) check, and
the only user of __fput_sync() was introduced by commit 17c0a5aaffa63da6
("make acct_kill() wait for file closing."). However, the latter commit is
effectively calling __fput_sync() from !PF_KTHREAD thread because of
schedule_work() call followed by immediate wait_for_completion() call.
That is, there is no need to defer close_work() to a WQ context. I guess
that the reason to defer was nothing but to bypass this BUG_ON() check.
While we need to remain careful about calling __fput_sync(), we can remove
bypassable BUG_ON() check from __fput_sync().

If this change is accepted, racy fput()+flush_delayed_fput() introduced
by commit e2dc9bf3f5275ca3 ("umd: Transform fork_usermode_blob into
fork_usermode_driver") will be replaced by this raceless __fput_sync().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 fs/file_table.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

Comments

Al Viro Sept. 10, 2020, 3:57 a.m. UTC | #1
On Wed, Jul 29, 2020 at 10:04:45PM +0900, Tetsuo Handa wrote:
> __fput_sync() was introduced by commit 4a9d4b024a3102fc ("switch fput to
> task_work_add") with BUG_ON(!(current->flags & PF_KTHREAD)) check, and
> the only user of __fput_sync() was introduced by commit 17c0a5aaffa63da6
> ("make acct_kill() wait for file closing."). However, the latter commit is
> effectively calling __fput_sync() from !PF_KTHREAD thread because of
> schedule_work() call followed by immediate wait_for_completion() call.
> That is, there is no need to defer close_work() to a WQ context. I guess
> that the reason to defer was nothing but to bypass this BUG_ON() check.
> While we need to remain careful about calling __fput_sync(), we can remove
> bypassable BUG_ON() check from __fput_sync().
> 
> If this change is accepted, racy fput()+flush_delayed_fput() introduced
> by commit e2dc9bf3f5275ca3 ("umd: Transform fork_usermode_blob into
> fork_usermode_driver") will be replaced by this raceless __fput_sync().

NAK.  The reason to defer is *NOT* to bypass that BUG_ON() - we really do not
want that thing done on anything other than extremely shallow stack.
Incidentally, why is that thing ever done _not_ in a kernel thread context?
Tetsuo Handa Sept. 10, 2020, 5:26 a.m. UTC | #2
On 2020/09/10 12:57, Al Viro wrote:
> On Wed, Jul 29, 2020 at 10:04:45PM +0900, Tetsuo Handa wrote:
>> __fput_sync() was introduced by commit 4a9d4b024a3102fc ("switch fput to
>> task_work_add") with BUG_ON(!(current->flags & PF_KTHREAD)) check, and
>> the only user of __fput_sync() was introduced by commit 17c0a5aaffa63da6
>> ("make acct_kill() wait for file closing."). However, the latter commit is
>> effectively calling __fput_sync() from !PF_KTHREAD thread because of
>> schedule_work() call followed by immediate wait_for_completion() call.
>> That is, there is no need to defer close_work() to a WQ context. I guess
>> that the reason to defer was nothing but to bypass this BUG_ON() check.
>> While we need to remain careful about calling __fput_sync(), we can remove
>> bypassable BUG_ON() check from __fput_sync().
>>
>> If this change is accepted, racy fput()+flush_delayed_fput() introduced
>> by commit e2dc9bf3f5275ca3 ("umd: Transform fork_usermode_blob into
>> fork_usermode_driver") will be replaced by this raceless __fput_sync().

Thank you for responding. I'm also waiting for your response on
"[RFC PATCH] pipe: make pipe_release() deferrable." at 
https://lore.kernel.org/linux-fsdevel/7ba35ca4-13c1-caa3-0655-50d328304462@i-love.sakura.ne.jp/
and "[PATCH] splice: fix premature end of input detection" at 
https://lore.kernel.org/linux-block/cf26a57e-01f4-32a9-0b2c-9102bffe76b2@i-love.sakura.ne.jp/ .

> 
> NAK.  The reason to defer is *NOT* to bypass that BUG_ON() - we really do not
> want that thing done on anything other than extremely shallow stack.
> Incidentally, why is that thing ever done _not_ in a kernel thread context?

What does "that thing" refer to? acct_pin_kill() ? blob_to_mnt() ?
I don't know the reason because I'm not the author of these functions.
Al Viro Sept. 10, 2020, 11:25 a.m. UTC | #3
On Thu, Sep 10, 2020 at 02:26:46PM +0900, Tetsuo Handa wrote:
> Thank you for responding. I'm also waiting for your response on
> "[RFC PATCH] pipe: make pipe_release() deferrable." at 
> https://lore.kernel.org/linux-fsdevel/7ba35ca4-13c1-caa3-0655-50d328304462@i-love.sakura.ne.jp/
> and "[PATCH] splice: fix premature end of input detection" at 
> https://lore.kernel.org/linux-block/cf26a57e-01f4-32a9-0b2c-9102bffe76b2@i-love.sakura.ne.jp/ .
> 
> > 
> > NAK.  The reason to defer is *NOT* to bypass that BUG_ON() - we really do not
> > want that thing done on anything other than extremely shallow stack.
> > Incidentally, why is that thing ever done _not_ in a kernel thread context?
> 
> What does "that thing" refer to? acct_pin_kill() ? blob_to_mnt() ?
> I don't know the reason because I'm not the author of these functions.

	The latter.  What I mean, why not simply do that from inside of
fork_usermode_driver()?  umd_setup is stored in sub_info->init and
eventually called from call_usermodehelper_exec_async(), right before
the created kernel thread is about to call kernel_execve() and stop
being a kernel thread...
Eric W. Biederman Sept. 10, 2020, 8:06 p.m. UTC | #4
Al Viro <viro@zeniv.linux.org.uk> writes:

> On Thu, Sep 10, 2020 at 02:26:46PM +0900, Tetsuo Handa wrote:
>> Thank you for responding. I'm also waiting for your response on
>> "[RFC PATCH] pipe: make pipe_release() deferrable." at 
>> https://lore.kernel.org/linux-fsdevel/7ba35ca4-13c1-caa3-0655-50d328304462@i-love.sakura.ne.jp/
>> and "[PATCH] splice: fix premature end of input detection" at 
>> https://lore.kernel.org/linux-block/cf26a57e-01f4-32a9-0b2c-9102bffe76b2@i-love.sakura.ne.jp/ .
>> 
>> > 
>> > NAK.  The reason to defer is *NOT* to bypass that BUG_ON() - we really do not
>> > want that thing done on anything other than extremely shallow stack.
>> > Incidentally, why is that thing ever done _not_ in a kernel thread context?
>> 
>> What does "that thing" refer to? acct_pin_kill() ? blob_to_mnt() ?
>> I don't know the reason because I'm not the author of these functions.
>
> 	The latter.  What I mean, why not simply do that from inside of
> fork_usermode_driver()?

Because that is a stupid place to do the work.  The usermode driver is
currently allowed to die and the kernel be respawned when needed.  Which
means there is not a 1 to 1 relationship between blob_to_mnt and
fork_usermode_driver.

As for the current code being racy, it is approxiamtely as racy as the
current code to load files init an initrd.  AKA no one has ever observed
any problems in practice but if you squint you can see where maybe
something could happen.

I think there is a stronger argument for finding a way to guarantee
that flush_delayed_fput will wait until any scheduled delayed_fput_work
will complete.  As that is the race Tetsuo is complaining about,
and it does also appear to also be present in populate_rootfs.


Flushing the fput is needed to ensure the writable struct file is
completely gone before an exec opens file file and calles
deny_write_access.

> umd_setup is stored in sub_info->init and
> eventually called from call_usermodehelper_exec_async(), right before
> the created kernel thread is about to call kernel_execve() and stop
> being a kernel thread...

I think you are suggesting calling __fput_sync in umd_setup.  Instead
of calling fput from blob_to_mnt.

To have a special case that only applies the first time a function is
called is possible but it is awkward, and likely more error prone.



I moved all of the user mode driver code out of exec and out of the user
mode helper code as the user mode driver code is essentially unused at
present.  The bpf folks really want to try and make it work so I wrote
something that is not completely insane so they can have their chance to
try.  I really suspect it will go the way of all of the migration of
the early kernel init code to userspace with klibc.  With the practical
details overwhelming things and making it not work or worth it in
practice.  Time will tell.


I hope that is enough context to understand what is going on there.

Eric
diff mbox series

Patch

diff --git a/fs/file_table.c b/fs/file_table.c
index 656647f..7c41251 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -359,20 +359,15 @@  void fput(struct file *file)
 }
 
 /*
- * synchronous analog of fput(); for kernel threads that might be needed
- * in some umount() (and thus can't use flush_delayed_fput() without
- * risking deadlocks), need to wait for completion of __fput() and know
- * for this specific struct file it won't involve anything that would
- * need them.  Use only if you really need it - at the very least,
- * don't blindly convert fput() by kernel thread to that.
+ * synchronous analog of fput(); for threads that need to wait for completion
+ * of __fput() and know for this specific struct file it won't involve anything
+ * that would need them.  Use only if you really need it - at the very least,
+ * don't blindly convert fput() to __fput_sync().
  */
 void __fput_sync(struct file *file)
 {
-	if (atomic_long_dec_and_test(&file->f_count)) {
-		struct task_struct *task = current;
-		BUG_ON(!(task->flags & PF_KTHREAD));
+	if (atomic_long_dec_and_test(&file->f_count))
 		__fput(file);
-	}
 }
 
 EXPORT_SYMBOL(fput);