Message ID | 1596027885-4730-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] fput: Allow calling __fput_sync() from !PF_KTHREAD thread. | expand |
On Wed, Jul 29, 2020 at 10:04:45PM +0900, Tetsuo Handa wrote: > __fput_sync() was introduced by commit 4a9d4b024a3102fc ("switch fput to > task_work_add") with BUG_ON(!(current->flags & PF_KTHREAD)) check, and > the only user of __fput_sync() was introduced by commit 17c0a5aaffa63da6 > ("make acct_kill() wait for file closing."). However, the latter commit is > effectively calling __fput_sync() from !PF_KTHREAD thread because of > schedule_work() call followed by immediate wait_for_completion() call. > That is, there is no need to defer close_work() to a WQ context. I guess > that the reason to defer was nothing but to bypass this BUG_ON() check. > While we need to remain careful about calling __fput_sync(), we can remove > bypassable BUG_ON() check from __fput_sync(). > > If this change is accepted, racy fput()+flush_delayed_fput() introduced > by commit e2dc9bf3f5275ca3 ("umd: Transform fork_usermode_blob into > fork_usermode_driver") will be replaced by this raceless __fput_sync(). NAK. The reason to defer is *NOT* to bypass that BUG_ON() - we really do not want that thing done on anything other than extremely shallow stack. Incidentally, why is that thing ever done _not_ in a kernel thread context?
On 2020/09/10 12:57, Al Viro wrote: > On Wed, Jul 29, 2020 at 10:04:45PM +0900, Tetsuo Handa wrote: >> __fput_sync() was introduced by commit 4a9d4b024a3102fc ("switch fput to >> task_work_add") with BUG_ON(!(current->flags & PF_KTHREAD)) check, and >> the only user of __fput_sync() was introduced by commit 17c0a5aaffa63da6 >> ("make acct_kill() wait for file closing."). However, the latter commit is >> effectively calling __fput_sync() from !PF_KTHREAD thread because of >> schedule_work() call followed by immediate wait_for_completion() call. >> That is, there is no need to defer close_work() to a WQ context. I guess >> that the reason to defer was nothing but to bypass this BUG_ON() check. >> While we need to remain careful about calling __fput_sync(), we can remove >> bypassable BUG_ON() check from __fput_sync(). >> >> If this change is accepted, racy fput()+flush_delayed_fput() introduced >> by commit e2dc9bf3f5275ca3 ("umd: Transform fork_usermode_blob into >> fork_usermode_driver") will be replaced by this raceless __fput_sync(). Thank you for responding. I'm also waiting for your response on "[RFC PATCH] pipe: make pipe_release() deferrable." at https://lore.kernel.org/linux-fsdevel/7ba35ca4-13c1-caa3-0655-50d328304462@i-love.sakura.ne.jp/ and "[PATCH] splice: fix premature end of input detection" at https://lore.kernel.org/linux-block/cf26a57e-01f4-32a9-0b2c-9102bffe76b2@i-love.sakura.ne.jp/ . > > NAK. The reason to defer is *NOT* to bypass that BUG_ON() - we really do not > want that thing done on anything other than extremely shallow stack. > Incidentally, why is that thing ever done _not_ in a kernel thread context? What does "that thing" refer to? acct_pin_kill() ? blob_to_mnt() ? I don't know the reason because I'm not the author of these functions.
On Thu, Sep 10, 2020 at 02:26:46PM +0900, Tetsuo Handa wrote: > Thank you for responding. I'm also waiting for your response on > "[RFC PATCH] pipe: make pipe_release() deferrable." at > https://lore.kernel.org/linux-fsdevel/7ba35ca4-13c1-caa3-0655-50d328304462@i-love.sakura.ne.jp/ > and "[PATCH] splice: fix premature end of input detection" at > https://lore.kernel.org/linux-block/cf26a57e-01f4-32a9-0b2c-9102bffe76b2@i-love.sakura.ne.jp/ . > > > > > NAK. The reason to defer is *NOT* to bypass that BUG_ON() - we really do not > > want that thing done on anything other than extremely shallow stack. > > Incidentally, why is that thing ever done _not_ in a kernel thread context? > > What does "that thing" refer to? acct_pin_kill() ? blob_to_mnt() ? > I don't know the reason because I'm not the author of these functions. The latter. What I mean, why not simply do that from inside of fork_usermode_driver()? umd_setup is stored in sub_info->init and eventually called from call_usermodehelper_exec_async(), right before the created kernel thread is about to call kernel_execve() and stop being a kernel thread...
Al Viro <viro@zeniv.linux.org.uk> writes: > On Thu, Sep 10, 2020 at 02:26:46PM +0900, Tetsuo Handa wrote: >> Thank you for responding. I'm also waiting for your response on >> "[RFC PATCH] pipe: make pipe_release() deferrable." at >> https://lore.kernel.org/linux-fsdevel/7ba35ca4-13c1-caa3-0655-50d328304462@i-love.sakura.ne.jp/ >> and "[PATCH] splice: fix premature end of input detection" at >> https://lore.kernel.org/linux-block/cf26a57e-01f4-32a9-0b2c-9102bffe76b2@i-love.sakura.ne.jp/ . >> >> > >> > NAK. The reason to defer is *NOT* to bypass that BUG_ON() - we really do not >> > want that thing done on anything other than extremely shallow stack. >> > Incidentally, why is that thing ever done _not_ in a kernel thread context? >> >> What does "that thing" refer to? acct_pin_kill() ? blob_to_mnt() ? >> I don't know the reason because I'm not the author of these functions. > > The latter. What I mean, why not simply do that from inside of > fork_usermode_driver()? Because that is a stupid place to do the work. The usermode driver is currently allowed to die and the kernel be respawned when needed. Which means there is not a 1 to 1 relationship between blob_to_mnt and fork_usermode_driver. As for the current code being racy, it is approxiamtely as racy as the current code to load files init an initrd. AKA no one has ever observed any problems in practice but if you squint you can see where maybe something could happen. I think there is a stronger argument for finding a way to guarantee that flush_delayed_fput will wait until any scheduled delayed_fput_work will complete. As that is the race Tetsuo is complaining about, and it does also appear to also be present in populate_rootfs. Flushing the fput is needed to ensure the writable struct file is completely gone before an exec opens file file and calles deny_write_access. > umd_setup is stored in sub_info->init and > eventually called from call_usermodehelper_exec_async(), right before > the created kernel thread is about to call kernel_execve() and stop > being a kernel thread... I think you are suggesting calling __fput_sync in umd_setup. Instead of calling fput from blob_to_mnt. To have a special case that only applies the first time a function is called is possible but it is awkward, and likely more error prone. I moved all of the user mode driver code out of exec and out of the user mode helper code as the user mode driver code is essentially unused at present. The bpf folks really want to try and make it work so I wrote something that is not completely insane so they can have their chance to try. I really suspect it will go the way of all of the migration of the early kernel init code to userspace with klibc. With the practical details overwhelming things and making it not work or worth it in practice. Time will tell. I hope that is enough context to understand what is going on there. Eric
diff --git a/fs/file_table.c b/fs/file_table.c index 656647f..7c41251 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -359,20 +359,15 @@ void fput(struct file *file) } /* - * synchronous analog of fput(); for kernel threads that might be needed - * in some umount() (and thus can't use flush_delayed_fput() without - * risking deadlocks), need to wait for completion of __fput() and know - * for this specific struct file it won't involve anything that would - * need them. Use only if you really need it - at the very least, - * don't blindly convert fput() by kernel thread to that. + * synchronous analog of fput(); for threads that need to wait for completion + * of __fput() and know for this specific struct file it won't involve anything + * that would need them. Use only if you really need it - at the very least, + * don't blindly convert fput() to __fput_sync(). */ void __fput_sync(struct file *file) { - if (atomic_long_dec_and_test(&file->f_count)) { - struct task_struct *task = current; - BUG_ON(!(task->flags & PF_KTHREAD)); + if (atomic_long_dec_and_test(&file->f_count)) __fput(file); - } } EXPORT_SYMBOL(fput);
__fput_sync() was introduced by commit 4a9d4b024a3102fc ("switch fput to task_work_add") with BUG_ON(!(current->flags & PF_KTHREAD)) check, and the only user of __fput_sync() was introduced by commit 17c0a5aaffa63da6 ("make acct_kill() wait for file closing."). However, the latter commit is effectively calling __fput_sync() from !PF_KTHREAD thread because of schedule_work() call followed by immediate wait_for_completion() call. That is, there is no need to defer close_work() to a WQ context. I guess that the reason to defer was nothing but to bypass this BUG_ON() check. While we need to remain careful about calling __fput_sync(), we can remove bypassable BUG_ON() check from __fput_sync(). If this change is accepted, racy fput()+flush_delayed_fput() introduced by commit e2dc9bf3f5275ca3 ("umd: Transform fork_usermode_blob into fork_usermode_driver") will be replaced by this raceless __fput_sync(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- fs/file_table.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-)