diff mbox series

[RFC] simpler way to get benefits of "vfs: shave work on failed file open"

Message ID 20231126020834.GC38156@ZenIV (mailing list archive)
State New, archived
Headers show
Series [RFC] simpler way to get benefits of "vfs: shave work on failed file open" | expand

Commit Message

Al Viro Nov. 26, 2023, 2:08 a.m. UTC
IMO 93faf426e3cc "vfs: shave work on failed file open" had gone overboard -
avoiding an RCU delay in that particular case is fine, but it's done on
the wrong level.  A file that has never gotten FMODE_OPENED will never
have RCU-accessed references, its final fput() is equivalent to file_free()
and if it doesn't have FMODE_BACKING either, it can be done from any context
and won't need task_work treatment.

However, all of that can be achieved easier - all it takes is fput()
recognizing that case and calling file_free() directly.
No need to introduce a special primitive for that - and things like
failing dentry_open() could benefit from that as well.

Objections?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

Comments

Linus Torvalds Nov. 26, 2023, 4:59 a.m. UTC | #1
On Sat, 25 Nov 2023 at 18:08, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> However, all of that can be achieved easier - all it takes is fput()
> recognizing that case and calling file_free() directly.
> No need to introduce a special primitive for that - and things like
> failing dentry_open() could benefit from that as well.
>
> Objections?

Ack, looks fine.

In fact, I did suggest something along the lines at the time:

   https://lore.kernel.org/all/CAHk-=whLadznjNKZPYUjxVzAyCH-rRhb24_KaGegKT9E6A86Kg@mail.gmail.com/

although yours is simpler, because I for some reason (probably looking
at Mateusz' original patch too much) re-implemented file_free() as
fput_immediate()..

              Linus
Al Viro Nov. 26, 2023, 5:08 a.m. UTC | #2
On Sat, Nov 25, 2023 at 08:59:54PM -0800, Linus Torvalds wrote:
> On Sat, 25 Nov 2023 at 18:08, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > However, all of that can be achieved easier - all it takes is fput()
> > recognizing that case and calling file_free() directly.
> > No need to introduce a special primitive for that - and things like
> > failing dentry_open() could benefit from that as well.
> >
> > Objections?
> 
> Ack, looks fine.
> 
> In fact, I did suggest something along the lines at the time:
> 
>    https://lore.kernel.org/all/CAHk-=whLadznjNKZPYUjxVzAyCH-rRhb24_KaGegKT9E6A86Kg@mail.gmail.com/
> 
> although yours is simpler, because I for some reason (probably looking
> at Mateusz' original patch too much) re-implemented file_free() as
> fput_immediate()..

file_free() was with RCU delay at that time, IIRC.  I don't think that
cost of one test and (rarely) branch on each final fput() is going to
be measurable.

Mateusz, do you still have the setup you used for the original patch?
Could you profile and compare the current tree and current tree + the
patch upthread?
Linus Torvalds Nov. 26, 2023, 5:17 a.m. UTC | #3
On Sat, 25 Nov 2023 at 21:08, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sat, Nov 25, 2023 at 08:59:54PM -0800, Linus Torvalds wrote:
> >
> >       because I for some reason (probably looking
> > at Mateusz' original patch too much) re-implemented file_free() as
> > fput_immediate()..
>
> file_free() was with RCU delay at that time, IIRC.

Ahh, indeed. So it was the SLAB_TYPESAFE_BY_RCU changes that basically
made my fput_immediate() and file_free() be the same, and thus it all
simplifies to your nice version.

>  I don't think that
> cost of one test and (rarely) branch on each final fput() is going to
> be measurable.

Nope. Me likey.

             Linus
Christian Brauner Nov. 26, 2023, 9:21 a.m. UTC | #4
On Sat, Nov 25, 2023 at 09:17:36PM -0800, Linus Torvalds wrote:
> On Sat, 25 Nov 2023 at 21:08, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Sat, Nov 25, 2023 at 08:59:54PM -0800, Linus Torvalds wrote:
> > >
> > >       because I for some reason (probably looking
> > > at Mateusz' original patch too much) re-implemented file_free() as
> > > fput_immediate()..
> >
> > file_free() was with RCU delay at that time, IIRC.
> 
> Ahh, indeed. So it was the SLAB_TYPESAFE_BY_RCU changes that basically

Yes, special-casing this into file_free() wasn't looking very appealing.
Christian Brauner Nov. 26, 2023, 9:31 a.m. UTC | #5
On Sun, 26 Nov 2023 02:08:34 +0000, Al Viro wrote:
> IMO 93faf426e3cc "vfs: shave work on failed file open" had gone overboard -
> avoiding an RCU delay in that particular case is fine, but it's done on
> the wrong level.  A file that has never gotten FMODE_OPENED will never
> have RCU-accessed references, its final fput() is equivalent to file_free()
> and if it doesn't have FMODE_BACKING either, it can be done from any context
> and won't need task_work treatment.
> 
> [...]

Fwiw, you had a typo in their so I folded the fixup below into it and
tweaked the commit message. The cleanup is good.

diff --git a/fs/file_table.c b/fs/file_table.c
index 7bcfa169dd45..6deac386486d 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -433,8 +433,8 @@ void fput(struct file *file)
        if (atomic_long_dec_and_test(&file->f_count)) {
                struct task_struct *task = current;

-               if (unlikely(!(f->f_mode & (FMODE_BACKING | FMODE_OPENED)))) {
-                       file_free(f);
+               if (unlikely(!(file->f_mode & (FMODE_BACKING | FMODE_OPENED)))) {
+                       file_free(file);
                        return;
                }
                if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {

---

Applied to the vfs.misc branch of the vfs/vfs.git tree.
Patches in the vfs.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.misc

[1/1] file: massage cleanup of files that failed to open
      https://git.kernel.org/vfs/vfs/c/4d6fdbf44ad8
Mateusz Guzik Nov. 26, 2023, 10:58 a.m. UTC | #6
On Sun, Nov 26, 2023 at 10:21:59AM +0100, Christian Brauner wrote:
> On Sat, Nov 25, 2023 at 09:17:36PM -0800, Linus Torvalds wrote:
> > On Sat, 25 Nov 2023 at 21:08, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > On Sat, Nov 25, 2023 at 08:59:54PM -0800, Linus Torvalds wrote:
> > > >
> > > >       because I for some reason (probably looking
> > > > at Mateusz' original patch too much) re-implemented file_free() as
> > > > fput_immediate()..
> > >
> > > file_free() was with RCU delay at that time, IIRC.
> > 
> > Ahh, indeed. So it was the SLAB_TYPESAFE_BY_RCU changes that basically
> 
> Yes, special-casing this into file_free() wasn't looking very appealing.

Right, if the SLAB_TYPESAFE_BY_RCU work was already there my v1 for
dodging task_work would have been much simpler (but would still have
fput_badopen).

While I support deduping code which comes with this patch I'm not fond
of special casing failed opens in fput.

A minor remark is that in the spot which ends up calling here on stock
kernel it is FMODE_OPENED which is the unlikely case, but with the patch
it ends up being handled with a branch marked the other way around.
I noted in my commit message failed opens are not some corner-case, they
are much common.

The thing which irks me on its principle is that special-casing for the
case which is guaranteed to not be true for majority of fput users is
avoidably rolled into the general routine.

For my taste the code below (untested) would be nicer, but I don't think
there are solid grounds for picking one approach over another. That is
to say if you insist on Al's variant then we are done here. :)

diff --git a/fs/file_table.c b/fs/file_table.c
index de4a2915bfd4..5e5613d80631 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -75,18 +75,6 @@ static inline void file_free(struct file *f)
 	}
 }
 
-void release_empty_file(struct file *f)
-{
-	WARN_ON_ONCE(f->f_mode & (FMODE_BACKING | FMODE_OPENED));
-	if (atomic_long_dec_and_test(&f->f_count)) {
-		security_file_free(f);
-		put_cred(f->f_cred);
-		if (likely(!(f->f_mode & FMODE_NOACCOUNT)))
-			percpu_counter_dec(&nr_files);
-		kmem_cache_free(filp_cachep, f);
-	}
-}
-
 /*
  * Return the total number of open files in the system
  */
@@ -461,6 +449,18 @@ void fput(struct file *file)
 	}
 }
 
+void fput_badopen(struct file *f)
+{
+	if (unlikely(f->f_mode & (FMODE_BACKING | FMODE_OPENED))) {
+		fput(f);
+		return;
+	}
+
+	if (likely(atomic_long_dec_and_test(&f->f_count))) {
+		file_free(f);
+	}
+}
+
 /*
  * synchronous analog of fput(); for kernel threads that might be needed
  * in some umount() (and thus can't use flush_delayed_fput() without
diff --git a/fs/internal.h b/fs/internal.h
index 58e43341aebf..3afe774ff7c6 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -94,7 +94,7 @@ extern void chroot_fs_refs(const struct path *, const struct path *);
 struct file *alloc_empty_file(int flags, const struct cred *cred);
 struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred);
 struct file *alloc_empty_backing_file(int flags, const struct cred *cred);
-void release_empty_file(struct file *f);
+void fput_badopen(struct file *f);
 
 static inline void file_put_write_access(struct file *file)
 {
diff --git a/fs/namei.c b/fs/namei.c
index 71c13b2990b4..e42e2c237a4c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3785,10 +3785,7 @@ static struct file *path_openat(struct nameidata *nd,
 		WARN_ON(1);
 		error = -EINVAL;
 	}
-	if (unlikely(file->f_mode & FMODE_OPENED))
-		fput(file);
-	else
-		release_empty_file(file);
+	fput_badopen(file);
 	if (error == -EOPENSTALE) {
 		if (flags & LOOKUP_RCU)
 			error = -ECHILD;
Christian Brauner Nov. 27, 2023, 3:03 p.m. UTC | #7
> to say if you insist on Al's variant then we are done here. :)

I think it's just simpler and having it in a central place is actually
nicer in this case.

I mostly try to avoid arguing about minutiae unless they do actually
have provable impact so if a patch comes that adheres to someones taste
more than my own then I'm not going to argue (quite often). All three
version are fine and functional.
diff mbox series

Patch

diff --git a/fs/file_table.c b/fs/file_table.c
index de4a2915bfd4..7bcfa169dd45 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -75,18 +75,6 @@  static inline void file_free(struct file *f)
 	}
 }
 
-void release_empty_file(struct file *f)
-{
-	WARN_ON_ONCE(f->f_mode & (FMODE_BACKING | FMODE_OPENED));
-	if (atomic_long_dec_and_test(&f->f_count)) {
-		security_file_free(f);
-		put_cred(f->f_cred);
-		if (likely(!(f->f_mode & FMODE_NOACCOUNT)))
-			percpu_counter_dec(&nr_files);
-		kmem_cache_free(filp_cachep, f);
-	}
-}
-
 /*
  * Return the total number of open files in the system
  */
@@ -445,6 +433,10 @@  void fput(struct file *file)
 	if (atomic_long_dec_and_test(&file->f_count)) {
 		struct task_struct *task = current;
 
+		if (unlikely(!(f->f_mode & (FMODE_BACKING | FMODE_OPENED)))) {
+			file_free(f);
+			return;
+		}
 		if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
 			init_task_work(&file->f_rcuhead, ____fput);
 			if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME))
diff --git a/fs/internal.h b/fs/internal.h
index 58e43341aebf..273e6fd40d1b 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -94,7 +94,6 @@  extern void chroot_fs_refs(const struct path *, const struct path *);
 struct file *alloc_empty_file(int flags, const struct cred *cred);
 struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred);
 struct file *alloc_empty_backing_file(int flags, const struct cred *cred);
-void release_empty_file(struct file *f);
 
 static inline void file_put_write_access(struct file *file)
 {
diff --git a/fs/namei.c b/fs/namei.c
index 22915c40e2bd..e7f641d3115f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3787,10 +3787,7 @@  static struct file *path_openat(struct nameidata *nd,
 		WARN_ON(1);
 		error = -EINVAL;
 	}
-	if (unlikely(file->f_mode & FMODE_OPENED))
-		fput(file);
-	else
-		release_empty_file(file);
+	fput(file);
 	if (error == -EOPENSTALE) {
 		if (flags & LOOKUP_RCU)
 			error = -ECHILD;