mbox series

[GIT,PULL] overlayfs updates for 6.13

Message ID 20241122095746.198762-1-amir73il@gmail.com
State New
Headers show
Series [GIT,PULL] overlayfs updates for 6.13 | expand

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs.git ovl-update-6.13

Message

Amir Goldstein Nov. 22, 2024, 9:57 a.m. UTC
Hi Linus,

Please pull overlayfs updates for 6.13.

This pull request has some changes in code outside of fs/overlayfs:

1. The backing_file API change touches fuse code - that was collaborated
   with Miklos who authored this API change

2. The additions of revert/override_creds_light() helpers in cred.{h,c}
   were collaborated with Christian who has suggested those helpers

There was also an overlayfs change in this cycle coming from Christian
(file descriptors based layer setup).  His changes do not conflict with
this branch and I have also tested his change along with the fs-next
community test branch.

Most of this branch has been sitting in linux-next for over a week except
for one syzbot issue fix that was added three days ago.

The code has gone through the usual overlayfs test routines.

The branch merges cleanly with master branch of the moment.

Thanks,
Amir.

----------------------------------------------------------------
The following changes since commit 2d5404caa8c7bb5c4e0435f94b28834ae5456623:

  Linux 6.12-rc7 (2024-11-10 14:19:35 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs.git ovl-update-6.13

for you to fetch changes up to c8b359dddb418c60df1a69beea01d1b3322bfe83:

  ovl: Filter invalid inodes with missing lookup function (2024-11-20 10:23:04 +0100)

----------------------------------------------------------------
overlayfs updates for 6.13

- Fix a syzbot reported NULL pointer deref with bfs lower layers

- Fix a copy up failure of large file from lower fuse fs

- Followup cleanup of backing_file API from Miklos

- Introduction and use of revert/override_creds_light() helpers, that were
  suggested by Christian as a mitigation to cache line bouncing and false
  sharing of fields in overlayfs creator_cred long lived struct cred copy.

- Store up to two backing file references (upper and lower) in an ovl_file
  container instead of storing a single backing file in file->private_data.

  This is used to avoid the practice of opening a short lived backing file
  for the duration of some file operations and to avoid the specialized use
  of FDPUT_FPUT in such occasions, that was getting in the way of Al's
  fd_file() conversions.

----------------------------------------------------------------
Amir Goldstein (6):
      ovl: pass an explicit reference of creators creds to callers
      ovl: do not open non-data lower file for fsync
      ovl: allocate a container struct ovl_file for ovl private context
      ovl: store upper real file in ovl_file struct
      ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path()
      ovl: convert ovl_real_fdget() callers to ovl_real_file()

Miklos Szeredi (1):
      backing-file: clean up the API

Oleksandr Tymoshenko (1):
      ovl: properly handle large files in ovl_security_fileattr

Vasiliy Kovalev (1):
      ovl: Filter invalid inodes with missing lookup function

Vinicius Costa Gomes (4):
      cred: Add a light version of override/revert_creds()
      fs/backing-file: Convert to revert/override_creds_light()
      ovl: use wrapper ovl_revert_creds()
      ovl: Optimize override/revert creds

 fs/backing-file.c            |  53 ++++---
 fs/fuse/passthrough.c        |  32 +++--
 fs/overlayfs/copy_up.c       |   2 +-
 fs/overlayfs/dir.c           |  68 ++++++---
 fs/overlayfs/file.c          | 327 +++++++++++++++++++++++++------------------
 fs/overlayfs/inode.c         |  27 ++--
 fs/overlayfs/namei.c         |  10 +-
 fs/overlayfs/overlayfs.h     |   4 +
 fs/overlayfs/readdir.c       |   8 +-
 fs/overlayfs/util.c          |  14 +-
 fs/overlayfs/xattrs.c        |   9 +-
 include/linux/backing-file.h |  11 +-
 include/linux/cred.h         |  18 +++
 kernel/cred.c                |   6 +-
 14 files changed, 352 insertions(+), 237 deletions(-)

Comments

Linus Torvalds Nov. 23, 2024, 5:21 a.m. UTC | #1
On Fri, 22 Nov 2024 at 01:57, Amir Goldstein <amir73il@gmail.com> wrote:
>
> - Introduction and use of revert/override_creds_light() helpers, that were
>   suggested by Christian as a mitigation to cache line bouncing and false
>   sharing of fields in overlayfs creator_cred long lived struct cred copy.

So I don't actively hate this, but I do wonder if this shouldn't have
been done differently.

In particular, I suspect *most* users of override_creds() actually
wants this "light" version, because they all already hold a ref to the
cred that they want to use as the override.

We did it that safe way with the extra refcount not because most
people would need it, but it was expected to not be a big deal.

Now you found that it *is* a big deal, and instead of just fixing the
old interface, you create a whole new interface and the mental burden
of having to know the difference between the two.

So may I ask that you look at perhaps just converting the (not very
many) users of the non-light cred override to the "light" version?

Because I suspect you will find that they basically *all* convert. I
wouldn't be surprised if some of them could convert automatically with
a coccinelle script.

Because we actually have several users that have a pattern line

        old_cred = override_creds(override_cred);

        /* override_cred() gets its own ref */
        put_cred(override_cred);

because it *didn't* want the new cred, because it's literally a
temporary cred that already had the single ref it needed, and the code
actually it wants it to go away when it does

        revert_creds(old_cred);

End result: I suspect what it *really* would have wanted is basically
to have 'override_creds()' not do the refcount at all, and at revert
time, it would want "revert_creds()" to return the creds that got
reverted, and then it would just do

        old_cred = override_creds(override_cred);
        ...
        put_cred(revert_creds(old_cred));

instead - which would not change the refcount on 'old_cred' at all at
any time (and does it for the override case only at the end when it
actually wants it free'd).

And the above is very annoyingly *almost* exactly what your "light"
interface does, except your interface is bad too: it doesn't return
the reverted creds.

So then users have to remember the override_creds *and* the old creds,
just to do their own cred refcounting outside of this all.

In other words, what I really dislike about this all is that

 (a) we had a flawed interface

 (b) you added *another* flawed interface for one special case you cared about

 (c) now we have *two* flawed interfaces instead of one better one

Hmm?

                  Linus
Linus Torvalds Nov. 23, 2024, 5:22 a.m. UTC | #2
On Fri, 22 Nov 2024 at 21:21, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So I don't actively hate this, but I do wonder if this shouldn't have
> been done differently.

Just to clarify: because I understand *why* you wanted this, and
because I don't hate it with a passion, I have pulled your changes.

But I really think we could and should do better. Please?

               Linus
pr-tracker-bot@kernel.org Nov. 23, 2024, 5:57 a.m. UTC | #3
The pull request you sent on Fri, 22 Nov 2024 10:57:46 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs.git ovl-update-6.13

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/e7675238b9bf4db0b872d5dbcd53efa31914c98f

Thank you!
Linus Torvalds Nov. 23, 2024, 6:09 a.m. UTC | #4
On Fri, 22 Nov 2024 at 21:21, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So may I ask that you look at perhaps just converting the (not very
> many) users of the non-light cred override to the "light" version?

I think you could do a completely automated conversion:

 (a) add a new "dup_cred()" helper

    /* Get the cred without clearing the 'non_rcu' flag */
    const struct cred *dup_cred(const struct cred *cred)
    { get_new_cred((struct cred *)cred); return cred; }

 (b) mindlessly convert:

    override_creds(cred) -> override_creds_light(dup_cred(cred))

    revert_creds(cred) -> put_cred(revert_creds_light(old));

 (c) rename away the "_light" again:

    override_creds_light -> override_creds
    revert_creds_light -> revert_creds

and then finally the only non-automated part would be

 (d) simplify any obvious and trivial dup_cred -> put_cred chains.

which might take some effort, but there should be at least a couple of
really obvious cases of "that's not necessary".

Because honestly, I think I'd rather see a few cases of

        old_creds = override_creds(dup_cred(cred));
        ...
        put_cred(revert_creds(old));

that look a bit more complicated, and couldn't be trivially simplified away.

That seems better than the current case of having two very different
forms of override_creds() / put_cred() where people have to know
deeply when to use one or the other.

No?

                Linus
Al Viro Nov. 23, 2024, 6:14 a.m. UTC | #5
On Fri, Nov 22, 2024 at 10:09:04PM -0800, Linus Torvalds wrote:

>  (a) add a new "dup_cred()" helper
> 
>     /* Get the cred without clearing the 'non_rcu' flag */
>     const struct cred *dup_cred(const struct cred *cred)
>     { get_new_cred((struct cred *)cred); return cred; }

Umm...  Something like hold_cred() might be better - dup usually
implies copying an object...  For grapping a reference we
normally go for something like hold/get/grab/pin...
Christian Brauner Nov. 23, 2024, 12:06 p.m. UTC | #6
On Fri, Nov 22, 2024 at 09:21:58PM -0800, Linus Torvalds wrote:
> On Fri, 22 Nov 2024 at 01:57, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > - Introduction and use of revert/override_creds_light() helpers, that were
> >   suggested by Christian as a mitigation to cache line bouncing and false
> >   sharing of fields in overlayfs creator_cred long lived struct cred copy.
> 
> So I don't actively hate this, but I do wonder if this shouldn't have
> been done differently.
> 
> In particular, I suspect *most* users of override_creds() actually
> wants this "light" version, because they all already hold a ref to the
> cred that they want to use as the override.
> 
> We did it that safe way with the extra refcount not because most
> people would need it, but it was expected to not be a big deal.
> 
> Now you found that it *is* a big deal, and instead of just fixing the
> old interface, you create a whole new interface and the mental burden
> of having to know the difference between the two.

> So may I ask that you look at perhaps just converting the (not very
> many) users of the non-light cred override to the "light" version?

I think that could be a good idea in general.

But I have to say I'm feeling a bit defensive after having read your
message even though I usually try not to. :) 

So just to clarify when that issue was brought up I realized that the
cred bump was a big deal for overlayfs but from a quick grep I didn't
think for any of the other cases it really mattered that much.

Realistically, overlayfs is the prime example where that override cred
matters big time because it's called everywhere and in all core
operations one can think of. But so far I at least haven't heard
complaints outside of that and so the immediate focus was to bring about
a solution for overlayfs.

The reason the revert_creds_light() variant doesn't return the old creds
is so that callers don't put_cred() them blindly.

Because for overlayfs (and from a quick glance io_uring and nfs) the
refcount for the temporary creds is kept completely independent of the
callsites.

The lifetime is bound to the superblock and so the final put on the
temporary creds has nothing to do with the callers at all.
Linus Torvalds Nov. 23, 2024, 5:18 p.m. UTC | #7
On Fri, 22 Nov 2024 at 22:14, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Fri, Nov 22, 2024 at 10:09:04PM -0800, Linus Torvalds wrote:
>
> >  (a) add a new "dup_cred()" helper
> >
> >     /* Get the cred without clearing the 'non_rcu' flag */
> >     const struct cred *dup_cred(const struct cred *cred)
> >     { get_new_cred((struct cred *)cred); return cred; }
>
> Umm...  Something like hold_cred() might be better - dup usually
> implies copying an object...

Ack. "dup" is clearly a horrible name, and I'm ashamed and properly chastised.

               Linus
Linus Torvalds Nov. 23, 2024, 5:29 p.m. UTC | #8
On Sat, 23 Nov 2024 at 04:06, Christian Brauner <brauner@kernel.org> wrote:
>
> So just to clarify when that issue was brought up I realized that the
> cred bump was a big deal for overlayfs but from a quick grep I didn't
> think for any of the other cases it really mattered that much.

Oh, I agree. It's probably not really a performance issue anywhere
else. I don't think this has really ever come up before.

So my "please convert everything to one single new model" is not
because I think that would help performance, but because I really hate
having two differently flawed models when I think one would do.

We have other situations where we really do have two or more different
interfaces for the "same" thing, with very special rules: things like
fget() vs fget_raw() vs fget_task() (and similar issues wrt fdget).

But I think those other situations have more _reason_ for them.

The whole "override_creds()" thing is _already_ such a special
operation, that I hate seeing two subtly different versions of the
interface, both with their own quirks.

Because the old interface really isn't some "perfectly tailored"
thing. Yes, the performance implications were a surprise to me and I
hadn't seen that before, but the "refcounting isn't wonderful" was
_not_ really a big surprise at all.

                        Linus
Christian Brauner Nov. 23, 2024, 6:47 p.m. UTC | #9
On Sat, Nov 23, 2024 at 01:06:14PM +0100, Christian Brauner wrote:
> On Fri, Nov 22, 2024 at 09:21:58PM -0800, Linus Torvalds wrote:
> > On Fri, 22 Nov 2024 at 01:57, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > - Introduction and use of revert/override_creds_light() helpers, that were
> > >   suggested by Christian as a mitigation to cache line bouncing and false
> > >   sharing of fields in overlayfs creator_cred long lived struct cred copy.
> > 
> > So I don't actively hate this, but I do wonder if this shouldn't have
> > been done differently.
> > 
> > In particular, I suspect *most* users of override_creds() actually
> > wants this "light" version, because they all already hold a ref to the
> > cred that they want to use as the override.
> > 
> > We did it that safe way with the extra refcount not because most
> > people would need it, but it was expected to not be a big deal.
> > 
> > Now you found that it *is* a big deal, and instead of just fixing the
> > old interface, you create a whole new interface and the mental burden
> > of having to know the difference between the two.
> 
> > So may I ask that you look at perhaps just converting the (not very
> > many) users of the non-light cred override to the "light" version?
> 
> I think that could be a good idea in general.
> 
> But I have to say I'm feeling a bit defensive after having read your
> message even though I usually try not to. :) 

It was just pointed out to me that this was written like I'm not reading
you messages - which is obviously not the case. What I means it that I
usually try to not be defensive when valid criticism is brought up. :)