diff mbox series

[1/4] ovl: do not open non-data lower file for fsync

Message ID 20241004102342.179434-2-amir73il@gmail.com (mailing list archive)
State New
Headers show
Series Stash overlay real upper file in backing_file | expand

Commit Message

Amir Goldstein Oct. 4, 2024, 10:23 a.m. UTC
ovl_fsync() with !datasync opens a backing file from the top most dentry
in the stack, checks if this dentry is non-upper and skips the fsync.

In case of an overlay dentry stack with lower data and lower metadata
above it, but without an upper metadata above it, the backing file is
opened from the top most lower metadata dentry and never used.

Fix the helper ovl_real_fdget_meta() to return an empty struct fd in
that case to avoid the unneeded backing file open.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Comments

Al Viro Oct. 4, 2024, 10:16 p.m. UTC | #1
On Fri, Oct 04, 2024 at 12:23:39PM +0200, Amir Goldstein wrote:
> ovl_fsync() with !datasync opens a backing file from the top most dentry
> in the stack, checks if this dentry is non-upper and skips the fsync.
> 
> In case of an overlay dentry stack with lower data and lower metadata
> above it, but without an upper metadata above it, the backing file is
> opened from the top most lower metadata dentry and never used.
> 
> Fix the helper ovl_real_fdget_meta() to return an empty struct fd in
> that case to avoid the unneeded backing file open.

Umm...  Won't that screw the callers of ovl_real_fd()?

I mean, here

> @@ -395,7 +397,7 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
>  		return ret;
>  
>  	ret = ovl_real_fdget_meta(file, &real, !datasync);
> -	if (ret)
> +	if (ret || fd_empty(real))
>  		return ret;
>  

you are taking account of that, but what of e.g.
        ret = ovl_real_fdget(file, &real);
        if (ret)
                return ret;

        /*
         * Overlay file f_pos is the master copy that is preserved
         * through copy up and modified on read/write, but only real
         * fs knows how to SEEK_HOLE/SEEK_DATA and real fs may impose
         * limitations that are more strict than ->s_maxbytes for specific
         * files, so we use the real file to perform seeks.
         */
        ovl_inode_lock(inode);
        fd_file(real)->f_pos = file->f_pos;
in ovl_llseek()?  Get ovl_real_fdget_meta() called by ovl_real_fdget() and
have it return 0 with NULL in fd_file(real), and you've got an oops right
there, don't you?

At the very least it's a bisect hazard...
Al Viro Oct. 4, 2024, 10:28 p.m. UTC | #2
On Fri, Oct 04, 2024 at 11:16:25PM +0100, Al Viro wrote:
> On Fri, Oct 04, 2024 at 12:23:39PM +0200, Amir Goldstein wrote:
> > ovl_fsync() with !datasync opens a backing file from the top most dentry
> > in the stack, checks if this dentry is non-upper and skips the fsync.
> > 
> > In case of an overlay dentry stack with lower data and lower metadata
> > above it, but without an upper metadata above it, the backing file is
> > opened from the top most lower metadata dentry and never used.
> > 
> > Fix the helper ovl_real_fdget_meta() to return an empty struct fd in
> > that case to avoid the unneeded backing file open.
> 
> Umm...  Won't that screw the callers of ovl_real_fd()?
> 
> I mean, here
> 
> > @@ -395,7 +397,7 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
> >  		return ret;
> >  
> >  	ret = ovl_real_fdget_meta(file, &real, !datasync);
> > -	if (ret)
> > +	if (ret || fd_empty(real))
> >  		return ret;
> >  
> 
> you are taking account of that, but what of e.g.
>         ret = ovl_real_fdget(file, &real);
>         if (ret)
>                 return ret;
> 
>         /*
>          * Overlay file f_pos is the master copy that is preserved
>          * through copy up and modified on read/write, but only real
>          * fs knows how to SEEK_HOLE/SEEK_DATA and real fs may impose
>          * limitations that are more strict than ->s_maxbytes for specific
>          * files, so we use the real file to perform seeks.
>          */
>         ovl_inode_lock(inode);
>         fd_file(real)->f_pos = file->f_pos;
> in ovl_llseek()?  Get ovl_real_fdget_meta() called by ovl_real_fdget() and
> have it return 0 with NULL in fd_file(real), and you've got an oops right
> there, don't you?

I see... so you rely upon that thing never happening when the last argument of
ovl_real_fdget_meta() is false, including the call from ovl_real_fdget().

I still don't like the calling conventions, TBH.  Let me think a bit...
Al Viro Oct. 5, 2024, 1:35 a.m. UTC | #3
On Fri, Oct 04, 2024 at 11:28:11PM +0100, Al Viro wrote:
> >         /*
> >          * Overlay file f_pos is the master copy that is preserved
> >          * through copy up and modified on read/write, but only real
> >          * fs knows how to SEEK_HOLE/SEEK_DATA and real fs may impose
> >          * limitations that are more strict than ->s_maxbytes for specific
> >          * files, so we use the real file to perform seeks.
> >          */
> >         ovl_inode_lock(inode);
> >         fd_file(real)->f_pos = file->f_pos;
> > in ovl_llseek()?  Get ovl_real_fdget_meta() called by ovl_real_fdget() and
> > have it return 0 with NULL in fd_file(real), and you've got an oops right
> > there, don't you?
> 
> I see... so you rely upon that thing never happening when the last argument of
> ovl_real_fdget_meta() is false, including the call from ovl_real_fdget().
> 
> I still don't like the calling conventions, TBH.  Let me think a bit...

Sorry, I'm afraid I'll have to leave that until tomorrow - over 38C after the sodding
shingles shot really screws the ability to dig through the code ;-/  My apologies...
Amir Goldstein Oct. 5, 2024, 6:30 a.m. UTC | #4
On Sat, Oct 5, 2024 at 3:35 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Fri, Oct 04, 2024 at 11:28:11PM +0100, Al Viro wrote:
> > >         /*
> > >          * Overlay file f_pos is the master copy that is preserved
> > >          * through copy up and modified on read/write, but only real
> > >          * fs knows how to SEEK_HOLE/SEEK_DATA and real fs may impose
> > >          * limitations that are more strict than ->s_maxbytes for specific
> > >          * files, so we use the real file to perform seeks.
> > >          */
> > >         ovl_inode_lock(inode);
> > >         fd_file(real)->f_pos = file->f_pos;
> > > in ovl_llseek()?  Get ovl_real_fdget_meta() called by ovl_real_fdget() and
> > > have it return 0 with NULL in fd_file(real), and you've got an oops right
> > > there, don't you?
> >
> > I see... so you rely upon that thing never happening when the last argument of
> > ovl_real_fdget_meta() is false, including the call from ovl_real_fdget().
> >

Correct. I had considered renaming the argument to allow_empty_upper_meta
but I don't think that will make the contract a lot better.
The thing is that ovl_fsync() caller is really different in two
different aspects:
1. It wants only upper and therefore fd_empty() is a possible outcome
2. It (may) want the metadata inode (when data is still in lower inode)

Overlayfs can have up to 3 different inodes in the stack for a regular file:
lower_data+lower_metadata+upper_metdata

There is currently no file operation that requires opening the lower_metadata
inode and therefore, staching one backing file in ->private_data and another
optional backing file chained from the first one is enough.

If there is ever a file operation that needs to open a realfile from
lower_metadata (only ioctl comes to mind), we may need to reevaluate.

> > I still don't like the calling conventions, TBH.  Let me think a bit...
>

I understand your concern, but honestly, I am not sure that returning
struct fderr is fundamentally different from checking IS_ERR_OR_NULL.

What I can do is refactor the helpers differently so that ovl_fsync() will
call ovl_file_upper() to clarify that it may return NULL, just like
ovl_{dentry,inode,path}_upper() and all the other callers will
call ovl_file_real() which cannot return NULL, because it returns
either lower or upper file, just like ovl_{inode,path}_real{,data}().

> Sorry, I'm afraid I'll have to leave that until tomorrow - over 38C after the sodding
> shingles shot really screws the ability to dig through the code ;-/  My apologies...

Ouch! feel well soon.

Thanks,
Amir.
Al Viro Oct. 5, 2024, 7:49 p.m. UTC | #5
On Sat, Oct 05, 2024 at 08:30:23AM +0200, Amir Goldstein wrote:

> I understand your concern, but honestly, I am not sure that returning
> struct fderr is fundamentally different from checking IS_ERR_OR_NULL.
> 
> What I can do is refactor the helpers differently so that ovl_fsync() will
> call ovl_file_upper() to clarify that it may return NULL, just like

ovl_dentry_upper(), you mean?

> ovl_{dentry,inode,path}_upper() and all the other callers will
> call ovl_file_real() which cannot return NULL, because it returns
> either lower or upper file, just like ovl_{inode,path}_real{,data}().

OK...  One thing I'm not happy about is the control (and data) flow in there:
stashed_upper:
        if (upperfile && file_inode(upperfile) == d_inode(realpath.dentry))
                realfile = upperfile;

        /*
         * If realfile is lower and has been copied up since we'd opened it,
         * open the real upper file and stash it in backing_file_private().
         */
        if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
                struct file *old;

                /* Stashed upperfile has a mismatched inode */
                if (unlikely(upperfile))
                        return ERR_PTR(-EIO);

                upperfile = ovl_open_realfile(file, &realpath);
                if (IS_ERR(upperfile))
                        return upperfile;

                old = cmpxchg_release(backing_file_private_ptr(realfile), NULL,
                                      upperfile);
                if (old) {
                        fput(upperfile);
                        upperfile = old;
                }

                goto stashed_upper;
        }
Unless I'm misreading that, the value of realfile seen inside the second
if is always the original; reassignment in the first if might as well had
been followed by goto past the second one.  What's more, if you win the
race in the second if, you'll have upperfile != NULL and its file_inode()
matching realpath.dentry->d_inode (you'd better, or you have a real problem
in backing_file_open()).  So that branch to stashed_upper in case old == NULL
might as well had been "realfile = upperfile;".  Correct?  In case old != NULL
we go there with upperfile != NULL.  If it (i.e. old) has the right file_inode(),
we are done; otherwise it's going to hit ERR_PTR(-EIO) in the second if.

So it seems to be equivalent to this:
        if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
	        /*
		 * If realfile is lower and has been copied up since we'd
		 * opened it, we need the real upper file opened.  Whoever gets
		 * there first stashes the result in backing_file_private().
		 */
		struct file *upperfile = backing_file_private(realfile);
		if (unlikely(!upperfile)) {
			struct file *old;

			upperfile = ovl_open_realfile(file, &realpath);
			if (IS_ERR(upperfile))
				return upperfile;

			old = cmpxchg_release(backing_file_private_ptr(realfile), NULL,
					      upperfile);
			if (old) {
				fput(upperfile);
				upperfile = old;
			}
		}
		// upperfile reference is owned by realfile at that point
		if (unlikely(file_inode(upperfile) != d_inode(realpath.dentry)))
			/* Stashed upperfile has a mismatched inode */
			return ERR_PTR(-EIO);
		realfile = upperfile;
	}
Or am I misreading it?  Seems to be more straightforward that way...
Amir Goldstein Oct. 6, 2024, 8:03 a.m. UTC | #6
On Sat, Oct 5, 2024 at 9:49 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sat, Oct 05, 2024 at 08:30:23AM +0200, Amir Goldstein wrote:
>
> > I understand your concern, but honestly, I am not sure that returning
> > struct fderr is fundamentally different from checking IS_ERR_OR_NULL.
> >
> > What I can do is refactor the helpers differently so that ovl_fsync() will
> > call ovl_file_upper() to clarify that it may return NULL, just like
>
> ovl_dentry_upper(), you mean?

No, I meant I created a new helper ovl_upper_file() that only ovl_fsync()
uses and can return NULL.
All the rest of the callers are using the helper ovl_real_file() which cannot
return NULL.
I will post it.

>
> > ovl_{dentry,inode,path}_upper() and all the other callers will
> > call ovl_file_real() which cannot return NULL, because it returns
> > either lower or upper file, just like ovl_{inode,path}_real{,data}().
>
> OK...  One thing I'm not happy about is the control (and data) flow in there:
> stashed_upper:
>         if (upperfile && file_inode(upperfile) == d_inode(realpath.dentry))
>                 realfile = upperfile;
>
>         /*
>          * If realfile is lower and has been copied up since we'd opened it,
>          * open the real upper file and stash it in backing_file_private().
>          */
>         if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
>                 struct file *old;
>
>                 /* Stashed upperfile has a mismatched inode */
>                 if (unlikely(upperfile))
>                         return ERR_PTR(-EIO);
>
>                 upperfile = ovl_open_realfile(file, &realpath);
>                 if (IS_ERR(upperfile))
>                         return upperfile;
>
>                 old = cmpxchg_release(backing_file_private_ptr(realfile), NULL,
>                                       upperfile);
>                 if (old) {
>                         fput(upperfile);
>                         upperfile = old;
>                 }
>
>                 goto stashed_upper;
>         }
> Unless I'm misreading that, the value of realfile seen inside the second
> if is always the original; reassignment in the first if might as well had
> been followed by goto past the second one.  What's more, if you win the
> race in the second if, you'll have upperfile != NULL and its file_inode()
> matching realpath.dentry->d_inode (you'd better, or you have a real problem
> in backing_file_open()).  So that branch to stashed_upper in case old == NULL
> might as well had been "realfile = upperfile;".  Correct?  In case old != NULL

Correct.

> we go there with upperfile != NULL.  If it (i.e. old) has the right file_inode(),
> we are done; otherwise it's going to hit ERR_PTR(-EIO) in the second if.
>
> So it seems to be equivalent to this:
>         if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
>                 /*
>                  * If realfile is lower and has been copied up since we'd
>                  * opened it, we need the real upper file opened.  Whoever gets
>                  * there first stashes the result in backing_file_private().
>                  */
>                 struct file *upperfile = backing_file_private(realfile);
>                 if (unlikely(!upperfile)) {
>                         struct file *old;
>
>                         upperfile = ovl_open_realfile(file, &realpath);
>                         if (IS_ERR(upperfile))
>                                 return upperfile;
>
>                         old = cmpxchg_release(backing_file_private_ptr(realfile), NULL,
>                                               upperfile);
>                         if (old) {
>                                 fput(upperfile);
>                                 upperfile = old;
>                         }
>                 }
>                 // upperfile reference is owned by realfile at that point
>                 if (unlikely(file_inode(upperfile) != d_inode(realpath.dentry)))
>                         /* Stashed upperfile has a mismatched inode */
>                         return ERR_PTR(-EIO);
>                 realfile = upperfile;
>         }
> Or am I misreading it?  Seems to be more straightforward that way...

Yeh, that's a bit more clear without to goto, but I would not remove
the if (upperfile) assertion. Actually the first if has a bug.
It assumes that if the upperfile is stashed then it must be used, but
this is incorrect.

I have made the following change:

static bool ovl_is_real_file(const struct file *realfile,
                             const struct path *realpath)
{
        return file_inode(realfile) == d_inode(realpath->dentry);
}
...
        /*
         * Usually, if we operated on a stashed upperfile once, all following
         * operations will operate on the stashed upperfile, but there is one
         * exception - ovl_fsync(datasync = false) can populate the stashed
         * upperfile to perform fsync on upper metadata inode.  In this case,
         * following read/write operations will not use the stashed upperfile.
         */
        if (upperfile && likely(ovl_is_real_file(upperfile, realpath))) {
                realfile = upperfile;
                goto checkflags;
        }

        /*
         * If realfile is lower and has been copied up since we'd opened it,
         * open the real upper file and stash it in backing_file_private().
         */
        if (unlikely(!ovl_is_real_file(realfile, realpath))) {
                struct file *old;

                /* Either stashed realfile or upperfile must match realinode */
                if (WARN_ON_ONCE(upperfile))
                        return ERR_PTR(-EIO);
...
                /* Stashed upperfile that won the race must match realinode */
                if (WARN_ON_ONCE(!ovl_is_real_file(upperfile, realpath)))
                        return ERR_PTR(-EIO);
               realfile = upperfile;
        }

checkflags:
...

What happens is that we have two slots for stashing real files
and there are subtle assumptions in the code that
1. We will never be requested to open a real file for more than
    two inodes (lower and upper)
2. If we are requested to open two inodes, we will always be
    requested to open the lower inode first
3. IOW if we are requested to open the upper inode first, we
    will not be requested to open a different inode

Therefore, the names realfile/upperfile are a bit misleading.
If file is opened after copyup, then the realfile in ->private_data
is the upper file and stashed upperfile is NULL.

I will post the revised version.

Thanks,
Amir.
diff mbox series

Patch

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 4504493b20be..3d64d00ef981 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -90,17 +90,19 @@  static int ovl_change_flags(struct file *file, unsigned int flags)
 }
 
 static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
-			       bool allow_meta)
+			       bool upper_meta)
 {
 	struct dentry *dentry = file_dentry(file);
 	struct file *realfile = file->private_data;
 	struct path realpath;
 	int err;
 
-	real->word = (unsigned long)realfile;
+	real->word = 0;
 
-	if (allow_meta) {
-		ovl_path_real(dentry, &realpath);
+	if (upper_meta) {
+		ovl_path_upper(dentry, &realpath);
+		if (!realpath.dentry)
+			return 0;
 	} else {
 		/* lazy lookup and verify of lowerdata */
 		err = ovl_verify_lowerdata(dentry);
@@ -395,7 +397,7 @@  static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 		return ret;
 
 	ret = ovl_real_fdget_meta(file, &real, !datasync);
-	if (ret)
+	if (ret || fd_empty(real))
 		return ret;
 
 	/* Don't sync lower file for fear of receiving EROFS error */