Message ID | 20231007084433.1417887-4-amir73il@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Reduce impact of overlayfs backing files fake path | expand |
On Sat, Oct 07, 2023 at 11:44:33AM +0300, Amir Goldstein wrote: > - if (real_path->mnt) > - mnt_put_write_access(real_path->mnt); > + if (user_path->mnt) > + mnt_put_write_access(user_path->mnt); > } > } Again, how can the predicates be ever false here? We should *not* have struct path with NULL .mnt unless it's {NULL, NULL} pair. For the record, struct path with NULL .dentry and non-NULL .mnt *IS* possible, but only in a very narrow area - if, during an attempt to fall back from rcu pathwalk to normal one we have __legitimize_path() successfully validate (== grab) the reference to mount, but fail to validate dentry. In that case we need to drop mount, but not dentry when we get to cleanup (pretty much as soon as we drop rcu_read_lock()). That gets indicated by clearing path->dentry, and only while we are propagating the error back to the point where references would be dropped. No filesystem code should ever see struct path instances in that state. Please, don't make the things more confusing; "incomplete" struct path like that are very much not normal (and this variety is flat-out impossible). > @@ -34,9 +34,18 @@ static struct dentry *ovl_d_real(struct dentry *dentry, > struct dentry *real = NULL, *lower; > int err; > > - /* It's an overlay file */ > + /* > + * vfs is only expected to call d_real() with NULL from d_real_inode() > + * and with overlay inode from file_dentry() on an overlay file. > + * > + * TODO: remove @inode argument from d_real() API, remove code in this > + * function that deals with non-NULL @inode and remove d_real() call > + * from file_dentry(). > + */ > if (inode && d_inode(dentry) == inode) > return dentry; > + else > + WARN_ON_ONCE(inode); > > if (!d_is_reg(dentry)) { > if (!inode || inode == d_inode(dentry)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ BTW, that condition is confusing as hell (both before and after this patch). AFAICS, it's a pointlessly obfuscated if (!inode) Look: we get to evaluating that test only if we hadn't buggered off on if (inode && d_inode(dentry) == inode) return dentry; above. Which means that either inode is NULL (in which case the evaluation yields true as soon as we see that !inode is true) or it's neither NULL nor equal to d_inode(dentry). In which case we see that !inode is false and proceed yield false *after* comparing inode with d_inode(dentry) and seeing that they are not equal. <checks history> e8c985bace13 "ovl: deal with overlay files in ovl_d_real()" had introduced the first check, and nobody noticed that the older check below could've been simplified. Oh, well... > -static inline const struct path *file_real_path(struct file *f) > +static inline const struct path *f_path(struct file *f) > { > - if (unlikely(f->f_mode & FMODE_BACKING)) > - return backing_file_real_path(f); > return &f->f_path; > } Bad name, IMO - makes grepping harder and... what the hell do we need it for, anyway? You have only one caller, and no obvious reason why it would be worse off as path = &file->f_path...
On Mon, Oct 9, 2023 at 10:48 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Sat, Oct 07, 2023 at 11:44:33AM +0300, Amir Goldstein wrote: > > > - if (real_path->mnt) > > - mnt_put_write_access(real_path->mnt); > > + if (user_path->mnt) > > + mnt_put_write_access(user_path->mnt); > > } > > } > > Again, how can the predicates be ever false here? We should *not* > have struct path with NULL .mnt unless it's {NULL, NULL} pair. > > For the record, struct path with NULL .dentry and non-NULL .mnt > *IS* possible, but only in a very narrow area - if, during > an attempt to fall back from rcu pathwalk to normal one we > have __legitimize_path() successfully validate (== grab) the > reference to mount, but fail to validate dentry. In that > case we need to drop mount, but not dentry when we get to > cleanup (pretty much as soon as we drop rcu_read_lock()). > That gets indicated by clearing path->dentry, and only > while we are propagating the error back to the point where > references would be dropped. No filesystem code should > ever see struct path instances in that state. > > Please, don't make the things more confusing; "incomplete" > struct path like that are very much not normal (and this > variety is flat-out impossible). > > No problem. I will remove the conditional. > > @@ -34,9 +34,18 @@ static struct dentry *ovl_d_real(struct dentry *dentry, > > struct dentry *real = NULL, *lower; > > int err; > > > > - /* It's an overlay file */ > > + /* > > + * vfs is only expected to call d_real() with NULL from d_real_inode() > > + * and with overlay inode from file_dentry() on an overlay file. > > + * > > + * TODO: remove @inode argument from d_real() API, remove code in this > > + * function that deals with non-NULL @inode and remove d_real() call > > + * from file_dentry(). > > + */ > > if (inode && d_inode(dentry) == inode) > > return dentry; > > + else > > + WARN_ON_ONCE(inode); > > > > if (!d_is_reg(dentry)) { > > if (!inode || inode == d_inode(dentry)) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > BTW, that condition is confusing as hell (both before and > after this patch). AFAICS, it's a pointlessly obfuscated > if (!inode) > Look: we get to evaluating that test only if we hadn't buggered > off on > if (inode && d_inode(dentry) == inode) > return dentry; > above. Which means that either inode is NULL (in which case the > evaluation yields true as soon as we see that !inode is true) or > it's neither NULL nor equal to d_inode(dentry). In which case > we see that !inode is false and proceed yield false *after* > comparing inode with d_inode(dentry) and seeing that they > are not equal. > > <checks history> > e8c985bace13 "ovl: deal with overlay files in ovl_d_real()" > had introduced the first check, and nobody noticed that the > older check below could've been simplified. Oh, well... > Absolutely right. I can remove the pointless condition. FWIW, the next step after dust from this patch set settles is to make file_dentry(f) := ((f)->f_path.dentry) and remove the non-NULL inode case from ->d_real() interface altogether, so this confusing check was going to go away soon anyway. > > -static inline const struct path *file_real_path(struct file *f) > > +static inline const struct path *f_path(struct file *f) > > { > > - if (unlikely(f->f_mode & FMODE_BACKING)) > > - return backing_file_real_path(f); > > return &f->f_path; > > } > > Bad name, IMO - makes grepping harder and... what the hell do > we need it for, anyway? You have only one caller, and no > obvious reason why it would be worse off as path = &file->f_path... It's not important. I don't mind dropping it. If you dislike that name f_path(), I guess you are not a fan of d_inode() either... FYI, I wanted to do a file_path() accessor to be consistent with file_inode() and file_dentry(), alas file_path() is used for something completely different. I find it confusing that {file,dentry,d}_path() do not return a path but a path string, but whatever. Thanks, Amir.
On Mon, Oct 09, 2023 at 11:25:38AM +0300, Amir Goldstein wrote: > It's not important. I don't mind dropping it. > > If you dislike that name f_path(), I guess you are not a fan of > d_inode() either... In case of d_inode() there's an opposition between d_inode() and d_inode_rcu(), and that bears useful information. In case of f_path()... > FYI, I wanted to do a file_path() accessor to be consistent with > file_inode() and file_dentry(), alas file_path() is used for something > completely different. > > I find it confusing that {file,dentry,d}_path() do not return a path > but a path string, but whatever. *blink* How would one possibly produce struct path (i.e. mount/dentry pair) out of dentry? Anyway, I admit that struct path hadn't been a great name to start with; it's basically "location in namespace", and it clashes with the use of the same word for "string interpreted to get a location in namespace". Originally it's been just in the pathname resolution internals; TBH, I don't remember I had specific plans regarding converting such pairs (a plenty of those in the tree at the time) back then. <checks the historical tree> <checks old mail archives> Probably hopeless to reconstruct the details, I'm afraid - everything else aside, the timestamps in that patch are in the first half of the day on Apr 29 2002; (hopefully) tested and sent out at about 6pm. Followed by sitting down for birthday celebration, of all things, so the details of rationale for name choice would probably be hard to recover on the next morning, nevermind 21 years later ;-) Bringing it out of fs/namei.c (into include/linux/namei.h) had happened in late 2006 (by Jeff Sipek) and after that it was too late to change the name; I'm still not sure what name would be good for that, TBH...
diff --git a/fs/file_table.c b/fs/file_table.c index 08fd1dd6d863..fa92743ba6a9 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -44,10 +44,10 @@ static struct kmem_cache *filp_cachep __read_mostly; static struct percpu_counter nr_files __cacheline_aligned_in_smp; -/* Container for backing file with optional real path */ +/* Container for backing file with optional user path */ struct backing_file { struct file file; - struct path real_path; + struct path user_path; }; static inline struct backing_file *backing_file(struct file *f) @@ -55,11 +55,11 @@ static inline struct backing_file *backing_file(struct file *f) return container_of(f, struct backing_file, file); } -struct path *backing_file_real_path(struct file *f) +struct path *backing_file_user_path(struct file *f) { - return &backing_file(f)->real_path; + return &backing_file(f)->user_path; } -EXPORT_SYMBOL_GPL(backing_file_real_path); +EXPORT_SYMBOL_GPL(backing_file_user_path); static inline void file_free(struct file *f) { @@ -68,7 +68,7 @@ static inline void file_free(struct file *f) percpu_counter_dec(&nr_files); put_cred(f->f_cred); if (unlikely(f->f_mode & FMODE_BACKING)) { - path_put(backing_file_real_path(f)); + path_put(backing_file_user_path(f)); kfree(backing_file(f)); } else { kmem_cache_free(filp_cachep, f); diff --git a/fs/internal.h b/fs/internal.h index 846d5133dd9c..652a1703668e 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -101,10 +101,10 @@ static inline void file_put_write_access(struct file *file) put_write_access(file->f_inode); mnt_put_write_access(file->f_path.mnt); if (unlikely(file->f_mode & FMODE_BACKING)) { - struct path *real_path = backing_file_real_path(file); + struct path *user_path = backing_file_user_path(file); - if (real_path->mnt) - mnt_put_write_access(real_path->mnt); + if (user_path->mnt) + mnt_put_write_access(user_path->mnt); } } diff --git a/fs/open.c b/fs/open.c index 2f3e28512663..1bfedc314e49 100644 --- a/fs/open.c +++ b/fs/open.c @@ -881,10 +881,10 @@ static inline int file_get_write_access(struct file *f) if (unlikely(error)) goto cleanup_inode; if (unlikely(f->f_mode & FMODE_BACKING)) { - struct path *real_path = backing_file_real_path(f); + struct path *user_path = backing_file_user_path(f); - if (real_path->mnt) - error = mnt_get_write_access(real_path->mnt); + if (user_path->mnt) + error = mnt_get_write_access(user_path->mnt); if (unlikely(error)) goto cleanup_mnt; } @@ -1185,20 +1185,19 @@ EXPORT_SYMBOL_GPL(kernel_file_open); /** * backing_file_open - open a backing file for kernel internal use - * @path: path of the file to open + * @user_path: path that the user reuqested to open * @flags: open flags * @real_path: path of the backing file * @cred: credentials for open * * Open a backing file for a stackable filesystem (e.g., overlayfs). - * @path may be on the stackable filesystem and backing inode on the - * underlying filesystem. In this case, we want to be able to return - * the @real_path of the backing inode. This is done by embedding the - * returned file into a container structure that also stores the path of - * the backing inode on the underlying filesystem, which can be - * retrieved using backing_file_real_path(). + * @user_path may be on the stackable filesystem and @real_path on the + * underlying filesystem. In this case, we want to be able to return the + * @user_path of the stackable filesystem. This is done by embedding the + * returned file into a container structure that also stores the stacked + * file's path, which can be retrieved using backing_file_user_path(). */ -struct file *backing_file_open(const struct path *path, int flags, +struct file *backing_file_open(const struct path *user_path, int flags, const struct path *real_path, const struct cred *cred) { @@ -1209,9 +1208,9 @@ struct file *backing_file_open(const struct path *path, int flags, if (IS_ERR(f)) return f; - f->f_path = *path; - path_get(real_path); - *backing_file_real_path(f) = *real_path; + path_get(user_path); + *backing_file_user_path(f) = *user_path; + f->f_path = *real_path; error = do_dentry_open(f, d_inode(real_path->dentry), NULL); if (error) { fput(f); diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 3fa2416264a4..0245db1a3e29 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -34,9 +34,18 @@ static struct dentry *ovl_d_real(struct dentry *dentry, struct dentry *real = NULL, *lower; int err; - /* It's an overlay file */ + /* + * vfs is only expected to call d_real() with NULL from d_real_inode() + * and with overlay inode from file_dentry() on an overlay file. + * + * TODO: remove @inode argument from d_real() API, remove code in this + * function that deals with non-NULL @inode and remove d_real() call + * from file_dentry(). + */ if (inode && d_inode(dentry) == inode) return dentry; + else + WARN_ON_ONCE(inode); if (!d_is_reg(dentry)) { if (!inode || inode == d_inode(dentry)) diff --git a/include/linux/fs.h b/include/linux/fs.h index a8e4e1cac48e..75073a9d0fab 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2451,24 +2451,13 @@ struct file *dentry_open(const struct path *path, int flags, const struct cred *creds); struct file *dentry_create(const struct path *path, int flags, umode_t mode, const struct cred *cred); -struct file *backing_file_open(const struct path *path, int flags, +struct file *backing_file_open(const struct path *user_path, int flags, const struct path *real_path, const struct cred *cred); -struct path *backing_file_real_path(struct file *f); +struct path *backing_file_user_path(struct file *f); -/* - * file_real_path - get the path corresponding to f_inode - * - * When opening a backing file for a stackable filesystem (e.g., - * overlayfs) f_path may be on the stackable filesystem and f_inode on - * the underlying filesystem. When the path associated with f_inode is - * needed, this helper should be used instead of accessing f_path - * directly. -*/ -static inline const struct path *file_real_path(struct file *f) +static inline const struct path *f_path(struct file *f) { - if (unlikely(f->f_mode & FMODE_BACKING)) - return backing_file_real_path(f); return &f->f_path; } @@ -2483,6 +2472,8 @@ static inline const struct path *file_real_path(struct file *f) */ static inline const struct path *file_user_path(struct file *f) { + if (unlikely(f->f_mode & FMODE_BACKING)) + return backing_file_user_path(f); return &f->f_path; } diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index ed48e4f1e755..001307e12a49 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -96,8 +96,7 @@ static inline int fsnotify_file(struct file *file, __u32 mask) if (file->f_mode & FMODE_NONOTIFY) return 0; - /* Overlayfs internal files have fake f_path */ - path = file_real_path(file); + path = f_path(file); return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH); }
A backing file struct stores two path's, one "real" path that is referring to f_inode and one "fake" path, which should be displayed to users in /proc/<pid>/maps. There is a lot more potential code that needs to know the "real" path, then code that needs to know the "fake" path. Instead of code having to request the "real" path with file_real_path(), store the "real" path in f_path and require code that needs to know the "fake" path request it with file_user_path(). Replace the file_real_path() helper with a simple const accessor f_path(). After this change, file_dentry() is not expected to observe any files with overlayfs f_path and real f_inode, so the call to ->d_real() should not be needed. Leave the ->d_real() call for now and add an assertion in ovl_d_real() to catch if we made wrong assumptions. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/r/CAJfpegtt48eXhhjDFA1ojcHPNKj3Go6joryCPtEFAKpocyBsnw@mail.gmail.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- fs/file_table.c | 12 ++++++------ fs/internal.h | 6 +++--- fs/open.c | 27 +++++++++++++-------------- fs/overlayfs/super.c | 11 ++++++++++- include/linux/fs.h | 19 +++++-------------- include/linux/fsnotify.h | 3 +-- 6 files changed, 38 insertions(+), 40 deletions(-)