Message ID | 20240202110132.1584111-3-amir73il@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Decomplicate file_dentry() | expand |
On Fri, 2 Feb 2024 at 12:01, Amir Goldstein <amir73il@gmail.com> wrote: > diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst > index d5bf4b6b7509..453039a2e49b 100644 > --- a/Documentation/filesystems/locking.rst > +++ b/Documentation/filesystems/locking.rst > @@ -29,7 +29,7 @@ prototypes:: > char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen); > struct vfsmount *(*d_automount)(struct path *path); > int (*d_manage)(const struct path *, bool); > - struct dentry *(*d_real)(struct dentry *, const struct inode *); > + struct dentry *(*d_real)(struct dentry *, int type); Why not use the specific enum type for the argument? Thanks, Miklos
On Fri, Feb 2, 2024 at 2:19 PM Miklos Szeredi <miklos@szeredi.hu> wrote: > > On Fri, 2 Feb 2024 at 12:01, Amir Goldstein <amir73il@gmail.com> wrote: > > > diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst > > index d5bf4b6b7509..453039a2e49b 100644 > > --- a/Documentation/filesystems/locking.rst > > +++ b/Documentation/filesystems/locking.rst > > @@ -29,7 +29,7 @@ prototypes:: > > char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen); > > struct vfsmount *(*d_automount)(struct path *path); > > int (*d_manage)(const struct path *, bool); > > - struct dentry *(*d_real)(struct dentry *, const struct inode *); > > + struct dentry *(*d_real)(struct dentry *, int type); > > Why not use the specific enum type for the argument? No reason, we can do enum d_real_type. Thanks, Amir.
On Fri, Feb 02, 2024 at 02:41:16PM +0200, Amir Goldstein wrote: > On Fri, Feb 2, 2024 at 2:19 PM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > On Fri, 2 Feb 2024 at 12:01, Amir Goldstein <amir73il@gmail.com> wrote: > > > > > diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst > > > index d5bf4b6b7509..453039a2e49b 100644 > > > --- a/Documentation/filesystems/locking.rst > > > +++ b/Documentation/filesystems/locking.rst > > > @@ -29,7 +29,7 @@ prototypes:: > > > char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen); > > > struct vfsmount *(*d_automount)(struct path *path); > > > int (*d_manage)(const struct path *, bool); > > > - struct dentry *(*d_real)(struct dentry *, const struct inode *); > > > + struct dentry *(*d_real)(struct dentry *, int type); > > > > Why not use the specific enum type for the argument? > > No reason, we can do enum d_real_type. Fwiw, I'm happy to just change this. No need to resend as far as I'm concerned.
On 2/2/24 06:01, Amir Goldstein wrote: > The only remaining user of ->d_real() method is d_real_inode(), which > passed NULL inode argument to get the real data dentry. > > There are no longer any users that call ->d_real() with a non-NULL > inode argument for getting a detry from a specific underlying layer. > > Remove the inode argument of the method and replace it with an integer > 'type' argument, to allow callers to request the real metadata dentry > instead of the real data dentry. > > All the current users of d_real_inode() (e.g. uprobe) continue to get > the real data inode. Caller that need to get the real metadata inode > (e.g. IMA/EVM) can use d_inode(d_real(dentry, D_REAL_METADATA)). > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> Tested-by: Stefan Berger <stefanb@linux.ibm.com> > --- > Documentation/filesystems/locking.rst | 2 +- > Documentation/filesystems/vfs.rst | 16 ++++----- > fs/overlayfs/super.c | 52 ++++++++++++--------------- > include/linux/dcache.h | 18 ++++++---- > 4 files changed, 41 insertions(+), 47 deletions(-) > > diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst > index d5bf4b6b7509..453039a2e49b 100644 > --- a/Documentation/filesystems/locking.rst > +++ b/Documentation/filesystems/locking.rst > @@ -29,7 +29,7 @@ prototypes:: > char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen); > struct vfsmount *(*d_automount)(struct path *path); > int (*d_manage)(const struct path *, bool); > - struct dentry *(*d_real)(struct dentry *, const struct inode *); > + struct dentry *(*d_real)(struct dentry *, int type); > > locking rules: > > diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst > index eebcc0f9e2bc..2a39e718fdf8 100644 > --- a/Documentation/filesystems/vfs.rst > +++ b/Documentation/filesystems/vfs.rst > @@ -1264,7 +1264,7 @@ defined: > char *(*d_dname)(struct dentry *, char *, int); > struct vfsmount *(*d_automount)(struct path *); > int (*d_manage)(const struct path *, bool); > - struct dentry *(*d_real)(struct dentry *, const struct inode *); > + struct dentry *(*d_real)(struct dentry *, int type); > }; > > ``d_revalidate`` > @@ -1419,16 +1419,14 @@ defined: > the dentry being transited from. > > ``d_real`` > - overlay/union type filesystems implement this method to return > - one of the underlying dentries hidden by the overlay. It is > - used in two different modes: > + overlay/union type filesystems implement this method to return one > + of the underlying dentries of a regular file hidden by the overlay. > > - Called from file_dentry() it returns the real dentry matching > - the inode argument. The real dentry may be from a lower layer > - already copied up, but still referenced from the file. This > - mode is selected with a non-NULL inode argument. > + The 'type' argument takes the values D_REAL_DATA or D_REAL_METADATA > + for returning the real underlying dentry that refers to the inode > + hosting the file's data or metadata respectively. > > - With NULL inode the topmost real underlying dentry is returned. > + For non-regular files, the 'dentry' argument is returned. > > Each dentry has a pointer to its parent dentry, as well as a hash list > of child dentries. Child dentries are basically like files in a > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > index 2eef6c70b2ae..938852a0a92b 100644 > --- a/fs/overlayfs/super.c > +++ b/fs/overlayfs/super.c > @@ -28,41 +28,38 @@ MODULE_LICENSE("GPL"); > > struct ovl_dir_cache; > > -static struct dentry *ovl_d_real(struct dentry *dentry, > - const struct inode *inode) > +static struct dentry *ovl_d_real(struct dentry *dentry, int type) > { > - struct dentry *real = NULL, *lower; > + struct dentry *upper, *lower; > int err; > > - /* > - * vfs is only expected to call d_real() with NULL from d_real_inode() > - * and with overlay inode from file_dentry() on an overlay file. > - * > - * TODO: remove @inode argument from d_real() API, remove code in this > - * function that deals with non-NULL @inode and remove d_real() call > - * from file_dentry(). > - */ > - if (inode && d_inode(dentry) == inode) > - return dentry; > - else if (inode) > + switch (type) { > + case D_REAL_DATA: > + case D_REAL_METADATA: > + break; > + default: > goto bug; > + } > > if (!d_is_reg(dentry)) { > /* d_real_inode() is only relevant for regular files */ > return dentry; > } > > - real = ovl_dentry_upper(dentry); > - if (real && (inode == d_inode(real))) > - return real; > + upper = ovl_dentry_upper(dentry); > + if (upper && (type == D_REAL_METADATA || > + ovl_has_upperdata(d_inode(dentry)))) > + return upper; > > - if (real && !inode && ovl_has_upperdata(d_inode(dentry))) > - return real; > + if (type == D_REAL_METADATA) { > + lower = ovl_dentry_lower(dentry); > + goto real_lower; > + } > > /* > - * Best effort lazy lookup of lowerdata for !inode case to return > + * Best effort lazy lookup of lowerdata for D_REAL_DATA case to return > * the real lowerdata dentry. The only current caller of d_real() with > - * NULL inode is d_real_inode() from trace_uprobe and this caller is > + * D_REAL_DATA is d_real_inode() from trace_uprobe and this caller is > * likely going to be followed reading from the file, before placing > * uprobes on offset within the file, so lowerdata should be available > * when setting the uprobe. > @@ -73,18 +70,13 @@ static struct dentry *ovl_d_real(struct dentry *dentry, > lower = ovl_dentry_lowerdata(dentry); > if (!lower) > goto bug; > - real = lower; > > - /* Handle recursion */ > - real = d_real(real, inode); > +real_lower: > + /* Handle recursion into stacked lower fs */ > + return d_real(lower, type); > > - if (!inode || inode == d_inode(real)) > - return real; > bug: > - WARN(1, "%s(%pd4, %s:%lu): real dentry (%p/%lu) not found\n", > - __func__, dentry, inode ? inode->i_sb->s_id : "NULL", > - inode ? inode->i_ino : 0, real, > - real && d_inode(real) ? d_inode(real)->i_ino : 0); > + WARN(1, "%s(%pd4, %d): real dentry not found\n", __func__, dentry, type); > return dentry; > } > > diff --git a/include/linux/dcache.h b/include/linux/dcache.h > index 1666c387861f..019ad02f2b7e 100644 > --- a/include/linux/dcache.h > +++ b/include/linux/dcache.h > @@ -139,7 +139,7 @@ struct dentry_operations { > char *(*d_dname)(struct dentry *, char *, int); > struct vfsmount *(*d_automount)(struct path *); > int (*d_manage)(const struct path *, bool); > - struct dentry *(*d_real)(struct dentry *, const struct inode *); > + struct dentry *(*d_real)(struct dentry *, int type); > } ____cacheline_aligned; > > /* > @@ -543,27 +543,31 @@ static inline struct inode *d_backing_inode(const struct dentry *upper) > return inode; > } > > +enum d_real_type { > + D_REAL_DATA, > + D_REAL_METADATA, > +}; > + > /** > * d_real - Return the real dentry > * @dentry: the dentry to query > - * @inode: inode to select the dentry from multiple layers (can be NULL) > + * @type: the type of real dentry (data or metadata) > * > * If dentry is on a union/overlay, then return the underlying, real dentry. > * Otherwise return the dentry itself. > * > * See also: Documentation/filesystems/vfs.rst > */ > -static inline struct dentry *d_real(struct dentry *dentry, > - const struct inode *inode) > +static inline struct dentry *d_real(struct dentry *dentry, int type) > { > if (unlikely(dentry->d_flags & DCACHE_OP_REAL)) > - return dentry->d_op->d_real(dentry, inode); > + return dentry->d_op->d_real(dentry, type); > else > return dentry; > } > > /** > - * d_real_inode - Return the real inode > + * d_real_inode - Return the real inode hosting the data > * @dentry: The dentry to query > * > * If dentry is on a union/overlay, then return the underlying, real inode. > @@ -572,7 +576,7 @@ static inline struct dentry *d_real(struct dentry *dentry, > static inline struct inode *d_real_inode(const struct dentry *dentry) > { > /* This usage of d_real() results in const dentry */ > - return d_backing_inode(d_real((struct dentry *) dentry, NULL)); > + return d_inode(d_real((struct dentry *) dentry, D_REAL_DATA)); > } > > struct name_snapshot {
On Fri, Feb 02, 2024 at 01:01:32PM +0200, Amir Goldstein wrote: > The only remaining user of ->d_real() method is d_real_inode(), which > passed NULL inode argument to get the real data dentry. > > There are no longer any users that call ->d_real() with a non-NULL > inode argument for getting a detry from a specific underlying layer. > > Remove the inode argument of the method and replace it with an integer > 'type' argument, to allow callers to request the real metadata dentry > instead of the real data dentry. > > All the current users of d_real_inode() (e.g. uprobe) continue to get > the real data inode. Caller that need to get the real metadata inode > (e.g. IMA/EVM) can use d_inode(d_real(dentry, D_REAL_METADATA)). Hmm... Speaking of the callers, could somebody try explain to IMA folks that they _still_ have a blatant UAF in ima_collect_measurement()? I gave up after several attempts years ago... int ima_collect_measurement(struct integrity_iint_cache *iint, struct file *file, void *buf, loff_t size, enum hash_algo algo, struct modsig *modsig) { const char *audit_cause = "failed"; struct inode *inode = file_inode(file); struct inode *real_inode = d_real_inode(file_dentry(file)); const char *filename = file->f_path.dentry->d_name.name; The name is longer than 40 characters, and thus separately allocated. ... Somebody renames the file, now the name is short and ->d_name.name points to embedded array. The reference to external name is dropped and it's freed after an RCU delay. ... tmpbuf = krealloc(iint->ima_hash, length, GFP_NOFS); We block, RCU delay expires and filename points to freed memory object. ... integrity_audit_msg(AUDIT_INTEGRITY_DATA, inode, filename, "collect_data", audit_cause, result, 0); Which calls integrity_audit_message(), where we hit audit_log_untrustedstring(ab, fname); with fname being our dangling pointer. Use After Free. Really. And "untrusted" in the function name does not refer to "it might be pointing to unmapped page" - it's just "don't expect anything from the characters you might find there, including the presence of NUL".
On Fri, Feb 02, 2024 at 04:05:09PM +0000, Al Viro wrote: > Use After Free. Really. And "untrusted" in the function name does not > refer to "it might be pointing to unmapped page" - it's just "don't > expect anything from the characters you might find there, including > the presence of NUL". Argh... s/including/beyond the/ - sorry. Messed up rewriting the sentence. "Untrusted" refers to the lack of whitespaces, control characters, '"', etc. What audit_log_untrustedstring(ab, string) expects is * string pointing to readable memory object * the object remaining unchanged through the call * NUL existing somewhere in that object. All of those assertions can be violated once the object string used to point to has been passed to kmem_cache_free(). Which is what can very well happen to filename pointer in this case.
On 2/2/24 11:16, Al Viro wrote: > On Fri, Feb 02, 2024 at 04:05:09PM +0000, Al Viro wrote: > >> Use After Free. Really. And "untrusted" in the function name does not >> refer to "it might be pointing to unmapped page" - it's just "don't >> expect anything from the characters you might find there, including >> the presence of NUL". > > Argh... s/including/beyond the/ - sorry. Messed up rewriting the > sentence. > > "Untrusted" refers to the lack of whitespaces, control characters, '"', > etc. What audit_log_untrustedstring(ab, string) expects is > * string pointing to readable memory object > * the object remaining unchanged through the call > * NUL existing somewhere in that object. > > All of those assertions can be violated once the object string > used to point to has been passed to kmem_cache_free(). Which is what > can very well happen to filename pointer in this case. I suppose this would provide a stable name? diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c index 597ea0c4d72f..48ae6911139b 100644 --- a/security/integrity/ima/ima_api.c +++ b/security/integrity/ima/ima_api.c @@ -244,7 +244,6 @@ int ima_collect_measurement(struct integrity_iint_cache *iint, const char *audit_cause = "failed"; struct inode *inode = file_inode(file); struct inode *real_inode = d_real_inode(file_dentry(file)); - const char *filename = file->f_path.dentry->d_name.name; struct ima_max_digest_data hash; struct kstat stat; int result = 0; @@ -313,11 +312,17 @@ int ima_collect_measurement(struct integrity_iint_cache *iint, iint->flags |= IMA_COLLECTED; out: if (result) { + struct qstr *qstr = &file->f_path.dentry->d_name; + char buf[NAME_MAX + 1]; + if (file->f_flags & O_DIRECT) audit_cause = "failed(directio)"; + memcpy(buf, qstr->name, qstr->len); + buf[qstr->len] = 0; + integrity_audit_msg(AUDIT_INTEGRITY_DATA, inode, - filename, "collect_data", audit_cause, + buf, "collect_data", audit_cause, result, 0); } return result; Regards, Stefan
On Fri, Feb 02, 2024 at 12:34:15PM -0500, Stefan Berger wrote: > I suppose this would provide a stable name? > > diff --git a/security/integrity/ima/ima_api.c > b/security/integrity/ima/ima_api.c > index 597ea0c4d72f..48ae6911139b 100644 > --- a/security/integrity/ima/ima_api.c > +++ b/security/integrity/ima/ima_api.c > @@ -244,7 +244,6 @@ int ima_collect_measurement(struct integrity_iint_cache > *iint, > const char *audit_cause = "failed"; > struct inode *inode = file_inode(file); > struct inode *real_inode = d_real_inode(file_dentry(file)); > - const char *filename = file->f_path.dentry->d_name.name; > struct ima_max_digest_data hash; > struct kstat stat; > int result = 0; > @@ -313,11 +312,17 @@ int ima_collect_measurement(struct > integrity_iint_cache *iint, > iint->flags |= IMA_COLLECTED; > out: > if (result) { > + struct qstr *qstr = &file->f_path.dentry->d_name; > + char buf[NAME_MAX + 1]; > + > if (file->f_flags & O_DIRECT) > audit_cause = "failed(directio)"; > > + memcpy(buf, qstr->name, qstr->len); > + buf[qstr->len] = 0; Think what happens if you fetch ->len in state prior to rename and ->name - after. memcpy() from one memory object with length that matches another, UAF right there. If you want to take a snapshot of the name, just use take_dentry_name_snapshot() - that will copy name if it's short or grab a reference to external name if the length is >= 40, all of that under ->d_lock to stabilize things. Paired with release_dentry_name_snapshot(), to drop the reference to external name if one had been taken. No need to copy in long case (external names are never rewritten) and it's kinder on the stack footprint that way (56 bytes vs. 256). Something like this (completely untested): fix a UAF in ima_collect_measurement() ->d_name.name can change on rename and the earlier value can get freed; there are conditions sufficient to stabilize it (->d_lock on denty, ->d_lock on its parent, ->i_rwsem exclusive on the parent's inode, rename_lock), but none of those are met in ima_collect_measurement(). Take a stable snapshot of name instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c index 597ea0c4d72f..d8be2280d97b 100644 --- a/security/integrity/ima/ima_api.c +++ b/security/integrity/ima/ima_api.c @@ -244,7 +244,6 @@ int ima_collect_measurement(struct integrity_iint_cache *iint, const char *audit_cause = "failed"; struct inode *inode = file_inode(file); struct inode *real_inode = d_real_inode(file_dentry(file)); - const char *filename = file->f_path.dentry->d_name.name; struct ima_max_digest_data hash; struct kstat stat; int result = 0; @@ -313,12 +312,16 @@ int ima_collect_measurement(struct integrity_iint_cache *iint, iint->flags |= IMA_COLLECTED; out: if (result) { + struct name_snapshot filename; + + take_dentry_name_snapshot(&filename, file->f_path.dentry); if (file->f_flags & O_DIRECT) audit_cause = "failed(directio)"; integrity_audit_msg(AUDIT_INTEGRITY_DATA, inode, - filename, "collect_data", audit_cause, - result, 0); + filename.name.name, "collect_data", + audit_cause, result, 0); + release_dentry_name_snapshot(&filename); } return result; }
On 2/2/24 13:27, Al Viro wrote: > On Fri, Feb 02, 2024 at 12:34:15PM -0500, Stefan Berger wrote: > >> I suppose this would provide a stable name? >> >> diff --git a/security/integrity/ima/ima_api.c >> b/security/integrity/ima/ima_api.c >> index 597ea0c4d72f..48ae6911139b 100644 >> --- a/security/integrity/ima/ima_api.c >> +++ b/security/integrity/ima/ima_api.c >> @@ -244,7 +244,6 @@ int ima_collect_measurement(struct integrity_iint_cache >> *iint, >> const char *audit_cause = "failed"; >> struct inode *inode = file_inode(file); >> struct inode *real_inode = d_real_inode(file_dentry(file)); >> - const char *filename = file->f_path.dentry->d_name.name; >> struct ima_max_digest_data hash; >> struct kstat stat; >> int result = 0; >> @@ -313,11 +312,17 @@ int ima_collect_measurement(struct >> integrity_iint_cache *iint, >> iint->flags |= IMA_COLLECTED; >> out: >> if (result) { >> + struct qstr *qstr = &file->f_path.dentry->d_name; >> + char buf[NAME_MAX + 1]; >> + >> if (file->f_flags & O_DIRECT) >> audit_cause = "failed(directio)"; >> >> + memcpy(buf, qstr->name, qstr->len); >> + buf[qstr->len] = 0; > > Think what happens if you fetch ->len in state prior to > rename and ->name - after. memcpy() from one memory object > with length that matches another, UAF right there. > > If you want to take a snapshot of the name, just use > take_dentry_name_snapshot() - that will copy name if it's > short or grab a reference to external name if the length is >> = 40, all of that under ->d_lock to stabilize things. I had exactly what you show below (inspired by overlayfs) but then looked around whether there's another way of doing it and I saw copies being made. Thanks. > > Paired with release_dentry_name_snapshot(), to > drop the reference to external name if one had been taken. > No need to copy in long case (external names are never > rewritten) and it's kinder on the stack footprint that > way (56 bytes vs. 256). > > Something like this (completely untested): > > fix a UAF in ima_collect_measurement() > > ->d_name.name can change on rename and the earlier value can get freed; > there are conditions sufficient to stabilize it (->d_lock on denty, > ->d_lock on its parent, ->i_rwsem exclusive on the parent's inode, > rename_lock), but none of those are met in ima_collect_measurement(). > Take a stable snapshot of name instead. > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > --- > diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c > index 597ea0c4d72f..d8be2280d97b 100644 > --- a/security/integrity/ima/ima_api.c > +++ b/security/integrity/ima/ima_api.c > @@ -244,7 +244,6 @@ int ima_collect_measurement(struct integrity_iint_cache *iint, > const char *audit_cause = "failed"; > struct inode *inode = file_inode(file); > struct inode *real_inode = d_real_inode(file_dentry(file)); > - const char *filename = file->f_path.dentry->d_name.name; > struct ima_max_digest_data hash; > struct kstat stat; > int result = 0; > @@ -313,12 +312,16 @@ int ima_collect_measurement(struct integrity_iint_cache *iint, > iint->flags |= IMA_COLLECTED; > out: > if (result) { > + struct name_snapshot filename; > + > + take_dentry_name_snapshot(&filename, file->f_path.dentry); > if (file->f_flags & O_DIRECT) > audit_cause = "failed(directio)"; > > integrity_audit_msg(AUDIT_INTEGRITY_DATA, inode, > - filename, "collect_data", audit_cause, > - result, 0); > + filename.name.name, "collect_data", > + audit_cause, result, 0); > + release_dentry_name_snapshot(&filename); > } > return result; > } I can take it from here unless you want to formally post it. Stefan
On Fri, Feb 02, 2024 at 06:27:32PM +0000, Al Viro wrote: > Think what happens if you fetch ->len in state prior to > rename and ->name - after. memcpy() from one memory object > with length that matches another, UAF right there. s/UAF/fairly easy oops/ - you can end up fetching past the end of page that hosts kmalloc'ed object, and there's no promise that anything will be mapped there. I really need more coffee...
On Fri, Feb 02, 2024 at 01:32:39PM -0500, Stefan Berger wrote:
> I can take it from here unless you want to formally post it.
Less headache for me that way, but you want to test it - all
I've checked is that the damn thing compiles, and while it's
hard to fuck up and I don't see any brainos in there...
I'm way too low on caffeine at the moment. It should be
safe from races, just verifying that it prints the right thing
when you hit that codepath would be enough.
On Fri, Feb 2, 2024 at 3:55 PM Christian Brauner <brauner@kernel.org> wrote: > > On Fri, Feb 02, 2024 at 02:41:16PM +0200, Amir Goldstein wrote: > > On Fri, Feb 2, 2024 at 2:19 PM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > On Fri, 2 Feb 2024 at 12:01, Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst > > > > index d5bf4b6b7509..453039a2e49b 100644 > > > > --- a/Documentation/filesystems/locking.rst > > > > +++ b/Documentation/filesystems/locking.rst > > > > @@ -29,7 +29,7 @@ prototypes:: > > > > char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen); > > > > struct vfsmount *(*d_automount)(struct path *path); > > > > int (*d_manage)(const struct path *, bool); > > > > - struct dentry *(*d_real)(struct dentry *, const struct inode *); > > > > + struct dentry *(*d_real)(struct dentry *, int type); > > > > > > Why not use the specific enum type for the argument? > > > > No reason, we can do enum d_real_type. > > Fwiw, I'm happy to just change this. No need to resend as far as I'm concerned. FWIW, I'd be happy if you do that :) Note to move enum d_real_type definition above dentry_operations. Please add Stefan's Tested-by to both patches. Thanks, Amir.
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index d5bf4b6b7509..453039a2e49b 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -29,7 +29,7 @@ prototypes:: char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen); struct vfsmount *(*d_automount)(struct path *path); int (*d_manage)(const struct path *, bool); - struct dentry *(*d_real)(struct dentry *, const struct inode *); + struct dentry *(*d_real)(struct dentry *, int type); locking rules: diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index eebcc0f9e2bc..2a39e718fdf8 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -1264,7 +1264,7 @@ defined: char *(*d_dname)(struct dentry *, char *, int); struct vfsmount *(*d_automount)(struct path *); int (*d_manage)(const struct path *, bool); - struct dentry *(*d_real)(struct dentry *, const struct inode *); + struct dentry *(*d_real)(struct dentry *, int type); }; ``d_revalidate`` @@ -1419,16 +1419,14 @@ defined: the dentry being transited from. ``d_real`` - overlay/union type filesystems implement this method to return - one of the underlying dentries hidden by the overlay. It is - used in two different modes: + overlay/union type filesystems implement this method to return one + of the underlying dentries of a regular file hidden by the overlay. - Called from file_dentry() it returns the real dentry matching - the inode argument. The real dentry may be from a lower layer - already copied up, but still referenced from the file. This - mode is selected with a non-NULL inode argument. + The 'type' argument takes the values D_REAL_DATA or D_REAL_METADATA + for returning the real underlying dentry that refers to the inode + hosting the file's data or metadata respectively. - With NULL inode the topmost real underlying dentry is returned. + For non-regular files, the 'dentry' argument is returned. Each dentry has a pointer to its parent dentry, as well as a hash list of child dentries. Child dentries are basically like files in a diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 2eef6c70b2ae..938852a0a92b 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -28,41 +28,38 @@ MODULE_LICENSE("GPL"); struct ovl_dir_cache; -static struct dentry *ovl_d_real(struct dentry *dentry, - const struct inode *inode) +static struct dentry *ovl_d_real(struct dentry *dentry, int type) { - struct dentry *real = NULL, *lower; + struct dentry *upper, *lower; int err; - /* - * vfs is only expected to call d_real() with NULL from d_real_inode() - * and with overlay inode from file_dentry() on an overlay file. - * - * TODO: remove @inode argument from d_real() API, remove code in this - * function that deals with non-NULL @inode and remove d_real() call - * from file_dentry(). - */ - if (inode && d_inode(dentry) == inode) - return dentry; - else if (inode) + switch (type) { + case D_REAL_DATA: + case D_REAL_METADATA: + break; + default: goto bug; + } if (!d_is_reg(dentry)) { /* d_real_inode() is only relevant for regular files */ return dentry; } - real = ovl_dentry_upper(dentry); - if (real && (inode == d_inode(real))) - return real; + upper = ovl_dentry_upper(dentry); + if (upper && (type == D_REAL_METADATA || + ovl_has_upperdata(d_inode(dentry)))) + return upper; - if (real && !inode && ovl_has_upperdata(d_inode(dentry))) - return real; + if (type == D_REAL_METADATA) { + lower = ovl_dentry_lower(dentry); + goto real_lower; + } /* - * Best effort lazy lookup of lowerdata for !inode case to return + * Best effort lazy lookup of lowerdata for D_REAL_DATA case to return * the real lowerdata dentry. The only current caller of d_real() with - * NULL inode is d_real_inode() from trace_uprobe and this caller is + * D_REAL_DATA is d_real_inode() from trace_uprobe and this caller is * likely going to be followed reading from the file, before placing * uprobes on offset within the file, so lowerdata should be available * when setting the uprobe. @@ -73,18 +70,13 @@ static struct dentry *ovl_d_real(struct dentry *dentry, lower = ovl_dentry_lowerdata(dentry); if (!lower) goto bug; - real = lower; - /* Handle recursion */ - real = d_real(real, inode); +real_lower: + /* Handle recursion into stacked lower fs */ + return d_real(lower, type); - if (!inode || inode == d_inode(real)) - return real; bug: - WARN(1, "%s(%pd4, %s:%lu): real dentry (%p/%lu) not found\n", - __func__, dentry, inode ? inode->i_sb->s_id : "NULL", - inode ? inode->i_ino : 0, real, - real && d_inode(real) ? d_inode(real)->i_ino : 0); + WARN(1, "%s(%pd4, %d): real dentry not found\n", __func__, dentry, type); return dentry; } diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 1666c387861f..019ad02f2b7e 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -139,7 +139,7 @@ struct dentry_operations { char *(*d_dname)(struct dentry *, char *, int); struct vfsmount *(*d_automount)(struct path *); int (*d_manage)(const struct path *, bool); - struct dentry *(*d_real)(struct dentry *, const struct inode *); + struct dentry *(*d_real)(struct dentry *, int type); } ____cacheline_aligned; /* @@ -543,27 +543,31 @@ static inline struct inode *d_backing_inode(const struct dentry *upper) return inode; } +enum d_real_type { + D_REAL_DATA, + D_REAL_METADATA, +}; + /** * d_real - Return the real dentry * @dentry: the dentry to query - * @inode: inode to select the dentry from multiple layers (can be NULL) + * @type: the type of real dentry (data or metadata) * * If dentry is on a union/overlay, then return the underlying, real dentry. * Otherwise return the dentry itself. * * See also: Documentation/filesystems/vfs.rst */ -static inline struct dentry *d_real(struct dentry *dentry, - const struct inode *inode) +static inline struct dentry *d_real(struct dentry *dentry, int type) { if (unlikely(dentry->d_flags & DCACHE_OP_REAL)) - return dentry->d_op->d_real(dentry, inode); + return dentry->d_op->d_real(dentry, type); else return dentry; } /** - * d_real_inode - Return the real inode + * d_real_inode - Return the real inode hosting the data * @dentry: The dentry to query * * If dentry is on a union/overlay, then return the underlying, real inode. @@ -572,7 +576,7 @@ static inline struct dentry *d_real(struct dentry *dentry, static inline struct inode *d_real_inode(const struct dentry *dentry) { /* This usage of d_real() results in const dentry */ - return d_backing_inode(d_real((struct dentry *) dentry, NULL)); + return d_inode(d_real((struct dentry *) dentry, D_REAL_DATA)); } struct name_snapshot {
The only remaining user of ->d_real() method is d_real_inode(), which passed NULL inode argument to get the real data dentry. There are no longer any users that call ->d_real() with a non-NULL inode argument for getting a detry from a specific underlying layer. Remove the inode argument of the method and replace it with an integer 'type' argument, to allow callers to request the real metadata dentry instead of the real data dentry. All the current users of d_real_inode() (e.g. uprobe) continue to get the real data inode. Caller that need to get the real metadata inode (e.g. IMA/EVM) can use d_inode(d_real(dentry, D_REAL_METADATA)). Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- Documentation/filesystems/locking.rst | 2 +- Documentation/filesystems/vfs.rst | 16 ++++----- fs/overlayfs/super.c | 52 ++++++++++++--------------- include/linux/dcache.h | 18 ++++++---- 4 files changed, 41 insertions(+), 47 deletions(-)