[1/6] VFS: improve interface for lookup_one functions

Message ID	20250319031545.2999807-2-neil@brown.name (mailing list archive)
State	Handled Elsewhere
Headers	show Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F242520DD65; Wed, 19 Mar 2025 03:16:06 +0000 (UTC) From: NeilBrown <neil@brown.name> To: Alexander Viro <viro@zeniv.linux.org.uk>, Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>, David Howells <dhowells@redhat.com>, Chuck Lever <chuck.lever@oracle.com>, Jeff Layton <jlayton@kernel.org> Cc: linux-nfs@vger.kernel.org, netfs@lists.linux.dev, linux-fsdevel@vger.kernel.org Subject: [PATCH 1/6] VFS: improve interface for lookup_one functions Date: Wed, 19 Mar 2025 14:01:32 +1100 Message-ID: <20250319031545.2999807-2-neil@brown.name> In-Reply-To: <20250319031545.2999807-1-neil@brown.name> References: <20250319031545.2999807-1-neil@brown.name> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	tidy up various VFS lookup functions \| expand [0/6,RFC,v2] tidy up various VFS lookup functions [1/6] VFS: improve interface for lookup_one functions [2/6] nfsd: Use lookup_one() rather than lookup_one_len() [3/6] cachefiles: Use lookup_one() rather than lookup_one_len() [4/6] VFS: rename lookup_one_len family to lookup_noperm and remove permission check [5/6] Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS [6/6] VFS: change lookup_one_common and lookup_noperm_common to take a qstr

NeilBrown March 19, 2025, 3:01 a.m. UTC

The family of functions:
  lookup_one()
  lookup_one_unlocked()
  lookup_one_positive_unlocked()

appear designed to be used by external clients of the filesystem rather
than by filesystems acting on themselves as the lookup_one_len family
are used.

They are used by:
   btrfs/ioctl - which is a user-space interface rather than an internal
     activity
   exportfs - i.e. from nfsd or the open_by_handle_at interface
   overlayfs - at access the underlying filesystems
   smb/server - for file service

They should be used by nfsd (more than just the exportfs path) and
cachefs but aren't.

It would help if the documentation didn't claim they should "not be
called by generic code".

Also the path component name is passed as "name" and "len" which are
(confusingly?) separate by the "base".  In some cases the len in simply
"strlen" and so passing a qstr using QSTR() would make the calling
clearer.
Other callers do pass separate name and len which are stored in a
struct.  Sometimes these are already stored in a qstr, other times it
easily could be.

So this patch changes these three functions to receive a 'struct qstr',
and improves the documentation.

QSTR_LEN() is added to make it easy to pass a QSTR containing a known
len.

Signed-off-by: NeilBrown <neil@brown.name>
---
 Documentation/filesystems/porting.rst |  9 +++++
 fs/btrfs/ioctl.c                      |  9 ++---
 fs/exportfs/expfs.c                   |  5 ++-
 fs/namei.c                            | 51 ++++++++++++---------------
 fs/overlayfs/namei.c                  | 10 +++---
 fs/overlayfs/overlayfs.h              |  2 +-
 fs/overlayfs/readdir.c                |  9 ++---
 fs/smb/server/smb2pdu.c               |  7 ++--
 include/linux/dcache.h                |  3 +-
 include/linux/namei.h                 |  9 +++--
 10 files changed, 57 insertions(+), 57 deletions(-)

Christian Brauner March 19, 2025, 8:40 a.m. UTC | #1

On Wed, 19 Mar 2025 14:01:32 +1100, NeilBrown wrote:
> The family of functions:
>   lookup_one()
>   lookup_one_unlocked()
>   lookup_one_positive_unlocked()
> 
> appear designed to be used by external clients of the filesystem rather
> than by filesystems acting on themselves as the lookup_one_len family
> are used.
> 
> [...]

Applied to the vfs-6.16.async.dir branch of the vfs/vfs.git tree.
Patches in the vfs-6.16.async.dir branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.16.async.dir

[1/6] VFS: improve interface for lookup_one functions
      https://git.kernel.org/vfs/vfs/c/d01d332dbf28
[2/6] nfsd: Use lookup_one() rather than lookup_one_len()
      https://git.kernel.org/vfs/vfs/c/4c664a5962b7
[3/6] cachefiles: Use lookup_one() rather than lookup_one_len()
      https://git.kernel.org/vfs/vfs/c/ebc0dcbf5ba2
[4/6] VFS: rename lookup_one_len family to lookup_noperm and remove permission check
      https://git.kernel.org/vfs/vfs/c/12597aa5fea6
[5/6] Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS
      https://git.kernel.org/vfs/vfs/c/d2bacbb4495a
[6/6] VFS: change lookup_one_common and lookup_noperm_common to take a qstr
      https://git.kernel.org/vfs/vfs/c/b496b0712e63

Jeff Layton March 20, 2025, 10:17 a.m. UTC | #2

On Wed, 2025-03-19 at 14:01 +1100, NeilBrown wrote:
> The family of functions:
>   lookup_one()
>   lookup_one_unlocked()
>   lookup_one_positive_unlocked()
> 
> appear designed to be used by external clients of the filesystem rather
> than by filesystems acting on themselves as the lookup_one_len family
> are used.
> 
> They are used by:
>    btrfs/ioctl - which is a user-space interface rather than an internal
>      activity
>    exportfs - i.e. from nfsd or the open_by_handle_at interface
>    overlayfs - at access the underlying filesystems
>    smb/server - for file service
> 
> They should be used by nfsd (more than just the exportfs path) and
> cachefs but aren't.
> 
> It would help if the documentation didn't claim they should "not be
> called by generic code".
> 

I read that as generic VFS code, since this is really intended to used
outside that, but I agree the term "generic" is vague here.

> Also the path component name is passed as "name" and "len" which are
> (confusingly?) separate by the "base".  In some cases the len in simply
> "strlen" and so passing a qstr using QSTR() would make the calling
> clearer.
> Other callers do pass separate name and len which are stored in a
> struct.  Sometimes these are already stored in a qstr, other times it
> easily could be.
> 
> So this patch changes these three functions to receive a 'struct qstr',
> and improves the documentation.
> 
> QSTR_LEN() is added to make it easy to pass a QSTR containing a known
> len.
> 
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
>  Documentation/filesystems/porting.rst |  9 +++++
>  fs/btrfs/ioctl.c                      |  9 ++---
>  fs/exportfs/expfs.c                   |  5 ++-
>  fs/namei.c                            | 51 ++++++++++++---------------
>  fs/overlayfs/namei.c                  | 10 +++---
>  fs/overlayfs/overlayfs.h              |  2 +-
>  fs/overlayfs/readdir.c                |  9 ++---
>  fs/smb/server/smb2pdu.c               |  7 ++--
>  include/linux/dcache.h                |  3 +-
>  include/linux/namei.h                 |  9 +++--
>  10 files changed, 57 insertions(+), 57 deletions(-)
> 
> diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
> index 6817614e0820..06296ffd1e81 100644
> --- a/Documentation/filesystems/porting.rst
> +++ b/Documentation/filesystems/porting.rst
> @@ -1196,3 +1196,12 @@ should use d_drop();d_splice_alias() and return the result of the latter.
>  If a positive dentry cannot be returned for some reason, in-kernel
>  clients such as cachefiles, nfsd, smb/server may not perform ideally but
>  will fail-safe.
> +
> +---
> +
> +** mandatory**
> +
> +lookup_one(), lookup_one_unlocked(), lookup_one_positive_unlocked() now
> +take a qstr instead of a name and len.  These, not the "one_len"
> +versions, should be used whenever accessing a filesystem from outside
> +that filesysmtem, through a mount point - which will have a mnt_idmap.
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 6c18bad53cd3..f94b638f9478 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -911,7 +911,7 @@ static noinline int btrfs_mksubvol(const struct path *parent,
>  	if (error == -EINTR)
>  		return error;
>  
> -	dentry = lookup_one(idmap, name, parent->dentry, namelen);
> +	dentry = lookup_one(idmap, QSTR_LEN(name, namelen), parent->dentry);
>  	error = PTR_ERR(dentry);
>  	if (IS_ERR(dentry))
>  		goto out_unlock;
> @@ -2289,7 +2289,6 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
>  	struct btrfs_ioctl_vol_args_v2 *vol_args2 = NULL;
>  	struct mnt_idmap *idmap = file_mnt_idmap(file);
>  	char *subvol_name, *subvol_name_ptr = NULL;
> -	int subvol_namelen;
>  	int ret = 0;
>  	bool destroy_parent = false;
>  
> @@ -2412,10 +2411,8 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
>  			goto out;
>  	}
>  
> -	subvol_namelen = strlen(subvol_name);
> -
>  	if (strchr(subvol_name, '/') ||
> -	    strncmp(subvol_name, "..", subvol_namelen) == 0) {
> +	    strcmp(subvol_name, "..") == 0) {
>  		ret = -EINVAL;
>  		goto free_subvol_name;
>  	}
> @@ -2428,7 +2425,7 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
>  	ret = down_write_killable_nested(&dir->i_rwsem, I_MUTEX_PARENT);
>  	if (ret == -EINTR)
>  		goto free_subvol_name;
> -	dentry = lookup_one(idmap, subvol_name, parent, subvol_namelen);
> +	dentry = lookup_one(idmap, QSTR(subvol_name), parent);
>  	if (IS_ERR(dentry)) {
>  		ret = PTR_ERR(dentry);
>  		goto out_unlock_dir;
> diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
> index 0c899cfba578..974b432087aa 100644
> --- a/fs/exportfs/expfs.c
> +++ b/fs/exportfs/expfs.c
> @@ -145,7 +145,7 @@ static struct dentry *reconnect_one(struct vfsmount *mnt,
>  	if (err)
>  		goto out_err;
>  	dprintk("%s: found name: %s\n", __func__, nbuf);
> -	tmp = lookup_one_unlocked(mnt_idmap(mnt), nbuf, parent, strlen(nbuf));
> +	tmp = lookup_one_unlocked(mnt_idmap(mnt), QSTR(nbuf), parent);
>  	if (IS_ERR(tmp)) {
>  		dprintk("lookup failed: %ld\n", PTR_ERR(tmp));
>  		err = PTR_ERR(tmp);
> @@ -551,8 +551,7 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
>  		}
>  
>  		inode_lock(target_dir->d_inode);
> -		nresult = lookup_one(mnt_idmap(mnt), nbuf,
> -				     target_dir, strlen(nbuf));
> +		nresult = lookup_one(mnt_idmap(mnt), QSTR(nbuf), target_dir);
>  		if (!IS_ERR(nresult)) {
>  			if (unlikely(nresult->d_inode != result->d_inode)) {
>  				dput(nresult);
> diff --git a/fs/namei.c b/fs/namei.c
> index b5abf456c5f4..75816fa80028 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2922,19 +2922,17 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
>  EXPORT_SYMBOL(lookup_one_len);
>  
>  /**
> - * lookup_one - filesystem helper to lookup single pathname component
> + * lookup_one - lookup single pathname component
>   * @idmap:	idmap of the mount the lookup is performed from
> - * @name:	pathname component to lookup
> + * @name:	qstr holding pathname component to lookup
>   * @base:	base directory to lookup from
> - * @len:	maximum length @len should be interpreted to
>   *
> - * Note that this routine is purely a helper for filesystem usage and should
> - * not be called by generic code.
> + * This can be used for in-kernel filesystem clients such as file servers.
>   *
>   * The caller must hold base->i_mutex.
>   */
> -struct dentry *lookup_one(struct mnt_idmap *idmap, const char *name,
> -			  struct dentry *base, int len)
> +struct dentry *lookup_one(struct mnt_idmap *idmap, struct qstr name,
> +			  struct dentry *base)
>  {
>  	struct dentry *dentry;
>  	struct qstr this;
> @@ -2942,7 +2940,7 @@ struct dentry *lookup_one(struct mnt_idmap *idmap, const char *name,
>  
>  	WARN_ON_ONCE(!inode_is_locked(base->d_inode));
>  
> -	err = lookup_one_common(idmap, name, base, len, &this);
> +	err = lookup_one_common(idmap, name.name, base, name.len, &this);
>  	if (err)
>  		return ERR_PTR(err);
>  
> @@ -2952,27 +2950,24 @@ struct dentry *lookup_one(struct mnt_idmap *idmap, const char *name,
>  EXPORT_SYMBOL(lookup_one);
>  
>  /**
> - * lookup_one_unlocked - filesystem helper to lookup single pathname component
> + * lookup_one_unlocked - lookup single pathname component
>   * @idmap:	idmap of the mount the lookup is performed from
> - * @name:	pathname component to lookup
> + * @name:	qstr olding pathname component to lookup
>   * @base:	base directory to lookup from
> - * @len:	maximum length @len should be interpreted to
>   *
> - * Note that this routine is purely a helper for filesystem usage and should
> - * not be called by generic code.
> + * This can be used for in-kernel filesystem clients such as file servers.
>   *
>   * Unlike lookup_one_len, it should be called without the parent
> - * i_mutex held, and will take the i_mutex itself if necessary.
> + * i_rwsem held, and will take the i_rwsem itself if necessary.
>   */
>  struct dentry *lookup_one_unlocked(struct mnt_idmap *idmap,
> -				   const char *name, struct dentry *base,
> -				   int len)
> +				   struct qstr name, struct dentry *base)
>  {
>  	struct qstr this;
>  	int err;
>  	struct dentry *ret;
>  
> -	err = lookup_one_common(idmap, name, base, len, &this);
> +	err = lookup_one_common(idmap, name.name, base, name.len, &this);
>  	if (err)
>  		return ERR_PTR(err);
>  
> @@ -2984,12 +2979,10 @@ struct dentry *lookup_one_unlocked(struct mnt_idmap *idmap,
>  EXPORT_SYMBOL(lookup_one_unlocked);
>  
>  /**
> - * lookup_one_positive_unlocked - filesystem helper to lookup single
> - *				  pathname component
> + * lookup_one_positive_unlocked - lookup single pathname component
>   * @idmap:	idmap of the mount the lookup is performed from
> - * @name:	pathname component to lookup
> + * @name:	qstr holding pathname component to lookup
>   * @base:	base directory to lookup from
> - * @len:	maximum length @len should be interpreted to
>   *
>   * This helper will yield ERR_PTR(-ENOENT) on negatives. The helper returns
>   * known positive or ERR_PTR(). This is what most of the users want.
> @@ -2998,16 +2991,15 @@ EXPORT_SYMBOL(lookup_one_unlocked);
>   * time, so callers of lookup_one_unlocked() need to be very careful; pinned
>   * positives have >d_inode stable, so this one avoids such problems.
>   *
> - * Note that this routine is purely a helper for filesystem usage and should
> - * not be called by generic code.
> + * This can be used for in-kernel filesystem clients such as file servers.
>   *
> - * The helper should be called without i_mutex held.
> + * The helper should be called without i_rwsem held.
>   */
>  struct dentry *lookup_one_positive_unlocked(struct mnt_idmap *idmap,
> -					    const char *name,
> -					    struct dentry *base, int len)
> +					    struct qstr name,
> +					    struct dentry *base)
>  {
> -	struct dentry *ret = lookup_one_unlocked(idmap, name, base, len);
> +	struct dentry *ret = lookup_one_unlocked(idmap, name, base);
>  
>  	if (!IS_ERR(ret) && d_flags_negative(smp_load_acquire(&ret->d_flags))) {
>  		dput(ret);
> @@ -3032,7 +3024,7 @@ EXPORT_SYMBOL(lookup_one_positive_unlocked);
>  struct dentry *lookup_one_len_unlocked(const char *name,
>  				       struct dentry *base, int len)
>  {
> -	return lookup_one_unlocked(&nop_mnt_idmap, name, base, len);
> +	return lookup_one_unlocked(&nop_mnt_idmap, QSTR_LEN(name, len), base);
>  }
>  EXPORT_SYMBOL(lookup_one_len_unlocked);
>  
> @@ -3047,7 +3039,8 @@ EXPORT_SYMBOL(lookup_one_len_unlocked);
>  struct dentry *lookup_positive_unlocked(const char *name,
>  				       struct dentry *base, int len)
>  {
> -	return lookup_one_positive_unlocked(&nop_mnt_idmap, name, base, len);
> +	return lookup_one_positive_unlocked(&nop_mnt_idmap, QSTR_LEN(name, len),
> +					    base);
>  }
>  EXPORT_SYMBOL(lookup_positive_unlocked);
>  
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index be5c65d6f848..6a6301e4bba5 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -205,8 +205,8 @@ static struct dentry *ovl_lookup_positive_unlocked(struct ovl_lookup_data *d,
>  						   struct dentry *base, int len,
>  						   bool drop_negative)
>  {
> -	struct dentry *ret = lookup_one_unlocked(mnt_idmap(d->layer->mnt), name,
> -						 base, len);
> +	struct dentry *ret = lookup_one_unlocked(mnt_idmap(d->layer->mnt),
> +						 QSTR_LEN(name, len), base);
>  
>  	if (!IS_ERR(ret) && d_flags_negative(smp_load_acquire(&ret->d_flags))) {
>  		if (drop_negative && ret->d_lockref.count == 1) {
> @@ -789,8 +789,8 @@ struct dentry *ovl_lookup_index(struct ovl_fs *ofs, struct dentry *upper,
>  	if (err)
>  		return ERR_PTR(err);
>  
> -	index = lookup_one_positive_unlocked(ovl_upper_mnt_idmap(ofs), name.name,
> -					     ofs->workdir, name.len);
> +	index = lookup_one_positive_unlocked(ovl_upper_mnt_idmap(ofs), name,
> +					     ofs->workdir);
>  	if (IS_ERR(index)) {
>  		err = PTR_ERR(index);
>  		if (err == -ENOENT) {
> @@ -1396,7 +1396,7 @@ bool ovl_lower_positive(struct dentry *dentry)
>  
>  		this = lookup_one_positive_unlocked(
>  				mnt_idmap(parentpath->layer->mnt),
> -				name->name, parentpath->dentry, name->len);
> +				*name, parentpath->dentry);
>  		if (IS_ERR(this)) {
>  			switch (PTR_ERR(this)) {
>  			case -ENOENT:
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 6f2f8f4cfbbc..ceaf4eb199c7 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -402,7 +402,7 @@ static inline struct dentry *ovl_lookup_upper(struct ovl_fs *ofs,
>  					      const char *name,
>  					      struct dentry *base, int len)
>  {
> -	return lookup_one(ovl_upper_mnt_idmap(ofs), name, base, len);
> +	return lookup_one(ovl_upper_mnt_idmap(ofs), QSTR_LEN(name, len), base);
>  }
>  
>  static inline bool ovl_open_flags_need_copy_up(int flags)
> diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> index 881ec5592da5..68df61f4bba7 100644
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -271,7 +271,6 @@ static bool ovl_fill_merge(struct dir_context *ctx, const char *name,
>  static int ovl_check_whiteouts(const struct path *path, struct ovl_readdir_data *rdd)
>  {
>  	int err;
> -	struct ovl_cache_entry *p;
>  	struct dentry *dentry, *dir = path->dentry;
>  	const struct cred *old_cred;
>  
> @@ -280,9 +279,11 @@ static int ovl_check_whiteouts(const struct path *path, struct ovl_readdir_data
>  	err = down_write_killable(&dir->d_inode->i_rwsem);
>  	if (!err) {
>  		while (rdd->first_maybe_whiteout) {
> -			p = rdd->first_maybe_whiteout;
> +			struct ovl_cache_entry *p =
> +				rdd->first_maybe_whiteout;
>  			rdd->first_maybe_whiteout = p->next_maybe_whiteout;
> -			dentry = lookup_one(mnt_idmap(path->mnt), p->name, dir, p->len);
> +			dentry = lookup_one(mnt_idmap(path->mnt),
> +					    QSTR_LEN(p->name, p->len), dir);
>  			if (!IS_ERR(dentry)) {
>  				p->is_whiteout = ovl_is_whiteout(dentry);
>  				dput(dentry);
> @@ -492,7 +493,7 @@ static int ovl_cache_update(const struct path *path, struct ovl_cache_entry *p,
>  		}
>  	}
>  	/* This checks also for xwhiteouts */
> -	this = lookup_one(mnt_idmap(path->mnt), p->name, dir, p->len);
> +	this = lookup_one(mnt_idmap(path->mnt), QSTR_LEN(p->name, p->len), dir);
>  	if (IS_ERR_OR_NULL(this) || !this->d_inode) {
>  		/* Mark a stale entry */
>  		p->is_whiteout = true;
> diff --git a/fs/smb/server/smb2pdu.c b/fs/smb/server/smb2pdu.c
> index f1efcd027475..c862e3bd4531 100644
> --- a/fs/smb/server/smb2pdu.c
> +++ b/fs/smb/server/smb2pdu.c
> @@ -4091,9 +4091,10 @@ static int process_query_dir_entries(struct smb2_query_dir_private *priv)
>  			return -EINVAL;
>  
>  		lock_dir(priv->dir_fp);
> -		dent = lookup_one(idmap, priv->d_info->name,
> -				  priv->dir_fp->filp->f_path.dentry,
> -				  priv->d_info->name_len);
> +		dent = lookup_one(idmap,
> +				  QSTR_LEN(priv->d_info->name,
> +					   priv->d_info->name_len),
> +				  priv->dir_fp->filp->f_path.dentry);
>  		unlock_dir(priv->dir_fp);
>  
>  		if (IS_ERR(dent)) {
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 45bff10d3773..1f01f4e734c5 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -57,7 +57,8 @@ struct qstr {
>  };
>  
>  #define QSTR_INIT(n,l) { { { .len = l } }, .name = n }
> -#define QSTR(n) (struct qstr)QSTR_INIT(n, strlen(n))
> +#define QSTR_LEN(n,l) (struct qstr)QSTR_INIT(n,l)
> +#define QSTR(n) QSTR_LEN(n, strlen(n))
>  
>  extern const struct qstr empty_name;
>  extern const struct qstr slash_name;
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index e3042176cdf4..508dae67e3c5 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -73,13 +73,12 @@ extern struct dentry *try_lookup_one_len(const char *, struct dentry *, int);
>  extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
>  extern struct dentry *lookup_one_len_unlocked(const char *, struct dentry *, int);
>  extern struct dentry *lookup_positive_unlocked(const char *, struct dentry *, int);
> -struct dentry *lookup_one(struct mnt_idmap *, const char *, struct dentry *, int);
> +struct dentry *lookup_one(struct mnt_idmap *, struct qstr, struct dentry *);
>  struct dentry *lookup_one_unlocked(struct mnt_idmap *idmap,
> -				   const char *name, struct dentry *base,
> -				   int len);
> +				   struct qstr name, struct dentry *base);
>  struct dentry *lookup_one_positive_unlocked(struct mnt_idmap *idmap,
> -					    const char *name,
> -					    struct dentry *base, int len);
> +					    struct qstr name,
> +					    struct dentry *base);
>  
>  extern int follow_down_one(struct path *);
>  extern int follow_down(struct path *path, unsigned int flags);

Reviewed-by: Jeff Layton <jlayton@kernel.org>

David Howells March 20, 2025, 2:04 p.m. UTC | #3

NeilBrown <neil@brown.name> wrote:

> Also the path component name is passed as "name" and "len" which are
> (confusingly?) separate by the "base".  In some cases the len in simply
> "strlen" and so passing a qstr using QSTR() would make the calling
> clearer.
> Other callers do pass separate name and len which are stored in a
> struct.  Sometimes these are already stored in a qstr, other times it
> easily could be.
> 
> So this patch changes these three functions to receive a 'struct qstr',
> and improves the documentation.

You did want 'struct qstr' not 'struct qstr *' right?  I think there are
arches where this will cause the compiler to skip a register argument or two
if it's the second argument or third argument - i386 for example.  Plus you
have an 8-byte alignment requirement because of the u64 in it that may suck if
passed through several layers of function.

David

Al Viro March 22, 2025, 12:27 a.m. UTC | #4

On Wed, Mar 19, 2025 at 02:01:32PM +1100, NeilBrown wrote:

> -struct dentry *lookup_one(struct mnt_idmap *idmap, const char *name,
> -			  struct dentry *base, int len)
> +struct dentry *lookup_one(struct mnt_idmap *idmap, struct qstr name,
> +			  struct dentry *base)

>  {
>  	struct dentry *dentry;
>  	struct qstr this;
> @@ -2942,7 +2940,7 @@ struct dentry *lookup_one(struct mnt_idmap *idmap, const char *name,
>  
>  	WARN_ON_ONCE(!inode_is_locked(base->d_inode));
>  
> -	err = lookup_one_common(idmap, name, base, len, &this);
> +	err = lookup_one_common(idmap, name.name, base, name.len, &this);

No.  Just look at what lookup_one_common() is doing as the first step.

        this->name = name;
	this->len = len;

You copy your argument's fields to corresponding fields of *&this.  It might make
sense to pass a qstr, but not like that - just pass a _pointer_ to struct qstr instead.

Have lookup_one_common() do this:

static int lookup_one_common(struct mnt_idmap *idmap,
                             struct qstr *this, struct dentry *base)
{
	const unsigned char *name = this->name;
	int len = this->len;
        if (!len)
                return -EACCES;

        this->hash = full_name_hash(base, name, len);
        if (is_dot_dotdot(name, len))
                return -EACCES;

        while (len--) {
                unsigned int c = *name++;
                if (c == '/' || c == '\0')
                        return -EACCES;
        }
        /*
         * See if the low-level filesystem might want
         * to use its own hash..
         */
        if (base->d_flags & DCACHE_OP_HASH) {
                int err = base->d_op->d_hash(base, this);
                if (err < 0)
                        return err;
        }

        return inode_permission(idmap, base->d_inode, MAY_EXEC);
}

and adjust the callers; e.g.
struct dentry *lookup_one(struct mnt_idmap *idmap, struct qstr *this,
			  struct dentry *base)
{
        struct dentry *dentry;
        int err;

        WARN_ON_ONCE(!inode_is_locked(base->d_inode));

        err = lookup_one_common(idmap, this, base);
        if (err)
                return ERR_PTR(err);

        dentry = lookup_dcache(this, base, 0);
        return dentry ? dentry : __lookup_slow(this, base, 0);
}

with callers passing idmap, &QSTR_LEN(name, len), base instead of
idmap, name, base, len.  lookup_one_common() looks at the fields
separately; its callers do not.

Al Viro March 22, 2025, 12:29 a.m. UTC | #5

On Thu, Mar 20, 2025 at 02:04:49PM +0000, David Howells wrote:
> NeilBrown <neil@brown.name> wrote:
> 
> > Also the path component name is passed as "name" and "len" which are
> > (confusingly?) separate by the "base".  In some cases the len in simply
> > "strlen" and so passing a qstr using QSTR() would make the calling
> > clearer.
> > Other callers do pass separate name and len which are stored in a
> > struct.  Sometimes these are already stored in a qstr, other times it
> > easily could be.
> > 
> > So this patch changes these three functions to receive a 'struct qstr',
> > and improves the documentation.
> 
> You did want 'struct qstr' not 'struct qstr *' right?  I think there are
> arches where this will cause the compiler to skip a register argument or two
> if it's the second argument or third argument - i386 for example.  Plus you
> have an 8-byte alignment requirement because of the u64 in it that may suck if
> passed through several layers of function.

Not just that - you end up with *two* struct qstr instances for no good reason,
copying from the one passed by value field-by-field into the other one, then
passing the *address* of the copy to the functions that do actual work.

NeilBrown March 28, 2025, 1:14 a.m. UTC | #6

On Sat, 22 Mar 2025, Al Viro wrote:
> On Wed, Mar 19, 2025 at 02:01:32PM +1100, NeilBrown wrote:
> 
> > -struct dentry *lookup_one(struct mnt_idmap *idmap, const char *name,
> > -			  struct dentry *base, int len)
> > +struct dentry *lookup_one(struct mnt_idmap *idmap, struct qstr name,
> > +			  struct dentry *base)
> 
> >  {
> >  	struct dentry *dentry;
> >  	struct qstr this;
> > @@ -2942,7 +2940,7 @@ struct dentry *lookup_one(struct mnt_idmap *idmap, const char *name,
> >  
> >  	WARN_ON_ONCE(!inode_is_locked(base->d_inode));
> >  
> > -	err = lookup_one_common(idmap, name, base, len, &this);
> > +	err = lookup_one_common(idmap, name.name, base, name.len, &this);
> 
> No.  Just look at what lookup_one_common() is doing as the first step.
> 
>         this->name = name;
> 	this->len = len;

This code is cleaned up in a later patch. lookup_one_common receives the
address of just one qstr which is initialised with qstr that is passed
in.
So on x86_64, the original qstr is passed in as 2 registers.  These are
stored in the stack and the address is passed to lookup_noperm_common(),
as lookup_one_common() gets inlined.

We have to put the two values onto the stack at some point, either in
the original callers, or in the lookup_one family of functions.  I think
it is cleaner in lookup_one as we don't need to put a & in front of all
the QSTR calls.  But we can change it to pass the pointer if you really
think that is better.

Thanks,
NeilBrown


> 
> You copy your argument's fields to corresponding fields of *&this.  It might make
> sense to pass a qstr, but not like that - just pass a _pointer_ to struct qstr instead.
> 
> Have lookup_one_common() do this:
> 
> static int lookup_one_common(struct mnt_idmap *idmap,
>                              struct qstr *this, struct dentry *base)
> {
> 	const unsigned char *name = this->name;
> 	int len = this->len;
>         if (!len)
>                 return -EACCES;
> 
>         this->hash = full_name_hash(base, name, len);
>         if (is_dot_dotdot(name, len))
>                 return -EACCES;
> 
>         while (len--) {
>                 unsigned int c = *name++;
>                 if (c == '/' || c == '\0')
>                         return -EACCES;
>         }
>         /*
>          * See if the low-level filesystem might want
>          * to use its own hash..
>          */
>         if (base->d_flags & DCACHE_OP_HASH) {
>                 int err = base->d_op->d_hash(base, this);
>                 if (err < 0)
>                         return err;
>         }
> 
>         return inode_permission(idmap, base->d_inode, MAY_EXEC);
> }
> 
> and adjust the callers; e.g.
> struct dentry *lookup_one(struct mnt_idmap *idmap, struct qstr *this,
> 			  struct dentry *base)
> {
>         struct dentry *dentry;
>         int err;
> 
>         WARN_ON_ONCE(!inode_is_locked(base->d_inode));
> 
>         err = lookup_one_common(idmap, this, base);
>         if (err)
>                 return ERR_PTR(err);
> 
>         dentry = lookup_dcache(this, base, 0);
>         return dentry ? dentry : __lookup_slow(this, base, 0);
> }
> 
> with callers passing idmap, &QSTR_LEN(name, len), base instead of
> idmap, name, base, len.  lookup_one_common() looks at the fields
> separately; its callers do not.
>

NeilBrown March 28, 2025, 1:18 a.m. UTC | #7

On Fri, 21 Mar 2025, David Howells wrote:
> NeilBrown <neil@brown.name> wrote:
> 
> > Also the path component name is passed as "name" and "len" which are
> > (confusingly?) separate by the "base".  In some cases the len in simply
> > "strlen" and so passing a qstr using QSTR() would make the calling
> > clearer.
> > Other callers do pass separate name and len which are stored in a
> > struct.  Sometimes these are already stored in a qstr, other times it
> > easily could be.
> > 
> > So this patch changes these three functions to receive a 'struct qstr',
> > and improves the documentation.
> 
> You did want 'struct qstr' not 'struct qstr *' right?  I think there are
> arches where this will cause the compiler to skip a register argument or two
> if it's the second argument or third argument - i386 for example.  Plus you
> have an 8-byte alignment requirement because of the u64 in it that may suck if
> passed through several layers of function.

I don't think it is passed through several layers - except where the
intermediate are inlined.
And gcc enforces 16 byte alignment of the stack on function calls for
i386, so I don't think alignment will be an issue.

I thought 'struct qstr' would result in slightly cleaner calling.  But I
cannot make a strong argument in favour of it so I'm willing to change
if there are concerns.

Thanks,
NeilBrown

Christian Brauner April 4, 2025, 1:41 p.m. UTC | #8

On Fri, Mar 28, 2025 at 12:18:16PM +1100, NeilBrown wrote:
> On Fri, 21 Mar 2025, David Howells wrote:
> > NeilBrown <neil@brown.name> wrote:
> > 
> > > Also the path component name is passed as "name" and "len" which are
> > > (confusingly?) separate by the "base".  In some cases the len in simply
> > > "strlen" and so passing a qstr using QSTR() would make the calling
> > > clearer.
> > > Other callers do pass separate name and len which are stored in a
> > > struct.  Sometimes these are already stored in a qstr, other times it
> > > easily could be.
> > > 
> > > So this patch changes these three functions to receive a 'struct qstr',
> > > and improves the documentation.
> > 
> > You did want 'struct qstr' not 'struct qstr *' right?  I think there are
> > arches where this will cause the compiler to skip a register argument or two
> > if it's the second argument or third argument - i386 for example.  Plus you
> > have an 8-byte alignment requirement because of the u64 in it that may suck if
> > passed through several layers of function.
> 
> I don't think it is passed through several layers - except where the
> intermediate are inlined.
> And gcc enforces 16 byte alignment of the stack on function calls for
> i386, so I don't think alignment will be an issue.
> 
> I thought 'struct qstr' would result in slightly cleaner calling.  But I
> cannot make a strong argument in favour of it so I'm willing to change
> if there are concerns.

Fwiw, I massaged the whole series to pass struct qstr * instead of
struct qstr. I just forgot to finish that rebase and push.
/me doing so now.

Christian Brauner April 4, 2025, 1:46 p.m. UTC | #9

On Fri, Apr 04, 2025 at 03:41:01PM +0200, Christian Brauner wrote:
> On Fri, Mar 28, 2025 at 12:18:16PM +1100, NeilBrown wrote:
> > On Fri, 21 Mar 2025, David Howells wrote:
> > > NeilBrown <neil@brown.name> wrote:
> > > 
> > > > Also the path component name is passed as "name" and "len" which are
> > > > (confusingly?) separate by the "base".  In some cases the len in simply
> > > > "strlen" and so passing a qstr using QSTR() would make the calling
> > > > clearer.
> > > > Other callers do pass separate name and len which are stored in a
> > > > struct.  Sometimes these are already stored in a qstr, other times it
> > > > easily could be.
> > > > 
> > > > So this patch changes these three functions to receive a 'struct qstr',
> > > > and improves the documentation.
> > > 
> > > You did want 'struct qstr' not 'struct qstr *' right?  I think there are
> > > arches where this will cause the compiler to skip a register argument or two
> > > if it's the second argument or third argument - i386 for example.  Plus you
> > > have an 8-byte alignment requirement because of the u64 in it that may suck if
> > > passed through several layers of function.
> > 
> > I don't think it is passed through several layers - except where the
> > intermediate are inlined.
> > And gcc enforces 16 byte alignment of the stack on function calls for
> > i386, so I don't think alignment will be an issue.
> > 
> > I thought 'struct qstr' would result in slightly cleaner calling.  But I
> > cannot make a strong argument in favour of it so I'm willing to change
> > if there are concerns.
> 
> Fwiw, I massaged the whole series to pass struct qstr * instead of
> struct qstr. I just forgot to finish that rebase and push.
> /me doing so now.

Fwiw, there were a bunch of build failures for me when I built the
individual commits that I fixed up. I generally do:

git rebase -i HEAD~XXXXX -x "make -j512"

with an allmodconfig to make sure that it cleanly builds at each commit.

NeilBrown April 4, 2025, 11 p.m. UTC | #10

On Sat, 05 Apr 2025, Christian Brauner wrote:
> On Fri, Apr 04, 2025 at 03:41:01PM +0200, Christian Brauner wrote:
> > On Fri, Mar 28, 2025 at 12:18:16PM +1100, NeilBrown wrote:
> > > On Fri, 21 Mar 2025, David Howells wrote:
> > > > NeilBrown <neil@brown.name> wrote:
> > > > 
> > > > > Also the path component name is passed as "name" and "len" which are
> > > > > (confusingly?) separate by the "base".  In some cases the len in simply
> > > > > "strlen" and so passing a qstr using QSTR() would make the calling
> > > > > clearer.
> > > > > Other callers do pass separate name and len which are stored in a
> > > > > struct.  Sometimes these are already stored in a qstr, other times it
> > > > > easily could be.
> > > > > 
> > > > > So this patch changes these three functions to receive a 'struct qstr',
> > > > > and improves the documentation.
> > > > 
> > > > You did want 'struct qstr' not 'struct qstr *' right?  I think there are
> > > > arches where this will cause the compiler to skip a register argument or two
> > > > if it's the second argument or third argument - i386 for example.  Plus you
> > > > have an 8-byte alignment requirement because of the u64 in it that may suck if
> > > > passed through several layers of function.
> > > 
> > > I don't think it is passed through several layers - except where the
> > > intermediate are inlined.
> > > And gcc enforces 16 byte alignment of the stack on function calls for
> > > i386, so I don't think alignment will be an issue.
> > > 
> > > I thought 'struct qstr' would result in slightly cleaner calling.  But I
> > > cannot make a strong argument in favour of it so I'm willing to change
> > > if there are concerns.
> > 
> > Fwiw, I massaged the whole series to pass struct qstr * instead of
> > struct qstr. I just forgot to finish that rebase and push.
> > /me doing so now.
> 
> Fwiw, there were a bunch of build failures for me when I built the
> individual commits that I fixed up. I generally do:
> 
> git rebase -i HEAD~XXXXX -x "make -j512"

Thanks for the fix-up! I'll have a look and compare with what I have.
I generally do something similar to the above - though my hardware
wouldn't handle -j512.

> 
> with an allmodconfig to make sure that it cleanly builds at each commit.
> 
Part of the code I'm changing is in arch/s390.  I'd need to get a
cross-compiler to build that.... maybe I should.

Thanks again.  I have a few more small cleanup (probably post on Monday)
and then a large batch which changes all the code which locks
directories for create/remove/rename.

One awkwardness which you might have an opinion on:
 In the final state we will be locking just the dentry, not the
 directory.  So the unlock only needs to be given the dentry.
 Something like "dentry_unlock(de);".

 Today we can implement that as "inode_unlock(de->d_parent->d_inode)"
 because d_parent is stable until the unlock happens.  After the switch
 it will involves clearing a bit and a possible wakeup.

 However:  vfs_mkdir() consumes the dentry on failure so there won't be
 any dentry to unlock - unless the caller keeps a copy, which would be
 clumsy.
 In the case where vfs_mkdir() returns a different dentry it will have
 to transfer the lock to that dentry (I don't see a problem there) so it
 will need to know about locking.  So I'm planing to have vfs_mkdir()
 drop the lock as well as put the dentry on failure.

 This ends up being quite an intrusive change.  Several other functions
 (in nfsd and particularly overlayfs) need to be changed to also drop
 the lock on failure.

 Do you think this is an acceptable API or should I explore other
 approaches?

 I've included my current draft patch so you can see the extent of the
 changes.  overlayfs is particularly interesting - it sometimes holds a
 rename lock across vfs_mkdir(), so we need to unlock the "other"
 directory after failure.  I think it will get cleaner later but I don't
 have code to prove it yet.

Thanks,
NeilBrown




Author: NeilBrown <neil@brown.name>
Date:   Fri Apr 4 16:43:25 2025 +1100

    Change vfs_mkdir() to unlock on failure.
    
    Proposed changes to directory-op locking will lock the dentry rather
    than the whole directory.  So the dentry will need to be unlocked.
    
    vfs_mkdir() consumes the dentry on error, so there will be no dentry to
    be unlocked.
    
    So change vfs_mkdir() to unlock on error.  This requires various other
    functions in various callers to also unlock on error.
    
    At present this results in some clumsy code.  Once the transition to
    dentry locking is complete the clumsiness will be gone.
    
    overlays looks particularly clumsy as in some cases a double-directory
    rename lock is taken, and a mkdir is then performed in one of the
    directories.  If that fails the other directory must be unlocked.
    
    Signed-off-by: NeilBrown <neil@brown.name>

diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 14d0cc894000..1a5905c313f1 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -131,8 +131,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
 		ret = cachefiles_inject_write_error();
 		if (ret == 0)
 			subdir = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), subdir, 0700);
-		else
+		else {
+			/* vfs_mkdir() unlocks on failure */
+			inode_unlock(d_inode(dir));
 			subdir = ERR_PTR(ret);
+		}
 		if (IS_ERR(subdir)) {
 			trace_cachefiles_vfs_error(NULL, d_inode(dir), ret,
 						   cachefiles_trace_mkdir_error);
@@ -196,9 +199,10 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
 	return ERR_PTR(-EBUSY);
 
 mkdir_error:
-	inode_unlock(d_inode(dir));
-	if (!IS_ERR(subdir))
+	if (!IS_ERR(subdir)) {
+		inode_unlock(d_inode(dir));
 		dput(subdir);
+	}
 	pr_err("mkdir %s failed with error %d\n", dirname, ret);
 	return ERR_PTR(ret);
 
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 51a5c54eb740..31e8abfff490 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -518,7 +518,7 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
 				 lower_dentry, mode);
 	rc = PTR_ERR(lower_dentry);
 	if (IS_ERR(lower_dentry))
-		goto out;
+		goto out_unlocked;
 	rc = 0;
 	if (d_unhashed(lower_dentry))
 		goto out;
@@ -530,6 +530,7 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
 	set_nlink(dir, lower_dir->i_nlink);
 out:
 	inode_unlock(lower_dir);
+out_unlocked:
 	if (d_really_is_negative(dentry))
 		d_drop(dentry);
 	return ERR_PTR(rc);
diff --git a/fs/namei.c b/fs/namei.c
index 360a86ca1f02..ba36883dd70a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4130,9 +4130,10 @@ EXPORT_SYMBOL(kern_path_create);
 
 void done_path_create(struct path *path, struct dentry *dentry)
 {
-	if (!IS_ERR(dentry))
+	if (!IS_ERR(dentry)) {
 		dput(dentry);
-	inode_unlock(path->dentry->d_inode);
+		inode_unlock(path->dentry->d_inode);
+	}
 	mnt_drop_write(path->mnt);
 	path_put(path);
 }
@@ -4295,7 +4296,8 @@ SYSCALL_DEFINE3(mknod, const char __user *, filename, umode_t, mode, unsigned, d
  * negative or unhashes it and possibly splices a different one returning it,
  * the original dentry is dput() and the alternate is returned.
  *
- * In case of an error the dentry is dput() and an ERR_PTR() is returned.
+ * In case of an error the dentry is dput(), the parent is unlocked, and
+ * an ERR_PTR() is returned.
  */
 struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
 			 struct dentry *dentry, umode_t mode)
@@ -4333,6 +4335,8 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
 	return dentry;
 
 err:
+	/* Caller only needs to unlock if dentry is not an error */
+	inode_unlock(dir);
 	dput(dentry);
 	return ERR_PTR(error);
 }
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index c1d9bd07285f..8907967135c6 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -221,7 +221,8 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
 	dentry = lookup_one_len(dname, dir, HEXDIR_LEN-1);
 	if (IS_ERR(dentry)) {
 		status = PTR_ERR(dentry);
-		goto out_unlock;
+		inode_unlock(d_inode(dir));
+		goto out;
 	}
 	if (d_really_is_positive(dentry))
 		/*
@@ -234,13 +235,14 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
 		 */
 		goto out_put;
 	dentry = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), dentry, S_IRWXU);
-	if (IS_ERR(dentry))
+	if (IS_ERR(dentry)) {
 		status = PTR_ERR(dentry);
+		goto out;
+	}
 out_put:
-	if (!status)
-		dput(dentry);
-out_unlock:
+	dput(dentry);
 	inode_unlock(d_inode(dir));
+out:
 	if (status == 0) {
 		if (nn->in_grace)
 			__nfsd4_create_reclaim_record_grace(clp, dname,
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 9abdc4b75813..0e935be8c2c1 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1450,7 +1450,9 @@ nfsd_check_ignore_resizing(struct iattr *iap)
 		iap->ia_valid &= ~ATTR_SIZE;
 }
 
-/* The parent directory should already be locked: */
+/* The parent directory should already be locked.  The lock
+ * will be dropped on error.
+ */
 __be32
 nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		   struct nfsd_attrs *attrs,
@@ -1516,8 +1518,11 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	err = nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
 
 out:
-	if (!IS_ERR(dchild))
+	if (!IS_ERR(dchild)) {
+		if (err)
+			inode_unlock(dirp);
 		dput(dchild);
+	}
 	return err;
 
 out_nfserr:
@@ -1572,6 +1577,9 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (err != nfs_ok)
 		goto out_unlock;
 	err = nfsd_create_locked(rqstp, fhp, attrs, type, rdev, resfhp);
+	if (err)
+		/* lock will have been dropped */
+		return err;
 	fh_fill_post_attrs(fhp);
 out_unlock:
 	inode_unlock(dentry->d_inode);
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index d7310fcf3888..324429d02569 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -518,7 +518,7 @@ static int ovl_set_upper_fh(struct ovl_fs *ofs, struct dentry *upper,
 /*
  * Create and install index entry.
  *
- * Caller must hold i_mutex on indexdir.
+ * Caller must hold i_mutex on indexdir.  It will be unlocked on error.
  */
 static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
 			    struct dentry *upper)
@@ -539,16 +539,22 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
 	 * TODO: implement create index for non-dir, so we can call it when
 	 * encoding file handle for non-dir in case index does not exist.
 	 */
-	if (WARN_ON(!d_is_dir(dentry)))
+	if (WARN_ON(!d_is_dir(dentry))) {
+		inode_unlock(dir);
 		return -EIO;
+	}
 
 	/* Directory not expected to be indexed before copy up */
-	if (WARN_ON(ovl_test_flag(OVL_INDEX, d_inode(dentry))))
+	if (WARN_ON(ovl_test_flag(OVL_INDEX, d_inode(dentry)))) {
+		inode_unlock(dir);
 		return -EIO;
+	}
 
 	err = ovl_get_index_name_fh(fh, &name);
-	if (err)
+	if (err) {
+		inode_unlock(dir);
 		return err;
+	}
 
 	temp = ovl_create_temp(ofs, indexdir, OVL_CATTR(S_IFDIR | 0));
 	err = PTR_ERR(temp);
@@ -567,8 +573,10 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
 		dput(index);
 	}
 out:
-	if (err)
+	if (err) {
 		ovl_cleanup(ofs, dir, temp);
+		inode_unlock(dir);
+	}
 	dput(temp);
 free_name:
 	kfree(name.name);
@@ -781,7 +789,8 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
 	ovl_start_write(c->dentry);
 	inode_lock(wdir);
 	temp = ovl_create_temp(ofs, c->workdir, &cattr);
-	inode_unlock(wdir);
+	if (!IS_ERR(wdir))
+		inode_unlock(wdir);
 	ovl_end_write(c->dentry);
 	ovl_revert_cu_creds(&cc);
 
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index fe493f3ed6b6..b4d92b51b63f 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -138,13 +138,16 @@ int ovl_cleanup_and_whiteout(struct ovl_fs *ofs, struct inode *dir,
 	goto out;
 }
 
+/* dir will be unlocked on failure */
 struct dentry *ovl_create_real(struct ovl_fs *ofs, struct inode *dir,
 			       struct dentry *newdentry, struct ovl_cattr *attr)
 {
 	int err;
 
-	if (IS_ERR(newdentry))
+	if (IS_ERR(newdentry)) {
+		inode_unlock(dir);
 		return newdentry;
+	}
 
 	err = -ESTALE;
 	if (newdentry->d_inode)
@@ -189,13 +192,16 @@ struct dentry *ovl_create_real(struct ovl_fs *ofs, struct inode *dir,
 	}
 out:
 	if (err) {
-		if (!IS_ERR(newdentry))
+		if (!IS_ERR(newdentry)) {
+			inode_unlock(dir);
 			dput(newdentry);
+		}
 		return ERR_PTR(err);
 	}
 	return newdentry;
 }
 
+/* Note workdir will be unlocked on failure */
 struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
 			       struct ovl_cattr *attr)
 {
@@ -309,7 +315,7 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
 				    attr);
 	err = PTR_ERR(newdentry);
 	if (IS_ERR(newdentry))
-		goto out_unlock;
+		goto out;
 
 	if (ovl_type_merge(dentry->d_parent) && d_is_dir(newdentry) &&
 	    !ovl_allow_offline_changes(ofs)) {
@@ -323,6 +329,7 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
 		goto out_cleanup;
 out_unlock:
 	inode_unlock(udir);
+out:
 	return err;
 
 out_cleanup:
@@ -367,9 +374,12 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
 
 	opaquedir = ovl_create_temp(ofs, workdir, OVL_CATTR(stat.mode));
 	err = PTR_ERR(opaquedir);
-	if (IS_ERR(opaquedir))
-		goto out_unlock;
-
+	if (IS_ERR(opaquedir)) {
+		/* workdir was unlocked, no upperdir */
+		inode_unlock(upperdir->d_inode);
+		mutex_unlock(&upperdir->d_sb->s_vfs_rename_mutex);
+		return ERR_PTR(err);
+	}
 	err = ovl_copy_xattr(dentry->d_sb, &upperpath, opaquedir);
 	if (err)
 		goto out_cleanup;
@@ -455,8 +465,13 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
 
 	newdentry = ovl_create_temp(ofs, workdir, cattr);
 	err = PTR_ERR(newdentry);
-	if (IS_ERR(newdentry))
-		goto out_dput;
+	if (IS_ERR(newdentry)) {
+		/* workdir was unlocked, not upperdir */
+		inode_unlock(upperdir->d_inode);
+		mutex_unlock(&upperdir->d_sb->s_vfs_rename_mutex);
+		dput(upper);
+		goto out;
+	}
 
 	/*
 	 * mode could have been mutilated due to umask (e.g. sgid directory)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 6f2f8f4cfbbc..f7f7c3d1ad63 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -248,6 +248,7 @@ static inline struct dentry *ovl_do_mkdir(struct ovl_fs *ofs,
 {
 	dentry = vfs_mkdir(ovl_upper_mnt_idmap(ofs), dir, dentry, mode);
 	pr_debug("mkdir(%pd2, 0%o) = %i\n", dentry, mode, PTR_ERR_OR_ZERO(dentry));
+	/* Note: dir will have been unlocked on failure */
 	return dentry;
 }
 
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index b63474d1b064..56055c70f200 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -366,14 +366,17 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
 			goto out_dput;
 	} else {
 		err = PTR_ERR(work);
+		inode_unlock(dir);
 		goto out_err;
 	}
 out_unlock:
-	inode_unlock(dir);
+	if (work && !IS_ERR(work))
+		inode_unlock(dir);
 	return work;
 
 out_dput:
 	dput(work);
+	inode_unlock(dir);
 out_err:
 	pr_warn("failed to create directory %s/%s (errno: %i); mounting read-only\n",
 		ofs->config.workdir, name, -err);
@@ -569,7 +572,7 @@ static int ovl_check_rename_whiteout(struct ovl_fs *ofs)
 	temp = ovl_create_temp(ofs, workdir, OVL_CATTR(S_IFREG | 0));
 	err = PTR_ERR(temp);
 	if (IS_ERR(temp))
-		goto out_unlock;
+		return err;
 
 	dest = ovl_lookup_temp(ofs, workdir);
 	err = PTR_ERR(dest);
@@ -620,10 +623,13 @@ static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
 
 	inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
 	child = ovl_lookup_upper(ofs, name, parent, len);
-	if (!IS_ERR(child) && !child->d_inode)
+	if (!IS_ERR(child) && !child->d_inode) {
 		child = ovl_create_real(ofs, parent->d_inode, child,
 					OVL_CATTR(mode));
-	inode_unlock(parent->d_inode);
+		if (!IS_ERR(child))
+			inode_unlock(parent->d_inode);
+	} else
+		inode_unlock(parent->d_inode);
 	dput(parent);
 
 	return child;
diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c
index 3537f3cca6d5..af59183374ae 100644
--- a/fs/xfs/scrub/orphanage.c
+++ b/fs/xfs/scrub/orphanage.c
@@ -171,7 +171,7 @@ xrep_orphanage_create(
 					     orphanage_dentry, 0750);
 		error = PTR_ERR(orphanage_dentry);
 		if (IS_ERR(orphanage_dentry))
-			goto out_unlock_root;
+			goto out_dput_root;
 	}
 
 	/* Not a directory? Bail out. */

[1/6] VFS: improve interface for lookup_one functions

Commit Message

Comments

Patch