[2/7] vfs: add inode iteration superblock method

Message ID	20241002014017.3801899-3-david@fromorbit.com (mailing list archive)
State	New
Headers	show Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 990265227 for <linux-fsdevel@vger.kernel.org>; Wed, 2 Oct 2024 01:40:24 +0000 (UTC) From: Dave Chinner <david@fromorbit.com> To: linux-fsdevel@vger.kernel.org Cc: linux-xfs@vger.kernel.org, linux-bcachefs@vger.kernel.org, kent.overstreet@linux.dev, torvalds@linux-foundation.org Subject: [PATCH 2/7] vfs: add inode iteration superblock method Date: Wed, 2 Oct 2024 11:33:19 +1000 Message-ID: <20241002014017.3801899-3-david@fromorbit.com> In-Reply-To: <20241002014017.3801899-1-david@fromorbit.com> References: <20241002014017.3801899-1-david@fromorbit.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	vfs: improving inode cache iteration scalability \| expand [RFC,0/7] vfs: improving inode cache iteration scalability [1/7] vfs: replace invalidate_inodes() with evict_inodes() [2/7] vfs: add inode iteration superblock method [3/7] vfs: convert vfs inode iterators to super_iter_inodes_unsafe() [4/7] vfs: Convert sb->s_inodes iteration to super_iter_inodes() [5/7] vfs: add inode iteration superblock method [6/7] xfs: implement sb->iter_vfs_inodes [7/7] bcachefs: implement sb->iter_vfs_inodes

Message ID

20241002014017.3801899-3-david@fromorbit.com (mailing list archive)

State

New

Headers

From: Dave Chinner <david@fromorbit.com>
To: linux-fsdevel@vger.kernel.org
Cc: linux-xfs@vger.kernel.org,
	linux-bcachefs@vger.kernel.org,
	kent.overstreet@linux.dev,
	torvalds@linux-foundation.org
Subject: [PATCH 2/7] vfs: add inode iteration superblock method
Date: Wed,  2 Oct 2024 11:33:19 +1000
Message-ID: <20241002014017.3801899-3-david@fromorbit.com>
In-Reply-To: <20241002014017.3801899-1-david@fromorbit.com>
References: <20241002014017.3801899-1-david@fromorbit.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

vfs: improving inode cache iteration scalability | expand

Commit Message

Dave Chinner Oct. 2, 2024, 1:33 a.m. UTC

From: Dave Chinner <dchinner@redhat.com>

Add a new superblock method for iterating all cached inodes in the
inode cache.

This will be used to replace the explicit sb->s_inodes iteration,
and the caller will supply a callback function and a private data
pointer that gets passed to the callback along with each inode that
is iterated.

There are two iteration functions provided. The first is the
interface that everyone should be using - it provides an valid,
unlocked and referenced inode that any inode operation (including
blocking operations) is allowed on. The iterator infrastructure is
responsible for lifecycle management, hence the subsystem callback
only needs to implement the operation it wants to perform on all
inodes.

The second iterator interface is the unsafe variant for internal VFS
use only. It simply iterates all VFS inodes without guaranteeing
any state or taking references. This iteration is done under a RCU
read lock to ensure that the VFS inode is not freed from under
the callback. If the operation wishes to block, it must drop the
RCU context after guaranteeing that the inode will not get freed.
This unsafe iteration mechanism is needed for operations that need
tight control over the state of the inodes they need to operate on.

This mechanism allows the existing sb->s_inodes iteration models
to be maintained, allowing a generic implementation for iterating
all cached inodes on the superblock to be provided.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/internal.h      |   2 +
 fs/super.c         | 105 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h |  12 ++++++
 3 files changed, 119 insertions(+)

Comments

Christoph Hellwig Oct. 3, 2024, 7:12 a.m. UTC | #1

On Wed, Oct 02, 2024 at 11:33:19AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Add a new superblock method for iterating all cached inodes in the
> inode cache.

The method is added later, this just adds an abstraction.

> +/**
> + * super_iter_inodes - iterate all the cached inodes on a superblock
> + * @sb: superblock to iterate
> + * @iter_fn: callback to run on every inode found.
> + *
> + * This function iterates all cached inodes on a superblock that are not in
> + * the process of being initialised or torn down. It will run @iter_fn() with
> + * a valid, referenced inode, so it is safe for the caller to do anything
> + * it wants with the inode except drop the reference the iterator holds.
> + *
> + */

Spurious empty comment line above.

> +void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn,
> +		void *private_data)
> +{
> +	struct inode *inode;
> +	int ret;
> +
> +	rcu_read_lock();
> +	spin_lock(&sb->s_inode_list_lock);
> +	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> +		ret = iter_fn(inode, private_data);
> +		if (ret == INO_ITER_ABORT)
> +			break;
> +	}

Looking at the entire series, splitting the helpers for the unsafe
vs safe iteration feels a bit of an odd API design given that the
INO_ITER_REFERENCED can be passed to super_iter_inodes, but is an
internal flag pass here to the file system method.  Not sure what
the best way to do it, but maybe just make super_iter_inodes
a wrapper that calls into the method if available, or
a generic_iter_inodes_unsafe if the unsafe flag is set, else
a plain generic_iter_inodes?

> +/* Inode iteration callback return values */
> +#define INO_ITER_DONE		0
> +#define INO_ITER_ABORT		1
> +
> +/* Inode iteration control flags */
> +#define INO_ITER_REFERENCED	(1U << 0)
> +#define INO_ITER_UNSAFE		(1U << 1)

Please adjust the naming a bit to make clear these are different
namespaces, e.g. INO_ITER_RET_ and INO_ITER_F_.

Dave Chinner Oct. 3, 2024, 10:35 a.m. UTC | #2

On Thu, Oct 03, 2024 at 12:12:29AM -0700, Christoph Hellwig wrote:
> On Wed, Oct 02, 2024 at 11:33:19AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Add a new superblock method for iterating all cached inodes in the
> > inode cache.
> 
> The method is added later, this just adds an abstraction.

Ah, I forgot to remove that from the commit message when I split the
patch up....

> > +/**
> > + * super_iter_inodes - iterate all the cached inodes on a superblock
> > + * @sb: superblock to iterate
> > + * @iter_fn: callback to run on every inode found.
> > + *
> > + * This function iterates all cached inodes on a superblock that are not in
> > + * the process of being initialised or torn down. It will run @iter_fn() with
> > + * a valid, referenced inode, so it is safe for the caller to do anything
> > + * it wants with the inode except drop the reference the iterator holds.
> > + *
> > + */
> 
> Spurious empty comment line above.
> 
> > +void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn,
> > +		void *private_data)
> > +{
> > +	struct inode *inode;
> > +	int ret;
> > +
> > +	rcu_read_lock();
> > +	spin_lock(&sb->s_inode_list_lock);
> > +	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> > +		ret = iter_fn(inode, private_data);
> > +		if (ret == INO_ITER_ABORT)
> > +			break;
> > +	}
> 
> Looking at the entire series, splitting the helpers for the unsafe
> vs safe iteration feels a bit of an odd API design given that the
> INO_ITER_REFERENCED can be passed to super_iter_inodes, but is an
> internal flag pass here to the file system method.

The INO_ITER_REFERENCED flag is a hack to support the whacky
fsnotify and landlock iterators that are run after evict_inodes()
(which you already noticed...).  i.e.  there should not be any
unreferenced inodes at this point, so if any are found they should
be skipped.

I think it might be better to remove that flag and replace the
iterator implementation with some kind of SB flag and
WARN_ON_ONCE that fires if a referenced inode is found. With that,
the flags field for super_iter_inodes() can go away...

> Not sure what
> the best way to do it, but maybe just make super_iter_inodes
> a wrapper that calls into the method if available, or
> a generic_iter_inodes_unsafe if the unsafe flag is set, else
> a plain generic_iter_inodes?

Perhaps. I'll look into it.

> > +/* Inode iteration callback return values */
> > +#define INO_ITER_DONE		0
> > +#define INO_ITER_ABORT		1
> > +
> > +/* Inode iteration control flags */
> > +#define INO_ITER_REFERENCED	(1U << 0)
> > +#define INO_ITER_UNSAFE		(1U << 1)
> 
> Please adjust the naming a bit to make clear these are different
> namespaces, e.g. INO_ITER_RET_ and INO_ITER_F_.

Will do.

-Dave.

kernel test robot Oct. 4, 2024, 9:53 a.m. UTC | #3

Hi Dave,

kernel test robot noticed the following build warnings:

[auto build test WARNING on brauner-vfs/vfs.all]
[also build test WARNING on xfs-linux/for-next axboe-block/for-next linus/master v6.12-rc1 next-20241004]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Dave-Chinner/vfs-replace-invalidate_inodes-with-evict_inodes/20241002-094254
base:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
patch link:    https://lore.kernel.org/r/20241002014017.3801899-3-david%40fromorbit.com
patch subject: [PATCH 2/7] vfs: add inode iteration superblock method
config: openrisc-allnoconfig (https://download.01.org/0day-ci/archive/20241004/202410041724.REiCiIEQ-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241004/202410041724.REiCiIEQ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410041724.REiCiIEQ-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> fs/super.c:183: warning: Function parameter or struct member 'private_data' not described in 'super_iter_inodes'
>> fs/super.c:183: warning: Function parameter or struct member 'flags' not described in 'super_iter_inodes'
>> fs/super.c:241: warning: bad line: 
>> fs/super.c:260: warning: Function parameter or struct member 'private_data' not described in 'super_iter_inodes_unsafe'


vim +183 fs/super.c

   169	
   170	/**
   171	 * super_iter_inodes - iterate all the cached inodes on a superblock
   172	 * @sb: superblock to iterate
   173	 * @iter_fn: callback to run on every inode found.
   174	 *
   175	 * This function iterates all cached inodes on a superblock that are not in
   176	 * the process of being initialised or torn down. It will run @iter_fn() with
   177	 * a valid, referenced inode, so it is safe for the caller to do anything
   178	 * it wants with the inode except drop the reference the iterator holds.
   179	 *
   180	 */
   181	int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn,
   182			void *private_data, int flags)
 > 183	{
   184		struct inode *inode, *old_inode = NULL;
   185		int ret = 0;
   186	
   187		spin_lock(&sb->s_inode_list_lock);
   188		list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
   189			spin_lock(&inode->i_lock);
   190			if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
   191				spin_unlock(&inode->i_lock);
   192				continue;
   193			}
   194	
   195			/*
   196			 * Skip over zero refcount inode if the caller only wants
   197			 * referenced inodes to be iterated.
   198			 */
   199			if ((flags & INO_ITER_REFERENCED) &&
   200			    !atomic_read(&inode->i_count)) {
   201				spin_unlock(&inode->i_lock);
   202				continue;
   203			}
   204	
   205			__iget(inode);
   206			spin_unlock(&inode->i_lock);
   207			spin_unlock(&sb->s_inode_list_lock);
   208			iput(old_inode);
   209	
   210			ret = iter_fn(inode, private_data);
   211	
   212			old_inode = inode;
   213			if (ret == INO_ITER_ABORT) {
   214				ret = 0;
   215				break;
   216			}
   217			if (ret < 0)
   218				break;
   219	
   220			cond_resched();
   221			spin_lock(&sb->s_inode_list_lock);
   222		}
   223		spin_unlock(&sb->s_inode_list_lock);
   224		iput(old_inode);
   225		return ret;
   226	}
   227	
   228	/**
   229	 * super_iter_inodes_unsafe - unsafely iterate all the inodes on a superblock
   230	 * @sb: superblock to iterate
   231	 * @iter_fn: callback to run on every inode found.
   232	 *
   233	 * This is almost certainly not the function you want. It is for internal VFS
   234	 * operations only. Please use super_iter_inodes() instead. If you must use
   235	 * this function, please add a comment explaining why it is necessary and the
   236	 * locking that makes it safe to use this function.
   237	 *
   238	 * This function iterates all cached inodes on a superblock that are attached to
   239	 * the superblock. It will pass each inode to @iter_fn unlocked and without
   240	 * having performed any existences checks on it.
 > 241	
   242	 * @iter_fn must perform all necessary state checks on the inode itself to
   243	 * ensure safe operation. super_iter_inodes_unsafe() only guarantees that the
   244	 * inode exists and won't be freed whilst the callback is running.
   245	 *
   246	 * @iter_fn must not block. It is run in an atomic context that is not allowed
   247	 * to sleep to provide the inode existence guarantees. If the callback needs to
   248	 * do blocking operations it needs to track the inode itself and defer those
   249	 * operations until after the iteration completes.
   250	 *
   251	 * @iter_fn must provide conditional reschedule checks itself. If rescheduling
   252	 * or deferred processing is needed, it must return INO_ITER_ABORT to return to
   253	 * the high level function to perform those operations. It can then restart the
   254	 * iteration again. The high level code must provide forwards progress
   255	 * guarantees if they are necessary.
   256	 *
   257	 */
   258	void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn,
   259			void *private_data)
 > 260	{
   261		struct inode *inode;
   262		int ret;
   263	
   264		rcu_read_lock();
   265		spin_lock(&sb->s_inode_list_lock);
   266		list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
   267			ret = iter_fn(inode, private_data);
   268			if (ret == INO_ITER_ABORT)
   269				break;
   270		}
   271		spin_unlock(&sb->s_inode_list_lock);
   272		rcu_read_unlock();
   273	}
   274

diff --git a/fs/internal.h b/fs/internal.h
index 37749b429e80..7039d13980c6 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -127,6 +127,8 @@  struct super_block *user_get_super(dev_t, bool excl);
 void put_super(struct super_block *sb);
 extern bool mount_capable(struct fs_context *);
 int sb_init_dio_done_wq(struct super_block *sb);
+void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn,
+		void *private_data);
 
 /*
  * Prepare superblock for changing its read-only state (i.e., either remount
diff --git a/fs/super.c b/fs/super.c
index a16e6a6342e0..20a9446d943a 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -167,6 +167,111 @@  static void super_wake(struct super_block *sb, unsigned int flag)
 	wake_up_var(&sb->s_flags);
 }
 
+/**
+ * super_iter_inodes - iterate all the cached inodes on a superblock
+ * @sb: superblock to iterate
+ * @iter_fn: callback to run on every inode found.
+ *
+ * This function iterates all cached inodes on a superblock that are not in
+ * the process of being initialised or torn down. It will run @iter_fn() with
+ * a valid, referenced inode, so it is safe for the caller to do anything
+ * it wants with the inode except drop the reference the iterator holds.
+ *
+ */
+int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn,
+		void *private_data, int flags)
+{
+	struct inode *inode, *old_inode = NULL;
+	int ret = 0;
+
+	spin_lock(&sb->s_inode_list_lock);
+	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+		spin_lock(&inode->i_lock);
+		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+
+		/*
+		 * Skip over zero refcount inode if the caller only wants
+		 * referenced inodes to be iterated.
+		 */
+		if ((flags & INO_ITER_REFERENCED) &&
+		    !atomic_read(&inode->i_count)) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+
+		__iget(inode);
+		spin_unlock(&inode->i_lock);
+		spin_unlock(&sb->s_inode_list_lock);
+		iput(old_inode);
+
+		ret = iter_fn(inode, private_data);
+
+		old_inode = inode;
+		if (ret == INO_ITER_ABORT) {
+			ret = 0;
+			break;
+		}
+		if (ret < 0)
+			break;
+
+		cond_resched();
+		spin_lock(&sb->s_inode_list_lock);
+	}
+	spin_unlock(&sb->s_inode_list_lock);
+	iput(old_inode);
+	return ret;
+}
+
+/**
+ * super_iter_inodes_unsafe - unsafely iterate all the inodes on a superblock
+ * @sb: superblock to iterate
+ * @iter_fn: callback to run on every inode found.
+ *
+ * This is almost certainly not the function you want. It is for internal VFS
+ * operations only. Please use super_iter_inodes() instead. If you must use
+ * this function, please add a comment explaining why it is necessary and the
+ * locking that makes it safe to use this function.
+ *
+ * This function iterates all cached inodes on a superblock that are attached to
+ * the superblock. It will pass each inode to @iter_fn unlocked and without
+ * having performed any existences checks on it.
+
+ * @iter_fn must perform all necessary state checks on the inode itself to
+ * ensure safe operation. super_iter_inodes_unsafe() only guarantees that the
+ * inode exists and won't be freed whilst the callback is running.
+ *
+ * @iter_fn must not block. It is run in an atomic context that is not allowed
+ * to sleep to provide the inode existence guarantees. If the callback needs to
+ * do blocking operations it needs to track the inode itself and defer those
+ * operations until after the iteration completes.
+ *
+ * @iter_fn must provide conditional reschedule checks itself. If rescheduling
+ * or deferred processing is needed, it must return INO_ITER_ABORT to return to
+ * the high level function to perform those operations. It can then restart the
+ * iteration again. The high level code must provide forwards progress
+ * guarantees if they are necessary.
+ *
+ */
+void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn,
+		void *private_data)
+{
+	struct inode *inode;
+	int ret;
+
+	rcu_read_lock();
+	spin_lock(&sb->s_inode_list_lock);
+	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+		ret = iter_fn(inode, private_data);
+		if (ret == INO_ITER_ABORT)
+			break;
+	}
+	spin_unlock(&sb->s_inode_list_lock);
+	rcu_read_unlock();
+}
+
 /*
  * One thing we have to be careful of with a per-sb shrinker is that we don't
  * drop the last active reference to the superblock from within the shrinker.
diff --git a/include/linux/fs.h b/include/linux/fs.h
index eae5b67e4a15..0a6a462c45ab 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2213,6 +2213,18 @@  enum freeze_holder {
 	FREEZE_MAY_NEST		= (1U << 2),
 };
 
+/* Inode iteration callback return values */
+#define INO_ITER_DONE		0
+#define INO_ITER_ABORT		1
+
+/* Inode iteration control flags */
+#define INO_ITER_REFERENCED	(1U << 0)
+#define INO_ITER_UNSAFE		(1U << 1)
+
+typedef int (*ino_iter_fn)(struct inode *inode, void *priv);
+int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn,
+		void *private_data, int flags);
+
 struct super_operations {
    	struct inode *(*alloc_inode)(struct super_block *sb);
 	void (*destroy_inode)(struct inode *);

[2/7] vfs: add inode iteration superblock method

Commit Message

Comments

Patch