Message ID | 20241002014017.3801899-3-david@fromorbit.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | vfs: improving inode cache iteration scalability | expand |
On Wed, Oct 02, 2024 at 11:33:19AM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > Add a new superblock method for iterating all cached inodes in the > inode cache. The method is added later, this just adds an abstraction. > +/** > + * super_iter_inodes - iterate all the cached inodes on a superblock > + * @sb: superblock to iterate > + * @iter_fn: callback to run on every inode found. > + * > + * This function iterates all cached inodes on a superblock that are not in > + * the process of being initialised or torn down. It will run @iter_fn() with > + * a valid, referenced inode, so it is safe for the caller to do anything > + * it wants with the inode except drop the reference the iterator holds. > + * > + */ Spurious empty comment line above. > +void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn, > + void *private_data) > +{ > + struct inode *inode; > + int ret; > + > + rcu_read_lock(); > + spin_lock(&sb->s_inode_list_lock); > + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { > + ret = iter_fn(inode, private_data); > + if (ret == INO_ITER_ABORT) > + break; > + } Looking at the entire series, splitting the helpers for the unsafe vs safe iteration feels a bit of an odd API design given that the INO_ITER_REFERENCED can be passed to super_iter_inodes, but is an internal flag pass here to the file system method. Not sure what the best way to do it, but maybe just make super_iter_inodes a wrapper that calls into the method if available, or a generic_iter_inodes_unsafe if the unsafe flag is set, else a plain generic_iter_inodes? > +/* Inode iteration callback return values */ > +#define INO_ITER_DONE 0 > +#define INO_ITER_ABORT 1 > + > +/* Inode iteration control flags */ > +#define INO_ITER_REFERENCED (1U << 0) > +#define INO_ITER_UNSAFE (1U << 1) Please adjust the naming a bit to make clear these are different namespaces, e.g. INO_ITER_RET_ and INO_ITER_F_.
On Thu, Oct 03, 2024 at 12:12:29AM -0700, Christoph Hellwig wrote: > On Wed, Oct 02, 2024 at 11:33:19AM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > Add a new superblock method for iterating all cached inodes in the > > inode cache. > > The method is added later, this just adds an abstraction. Ah, I forgot to remove that from the commit message when I split the patch up.... > > +/** > > + * super_iter_inodes - iterate all the cached inodes on a superblock > > + * @sb: superblock to iterate > > + * @iter_fn: callback to run on every inode found. > > + * > > + * This function iterates all cached inodes on a superblock that are not in > > + * the process of being initialised or torn down. It will run @iter_fn() with > > + * a valid, referenced inode, so it is safe for the caller to do anything > > + * it wants with the inode except drop the reference the iterator holds. > > + * > > + */ > > Spurious empty comment line above. > > > +void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn, > > + void *private_data) > > +{ > > + struct inode *inode; > > + int ret; > > + > > + rcu_read_lock(); > > + spin_lock(&sb->s_inode_list_lock); > > + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { > > + ret = iter_fn(inode, private_data); > > + if (ret == INO_ITER_ABORT) > > + break; > > + } > > Looking at the entire series, splitting the helpers for the unsafe > vs safe iteration feels a bit of an odd API design given that the > INO_ITER_REFERENCED can be passed to super_iter_inodes, but is an > internal flag pass here to the file system method. The INO_ITER_REFERENCED flag is a hack to support the whacky fsnotify and landlock iterators that are run after evict_inodes() (which you already noticed...). i.e. there should not be any unreferenced inodes at this point, so if any are found they should be skipped. I think it might be better to remove that flag and replace the iterator implementation with some kind of SB flag and WARN_ON_ONCE that fires if a referenced inode is found. With that, the flags field for super_iter_inodes() can go away... > Not sure what > the best way to do it, but maybe just make super_iter_inodes > a wrapper that calls into the method if available, or > a generic_iter_inodes_unsafe if the unsafe flag is set, else > a plain generic_iter_inodes? Perhaps. I'll look into it. > > +/* Inode iteration callback return values */ > > +#define INO_ITER_DONE 0 > > +#define INO_ITER_ABORT 1 > > + > > +/* Inode iteration control flags */ > > +#define INO_ITER_REFERENCED (1U << 0) > > +#define INO_ITER_UNSAFE (1U << 1) > > Please adjust the naming a bit to make clear these are different > namespaces, e.g. INO_ITER_RET_ and INO_ITER_F_. Will do. -Dave.
Hi Dave, kernel test robot noticed the following build warnings: [auto build test WARNING on brauner-vfs/vfs.all] [also build test WARNING on xfs-linux/for-next axboe-block/for-next linus/master v6.12-rc1 next-20241004] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Dave-Chinner/vfs-replace-invalidate_inodes-with-evict_inodes/20241002-094254 base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all patch link: https://lore.kernel.org/r/20241002014017.3801899-3-david%40fromorbit.com patch subject: [PATCH 2/7] vfs: add inode iteration superblock method config: openrisc-allnoconfig (https://download.01.org/0day-ci/archive/20241004/202410041724.REiCiIEQ-lkp@intel.com/config) compiler: or1k-linux-gcc (GCC) 14.1.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241004/202410041724.REiCiIEQ-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202410041724.REiCiIEQ-lkp@intel.com/ All warnings (new ones prefixed by >>): >> fs/super.c:183: warning: Function parameter or struct member 'private_data' not described in 'super_iter_inodes' >> fs/super.c:183: warning: Function parameter or struct member 'flags' not described in 'super_iter_inodes' >> fs/super.c:241: warning: bad line: >> fs/super.c:260: warning: Function parameter or struct member 'private_data' not described in 'super_iter_inodes_unsafe' vim +183 fs/super.c 169 170 /** 171 * super_iter_inodes - iterate all the cached inodes on a superblock 172 * @sb: superblock to iterate 173 * @iter_fn: callback to run on every inode found. 174 * 175 * This function iterates all cached inodes on a superblock that are not in 176 * the process of being initialised or torn down. It will run @iter_fn() with 177 * a valid, referenced inode, so it is safe for the caller to do anything 178 * it wants with the inode except drop the reference the iterator holds. 179 * 180 */ 181 int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn, 182 void *private_data, int flags) > 183 { 184 struct inode *inode, *old_inode = NULL; 185 int ret = 0; 186 187 spin_lock(&sb->s_inode_list_lock); 188 list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { 189 spin_lock(&inode->i_lock); 190 if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { 191 spin_unlock(&inode->i_lock); 192 continue; 193 } 194 195 /* 196 * Skip over zero refcount inode if the caller only wants 197 * referenced inodes to be iterated. 198 */ 199 if ((flags & INO_ITER_REFERENCED) && 200 !atomic_read(&inode->i_count)) { 201 spin_unlock(&inode->i_lock); 202 continue; 203 } 204 205 __iget(inode); 206 spin_unlock(&inode->i_lock); 207 spin_unlock(&sb->s_inode_list_lock); 208 iput(old_inode); 209 210 ret = iter_fn(inode, private_data); 211 212 old_inode = inode; 213 if (ret == INO_ITER_ABORT) { 214 ret = 0; 215 break; 216 } 217 if (ret < 0) 218 break; 219 220 cond_resched(); 221 spin_lock(&sb->s_inode_list_lock); 222 } 223 spin_unlock(&sb->s_inode_list_lock); 224 iput(old_inode); 225 return ret; 226 } 227 228 /** 229 * super_iter_inodes_unsafe - unsafely iterate all the inodes on a superblock 230 * @sb: superblock to iterate 231 * @iter_fn: callback to run on every inode found. 232 * 233 * This is almost certainly not the function you want. It is for internal VFS 234 * operations only. Please use super_iter_inodes() instead. If you must use 235 * this function, please add a comment explaining why it is necessary and the 236 * locking that makes it safe to use this function. 237 * 238 * This function iterates all cached inodes on a superblock that are attached to 239 * the superblock. It will pass each inode to @iter_fn unlocked and without 240 * having performed any existences checks on it. > 241 242 * @iter_fn must perform all necessary state checks on the inode itself to 243 * ensure safe operation. super_iter_inodes_unsafe() only guarantees that the 244 * inode exists and won't be freed whilst the callback is running. 245 * 246 * @iter_fn must not block. It is run in an atomic context that is not allowed 247 * to sleep to provide the inode existence guarantees. If the callback needs to 248 * do blocking operations it needs to track the inode itself and defer those 249 * operations until after the iteration completes. 250 * 251 * @iter_fn must provide conditional reschedule checks itself. If rescheduling 252 * or deferred processing is needed, it must return INO_ITER_ABORT to return to 253 * the high level function to perform those operations. It can then restart the 254 * iteration again. The high level code must provide forwards progress 255 * guarantees if they are necessary. 256 * 257 */ 258 void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn, 259 void *private_data) > 260 { 261 struct inode *inode; 262 int ret; 263 264 rcu_read_lock(); 265 spin_lock(&sb->s_inode_list_lock); 266 list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { 267 ret = iter_fn(inode, private_data); 268 if (ret == INO_ITER_ABORT) 269 break; 270 } 271 spin_unlock(&sb->s_inode_list_lock); 272 rcu_read_unlock(); 273 } 274
diff --git a/fs/internal.h b/fs/internal.h index 37749b429e80..7039d13980c6 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -127,6 +127,8 @@ struct super_block *user_get_super(dev_t, bool excl); void put_super(struct super_block *sb); extern bool mount_capable(struct fs_context *); int sb_init_dio_done_wq(struct super_block *sb); +void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn, + void *private_data); /* * Prepare superblock for changing its read-only state (i.e., either remount diff --git a/fs/super.c b/fs/super.c index a16e6a6342e0..20a9446d943a 100644 --- a/fs/super.c +++ b/fs/super.c @@ -167,6 +167,111 @@ static void super_wake(struct super_block *sb, unsigned int flag) wake_up_var(&sb->s_flags); } +/** + * super_iter_inodes - iterate all the cached inodes on a superblock + * @sb: superblock to iterate + * @iter_fn: callback to run on every inode found. + * + * This function iterates all cached inodes on a superblock that are not in + * the process of being initialised or torn down. It will run @iter_fn() with + * a valid, referenced inode, so it is safe for the caller to do anything + * it wants with the inode except drop the reference the iterator holds. + * + */ +int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn, + void *private_data, int flags) +{ + struct inode *inode, *old_inode = NULL; + int ret = 0; + + spin_lock(&sb->s_inode_list_lock); + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + spin_lock(&inode->i_lock); + if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { + spin_unlock(&inode->i_lock); + continue; + } + + /* + * Skip over zero refcount inode if the caller only wants + * referenced inodes to be iterated. + */ + if ((flags & INO_ITER_REFERENCED) && + !atomic_read(&inode->i_count)) { + spin_unlock(&inode->i_lock); + continue; + } + + __iget(inode); + spin_unlock(&inode->i_lock); + spin_unlock(&sb->s_inode_list_lock); + iput(old_inode); + + ret = iter_fn(inode, private_data); + + old_inode = inode; + if (ret == INO_ITER_ABORT) { + ret = 0; + break; + } + if (ret < 0) + break; + + cond_resched(); + spin_lock(&sb->s_inode_list_lock); + } + spin_unlock(&sb->s_inode_list_lock); + iput(old_inode); + return ret; +} + +/** + * super_iter_inodes_unsafe - unsafely iterate all the inodes on a superblock + * @sb: superblock to iterate + * @iter_fn: callback to run on every inode found. + * + * This is almost certainly not the function you want. It is for internal VFS + * operations only. Please use super_iter_inodes() instead. If you must use + * this function, please add a comment explaining why it is necessary and the + * locking that makes it safe to use this function. + * + * This function iterates all cached inodes on a superblock that are attached to + * the superblock. It will pass each inode to @iter_fn unlocked and without + * having performed any existences checks on it. + + * @iter_fn must perform all necessary state checks on the inode itself to + * ensure safe operation. super_iter_inodes_unsafe() only guarantees that the + * inode exists and won't be freed whilst the callback is running. + * + * @iter_fn must not block. It is run in an atomic context that is not allowed + * to sleep to provide the inode existence guarantees. If the callback needs to + * do blocking operations it needs to track the inode itself and defer those + * operations until after the iteration completes. + * + * @iter_fn must provide conditional reschedule checks itself. If rescheduling + * or deferred processing is needed, it must return INO_ITER_ABORT to return to + * the high level function to perform those operations. It can then restart the + * iteration again. The high level code must provide forwards progress + * guarantees if they are necessary. + * + */ +void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn, + void *private_data) +{ + struct inode *inode; + int ret; + + rcu_read_lock(); + spin_lock(&sb->s_inode_list_lock); + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + ret = iter_fn(inode, private_data); + if (ret == INO_ITER_ABORT) + break; + } + spin_unlock(&sb->s_inode_list_lock); + rcu_read_unlock(); +} + /* * One thing we have to be careful of with a per-sb shrinker is that we don't * drop the last active reference to the superblock from within the shrinker. diff --git a/include/linux/fs.h b/include/linux/fs.h index eae5b67e4a15..0a6a462c45ab 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2213,6 +2213,18 @@ enum freeze_holder { FREEZE_MAY_NEST = (1U << 2), }; +/* Inode iteration callback return values */ +#define INO_ITER_DONE 0 +#define INO_ITER_ABORT 1 + +/* Inode iteration control flags */ +#define INO_ITER_REFERENCED (1U << 0) +#define INO_ITER_UNSAFE (1U << 1) + +typedef int (*ino_iter_fn)(struct inode *inode, void *priv); +int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn, + void *private_data, int flags); + struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); void (*destroy_inode)(struct inode *);