Message ID | 20250320004643.1903287-1-mjguzik@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] fs: call inode_sb_list_add() outside of inode hash lock | expand |
On Thu 20-03-25 01:46:43, Mateusz Guzik wrote: > As both locks are highly contended during significant inode churn, > holding the inode hash lock while waiting for the sb list lock > exacerbates the problem. > > Why moving it out is safe: the inode at hand still has I_NEW set and > anyone who finds it through legitimate means waits for the bit to clear, > by which time inode_sb_list_add() is guaranteed to have finished. > > This significantly drops hash lock contention for me when stating 20 > separate trees in parallel, each with 1000 directories * 1000 files. > > However, no speed up was observed as contention increased on the other > locks, notably dentry LRU. > > Even so, removal of the lock ordering will help making this faster > later. > > Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> I agree I_NEW will be protecting the inode from being looked up through the hash while we drop inode_hash_lock so this is safe. I'm usually a bit reluctant to performance "improvements" that actually don't bring a measurable benefit but this patch looks like a no-brainer (I'm not getting worried about added complexity :)) and at least it reduces contention on inode_hash_lock so I agree the potential is there. So feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > > There were ideas about using bitlocks, which ran into trouble because of > spinlocks being taken *after* and RT kernel not liking that. > > I'm thinking thanks to RCU this very much can be hacked around to > reverse the hash-specific lock -> inode lock: you find the inode you > are looking for, lock it and only then take the bitlock and validate it > remained in place. > > The above patch removes an impediment -- the only other lock possibly > taken with the hash thing. > > Even if the above idea does not pan out, the patch has some value in > decoupling these. > > I am however not going to strongly argue for it given lack of results. > > fs/inode.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/fs/inode.c b/fs/inode.c > index e188bb1eb07a..8efd38c27321 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -1307,8 +1307,8 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval, > } > > if (set && unlikely(set(inode, data))) { > - inode = NULL; > - goto unlock; > + spin_unlock(&inode_hash_lock); > + return NULL; > } > > /* > @@ -1320,14 +1320,14 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval, > hlist_add_head_rcu(&inode->i_hash, head); > spin_unlock(&inode->i_lock); > > + spin_unlock(&inode_hash_lock); > + > /* > * Add inode to the sb list if it's not already. It has I_NEW at this > * point, so it should be safe to test i_sb_list locklessly. > */ > if (list_empty(&inode->i_sb_list)) > inode_sb_list_add(inode); > -unlock: > - spin_unlock(&inode_hash_lock); > > return inode; > } > @@ -1456,8 +1456,8 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino) > inode->i_state = I_NEW; > hlist_add_head_rcu(&inode->i_hash, head); > spin_unlock(&inode->i_lock); > - inode_sb_list_add(inode); > spin_unlock(&inode_hash_lock); > + inode_sb_list_add(inode); > > /* Return the locked inode with I_NEW set, the > * caller is responsible for filling in the contents > -- > 2.43.0 >
On Thu, 20 Mar 2025 01:46:43 +0100, Mateusz Guzik wrote: > As both locks are highly contended during significant inode churn, > holding the inode hash lock while waiting for the sb list lock > exacerbates the problem. > > Why moving it out is safe: the inode at hand still has I_NEW set and > anyone who finds it through legitimate means waits for the bit to clear, > by which time inode_sb_list_add() is guaranteed to have finished. > > [...] Applied to the vfs-6.15.misc branch of the vfs/vfs.git tree. Patches in the vfs-6.15.misc branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs-6.15.misc [1/1] fs: call inode_sb_list_add() outside of inode hash lock https://git.kernel.org/vfs/vfs/c/c918f15420e3
diff --git a/fs/inode.c b/fs/inode.c index e188bb1eb07a..8efd38c27321 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1307,8 +1307,8 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval, } if (set && unlikely(set(inode, data))) { - inode = NULL; - goto unlock; + spin_unlock(&inode_hash_lock); + return NULL; } /* @@ -1320,14 +1320,14 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval, hlist_add_head_rcu(&inode->i_hash, head); spin_unlock(&inode->i_lock); + spin_unlock(&inode_hash_lock); + /* * Add inode to the sb list if it's not already. It has I_NEW at this * point, so it should be safe to test i_sb_list locklessly. */ if (list_empty(&inode->i_sb_list)) inode_sb_list_add(inode); -unlock: - spin_unlock(&inode_hash_lock); return inode; } @@ -1456,8 +1456,8 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino) inode->i_state = I_NEW; hlist_add_head_rcu(&inode->i_hash, head); spin_unlock(&inode->i_lock); - inode_sb_list_add(inode); spin_unlock(&inode_hash_lock); + inode_sb_list_add(inode); /* Return the locked inode with I_NEW set, the * caller is responsible for filling in the contents
As both locks are highly contended during significant inode churn, holding the inode hash lock while waiting for the sb list lock exacerbates the problem. Why moving it out is safe: the inode at hand still has I_NEW set and anyone who finds it through legitimate means waits for the bit to clear, by which time inode_sb_list_add() is guaranteed to have finished. This significantly drops hash lock contention for me when stating 20 separate trees in parallel, each with 1000 directories * 1000 files. However, no speed up was observed as contention increased on the other locks, notably dentry LRU. Even so, removal of the lock ordering will help making this faster later. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> --- There were ideas about using bitlocks, which ran into trouble because of spinlocks being taken *after* and RT kernel not liking that. I'm thinking thanks to RCU this very much can be hacked around to reverse the hash-specific lock -> inode lock: you find the inode you are looking for, lock it and only then take the bitlock and validate it remained in place. The above patch removes an impediment -- the only other lock possibly taken with the hash thing. Even if the above idea does not pan out, the patch has some value in decoupling these. I am however not going to strongly argue for it given lack of results. fs/inode.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)