diff mbox series

[v3] xfs: fix incorrect i_nlink caused by inode racing

Message ID 20221117133854.GA525799@ceph-admin (mailing list archive)
State Accepted, archived
Headers show
Series [v3] xfs: fix incorrect i_nlink caused by inode racing | expand

Commit Message

Long Li Nov. 17, 2022, 1:38 p.m. UTC
The following error occurred during the fsstress test:

XFS: Assertion failed: VFS_I(ip)->i_nlink >= 2, file: fs/xfs/xfs_inode.c, line: 2452

The problem was that inode race condition causes incorrect i_nlink to be
written to disk, and then it is read into memory. Consider the following
call graph, inodes that are marked as both XFS_IFLUSHING and
XFS_IRECLAIMABLE, i_nlink will be reset to 1 and then restored to original
value in xfs_reinit_inode(). Therefore, the i_nlink of directory on disk
may be set to 1.

  xfsaild
      xfs_inode_item_push
          xfs_iflush_cluster
              xfs_iflush
                  xfs_inode_to_disk

  xfs_iget
      xfs_iget_cache_hit
          xfs_iget_recycle
              xfs_reinit_inode
  	          inode_init_always

xfs_reinit_inode() needs to hold the ILOCK_EXCL as it is changing internal
inode state and can race with other RCU protected inode lookups. On the
read side, xfs_iflush_cluster() grabs the ILOCK_SHARED while under rcu +
ip->i_flags_lock, and so xfs_iflush/xfs_inode_to_disk() are protected from
racing inode updates (during transactions) by that lock.

Signed-off-by: Long Li <leo.lilong@huawei.com>
---
v2:
- Modify the assertion error code line number
- Use ILOCK_EXCL to prevent inode racing 
v3:
- Put ilock and iunlock in the same function

 fs/xfs/xfs_icache.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Darrick J. Wong Nov. 17, 2022, 7:14 p.m. UTC | #1
On Thu, Nov 17, 2022 at 09:38:54PM +0800, Long Li wrote:
> The following error occurred during the fsstress test:
> 
> XFS: Assertion failed: VFS_I(ip)->i_nlink >= 2, file: fs/xfs/xfs_inode.c, line: 2452
> 
> The problem was that inode race condition causes incorrect i_nlink to be
> written to disk, and then it is read into memory. Consider the following
> call graph, inodes that are marked as both XFS_IFLUSHING and
> XFS_IRECLAIMABLE, i_nlink will be reset to 1 and then restored to original
> value in xfs_reinit_inode(). Therefore, the i_nlink of directory on disk
> may be set to 1.
> 
>   xfsaild
>       xfs_inode_item_push
>           xfs_iflush_cluster
>               xfs_iflush
>                   xfs_inode_to_disk
> 
>   xfs_iget
>       xfs_iget_cache_hit
>           xfs_iget_recycle
>               xfs_reinit_inode
>   	          inode_init_always
> 
> xfs_reinit_inode() needs to hold the ILOCK_EXCL as it is changing internal
> inode state and can race with other RCU protected inode lookups. On the
> read side, xfs_iflush_cluster() grabs the ILOCK_SHARED while under rcu +
> ip->i_flags_lock, and so xfs_iflush/xfs_inode_to_disk() are protected from
> racing inode updates (during transactions) by that lock.
> 
> Signed-off-by: Long Li <leo.lilong@huawei.com>
> ---
> v2:
> - Modify the assertion error code line number
> - Use ILOCK_EXCL to prevent inode racing 
> v3:
> - Put ilock and iunlock in the same function

Thanks for fixing that, I'll run this through testing overnight.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> 
>  fs/xfs/xfs_icache.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index eae7427062cf..f35e2cee5265 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -342,6 +342,9 @@ xfs_iget_recycle(
>  
>  	trace_xfs_iget_recycle(ip);
>  
> +	if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL))
> +		return -EAGAIN;
> +
>  	/*
>  	 * We need to make it look like the inode is being reclaimed to prevent
>  	 * the actual reclaim workers from stomping over us while we recycle
> @@ -355,6 +358,7 @@ xfs_iget_recycle(
>  
>  	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
>  	error = xfs_reinit_inode(mp, inode);
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	if (error) {
>  		/*
>  		 * Re-initializing the inode failed, and we are in deep
> @@ -518,6 +522,8 @@ xfs_iget_cache_hit(
>  	if (ip->i_flags & XFS_IRECLAIMABLE) {
>  		/* Drops i_flags_lock and RCU read lock. */
>  		error = xfs_iget_recycle(pag, ip);
> +		if (error == -EAGAIN)
> +			goto out_skip;
>  		if (error)
>  			return error;
>  	} else {
> -- 
> 2.31.1
>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index eae7427062cf..f35e2cee5265 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -342,6 +342,9 @@  xfs_iget_recycle(
 
 	trace_xfs_iget_recycle(ip);
 
+	if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL))
+		return -EAGAIN;
+
 	/*
 	 * We need to make it look like the inode is being reclaimed to prevent
 	 * the actual reclaim workers from stomping over us while we recycle
@@ -355,6 +358,7 @@  xfs_iget_recycle(
 
 	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
 	error = xfs_reinit_inode(mp, inode);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	if (error) {
 		/*
 		 * Re-initializing the inode failed, and we are in deep
@@ -518,6 +522,8 @@  xfs_iget_cache_hit(
 	if (ip->i_flags & XFS_IRECLAIMABLE) {
 		/* Drops i_flags_lock and RCU read lock. */
 		error = xfs_iget_recycle(pag, ip);
+		if (error == -EAGAIN)
+			goto out_skip;
 		if (error)
 			return error;
 	} else {