Message ID | 20221117133854.GA525799@ceph-admin (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | [v3] xfs: fix incorrect i_nlink caused by inode racing | expand |
On Thu, Nov 17, 2022 at 09:38:54PM +0800, Long Li wrote: > The following error occurred during the fsstress test: > > XFS: Assertion failed: VFS_I(ip)->i_nlink >= 2, file: fs/xfs/xfs_inode.c, line: 2452 > > The problem was that inode race condition causes incorrect i_nlink to be > written to disk, and then it is read into memory. Consider the following > call graph, inodes that are marked as both XFS_IFLUSHING and > XFS_IRECLAIMABLE, i_nlink will be reset to 1 and then restored to original > value in xfs_reinit_inode(). Therefore, the i_nlink of directory on disk > may be set to 1. > > xfsaild > xfs_inode_item_push > xfs_iflush_cluster > xfs_iflush > xfs_inode_to_disk > > xfs_iget > xfs_iget_cache_hit > xfs_iget_recycle > xfs_reinit_inode > inode_init_always > > xfs_reinit_inode() needs to hold the ILOCK_EXCL as it is changing internal > inode state and can race with other RCU protected inode lookups. On the > read side, xfs_iflush_cluster() grabs the ILOCK_SHARED while under rcu + > ip->i_flags_lock, and so xfs_iflush/xfs_inode_to_disk() are protected from > racing inode updates (during transactions) by that lock. > > Signed-off-by: Long Li <leo.lilong@huawei.com> > --- > v2: > - Modify the assertion error code line number > - Use ILOCK_EXCL to prevent inode racing > v3: > - Put ilock and iunlock in the same function Thanks for fixing that, I'll run this through testing overnight. Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > > fs/xfs/xfs_icache.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > index eae7427062cf..f35e2cee5265 100644 > --- a/fs/xfs/xfs_icache.c > +++ b/fs/xfs/xfs_icache.c > @@ -342,6 +342,9 @@ xfs_iget_recycle( > > trace_xfs_iget_recycle(ip); > > + if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) > + return -EAGAIN; > + > /* > * We need to make it look like the inode is being reclaimed to prevent > * the actual reclaim workers from stomping over us while we recycle > @@ -355,6 +358,7 @@ xfs_iget_recycle( > > ASSERT(!rwsem_is_locked(&inode->i_rwsem)); > error = xfs_reinit_inode(mp, inode); > + xfs_iunlock(ip, XFS_ILOCK_EXCL); > if (error) { > /* > * Re-initializing the inode failed, and we are in deep > @@ -518,6 +522,8 @@ xfs_iget_cache_hit( > if (ip->i_flags & XFS_IRECLAIMABLE) { > /* Drops i_flags_lock and RCU read lock. */ > error = xfs_iget_recycle(pag, ip); > + if (error == -EAGAIN) > + goto out_skip; > if (error) > return error; > } else { > -- > 2.31.1 >
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index eae7427062cf..f35e2cee5265 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -342,6 +342,9 @@ xfs_iget_recycle( trace_xfs_iget_recycle(ip); + if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) + return -EAGAIN; + /* * We need to make it look like the inode is being reclaimed to prevent * the actual reclaim workers from stomping over us while we recycle @@ -355,6 +358,7 @@ xfs_iget_recycle( ASSERT(!rwsem_is_locked(&inode->i_rwsem)); error = xfs_reinit_inode(mp, inode); + xfs_iunlock(ip, XFS_ILOCK_EXCL); if (error) { /* * Re-initializing the inode failed, and we are in deep @@ -518,6 +522,8 @@ xfs_iget_cache_hit( if (ip->i_flags & XFS_IRECLAIMABLE) { /* Drops i_flags_lock and RCU read lock. */ error = xfs_iget_recycle(pag, ip); + if (error == -EAGAIN) + goto out_skip; if (error) return error; } else {
The following error occurred during the fsstress test: XFS: Assertion failed: VFS_I(ip)->i_nlink >= 2, file: fs/xfs/xfs_inode.c, line: 2452 The problem was that inode race condition causes incorrect i_nlink to be written to disk, and then it is read into memory. Consider the following call graph, inodes that are marked as both XFS_IFLUSHING and XFS_IRECLAIMABLE, i_nlink will be reset to 1 and then restored to original value in xfs_reinit_inode(). Therefore, the i_nlink of directory on disk may be set to 1. xfsaild xfs_inode_item_push xfs_iflush_cluster xfs_iflush xfs_inode_to_disk xfs_iget xfs_iget_cache_hit xfs_iget_recycle xfs_reinit_inode inode_init_always xfs_reinit_inode() needs to hold the ILOCK_EXCL as it is changing internal inode state and can race with other RCU protected inode lookups. On the read side, xfs_iflush_cluster() grabs the ILOCK_SHARED while under rcu + ip->i_flags_lock, and so xfs_iflush/xfs_inode_to_disk() are protected from racing inode updates (during transactions) by that lock. Signed-off-by: Long Li <leo.lilong@huawei.com> --- v2: - Modify the assertion error code line number - Use ILOCK_EXCL to prevent inode racing v3: - Put ilock and iunlock in the same function fs/xfs/xfs_icache.c | 6 ++++++ 1 file changed, 6 insertions(+)