diff mbox series

[v2,2/3] xfs: don't assert perag when free perag

Message ID 20231209122107.2422441-2-leo.lilong@huawei.com (mailing list archive)
State Deferred, archived
Headers show
Series [v2,1/3] xfs: add lock protection when remove perag from radix tree | expand

Commit Message

Long Li Dec. 9, 2023, 12:21 p.m. UTC
When releasing the perag in xfs_free_perag(), the assertion that the
perag in readix tree is correct in most cases. However, there is one
corner case where the assertion is not true. During log recovery, the
AGs become visible(that is included in mp->m_sb.sb_agcount) first, and
then the perag is initialized. If the initialization of the perag fails,
the assertion will be triggered. Worse yet, null pointer dereferencing
can occur.

Signed-off-by: Long Li <leo.lilong@huawei.com>
---
 fs/xfs/libxfs/xfs_ag.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Darrick J. Wong Dec. 11, 2023, 9:42 p.m. UTC | #1
On Sat, Dec 09, 2023 at 08:21:06PM +0800, Long Li wrote:
> When releasing the perag in xfs_free_perag(), the assertion that the
> perag in readix tree is correct in most cases. However, there is one
> corner case where the assertion is not true. During log recovery, the
> AGs become visible(that is included in mp->m_sb.sb_agcount) first, and
> then the perag is initialized. If the initialization of the perag fails,
> the assertion will be triggered. Worse yet, null pointer dereferencing
> can occur.
> 
> Signed-off-by: Long Li <leo.lilong@huawei.com>
> ---
>  fs/xfs/libxfs/xfs_ag.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index cc10a3ca052f..11ed048c350c 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -258,7 +258,8 @@ xfs_free_perag(
>  		spin_lock(&mp->m_perag_lock);
>  		pag = radix_tree_delete(&mp->m_perag_tree, agno);
>  		spin_unlock(&mp->m_perag_lock);
> -		ASSERT(pag);
> +		if (!pag)
> +			break;

Why wouldn't you continue to the next agnumber?

--D

>  		XFS_IS_CORRUPT(pag->pag_mount, atomic_read(&pag->pag_ref) != 0);
>  		xfs_defer_drain_free(&pag->pag_intents_drain);
>  
> -- 
> 2.31.1
> 
>
Dave Chinner Dec. 11, 2023, 10 p.m. UTC | #2
On Sat, Dec 09, 2023 at 08:21:06PM +0800, Long Li wrote:
> When releasing the perag in xfs_free_perag(), the assertion that the
> perag in readix tree is correct in most cases. However, there is one
> corner case where the assertion is not true. During log recovery, the
> AGs become visible(that is included in mp->m_sb.sb_agcount) first, and
> then the perag is initialized. If the initialization of the perag fails,
> the assertion will be triggered. Worse yet, null pointer dereferencing
> can occur.

I'm going to assume that you are talking about xlog_do_recover()
because the commit message doesn't actually tell us how this
situation occurs.

That code re-reads the superblock, then copies it to mp->m_sb,
then calls xfs_initialize_perag() with the values from mp->m_sb.

If log recovery replayed a growfs transaction, the mp->m_sb has a
larger sb_agcount and so then xfs_initialize_perag() is called
and if that fails we end up back in xfs_mountfs and the error
stack calls xfs_free_perag().

Is that correct?

If so, then the fix is to change how xlog_do_recover() works. It
needs to initialise the new perags before it updates the in-memory
superblock. If xfs_initialize_perag() fails, it undoes all the
changes it has made, so if we haven't updated the in-memory
superblock when the init of the new perags fails then the error
unwinding code works exactly as it should right now.

i.e. the bug is that xlog_do_recover() is leaving the in-memory
state inconsistent on init failure, and we need to fix that rather
than remove the assert that is telling us that in-memory state is
inconsistent....

-Dave.
Long Li Dec. 12, 2023, 1:28 p.m. UTC | #3
On Tue, Dec 12, 2023 at 09:00:50AM +1100, Dave Chinner wrote:
> On Sat, Dec 09, 2023 at 08:21:06PM +0800, Long Li wrote:
> > When releasing the perag in xfs_free_perag(), the assertion that the
> > perag in readix tree is correct in most cases. However, there is one
> > corner case where the assertion is not true. During log recovery, the
> > AGs become visible(that is included in mp->m_sb.sb_agcount) first, and
> > then the perag is initialized. If the initialization of the perag fails,
> > the assertion will be triggered. Worse yet, null pointer dereferencing
> > can occur.
> 
> I'm going to assume that you are talking about xlog_do_recover()
> because the commit message doesn't actually tell us how this
> situation occurs.
> 
> That code re-reads the superblock, then copies it to mp->m_sb,
> then calls xfs_initialize_perag() with the values from mp->m_sb.
> 
> If log recovery replayed a growfs transaction, the mp->m_sb has a
> larger sb_agcount and so then xfs_initialize_perag() is called
> and if that fails we end up back in xfs_mountfs and the error
> stack calls xfs_free_perag().
> 
> Is that correct?

Yes, you are right. When I tried to fix the perag leak issue in patch 3,
I found this problem.

> 
> If so, then the fix is to change how xlog_do_recover() works. It
> needs to initialise the new perags before it updates the in-memory
> superblock. If xfs_initialize_perag() fails, it undoes all the
> changes it has made, so if we haven't updated the in-memory
> superblock when the init of the new perags fails then the error
> unwinding code works exactly as it should right now.
> 
> i.e. the bug is that xlog_do_recover() is leaving the in-memory
> state inconsistent on init failure, and we need to fix that rather
> than remove the assert that is telling us that in-memory state is
> inconsistent....
> 

Yes, agree with you, I used to think that removing the assertion
would solve the problem, but now it seems a bit lazy, the problem
should be solved at the source. Right now, I haven't figured out
how to fix this problem comprehensively, so I'll fix perag leak
issue first. 

Thanks,
Long Li
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index cc10a3ca052f..11ed048c350c 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -258,7 +258,8 @@  xfs_free_perag(
 		spin_lock(&mp->m_perag_lock);
 		pag = radix_tree_delete(&mp->m_perag_tree, agno);
 		spin_unlock(&mp->m_perag_lock);
-		ASSERT(pag);
+		if (!pag)
+			break;
 		XFS_IS_CORRUPT(pag->pag_mount, atomic_read(&pag->pag_ref) != 0);
 		xfs_defer_drain_free(&pag->pag_intents_drain);