Message ID | 20231209122107.2422441-2-leo.lilong@huawei.com (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
Series | [v2,1/3] xfs: add lock protection when remove perag from radix tree | expand |
On Sat, Dec 09, 2023 at 08:21:06PM +0800, Long Li wrote: > When releasing the perag in xfs_free_perag(), the assertion that the > perag in readix tree is correct in most cases. However, there is one > corner case where the assertion is not true. During log recovery, the > AGs become visible(that is included in mp->m_sb.sb_agcount) first, and > then the perag is initialized. If the initialization of the perag fails, > the assertion will be triggered. Worse yet, null pointer dereferencing > can occur. > > Signed-off-by: Long Li <leo.lilong@huawei.com> > --- > fs/xfs/libxfs/xfs_ag.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c > index cc10a3ca052f..11ed048c350c 100644 > --- a/fs/xfs/libxfs/xfs_ag.c > +++ b/fs/xfs/libxfs/xfs_ag.c > @@ -258,7 +258,8 @@ xfs_free_perag( > spin_lock(&mp->m_perag_lock); > pag = radix_tree_delete(&mp->m_perag_tree, agno); > spin_unlock(&mp->m_perag_lock); > - ASSERT(pag); > + if (!pag) > + break; Why wouldn't you continue to the next agnumber? --D > XFS_IS_CORRUPT(pag->pag_mount, atomic_read(&pag->pag_ref) != 0); > xfs_defer_drain_free(&pag->pag_intents_drain); > > -- > 2.31.1 > >
On Sat, Dec 09, 2023 at 08:21:06PM +0800, Long Li wrote: > When releasing the perag in xfs_free_perag(), the assertion that the > perag in readix tree is correct in most cases. However, there is one > corner case where the assertion is not true. During log recovery, the > AGs become visible(that is included in mp->m_sb.sb_agcount) first, and > then the perag is initialized. If the initialization of the perag fails, > the assertion will be triggered. Worse yet, null pointer dereferencing > can occur. I'm going to assume that you are talking about xlog_do_recover() because the commit message doesn't actually tell us how this situation occurs. That code re-reads the superblock, then copies it to mp->m_sb, then calls xfs_initialize_perag() with the values from mp->m_sb. If log recovery replayed a growfs transaction, the mp->m_sb has a larger sb_agcount and so then xfs_initialize_perag() is called and if that fails we end up back in xfs_mountfs and the error stack calls xfs_free_perag(). Is that correct? If so, then the fix is to change how xlog_do_recover() works. It needs to initialise the new perags before it updates the in-memory superblock. If xfs_initialize_perag() fails, it undoes all the changes it has made, so if we haven't updated the in-memory superblock when the init of the new perags fails then the error unwinding code works exactly as it should right now. i.e. the bug is that xlog_do_recover() is leaving the in-memory state inconsistent on init failure, and we need to fix that rather than remove the assert that is telling us that in-memory state is inconsistent.... -Dave.
On Tue, Dec 12, 2023 at 09:00:50AM +1100, Dave Chinner wrote: > On Sat, Dec 09, 2023 at 08:21:06PM +0800, Long Li wrote: > > When releasing the perag in xfs_free_perag(), the assertion that the > > perag in readix tree is correct in most cases. However, there is one > > corner case where the assertion is not true. During log recovery, the > > AGs become visible(that is included in mp->m_sb.sb_agcount) first, and > > then the perag is initialized. If the initialization of the perag fails, > > the assertion will be triggered. Worse yet, null pointer dereferencing > > can occur. > > I'm going to assume that you are talking about xlog_do_recover() > because the commit message doesn't actually tell us how this > situation occurs. > > That code re-reads the superblock, then copies it to mp->m_sb, > then calls xfs_initialize_perag() with the values from mp->m_sb. > > If log recovery replayed a growfs transaction, the mp->m_sb has a > larger sb_agcount and so then xfs_initialize_perag() is called > and if that fails we end up back in xfs_mountfs and the error > stack calls xfs_free_perag(). > > Is that correct? Yes, you are right. When I tried to fix the perag leak issue in patch 3, I found this problem. > > If so, then the fix is to change how xlog_do_recover() works. It > needs to initialise the new perags before it updates the in-memory > superblock. If xfs_initialize_perag() fails, it undoes all the > changes it has made, so if we haven't updated the in-memory > superblock when the init of the new perags fails then the error > unwinding code works exactly as it should right now. > > i.e. the bug is that xlog_do_recover() is leaving the in-memory > state inconsistent on init failure, and we need to fix that rather > than remove the assert that is telling us that in-memory state is > inconsistent.... > Yes, agree with you, I used to think that removing the assertion would solve the problem, but now it seems a bit lazy, the problem should be solved at the source. Right now, I haven't figured out how to fix this problem comprehensively, so I'll fix perag leak issue first. Thanks, Long Li
diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c index cc10a3ca052f..11ed048c350c 100644 --- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -258,7 +258,8 @@ xfs_free_perag( spin_lock(&mp->m_perag_lock); pag = radix_tree_delete(&mp->m_perag_tree, agno); spin_unlock(&mp->m_perag_lock); - ASSERT(pag); + if (!pag) + break; XFS_IS_CORRUPT(pag->pag_mount, atomic_read(&pag->pag_ref) != 0); xfs_defer_drain_free(&pag->pag_intents_drain);
When releasing the perag in xfs_free_perag(), the assertion that the perag in readix tree is correct in most cases. However, there is one corner case where the assertion is not true. During log recovery, the AGs become visible(that is included in mp->m_sb.sb_agcount) first, and then the perag is initialized. If the initialization of the perag fails, the assertion will be triggered. Worse yet, null pointer dereferencing can occur. Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/xfs/libxfs/xfs_ag.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)