diff mbox series

xfs_repair: -EFSBADCRC needs action when read verifier detects it.

Message ID 20250226173335.558221-1-bodonnel@redhat.com (mailing list archive)
State New
Headers show
Series xfs_repair: -EFSBADCRC needs action when read verifier detects it. | expand

Commit Message

Bill O'Donnell Feb. 26, 2025, 5:32 p.m. UTC
From: Bill O'Donnell <bodonnel@redhat.com>

For xfs_repair, there is a case when -EFSBADCRC is encountered but not
acted on. Modify da_read_buf to check for and repair. The current
implementation fails for the case:

$ xfs_repair xfs_metadump_hosting.dmp.image
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
Metadata CRC error detected at 0x46cde8, xfs_dir3_block block 0xd3c50/0x1000
bad directory block magic # 0x16011664 in block 0 for directory inode 867467
corrupt directory block 0 for inode 867467
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 3
        - agno = 2
bad directory block magic # 0x16011664 in block 0 for directory inode 867467
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Metadata corruption detected at 0x46cc88, xfs_dir3_block block 0xd3c50/0x1000
libxfs_bwrite: write verifier failed on xfs_dir3_block bno 0xd3c50/0x8
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!

fatal error -- File system metadata writeout failed, err=117.  Re-run xfs_repair.


With the patch applied:
$ xfs_repair xfs_metadump_hosting.dmp.image
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
bad directory block magic # 0x16011664 in block 0 for directory inode 867467
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
bad directory block magic # 0x16011664 in block 0 for directory inode 867467
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
rebuilding directory inode 867467
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
---
 repair/da_util.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Darrick J. Wong Feb. 26, 2025, 6:20 p.m. UTC | #1
On Wed, Feb 26, 2025 at 11:32:22AM -0600, bodonnel@redhat.com wrote:
> From: Bill O'Donnell <bodonnel@redhat.com>
> 
> For xfs_repair, there is a case when -EFSBADCRC is encountered but not
> acted on. Modify da_read_buf to check for and repair. The current
> implementation fails for the case:
> 
> $ xfs_repair xfs_metadump_hosting.dmp.image
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
> Metadata CRC error detected at 0x46cde8, xfs_dir3_block block 0xd3c50/0x1000
> bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> corrupt directory block 0 for inode 867467

Curious -- this corrupt directory block fails the magic checks but
process_dir2_data returns 0 because it didn't find any corruption.
So it looks like we release the directory buffer (without dirtying it to
reset the checksum)...

>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 3
>         - agno = 2
> bad directory block magic # 0x16011664 in block 0 for directory inode 867467

...and then it shows up here again...

> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - traversing filesystem ...
> bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233

...and again here.  Now we reset the magic and dirty the buffer...

>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> Metadata corruption detected at 0x46cc88, xfs_dir3_block block 0xd3c50/0x1000

...but I guess we haven't fixed anything in the buffer, so the verifier
trips.  What code does 0x46cc88 map to in the dir3 block verifier
function?  That might reflect some missing code in process_dir2_data.

> libxfs_bwrite: write verifier failed on xfs_dir3_block bno 0xd3c50/0x8
> xfs_repair: Releasing dirty buffer to free list!
> xfs_repair: Refusing to write a corrupt buffer to the data device!
> xfs_repair: Lost a write to the data device!
> 
> fatal error -- File system metadata writeout failed, err=117.  Re-run xfs_repair.
> 
> 
> With the patch applied:
> $ xfs_repair xfs_metadump_hosting.dmp.image
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
> Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
> bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
> bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - traversing filesystem ...
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> rebuilding directory inode 867467
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> done
> 
> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
> ---
>  repair/da_util.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/repair/da_util.c b/repair/da_util.c
> index 7f94f4012062..0a4785e6f69b 100644
> --- a/repair/da_util.c
> +++ b/repair/da_util.c
> @@ -66,6 +66,9 @@ da_read_buf(
>  	}
>  	libxfs_buf_read_map(mp->m_dev, map, nex, LIBXFS_READBUF_SALVAGE,
>  			&bp, ops);
> +	if (bp->b_error == -EFSBADCRC) {
> +		libxfs_buf_relse(bp);

This introduces a use-after-free on the buffer pointer.

--D

> +	}
>  	if (map != map_array)
>  		free(map);
>  	return bp;
> -- 
> 2.48.1
> 
>
Bill O'Donnell Feb. 28, 2025, 3:27 p.m. UTC | #2
On Wed, Feb 26, 2025 at 10:20:02AM -0800, Darrick J. Wong wrote:
> On Wed, Feb 26, 2025 at 11:32:22AM -0600, bodonnel@redhat.com wrote:
> > From: Bill O'Donnell <bodonnel@redhat.com>
> > 
> > For xfs_repair, there is a case when -EFSBADCRC is encountered but not
> > acted on. Modify da_read_buf to check for and repair. The current
> > implementation fails for the case:
> > 
> > $ xfs_repair xfs_metadump_hosting.dmp.image
> > Phase 1 - find and verify superblock...
> > Phase 2 - using internal log
> >         - zero log...
> >         - scan filesystem freespace and inode maps...
> >         - found root inode chunk
> > Phase 3 - for each AG...
> >         - scan and clear agi unlinked lists...
> >         - process known inodes and perform inode discovery...
> >         - agno = 0
> > Metadata CRC error detected at 0x46cde8, xfs_dir3_block block 0xd3c50/0x1000
> > bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> > corrupt directory block 0 for inode 867467
> 
> Curious -- this corrupt directory block fails the magic checks but
> process_dir2_data returns 0 because it didn't find any corruption.
> So it looks like we release the directory buffer (without dirtying it to
> reset the checksum)...
> 
> >         - agno = 1
> >         - agno = 2
> >         - agno = 3
> >         - process newly discovered inodes...
> > Phase 4 - check for duplicate blocks...
> >         - setting up duplicate extent list...
> >         - check for inodes claiming duplicate blocks...
> >         - agno = 0
> >         - agno = 1
> >         - agno = 3
> >         - agno = 2
> > bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> 
> ...and then it shows up here again...
> 
> > Phase 5 - rebuild AG headers and trees...
> >         - reset superblock...
> > Phase 6 - check inode connectivity...
> >         - resetting contents of realtime bitmap and summary inodes
> >         - traversing filesystem ...
> > bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233
> 
> ...and again here.  Now we reset the magic and dirty the buffer...
> 
> >         - traversal finished ...
> >         - moving disconnected inodes to lost+found ...
> > Phase 7 - verify and correct link counts...
> > Metadata corruption detected at 0x46cc88, xfs_dir3_block block 0xd3c50/0x1000
> 
> ...but I guess we haven't fixed anything in the buffer, so the verifier
> trips.  What code does 0x46cc88 map to in the dir3 block verifier
> function?  That might reflect some missing code in process_dir2_data.

0x46cc88 maps to xfs_dir3_block_verify (bp=bp@entry=0x7fffbc103610) at xfs_dir2_block.c:56

> 
> > libxfs_bwrite: write verifier failed on xfs_dir3_block bno 0xd3c50/0x8
> > xfs_repair: Releasing dirty buffer to free list!
> > xfs_repair: Refusing to write a corrupt buffer to the data device!
> > xfs_repair: Lost a write to the data device!
> > 
> > fatal error -- File system metadata writeout failed, err=117.  Re-run xfs_repair.
> > 
> > 
> > With the patch applied:
> > $ xfs_repair xfs_metadump_hosting.dmp.image
> > Phase 1 - find and verify superblock...
> > Phase 2 - using internal log
> >         - zero log...
> >         - scan filesystem freespace and inode maps...
> >         - found root inode chunk
> > Phase 3 - for each AG...
> >         - scan and clear agi unlinked lists...
> >         - process known inodes and perform inode discovery...
> >         - agno = 0
> > Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
> > bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> >         - agno = 1
> >         - agno = 2
> >         - agno = 3
> >         - process newly discovered inodes...
> > Phase 4 - check for duplicate blocks...
> >         - setting up duplicate extent list...
> >         - check for inodes claiming duplicate blocks...
> >         - agno = 0
> >         - agno = 1
> >         - agno = 2
> >         - agno = 3
> > bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > Phase 5 - rebuild AG headers and trees...
> >         - reset superblock...
> > Phase 6 - check inode connectivity...
> >         - resetting contents of realtime bitmap and summary inodes
> >         - traversing filesystem ...
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > rebuilding directory inode 867467
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> >         - traversal finished ...
> >         - moving disconnected inodes to lost+found ...
> > Phase 7 - verify and correct link counts...
> > done
> > 
> > Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
> > ---
> >  repair/da_util.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/repair/da_util.c b/repair/da_util.c
> > index 7f94f4012062..0a4785e6f69b 100644
> > --- a/repair/da_util.c
> > +++ b/repair/da_util.c
> > @@ -66,6 +66,9 @@ da_read_buf(
> >  	}
> >  	libxfs_buf_read_map(mp->m_dev, map, nex, LIBXFS_READBUF_SALVAGE,
> >  			&bp, ops);
> > +	if (bp->b_error == -EFSBADCRC) {
> > +		libxfs_buf_relse(bp);
> 
> This introduces a use-after-free on the buffer pointer.
> 
> --D
> 
> > +	}
> >  	if (map != map_array)
> >  		free(map);
> >  	return bp;
> > -- 
> > 2.48.1
> > 
> > 
>
Darrick J. Wong Feb. 28, 2025, 5:35 p.m. UTC | #3
On Fri, Feb 28, 2025 at 09:27:31AM -0600, Bill O'Donnell wrote:
> On Wed, Feb 26, 2025 at 10:20:02AM -0800, Darrick J. Wong wrote:
> > On Wed, Feb 26, 2025 at 11:32:22AM -0600, bodonnel@redhat.com wrote:
> > > From: Bill O'Donnell <bodonnel@redhat.com>
> > > 
> > > For xfs_repair, there is a case when -EFSBADCRC is encountered but not
> > > acted on. Modify da_read_buf to check for and repair. The current
> > > implementation fails for the case:
> > > 
> > > $ xfs_repair xfs_metadump_hosting.dmp.image
> > > Phase 1 - find and verify superblock...
> > > Phase 2 - using internal log
> > >         - zero log...
> > >         - scan filesystem freespace and inode maps...
> > >         - found root inode chunk
> > > Phase 3 - for each AG...
> > >         - scan and clear agi unlinked lists...
> > >         - process known inodes and perform inode discovery...
> > >         - agno = 0
> > > Metadata CRC error detected at 0x46cde8, xfs_dir3_block block 0xd3c50/0x1000
> > > bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> > > corrupt directory block 0 for inode 867467
> > 
> > Curious -- this corrupt directory block fails the magic checks but
> > process_dir2_data returns 0 because it didn't find any corruption.
> > So it looks like we release the directory buffer (without dirtying it to
> > reset the checksum)...
> > 
> > >         - agno = 1
> > >         - agno = 2
> > >         - agno = 3
> > >         - process newly discovered inodes...
> > > Phase 4 - check for duplicate blocks...
> > >         - setting up duplicate extent list...
> > >         - check for inodes claiming duplicate blocks...
> > >         - agno = 0
> > >         - agno = 1
> > >         - agno = 3
> > >         - agno = 2
> > > bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> > 
> > ...and then it shows up here again...
> > 
> > > Phase 5 - rebuild AG headers and trees...
> > >         - reset superblock...
> > > Phase 6 - check inode connectivity...
> > >         - resetting contents of realtime bitmap and summary inodes
> > >         - traversing filesystem ...
> > > bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233
> > 
> > ...and again here.  Now we reset the magic and dirty the buffer...
> > 
> > >         - traversal finished ...
> > >         - moving disconnected inodes to lost+found ...
> > > Phase 7 - verify and correct link counts...
> > > Metadata corruption detected at 0x46cc88, xfs_dir3_block block 0xd3c50/0x1000
> > 
> > ...but I guess we haven't fixed anything in the buffer, so the verifier
> > trips.  What code does 0x46cc88 map to in the dir3 block verifier
> > function?  That might reflect some missing code in process_dir2_data.
> 
> 0x46cc88 maps to xfs_dir3_block_verify (bp=bp@entry=0x7fffbc103610) at xfs_dir2_block.c:56

On my git tree that maps to this code:

	struct xfs_mount	*mp = bp->b_mount;
	struct xfs_dir3_blk_hdr	*hdr3 = bp->b_addr;

	if (!xfs_verify_magic(bp, hdr3->magic))
		return __this_address;

	if (xfs_has_crc(mp)) {
		if (!uuid_equal(&hdr3->uuid, &mp->m_sb.sb_meta_uuid))
			return __this_address;

So I guess the UUID is broken too.  If you let process_dir2_data step
through the block, does it actually manage to parse whatever's there
without complaints?  And does it recover any entries from that mess?

I'm wondering if xfs_repair should junk the block if it fails the
verifier and there aren't any actual directory entries in it?

--D

> > 
> > > libxfs_bwrite: write verifier failed on xfs_dir3_block bno 0xd3c50/0x8
> > > xfs_repair: Releasing dirty buffer to free list!
> > > xfs_repair: Refusing to write a corrupt buffer to the data device!
> > > xfs_repair: Lost a write to the data device!
> > > 
> > > fatal error -- File system metadata writeout failed, err=117.  Re-run xfs_repair.
> > > 
> > > 
> > > With the patch applied:
> > > $ xfs_repair xfs_metadump_hosting.dmp.image
> > > Phase 1 - find and verify superblock...
> > > Phase 2 - using internal log
> > >         - zero log...
> > >         - scan filesystem freespace and inode maps...
> > >         - found root inode chunk
> > > Phase 3 - for each AG...
> > >         - scan and clear agi unlinked lists...
> > >         - process known inodes and perform inode discovery...
> > >         - agno = 0
> > > Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
> > > bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > >         - agno = 1
> > >         - agno = 2
> > >         - agno = 3
> > >         - process newly discovered inodes...
> > > Phase 4 - check for duplicate blocks...
> > >         - setting up duplicate extent list...
> > >         - check for inodes claiming duplicate blocks...
> > >         - agno = 0
> > >         - agno = 1
> > >         - agno = 2
> > >         - agno = 3
> > > bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > Phase 5 - rebuild AG headers and trees...
> > >         - reset superblock...
> > > Phase 6 - check inode connectivity...
> > >         - resetting contents of realtime bitmap and summary inodes
> > >         - traversing filesystem ...
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > rebuilding directory inode 867467
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > >         - traversal finished ...
> > >         - moving disconnected inodes to lost+found ...
> > > Phase 7 - verify and correct link counts...
> > > done
> > > 
> > > Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
> > > ---
> > >  repair/da_util.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/repair/da_util.c b/repair/da_util.c
> > > index 7f94f4012062..0a4785e6f69b 100644
> > > --- a/repair/da_util.c
> > > +++ b/repair/da_util.c
> > > @@ -66,6 +66,9 @@ da_read_buf(
> > >  	}
> > >  	libxfs_buf_read_map(mp->m_dev, map, nex, LIBXFS_READBUF_SALVAGE,
> > >  			&bp, ops);
> > > +	if (bp->b_error == -EFSBADCRC) {
> > > +		libxfs_buf_relse(bp);
> > 
> > This introduces a use-after-free on the buffer pointer.
> > 
> > --D
> > 
> > > +	}
> > >  	if (map != map_array)
> > >  		free(map);
> > >  	return bp;
> > > -- 
> > > 2.48.1
> > > 
> > > 
> > 
> 
>
diff mbox series

Patch

diff --git a/repair/da_util.c b/repair/da_util.c
index 7f94f4012062..0a4785e6f69b 100644
--- a/repair/da_util.c
+++ b/repair/da_util.c
@@ -66,6 +66,9 @@  da_read_buf(
 	}
 	libxfs_buf_read_map(mp->m_dev, map, nex, LIBXFS_READBUF_SALVAGE,
 			&bp, ops);
+	if (bp->b_error == -EFSBADCRC) {
+		libxfs_buf_relse(bp);
+	}
 	if (map != map_array)
 		free(map);
 	return bp;