diff mbox series

xfs_logprint: Fix super block buffer interpretation issue

Message ID 20241011030810.1083636-1-chizhiling@163.com (mailing list archive)
State Not Applicable, archived
Headers show
Series xfs_logprint: Fix super block buffer interpretation issue | expand

Commit Message

Chi Zhiling Oct. 11, 2024, 3:08 a.m. UTC
From: chizhiling <chizhiling@kylinos.cn>

When using xfs_logprint to interpret the buffer of the super block, the
icount will always be 6360863066640355328 (0x5846534200001000). This is
because the offset of icount is incorrect, causing xfs_logprint to
misinterpret the MAGIC number as icount.
This patch fixes the offset value of the SB counters in xfs_logprint.

Before this patch:
icount: 6360863066640355328  ifree: 5242880  fdblks: 0  frext: 0

After this patch:
icount: 10240  ifree: 4906  fdblks: 37  frext: 0

Signed-off-by: chizhiling <chizhiling@kylinos.cn>
---
 logprint/log_misc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Darrick J. Wong Oct. 11, 2024, 3:24 a.m. UTC | #1
On Fri, Oct 11, 2024 at 11:08:10AM +0800, Chi Zhiling wrote:
> From: chizhiling <chizhiling@kylinos.cn>
> 
> When using xfs_logprint to interpret the buffer of the super block, the
> icount will always be 6360863066640355328 (0x5846534200001000). This is
> because the offset of icount is incorrect, causing xfs_logprint to
> misinterpret the MAGIC number as icount.
> This patch fixes the offset value of the SB counters in xfs_logprint.
> 
> Before this patch:
> icount: 6360863066640355328  ifree: 5242880  fdblks: 0  frext: 0
> 
> After this patch:
> icount: 10240  ifree: 4906  fdblks: 37  frext: 0
> 
> Signed-off-by: chizhiling <chizhiling@kylinos.cn>
> ---
>  logprint/log_misc.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/logprint/log_misc.c b/logprint/log_misc.c
> index 8e86ac34..21da5b8b 100644
> --- a/logprint/log_misc.c
> +++ b/logprint/log_misc.c
> @@ -288,13 +288,13 @@ xlog_print_trans_buffer(char **ptr, int len, int *i, int num_ops)
>  			/*
>  			 * memmove because *ptr may not be 8-byte aligned
>  			 */
> -			memmove(&a, *ptr, sizeof(__be64));
> -			memmove(&b, *ptr+8, sizeof(__be64));

How did this ever work??  This even looks wrong in "Release_1.0.0".

> +			memmove(&a, *ptr + offsetof(struct xfs_dsb, sb_icount), sizeof(__be64));
> +			memmove(&b, *ptr + offsetof(struct xfs_dsb, sb_ifree), sizeof(__be64));

Why not do:

			struct xfs_dsb *dsb = *ptr;

			memcpy(&a, &dsb->sb_icount, sizeof(a));

or better yet, skip the indirection and do

			printf(_("icount: %llu  ifree: %llu  "),
					(unsigned long long)be64_to_cpu(dsb->sb_icount),
					(unsigned long long)be64_to_cpu(dsb->sb_ifree));

Hm?

--D

>  			printf(_("icount: %llu  ifree: %llu  "),
>  			       (unsigned long long) be64_to_cpu(a),
>  			       (unsigned long long) be64_to_cpu(b));
> -			memmove(&a, *ptr+16, sizeof(__be64));
> -			memmove(&b, *ptr+24, sizeof(__be64));
> +			memmove(&a, *ptr + offsetof(struct xfs_dsb, sb_fdblocks), sizeof(__be64));
> +			memmove(&b, *ptr + offsetof(struct xfs_dsb, sb_frextents), sizeof(__be64));
>  			printf(_("fdblks: %llu  frext: %llu\n"),
>  			       (unsigned long long) be64_to_cpu(a),
>  			       (unsigned long long) be64_to_cpu(b));
> -- 
> 2.43.0
>
Chi Zhiling Oct. 11, 2024, 3:54 a.m. UTC | #2
On 2024/10/11 11:24, Darrick J. Wong wrote:
> On Fri, Oct 11, 2024 at 11:08:10AM +0800, Chi Zhiling wrote:
>> From: chizhiling <chizhiling@kylinos.cn>
>>
>> When using xfs_logprint to interpret the buffer of the super block, the
>> icount will always be 6360863066640355328 (0x5846534200001000). This is
>> because the offset of icount is incorrect, causing xfs_logprint to
>> misinterpret the MAGIC number as icount.
>> This patch fixes the offset value of the SB counters in xfs_logprint.
>>
>> Before this patch:
>> icount: 6360863066640355328  ifree: 5242880  fdblks: 0  frext: 0
>>
>> After this patch:
>> icount: 10240  ifree: 4906  fdblks: 37  frext: 0
>>
>> Signed-off-by: chizhiling <chizhiling@kylinos.cn>
>> ---
>>   logprint/log_misc.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/logprint/log_misc.c b/logprint/log_misc.c
>> index 8e86ac34..21da5b8b 100644
>> --- a/logprint/log_misc.c
>> +++ b/logprint/log_misc.c
>> @@ -288,13 +288,13 @@ xlog_print_trans_buffer(char **ptr, int len, int *i, int num_ops)
>>   			/*
>>   			 * memmove because *ptr may not be 8-byte aligned
>>   			 */
>> -			memmove(&a, *ptr, sizeof(__be64));
>> -			memmove(&b, *ptr+8, sizeof(__be64));
> How did this ever work??  This even looks wrong in "Release_1.0.0".
>
Yes, I was surprised when I find this issue
>> +			memmove(&a, *ptr + offsetof(struct xfs_dsb, sb_icount), sizeof(__be64));
>> +			memmove(&b, *ptr + offsetof(struct xfs_dsb, sb_ifree), sizeof(__be64));
> Why not do:
>
> 			struct xfs_dsb *dsb = *ptr;
>
> 			memcpy(&a, &dsb->sb_icount, sizeof(a));
>
> or better yet, skip the indirection and do
>
> 			printf(_("icount: %llu  ifree: %llu  "),
> 					(unsigned long long)be64_to_cpu(dsb->sb_icount),
> 					(unsigned long long)be64_to_cpu(dsb->sb_ifree));
>
> Hm?

Yes, of course we can do it this way, I just want the fix patch to look 
smaller :)

I think both ok.


chi

>
> --D
>
>>   			printf(_("icount: %llu  ifree: %llu  "),
>>   			       (unsigned long long) be64_to_cpu(a),
>>   			       (unsigned long long) be64_to_cpu(b));
>> -			memmove(&a, *ptr+16, sizeof(__be64));
>> -			memmove(&b, *ptr+24, sizeof(__be64));
>> +			memmove(&a, *ptr + offsetof(struct xfs_dsb, sb_fdblocks), sizeof(__be64));
>> +			memmove(&b, *ptr + offsetof(struct xfs_dsb, sb_frextents), sizeof(__be64));
>>   			printf(_("fdblks: %llu  frext: %llu\n"),
>>   			       (unsigned long long) be64_to_cpu(a),
>>   			       (unsigned long long) be64_to_cpu(b));
>> -- 
>> 2.43.0
>>
Dave Chinner Oct. 12, 2024, 10:10 p.m. UTC | #3
On Fri, Oct 11, 2024 at 11:54:08AM +0800, Chi Zhiling wrote:
> 
> On 2024/10/11 11:24, Darrick J. Wong wrote:
> > On Fri, Oct 11, 2024 at 11:08:10AM +0800, Chi Zhiling wrote:
> > > From: chizhiling <chizhiling@kylinos.cn>
> > > 
> > > When using xfs_logprint to interpret the buffer of the super block, the
> > > icount will always be 6360863066640355328 (0x5846534200001000). This is
> > > because the offset of icount is incorrect, causing xfs_logprint to
> > > misinterpret the MAGIC number as icount.
> > > This patch fixes the offset value of the SB counters in xfs_logprint.
> > > 
> > > Before this patch:
> > > icount: 6360863066640355328  ifree: 5242880  fdblks: 0  frext: 0
> > > 
> > > After this patch:
> > > icount: 10240  ifree: 4906  fdblks: 37  frext: 0
> > > 
> > > Signed-off-by: chizhiling <chizhiling@kylinos.cn>
> > > ---
> > >   logprint/log_misc.c | 8 ++++----
> > >   1 file changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/logprint/log_misc.c b/logprint/log_misc.c
> > > index 8e86ac34..21da5b8b 100644
> > > --- a/logprint/log_misc.c
> > > +++ b/logprint/log_misc.c
> > > @@ -288,13 +288,13 @@ xlog_print_trans_buffer(char **ptr, int len, int *i, int num_ops)
> > >   			/*
> > >   			 * memmove because *ptr may not be 8-byte aligned
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This is important. I'll come back to it.

> > >   			 */
> > > -			memmove(&a, *ptr, sizeof(__be64));
> > > -			memmove(&b, *ptr+8, sizeof(__be64));
> > How did this ever work??  This even looks wrong in "Release_1.0.0".
> > 
> Yes, I was surprised when I find this issue

I"ve never cared about these values when doing diagnosis because
lazy-count means they aren't guaranteed to be correct except at
unmount. At which point, the correct values are generally found
in the superblock. IOWs, the values are largely meaningless whether
they are correct or not, so nobody has really cared enough about
this to bother fixing it...

> > > +			memmove(&a, *ptr + offsetof(struct xfs_dsb, sb_icount), sizeof(__be64));
> > > +			memmove(&b, *ptr + offsetof(struct xfs_dsb, sb_ifree), sizeof(__be64));
> > Why not do:
> > 
> > 			struct xfs_dsb *dsb = *ptr;
> > 
> > 			memcpy(&a, &dsb->sb_icount, sizeof(a));
> > 
> > or better yet, skip the indirection and do
> > 
> > 			printf(_("icount: %llu  ifree: %llu  "),
> > 					(unsigned long long)be64_to_cpu(dsb->sb_icount),
> > 					(unsigned long long)be64_to_cpu(dsb->sb_ifree));
> > 
> > Hm?
> 
> Yes, of course we can do it this way, I just want the fix patch to look
> smaller :)

This won't work on platforms where we can't do 4-byte aligned 64 bit
accesses (various risc archs), hence the comment above about using
memmove.

It should work if we use get_unaligned_be64() rather than
be64_to_cpu() - I would suggest that all such structure decoding in
logprint would need to do the same thing...

-Dave.
Dave Chinner Oct. 13, 2024, 10:50 p.m. UTC | #4
On Sun, Oct 13, 2024 at 12:00:22PM +0800, Chi Zhiling wrote:
> On 2024/10/13 06:10, Dave Chinner wrote:
> > On Fri, Oct 11, 2024 at 11:54:08AM +0800, Chi Zhiling wrote:
> > > On 2024/10/11 11:24, Darrick J. Wong wrote:
> > > > On Fri, Oct 11, 2024 at 11:08:10AM +0800, Chi Zhiling wrote:
> > > > > From: chizhiling<chizhiling@kylinos.cn>
> > > > > 
> > > > > When using xfs_logprint to interpret the buffer of the super block, the
> > > > > icount will always be 6360863066640355328 (0x5846534200001000). This is
> > > > > because the offset of icount is incorrect, causing xfs_logprint to
> > > > > misinterpret the MAGIC number as icount.
> > > > > This patch fixes the offset value of the SB counters in xfs_logprint.
> > > > > 
> > > > > Before this patch:
> > > > > icount: 6360863066640355328  ifree: 5242880  fdblks: 0  frext: 0
> > > > > 
> > > > > After this patch:
> > > > > icount: 10240  ifree: 4906  fdblks: 37  frext: 0
> > > > > 
> > > > > Signed-off-by: chizhiling<chizhiling@kylinos.cn>
> > > > > ---
> > > > >    logprint/log_misc.c | 8 ++++----
> > > > >    1 file changed, 4 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/logprint/log_misc.c b/logprint/log_misc.c
> > > > > index 8e86ac34..21da5b8b 100644
> > > > > --- a/logprint/log_misc.c
> > > > > +++ b/logprint/log_misc.c
> > > > > @@ -288,13 +288,13 @@ xlog_print_trans_buffer(char **ptr, int len, int *i, int num_ops)
> > > > >    			/*
> > > > >    			 * memmove because *ptr may not be 8-byte aligned
> >                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> > This is important. I'll come back to it.
> > 
> > > > >    			 */
> > > > > -			memmove(&a, *ptr, sizeof(__be64));
> > > > > -			memmove(&b, *ptr+8, sizeof(__be64));
> > > > How did this ever work??  This even looks wrong in "Release_1.0.0".
> > > > 
> > > Yes, I was surprised when I find this issue
> > I"ve never cared about these values when doing diagnosis because
> > lazy-count means they aren't guaranteed to be correct except at
> > unmount. At which point, the correct values are generally found
> > in the superblock. IOWs, the values are largely meaningless whether
> > they are correct or not, so nobody has really cared enough about
> > this to bother fixing it...
> 
> Because I got a log which shows that the fdblocks was (-8),   it caused
> the filesystem to fail mounting again. 'SB summary counter sanity check failed'

What kernel? Because AFAIK, that was fixed in commit 58f880711f2b
("xfs: make sure sb_fdblocks is non-negative") in 6.10...

-Dave.
Chi Zhiling Oct. 14, 2024, 5:51 a.m. UTC | #5
On 2024/10/14 06:50, Dave Chinner wrote:
> On Sun, Oct 13, 2024 at 12:00:22PM +0800, Chi Zhiling wrote:
>> On 2024/10/13 06:10, Dave Chinner wrote:
>>> On Fri, Oct 11, 2024 at 11:54:08AM +0800, Chi Zhiling wrote:
>>>> On 2024/10/11 11:24, Darrick J. Wong wrote:
>>>>> On Fri, Oct 11, 2024 at 11:08:10AM +0800, Chi Zhiling wrote:
>>>>>> From: chizhiling<chizhiling@kylinos.cn>
>>>>>>
>>>>>> When using xfs_logprint to interpret the buffer of the super block, the
>>>>>> icount will always be 6360863066640355328 (0x5846534200001000). This is
>>>>>> because the offset of icount is incorrect, causing xfs_logprint to
>>>>>> misinterpret the MAGIC number as icount.
>>>>>> This patch fixes the offset value of the SB counters in xfs_logprint.
>>>>>>
>>>>>> Before this patch:
>>>>>> icount: 6360863066640355328  ifree: 5242880  fdblks: 0  frext: 0
>>>>>>
>>>>>> After this patch:
>>>>>> icount: 10240  ifree: 4906  fdblks: 37  frext: 0
>>>>>>
>>>>>> Signed-off-by: chizhiling<chizhiling@kylinos.cn>
>>>>>> ---
>>>>>>     logprint/log_misc.c | 8 ++++----
>>>>>>     1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/logprint/log_misc.c b/logprint/log_misc.c
>>>>>> index 8e86ac34..21da5b8b 100644
>>>>>> --- a/logprint/log_misc.c
>>>>>> +++ b/logprint/log_misc.c
>>>>>> @@ -288,13 +288,13 @@ xlog_print_trans_buffer(char **ptr, int len, int *i, int num_ops)
>>>>>>     			/*
>>>>>>     			 * memmove because *ptr may not be 8-byte aligned
>>>                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>
>>> This is important. I'll come back to it.
>>>
>>>>>>     			 */
>>>>>> -			memmove(&a, *ptr, sizeof(__be64));
>>>>>> -			memmove(&b, *ptr+8, sizeof(__be64));
>>>>> How did this ever work??  This even looks wrong in "Release_1.0.0".
>>>>>
>>>> Yes, I was surprised when I find this issue
>>> I"ve never cared about these values when doing diagnosis because
>>> lazy-count means they aren't guaranteed to be correct except at
>>> unmount. At which point, the correct values are generally found
>>> in the superblock. IOWs, the values are largely meaningless whether
>>> they are correct or not, so nobody has really cared enough about
>>> this to bother fixing it...
>> Because I got a log which shows that the fdblocks was (-8),   it caused
>> the filesystem to fail mounting again. 'SB summary counter sanity check failed'
> What kernel? Because AFAIK, that was fixed in commit 58f880711f2b
> ("xfs: make sure sb_fdblocks is non-negative") in 6.10...

It's a 4.19 kernel. As you said, the fdblocks is meaningless, I think 
that patch (commit 58f880711f2b) is enough to fix the issue.

Thank you for your reminding.


chi
diff mbox series

Patch

diff --git a/logprint/log_misc.c b/logprint/log_misc.c
index 8e86ac34..21da5b8b 100644
--- a/logprint/log_misc.c
+++ b/logprint/log_misc.c
@@ -288,13 +288,13 @@  xlog_print_trans_buffer(char **ptr, int len, int *i, int num_ops)
 			/*
 			 * memmove because *ptr may not be 8-byte aligned
 			 */
-			memmove(&a, *ptr, sizeof(__be64));
-			memmove(&b, *ptr+8, sizeof(__be64));
+			memmove(&a, *ptr + offsetof(struct xfs_dsb, sb_icount), sizeof(__be64));
+			memmove(&b, *ptr + offsetof(struct xfs_dsb, sb_ifree), sizeof(__be64));
 			printf(_("icount: %llu  ifree: %llu  "),
 			       (unsigned long long) be64_to_cpu(a),
 			       (unsigned long long) be64_to_cpu(b));
-			memmove(&a, *ptr+16, sizeof(__be64));
-			memmove(&b, *ptr+24, sizeof(__be64));
+			memmove(&a, *ptr + offsetof(struct xfs_dsb, sb_fdblocks), sizeof(__be64));
+			memmove(&b, *ptr + offsetof(struct xfs_dsb, sb_frextents), sizeof(__be64));
 			printf(_("fdblks: %llu  frext: %llu\n"),
 			       (unsigned long long) be64_to_cpu(a),
 			       (unsigned long long) be64_to_cpu(b));