Message ID | 20241011030810.1083636-1-chizhiling@163.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | xfs_logprint: Fix super block buffer interpretation issue | expand |
On Fri, Oct 11, 2024 at 11:08:10AM +0800, Chi Zhiling wrote: > From: chizhiling <chizhiling@kylinos.cn> > > When using xfs_logprint to interpret the buffer of the super block, the > icount will always be 6360863066640355328 (0x5846534200001000). This is > because the offset of icount is incorrect, causing xfs_logprint to > misinterpret the MAGIC number as icount. > This patch fixes the offset value of the SB counters in xfs_logprint. > > Before this patch: > icount: 6360863066640355328 ifree: 5242880 fdblks: 0 frext: 0 > > After this patch: > icount: 10240 ifree: 4906 fdblks: 37 frext: 0 > > Signed-off-by: chizhiling <chizhiling@kylinos.cn> > --- > logprint/log_misc.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/logprint/log_misc.c b/logprint/log_misc.c > index 8e86ac34..21da5b8b 100644 > --- a/logprint/log_misc.c > +++ b/logprint/log_misc.c > @@ -288,13 +288,13 @@ xlog_print_trans_buffer(char **ptr, int len, int *i, int num_ops) > /* > * memmove because *ptr may not be 8-byte aligned > */ > - memmove(&a, *ptr, sizeof(__be64)); > - memmove(&b, *ptr+8, sizeof(__be64)); How did this ever work?? This even looks wrong in "Release_1.0.0". > + memmove(&a, *ptr + offsetof(struct xfs_dsb, sb_icount), sizeof(__be64)); > + memmove(&b, *ptr + offsetof(struct xfs_dsb, sb_ifree), sizeof(__be64)); Why not do: struct xfs_dsb *dsb = *ptr; memcpy(&a, &dsb->sb_icount, sizeof(a)); or better yet, skip the indirection and do printf(_("icount: %llu ifree: %llu "), (unsigned long long)be64_to_cpu(dsb->sb_icount), (unsigned long long)be64_to_cpu(dsb->sb_ifree)); Hm? --D > printf(_("icount: %llu ifree: %llu "), > (unsigned long long) be64_to_cpu(a), > (unsigned long long) be64_to_cpu(b)); > - memmove(&a, *ptr+16, sizeof(__be64)); > - memmove(&b, *ptr+24, sizeof(__be64)); > + memmove(&a, *ptr + offsetof(struct xfs_dsb, sb_fdblocks), sizeof(__be64)); > + memmove(&b, *ptr + offsetof(struct xfs_dsb, sb_frextents), sizeof(__be64)); > printf(_("fdblks: %llu frext: %llu\n"), > (unsigned long long) be64_to_cpu(a), > (unsigned long long) be64_to_cpu(b)); > -- > 2.43.0 >
On 2024/10/11 11:24, Darrick J. Wong wrote: > On Fri, Oct 11, 2024 at 11:08:10AM +0800, Chi Zhiling wrote: >> From: chizhiling <chizhiling@kylinos.cn> >> >> When using xfs_logprint to interpret the buffer of the super block, the >> icount will always be 6360863066640355328 (0x5846534200001000). This is >> because the offset of icount is incorrect, causing xfs_logprint to >> misinterpret the MAGIC number as icount. >> This patch fixes the offset value of the SB counters in xfs_logprint. >> >> Before this patch: >> icount: 6360863066640355328 ifree: 5242880 fdblks: 0 frext: 0 >> >> After this patch: >> icount: 10240 ifree: 4906 fdblks: 37 frext: 0 >> >> Signed-off-by: chizhiling <chizhiling@kylinos.cn> >> --- >> logprint/log_misc.c | 8 ++++---- >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/logprint/log_misc.c b/logprint/log_misc.c >> index 8e86ac34..21da5b8b 100644 >> --- a/logprint/log_misc.c >> +++ b/logprint/log_misc.c >> @@ -288,13 +288,13 @@ xlog_print_trans_buffer(char **ptr, int len, int *i, int num_ops) >> /* >> * memmove because *ptr may not be 8-byte aligned >> */ >> - memmove(&a, *ptr, sizeof(__be64)); >> - memmove(&b, *ptr+8, sizeof(__be64)); > How did this ever work?? This even looks wrong in "Release_1.0.0". > Yes, I was surprised when I find this issue >> + memmove(&a, *ptr + offsetof(struct xfs_dsb, sb_icount), sizeof(__be64)); >> + memmove(&b, *ptr + offsetof(struct xfs_dsb, sb_ifree), sizeof(__be64)); > Why not do: > > struct xfs_dsb *dsb = *ptr; > > memcpy(&a, &dsb->sb_icount, sizeof(a)); > > or better yet, skip the indirection and do > > printf(_("icount: %llu ifree: %llu "), > (unsigned long long)be64_to_cpu(dsb->sb_icount), > (unsigned long long)be64_to_cpu(dsb->sb_ifree)); > > Hm? Yes, of course we can do it this way, I just want the fix patch to look smaller :) I think both ok. chi > > --D > >> printf(_("icount: %llu ifree: %llu "), >> (unsigned long long) be64_to_cpu(a), >> (unsigned long long) be64_to_cpu(b)); >> - memmove(&a, *ptr+16, sizeof(__be64)); >> - memmove(&b, *ptr+24, sizeof(__be64)); >> + memmove(&a, *ptr + offsetof(struct xfs_dsb, sb_fdblocks), sizeof(__be64)); >> + memmove(&b, *ptr + offsetof(struct xfs_dsb, sb_frextents), sizeof(__be64)); >> printf(_("fdblks: %llu frext: %llu\n"), >> (unsigned long long) be64_to_cpu(a), >> (unsigned long long) be64_to_cpu(b)); >> -- >> 2.43.0 >>
On Fri, Oct 11, 2024 at 11:54:08AM +0800, Chi Zhiling wrote: > > On 2024/10/11 11:24, Darrick J. Wong wrote: > > On Fri, Oct 11, 2024 at 11:08:10AM +0800, Chi Zhiling wrote: > > > From: chizhiling <chizhiling@kylinos.cn> > > > > > > When using xfs_logprint to interpret the buffer of the super block, the > > > icount will always be 6360863066640355328 (0x5846534200001000). This is > > > because the offset of icount is incorrect, causing xfs_logprint to > > > misinterpret the MAGIC number as icount. > > > This patch fixes the offset value of the SB counters in xfs_logprint. > > > > > > Before this patch: > > > icount: 6360863066640355328 ifree: 5242880 fdblks: 0 frext: 0 > > > > > > After this patch: > > > icount: 10240 ifree: 4906 fdblks: 37 frext: 0 > > > > > > Signed-off-by: chizhiling <chizhiling@kylinos.cn> > > > --- > > > logprint/log_misc.c | 8 ++++---- > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > diff --git a/logprint/log_misc.c b/logprint/log_misc.c > > > index 8e86ac34..21da5b8b 100644 > > > --- a/logprint/log_misc.c > > > +++ b/logprint/log_misc.c > > > @@ -288,13 +288,13 @@ xlog_print_trans_buffer(char **ptr, int len, int *i, int num_ops) > > > /* > > > * memmove because *ptr may not be 8-byte aligned ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is important. I'll come back to it. > > > */ > > > - memmove(&a, *ptr, sizeof(__be64)); > > > - memmove(&b, *ptr+8, sizeof(__be64)); > > How did this ever work?? This even looks wrong in "Release_1.0.0". > > > Yes, I was surprised when I find this issue I"ve never cared about these values when doing diagnosis because lazy-count means they aren't guaranteed to be correct except at unmount. At which point, the correct values are generally found in the superblock. IOWs, the values are largely meaningless whether they are correct or not, so nobody has really cared enough about this to bother fixing it... > > > + memmove(&a, *ptr + offsetof(struct xfs_dsb, sb_icount), sizeof(__be64)); > > > + memmove(&b, *ptr + offsetof(struct xfs_dsb, sb_ifree), sizeof(__be64)); > > Why not do: > > > > struct xfs_dsb *dsb = *ptr; > > > > memcpy(&a, &dsb->sb_icount, sizeof(a)); > > > > or better yet, skip the indirection and do > > > > printf(_("icount: %llu ifree: %llu "), > > (unsigned long long)be64_to_cpu(dsb->sb_icount), > > (unsigned long long)be64_to_cpu(dsb->sb_ifree)); > > > > Hm? > > Yes, of course we can do it this way, I just want the fix patch to look > smaller :) This won't work on platforms where we can't do 4-byte aligned 64 bit accesses (various risc archs), hence the comment above about using memmove. It should work if we use get_unaligned_be64() rather than be64_to_cpu() - I would suggest that all such structure decoding in logprint would need to do the same thing... -Dave.
On Sun, Oct 13, 2024 at 12:00:22PM +0800, Chi Zhiling wrote: > On 2024/10/13 06:10, Dave Chinner wrote: > > On Fri, Oct 11, 2024 at 11:54:08AM +0800, Chi Zhiling wrote: > > > On 2024/10/11 11:24, Darrick J. Wong wrote: > > > > On Fri, Oct 11, 2024 at 11:08:10AM +0800, Chi Zhiling wrote: > > > > > From: chizhiling<chizhiling@kylinos.cn> > > > > > > > > > > When using xfs_logprint to interpret the buffer of the super block, the > > > > > icount will always be 6360863066640355328 (0x5846534200001000). This is > > > > > because the offset of icount is incorrect, causing xfs_logprint to > > > > > misinterpret the MAGIC number as icount. > > > > > This patch fixes the offset value of the SB counters in xfs_logprint. > > > > > > > > > > Before this patch: > > > > > icount: 6360863066640355328 ifree: 5242880 fdblks: 0 frext: 0 > > > > > > > > > > After this patch: > > > > > icount: 10240 ifree: 4906 fdblks: 37 frext: 0 > > > > > > > > > > Signed-off-by: chizhiling<chizhiling@kylinos.cn> > > > > > --- > > > > > logprint/log_misc.c | 8 ++++---- > > > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > > > > > diff --git a/logprint/log_misc.c b/logprint/log_misc.c > > > > > index 8e86ac34..21da5b8b 100644 > > > > > --- a/logprint/log_misc.c > > > > > +++ b/logprint/log_misc.c > > > > > @@ -288,13 +288,13 @@ xlog_print_trans_buffer(char **ptr, int len, int *i, int num_ops) > > > > > /* > > > > > * memmove because *ptr may not be 8-byte aligned > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > This is important. I'll come back to it. > > > > > > > */ > > > > > - memmove(&a, *ptr, sizeof(__be64)); > > > > > - memmove(&b, *ptr+8, sizeof(__be64)); > > > > How did this ever work?? This even looks wrong in "Release_1.0.0". > > > > > > > Yes, I was surprised when I find this issue > > I"ve never cared about these values when doing diagnosis because > > lazy-count means they aren't guaranteed to be correct except at > > unmount. At which point, the correct values are generally found > > in the superblock. IOWs, the values are largely meaningless whether > > they are correct or not, so nobody has really cared enough about > > this to bother fixing it... > > Because I got a log which shows that the fdblocks was (-8), it caused > the filesystem to fail mounting again. 'SB summary counter sanity check failed' What kernel? Because AFAIK, that was fixed in commit 58f880711f2b ("xfs: make sure sb_fdblocks is non-negative") in 6.10... -Dave.
On 2024/10/14 06:50, Dave Chinner wrote: > On Sun, Oct 13, 2024 at 12:00:22PM +0800, Chi Zhiling wrote: >> On 2024/10/13 06:10, Dave Chinner wrote: >>> On Fri, Oct 11, 2024 at 11:54:08AM +0800, Chi Zhiling wrote: >>>> On 2024/10/11 11:24, Darrick J. Wong wrote: >>>>> On Fri, Oct 11, 2024 at 11:08:10AM +0800, Chi Zhiling wrote: >>>>>> From: chizhiling<chizhiling@kylinos.cn> >>>>>> >>>>>> When using xfs_logprint to interpret the buffer of the super block, the >>>>>> icount will always be 6360863066640355328 (0x5846534200001000). This is >>>>>> because the offset of icount is incorrect, causing xfs_logprint to >>>>>> misinterpret the MAGIC number as icount. >>>>>> This patch fixes the offset value of the SB counters in xfs_logprint. >>>>>> >>>>>> Before this patch: >>>>>> icount: 6360863066640355328 ifree: 5242880 fdblks: 0 frext: 0 >>>>>> >>>>>> After this patch: >>>>>> icount: 10240 ifree: 4906 fdblks: 37 frext: 0 >>>>>> >>>>>> Signed-off-by: chizhiling<chizhiling@kylinos.cn> >>>>>> --- >>>>>> logprint/log_misc.c | 8 ++++---- >>>>>> 1 file changed, 4 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/logprint/log_misc.c b/logprint/log_misc.c >>>>>> index 8e86ac34..21da5b8b 100644 >>>>>> --- a/logprint/log_misc.c >>>>>> +++ b/logprint/log_misc.c >>>>>> @@ -288,13 +288,13 @@ xlog_print_trans_buffer(char **ptr, int len, int *i, int num_ops) >>>>>> /* >>>>>> * memmove because *ptr may not be 8-byte aligned >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >>> >>> This is important. I'll come back to it. >>> >>>>>> */ >>>>>> - memmove(&a, *ptr, sizeof(__be64)); >>>>>> - memmove(&b, *ptr+8, sizeof(__be64)); >>>>> How did this ever work?? This even looks wrong in "Release_1.0.0". >>>>> >>>> Yes, I was surprised when I find this issue >>> I"ve never cared about these values when doing diagnosis because >>> lazy-count means they aren't guaranteed to be correct except at >>> unmount. At which point, the correct values are generally found >>> in the superblock. IOWs, the values are largely meaningless whether >>> they are correct or not, so nobody has really cared enough about >>> this to bother fixing it... >> Because I got a log which shows that the fdblocks was (-8), it caused >> the filesystem to fail mounting again. 'SB summary counter sanity check failed' > What kernel? Because AFAIK, that was fixed in commit 58f880711f2b > ("xfs: make sure sb_fdblocks is non-negative") in 6.10... It's a 4.19 kernel. As you said, the fdblocks is meaningless, I think that patch (commit 58f880711f2b) is enough to fix the issue. Thank you for your reminding. chi
diff --git a/logprint/log_misc.c b/logprint/log_misc.c index 8e86ac34..21da5b8b 100644 --- a/logprint/log_misc.c +++ b/logprint/log_misc.c @@ -288,13 +288,13 @@ xlog_print_trans_buffer(char **ptr, int len, int *i, int num_ops) /* * memmove because *ptr may not be 8-byte aligned */ - memmove(&a, *ptr, sizeof(__be64)); - memmove(&b, *ptr+8, sizeof(__be64)); + memmove(&a, *ptr + offsetof(struct xfs_dsb, sb_icount), sizeof(__be64)); + memmove(&b, *ptr + offsetof(struct xfs_dsb, sb_ifree), sizeof(__be64)); printf(_("icount: %llu ifree: %llu "), (unsigned long long) be64_to_cpu(a), (unsigned long long) be64_to_cpu(b)); - memmove(&a, *ptr+16, sizeof(__be64)); - memmove(&b, *ptr+24, sizeof(__be64)); + memmove(&a, *ptr + offsetof(struct xfs_dsb, sb_fdblocks), sizeof(__be64)); + memmove(&b, *ptr + offsetof(struct xfs_dsb, sb_frextents), sizeof(__be64)); printf(_("fdblks: %llu frext: %llu\n"), (unsigned long long) be64_to_cpu(a), (unsigned long long) be64_to_cpu(b));