diff mbox

Old bugs in xfsprogs?

Message ID 7e185931-8830-5f31-7abb-5419bb255991@suse.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Jeff Mahoney Aug. 2, 2016, 1:20 p.m. UTC
Hi all -

While investigating a weird report on an internal list I found a few old
commits that don't look quite right and that may be very old bugs.  I
know it's hard to go back nearly 15 years, especially in the days where
very short commit messages were still acceptable, and try to remember
why certain changes happened.  In this case, a weird corner case[1]
would've been caught, xfs_repair would've bailed, and a file system may
have survived (despite obvious user error).

1/ Commit 5000d01d212f (white space cleanup)

This one may well have been intended as a repair operation or perhaps it
was accidentally duplicated from another chunk in the patch.  At any
rate, it hides a mismatched UUID between the log and the superblock from
the rest of xfs_repair.  The user sees the "error" message but it
carries on anyway.

2) Commit d321ceac8da (add libxlog directory.)

I believe this was supposed to be as simple as pushing some
functionality from logprint into a new libxlog library, but the result
is that things that used to return an error no longer did.  The
print_exit global that was initialized to 1 in logprint is initialized
to 0 in libxlog and never set.  So we always print an error message but
then carry on.  So even if the header_check_uuid() call above would fail
properly, the error is printed and then the error condition ignored.

-Jeff

[1] The details are still murky, but what I got was that the user ran
xfs_repair -L (yup, i know) on an image file that contained partitions.
It found a valid XFS superblock and then "repaired" the file system to
an empty state since everything it found was "corrupt."  I suspect that
there was an XFS file system on the raw image file, which was then
partitioned without clearing the MBR, and the expected XFS file system
was created on the first partition.  xfs_repair was pointed at the whole
image file, discovered the old superblock, and remade the fs in its own
image since nothing was at the proper locations.

Comments

Eric Sandeen Aug. 2, 2016, 3 p.m. UTC | #1
On 8/2/16 8:20 AM, Jeff Mahoney wrote:
> Hi all -
> 
> While investigating a weird report on an internal list I found a few old
> commits that don't look quite right and that may be very old bugs.  I
> know it's hard to go back nearly 15 years, especially in the days where
> very short commit messages were still acceptable, and try to remember
> why certain changes happened.  In this case, a weird corner case[1]
> would've been caught, xfs_repair would've bailed, and a file system may
> have survived (despite obvious user error).
> 
> 1/ Commit 5000d01d212f (white space cleanup)

Meh, not a white space cleanup, is it!

> diff --git a/libxlog/util.c b/libxlog/util.c
> index 7aca165..aa3093d 100644
> --- a/libxlog/util.c
> +++ b/libxlog/util.c
> @@ -49,8 +49,10 @@ header_check_uuid(xfs_mount_t *mp, xlog_rec_header_t
> *head)
>      printf("* ERROR: mismatched uuid in log\n"
>             "*            SB : %s\n*            log: %s\n",
>              uu_sb, uu_log);
> +
> +    memcpy(&mp->m_sb.sb_uuid, head->h_fs_uuid, sizeof(uuid_t));
> 
> -    return 1;
> +    return 0;
>  }

However, after seeing the mismatch, it "fixes" it by copying the header
uuid into the mount point uuid.

But that doesn't seem like the right approach at all, and it renders all
the callers who check the return value of header_check_uuid pointless.
So yeah, doesn't look good to me.

> This one may well have been intended as a repair operation or perhaps it
> was accidentally duplicated from another chunk in the patch.  At any
> rate, it hides a mismatched UUID between the log and the superblock from
> the rest of xfs_repair.  The user sees the "error" message but it
> carries on anyway.
> 
> 2) Commit d321ceac8da (add libxlog directory.)
> 
> I believe this was supposed to be as simple as pushing some
> functionality from logprint into a new libxlog library, but the result
> is that things that used to return an error no longer did.  The
> print_exit global that was initialized to 1 in logprint is initialized
> to 0 in libxlog and never set.  So we always print an error message but
> then carry on.  So even if the header_check_uuid() call above would fail
> properly, the error is printed and then the error condition ignored.

Sigh, yeah, the old commits are wild west.  :(

In this case header_check_uuid returned 0 anyway,so print_exit would
not have helped, but I think you're right.

A perfect storm of derp.  ;)

> -Jeff
> 
> [1] The details are still murky, but what I got was that the user ran
> xfs_repair -L (yup, i know) on an image file that contained partitions.
> It found a valid XFS superblock and then "repaired" the file system to
> an empty state since everything it found was "corrupt."  I suspect that
> there was an XFS file system on the raw image file, which was then
> partitioned without clearing the MBR, and the expected XFS file system
> was created on the first partition.  xfs_repair was pointed at the whole
> image file, discovered the old superblock, and remade the fs in its own
> image since nothing was at the proper locations.

Oh, so it found 4 matching, old, valid, superblocks?  Ugh.  I don't know
how to protect against that, although <handwave> it probably should have
found a few non-matching supers along the way as well.  I wonder if we
should be more cautious in that case.

I could imagine that maybe for each candidate super we find, we should
look at its geometry, and spot-check the other locations that it indicates
should contain a superblock.  If we get enough semi-valid but conflicting
"sets," maybe we should bail out and ask.  It's quite a corner case, tho.

Any chance you have full xfs_repair output?

-Eric
Jeff Mahoney Aug. 2, 2016, 6:51 p.m. UTC | #2
On 8/2/16 11:00 AM, Eric Sandeen wrote:
> On 8/2/16 8:20 AM, Jeff Mahoney wrote:
>> Hi all -
>>
>> While investigating a weird report on an internal list I found a few old
>> commits that don't look quite right and that may be very old bugs.  I
>> know it's hard to go back nearly 15 years, especially in the days where
>> very short commit messages were still acceptable, and try to remember
>> why certain changes happened.  In this case, a weird corner case[1]
>> would've been caught, xfs_repair would've bailed, and a file system may
>> have survived (despite obvious user error).
>>
>> 1/ Commit 5000d01d212f (white space cleanup)
> 
> Meh, not a white space cleanup, is it!
> 
>> diff --git a/libxlog/util.c b/libxlog/util.c
>> index 7aca165..aa3093d 100644
>> --- a/libxlog/util.c
>> +++ b/libxlog/util.c
>> @@ -49,8 +49,10 @@ header_check_uuid(xfs_mount_t *mp, xlog_rec_header_t
>> *head)
>>      printf("* ERROR: mismatched uuid in log\n"
>>             "*            SB : %s\n*            log: %s\n",
>>              uu_sb, uu_log);
>> +
>> +    memcpy(&mp->m_sb.sb_uuid, head->h_fs_uuid, sizeof(uuid_t));
>>
>> -    return 1;
>> +    return 0;
>>  }
> 
> However, after seeing the mismatch, it "fixes" it by copying the header
> uuid into the mount point uuid.
> 
> But that doesn't seem like the right approach at all, and it renders all
> the callers who check the return value of header_check_uuid pointless.
> So yeah, doesn't look good to me.
> 
>> This one may well have been intended as a repair operation or perhaps it
>> was accidentally duplicated from another chunk in the patch.  At any
>> rate, it hides a mismatched UUID between the log and the superblock from
>> the rest of xfs_repair.  The user sees the "error" message but it
>> carries on anyway.
>>
>> 2) Commit d321ceac8da (add libxlog directory.)
>>
>> I believe this was supposed to be as simple as pushing some
>> functionality from logprint into a new libxlog library, but the result
>> is that things that used to return an error no longer did.  The
>> print_exit global that was initialized to 1 in logprint is initialized
>> to 0 in libxlog and never set.  So we always print an error message but
>> then carry on.  So even if the header_check_uuid() call above would fail
>> properly, the error is printed and then the error condition ignored.
> 
> Sigh, yeah, the old commits are wild west.  :(
> 
> In this case header_check_uuid returned 0 anyway,so print_exit would
> not have helped, but I think you're right.
> 
> A perfect storm of derp.  ;)

Haha, yep.

>> -Jeff
>>
>> [1] The details are still murky, but what I got was that the user ran
>> xfs_repair -L (yup, i know) on an image file that contained partitions.
>> It found a valid XFS superblock and then "repaired" the file system to
>> an empty state since everything it found was "corrupt."  I suspect that
>> there was an XFS file system on the raw image file, which was then
>> partitioned without clearing the MBR, and the expected XFS file system
>> was created on the first partition.  xfs_repair was pointed at the whole
>> image file, discovered the old superblock, and remade the fs in its own
>> image since nothing was at the proper locations.
> 
> Oh, so it found 4 matching, old, valid, superblocks?  Ugh.  I don't know
> how to protect against that, although <handwave> it probably should have
> found a few non-matching supers along the way as well.  I wonder if we
> should be more cautious in that case.

Well, it at least found the non-matching log but then ran into the
trouble above.

> I could imagine that maybe for each candidate super we find, we should
> look at its geometry, and spot-check the other locations that it indicates
> should contain a superblock.  If we get enough semi-valid but conflicting
> "sets," maybe we should bail out and ask.  It's quite a corner case, tho.

I'm not sure a geometry check would've helped here.  The superblock
geometry still would've covered the whole, unpartitioned device.  Since
we're already linking with blkid, maybe a check to see if there's a
partition table on the device and bail if it sees one, unless forced?
The part that I'm still trying to explain is how it managed to get a
good magic from the log and then got the wrong uuid.

> Any chance you have full xfs_repair output?

Sure, below.  I heard back from the reporter and confirmed that my
hypothesis of mkfs -> fdisk -> mkfs was what happened.  He's on SLE12
SP1, so that means xfsprogs 3.2.1.

-Jeff

---

labadmin:/ssd # xfs_repair postgre.raw -L
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with
calculated value 128
resetting superblock root inode pointer to 128
sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent
with calculated value 129
resetting superblock realtime bitmap ino pointer to 129
sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent
with calculated value 130
resetting superblock realtime summary ino pointer to 130
Phase 2 - using internal log
        - zero log...
* ERROR: mismatched uuid in log
*            SB : ae384026-e3ac-4350-90b8-6dc1ac91595d
*            log: 4cb79a9a-4b85-40fd-aed5-c7f2de36a3f5
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
Metadata corruption detected at block 0x1/0x200
Metadata corruption detected at block 0x2/0x200
bad magic # 0x0 for agf 0
bad version # 0 for agf 0
bad length 0 for agf 0, should be 491520
bad magic # 0x0 for agi 0
bad version # 0 for agi 0
bad length # 0 for agi 0, should be 491520
reset bad agf for ag 0
reset bad agi for ag 0
bad agbno 0 for btbno root, agno 0
bad agbno 0 for btbcnt root, agno 0
bad agbno 0 for inobt root, agno 0
agi unlinked bucket 0 is 0 in ag 0 (inode=0)
agi unlinked bucket 1 is 0 in ag 0 (inode=0)
agi unlinked bucket 2 is 0 in ag 0 (inode=0)
agi unlinked bucket 3 is 0 in ag 0 (inode=0)
agi unlinked bucket 4 is 0 in ag 0 (inode=0)
agi unlinked bucket 5 is 0 in ag 0 (inode=0)
agi unlinked bucket 6 is 0 in ag 0 (inode=0)
agi unlinked bucket 7 is 0 in ag 0 (inode=0)
agi unlinked bucket 8 is 0 in ag 0 (inode=0)
agi unlinked bucket 9 is 0 in ag 0 (inode=0)
agi unlinked bucket 10 is 0 in ag 0 (inode=0)
agi unlinked bucket 11 is 0 in ag 0 (inode=0)
agi unlinked bucket 12 is 0 in ag 0 (inode=0)
agi unlinked bucket 13 is 0 in ag 0 (inode=0)
agi unlinked bucket 14 is 0 in ag 0 (inode=0)
agi unlinked bucket 15 is 0 in ag 0 (inode=0)
agi unlinked bucket 16 is 0 in ag 0 (inode=0)
agi unlinked bucket 17 is 0 in ag 0 (inode=0)
agi unlinked bucket 18 is 0 in ag 0 (inode=0)
agi unlinked bucket 19 is 0 in ag 0 (inode=0)
agi unlinked bucket 20 is 0 in ag 0 (inode=0)
agi unlinked bucket 21 is 0 in ag 0 (inode=0)
agi unlinked bucket 22 is 0 in ag 0 (inode=0)
agi unlinked bucket 23 is 0 in ag 0 (inode=0)
agi unlinked bucket 24 is 0 in ag 0 (inode=0)
agi unlinked bucket 25 is 0 in ag 0 (inode=0)
agi unlinked bucket 26 is 0 in ag 0 (inode=0)
agi unlinked bucket 27 is 0 in ag 0 (inode=0)
agi unlinked bucket 28 is 0 in ag 0 (inode=0)
agi unlinked bucket 29 is 0 in ag 0 (inode=0)
agi unlinked bucket 30 is 0 in ag 0 (inode=0)
agi unlinked bucket 31 is 0 in ag 0 (inode=0)
agi unlinked bucket 32 is 0 in ag 0 (inode=0)
agi unlinked bucket 33 is 0 in ag 0 (inode=0)
agi unlinked bucket 34 is 0 in ag 0 (inode=0)
agi unlinked bucket 35 is 0 in ag 0 (inode=0)
agi unlinked bucket 36 is 0 in ag 0 (inode=0)
agi unlinked bucket 37 is 0 in ag 0 (inode=0)
agi unlinked bucket 38 is 0 in ag 0 (inode=0)
agi unlinked bucket 39 is 0 in ag 0 (inode=0)
agi unlinked bucket 40 is 0 in ag 0 (inode=0)
agi unlinked bucket 41 is 0 in ag 0 (inode=0)
agi unlinked bucket 42 is 0 in ag 0 (inode=0)
agi unlinked bucket 43 is 0 in ag 0 (inode=0)
agi unlinked bucket 44 is 0 in ag 0 (inode=0)
agi unlinked bucket 45 is 0 in ag 0 (inode=0)
agi unlinked bucket 46 is 0 in ag 0 (inode=0)
agi unlinked bucket 47 is 0 in ag 0 (inode=0)
agi unlinked bucket 48 is 0 in ag 0 (inode=0)
agi unlinked bucket 49 is 0 in ag 0 (inode=0)
agi unlinked bucket 50 is 0 in ag 0 (inode=0)
agi unlinked bucket 51 is 0 in ag 0 (inode=0)
agi unlinked bucket 52 is 0 in ag 0 (inode=0)
agi unlinked bucket 53 is 0 in ag 0 (inode=0)
agi unlinked bucket 54 is 0 in ag 0 (inode=0)
agi unlinked bucket 55 is 0 in ag 0 (inode=0)
agi unlinked bucket 56 is 0 in ag 0 (inode=0)
agi unlinked bucket 57 is 0 in ag 0 (inode=0)
agi unlinked bucket 58 is 0 in ag 0 (inode=0)
agi unlinked bucket 59 is 0 in ag 0 (inode=0)
agi unlinked bucket 60 is 0 in ag 0 (inode=0)
agi unlinked bucket 61 is 0 in ag 0 (inode=0)
agi unlinked bucket 62 is 0 in ag 0 (inode=0)
agi unlinked bucket 63 is 0 in ag 0 (inode=0)
sb_fdblocks 7860416, counted 7368900
        - 10:29:06: scanning filesystem freespace - 16 of 16 allocation
groups done
root inode chunk not found
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - 10:29:06: scanning agi unlinked lists - 16 of 16 allocation
groups done
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 15
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x40/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
Metadata corruption detected at block 0x50/0x2000
bad magic number 0x0 on inode 128
bad version number 0x0 on inode 128
bad magic number 0x0 on inode 129
bad version number 0x0 on inode 129
bad magic number 0x0 on inode 130
bad version number 0x0 on inode 130
bad magic number 0x0 on inode 131
bad version number 0x0 on inode 131
bad magic number 0x0 on inode 132
bad version number 0x0 on inode 132
bad magic number 0x0 on inode 133
bad version number 0x0 on inode 133
bad magic number 0x0 on inode 134
bad version number 0x0 on inode 134
bad magic number 0x0 on inode 135
bad version number 0x0 on inode 135
bad magic number 0x0 on inode 136
bad version number 0x0 on inode 136
bad magic number 0x0 on inode 137
bad version number 0x0 on inode 137
bad magic number 0x0 on inode 138
bad version number 0x0 on inode 138
bad magic number 0x0 on inode 139
bad version number 0x0 on inode 139
bad magic number 0x0 on inode 140
bad version number 0x0 on inode 140
bad magic number 0x0 on inode 141
bad version number 0x0 on inode 141
bad magic number 0x0 on inode 142
bad version number 0x0 on inode 142
bad magic number 0x0 on inode 143
bad version number 0x0 on inode 143
bad magic number 0x0 on inode 144
bad version number 0x0 on inode 144
bad magic number 0x0 on inode 145
bad version number 0x0 on inode 145
bad magic number 0x0 on inode 146
bad version number 0x0 on inode 146
bad magic number 0x0 on inode 147
bad version number 0x0 on inode 147
bad magic number 0x0 on inode 148
bad version number 0x0 on inode 148
bad magic number 0x0 on inode 149
bad version number 0x0 on inode 149
bad magic number 0x0 on inode 150
bad version number 0x0 on inode 150
bad magic number 0x0 on inode 151
bad version number 0x0 on inode 151
bad magic number 0x0 on inode 152
bad version number 0x0 on inode 152
bad magic number 0x0 on inode 153
bad version number 0x0 on inode 153
bad magic number 0x0 on inode 154
bad version number 0x0 on inode 154
bad magic number 0x0 on inode 155
bad version number 0x0 on inode 155
bad magic number 0x0 on inode 156
bad version number 0x0 on inode 156
bad magic number 0x0 on inode 157
bad version number 0x0 on inode 157
bad magic number 0x0 on inode 158
bad version number 0x0 on inode 158
bad magic number 0x0 on inode 159
bad version number 0x0 on inode 159
bad magic number 0x0 on inode 160
bad version number 0x0 on inode 160
bad magic number 0x0 on inode 161
bad version number 0x0 on inode 161
bad magic number 0x0 on inode 162
bad version number 0x0 on inode 162
bad magic number 0x0 on inode 163
bad version number 0x0 on inode 163
bad magic number 0x0 on inode 164
bad version number 0x0 on inode 164
bad magic number 0x0 on inode 165
bad version number 0x0 on inode 165
bad magic number 0x0 on inode 166
bad version number 0x0 on inode 166
bad magic number 0x0 on inode 167
bad version number 0x0 on inode 167
bad magic number 0x0 on inode 168
bad version number 0x0 on inode 168
bad magic number 0x0 on inode 169
bad version number 0x0 on inode 169
bad magic number 0x0 on inode 170
bad version number 0x0 on inode 170
bad magic number 0x0 on inode 171
bad version number 0x0 on inode 171
bad magic number 0x0 on inode 172
bad version number 0x0 on inode 172
bad magic number 0x0 on inode 173
bad version number 0x0 on inode 173
bad magic number 0x0 on inode 174
bad version number 0x0 on inode 174
bad magic number 0x0 on inode 175
bad version number 0x0 on inode 175
bad magic number 0x0 on inode 176
bad version number 0x0 on inode 176
bad magic number 0x0 on inode 177
bad version number 0x0 on inode 177
bad magic number 0x0 on inode 178
bad version number 0x0 on inode 178
bad magic number 0x0 on inode 179
bad version number 0x0 on inode 179
bad magic number 0x0 on inode 180
bad version number 0x0 on inode 180
bad magic number 0x0 on inode 181
bad version number 0x0 on inode 181
bad magic number 0x0 on inode 182
bad version number 0x0 on inode 182
bad magic number 0x0 on inode 183
bad version number 0x0 on inode 183
bad magic number 0x0 on inode 184
bad version number 0x0 on inode 184
bad magic number 0x0 on inode 185
bad version number 0x0 on inode 185
bad magic number 0x0 on inode 186
bad version number 0x0 on inode 186
bad magic number 0x0 on inode 187
bad version number 0x0 on inode 187
bad magic number 0x0 on inode 188
bad version number 0x0 on inode 188
bad magic number 0x0 on inode 189
bad version number 0x0 on inode 189
bad magic number 0x0 on inode 190
bad version number 0x0 on inode 190
bad magic number 0x0 on inode 191
bad version number 0x0 on inode 191
bad magic number 0x0 on inode 128, resetting magic number
bad version number 0x0 on inode 128, resetting version number
imap claims a free inode 128 is in use, correcting imap and clearing inode
cleared root inode 128
bad magic number 0x0 on inode 129, resetting magic number
bad version number 0x0 on inode 129, resetting version number
imap claims a free inode 129 is in use, correcting imap and clearing inode
cleared realtime bitmap inode 129
bad magic number 0x0 on inode 130, resetting magic number
bad version number 0x0 on inode 130, resetting version number
imap claims a free inode 130 is in use, correcting imap and clearing inode
cleared realtime summary inode 130
bad magic number 0x0 on inode 131, resetting magic number
bad version number 0x0 on inode 131, resetting version number
bad magic number 0x0 on inode 132, resetting magic number
bad version number 0x0 on inode 132, resetting version number
bad magic number 0x0 on inode 133, resetting magic number
bad version number 0x0 on inode 133, resetting version number
bad magic number 0x0 on inode 134, resetting magic number
bad version number 0x0 on inode 134, resetting version number
bad magic number 0x0 on inode 135, resetting magic number
bad version number 0x0 on inode 135, resetting version number
bad magic number 0x0 on inode 136, resetting magic number
bad version number 0x0 on inode 136, resetting version number
bad magic number 0x0 on inode 137, resetting magic number
bad version number 0x0 on inode 137, resetting version number
bad magic number 0x0 on inode 138, resetting magic number
bad version number 0x0 on inode 138, resetting version number
bad magic number 0x0 on inode 139, resetting magic number
bad version number 0x0 on inode 139, resetting version number
bad magic number 0x0 on inode 140, resetting magic number
bad version number 0x0 on inode 140, resetting version number
bad magic number 0x0 on inode 141, resetting magic number
bad version number 0x0 on inode 141, resetting version number
bad magic number 0x0 on inode 142, resetting magic number
bad version number 0x0 on inode 142, resetting version number
bad magic number 0x0 on inode 143, resetting magic number
bad version number 0x0 on inode 143, resetting version number
bad magic number 0x0 on inode 144, resetting magic number
bad version number 0x0 on inode 144, resetting version number
bad magic number 0x0 on inode 145, resetting magic number
bad version number 0x0 on inode 145, resetting version number
bad magic number 0x0 on inode 146, resetting magic number
bad version number 0x0 on inode 146, resetting version number
bad magic number 0x0 on inode 147, resetting magic number
bad version number 0x0 on inode 147, resetting version number
bad magic number 0x0 on inode 148, resetting magic number
bad version number 0x0 on inode 148, resetting version number
bad magic number 0x0 on inode 149, resetting magic number
bad version number 0x0 on inode 149, resetting version number
bad magic number 0x0 on inode 150, resetting magic number
bad version number 0x0 on inode 150, resetting version number
bad magic number 0x0 on inode 151, resetting magic number
bad version number 0x0 on inode 151, resetting version number
bad magic number 0x0 on inode 152, resetting magic number
bad version number 0x0 on inode 152, resetting version number
bad magic number 0x0 on inode 153, resetting magic number
bad version number 0x0 on inode 153, resetting version number
bad magic number 0x0 on inode 154, resetting magic number
bad version number 0x0 on inode 154, resetting version number
bad magic number 0x0 on inode 155, resetting magic number
bad version number 0x0 on inode 155, resetting version number
bad magic number 0x0 on inode 156, resetting magic number
bad version number 0x0 on inode 156, resetting version number
bad magic number 0x0 on inode 157, resetting magic number
bad version number 0x0 on inode 157, resetting version number
bad magic number 0x0 on inode 158, resetting magic number
bad version number 0x0 on inode 158, resetting version number
bad magic number 0x0 on inode 159, resetting magic number
bad version number 0x0 on inode 159, resetting version number
bad magic number 0x0 on inode 160, resetting magic number
bad version number 0x0 on inode 160, resetting version number
bad magic number 0x0 on inode 161, resetting magic number
bad version number 0x0 on inode 161, resetting version number
bad magic number 0x0 on inode 162, resetting magic number
bad version number 0x0 on inode 162, resetting version number
bad magic number 0x0 on inode 163, resetting magic number
bad version number 0x0 on inode 163, resetting version number
bad magic number 0x0 on inode 164, resetting magic number
bad version number 0x0 on inode 164, resetting version number
bad magic number 0x0 on inode 165, resetting magic number
bad version number 0x0 on inode 165, resetting version number
bad magic number 0x0 on inode 166, resetting magic number
bad version number 0x0 on inode 166, resetting version number
bad magic number 0x0 on inode 167, resetting magic number
bad version number 0x0 on inode 167, resetting version number
bad magic number 0x0 on inode 168, resetting magic number
bad version number 0x0 on inode 168, resetting version number
bad magic number 0x0 on inode 169, resetting magic number
bad version number 0x0 on inode 169, resetting version number
bad magic number 0x0 on inode 170, resetting magic number
bad version number 0x0 on inode 170, resetting version number
bad magic number 0x0 on inode 171, resetting magic number
bad version number 0x0 on inode 171, resetting version number
bad magic number 0x0 on inode 172, resetting magic number
bad version number 0x0 on inode 172, resetting version number
bad magic number 0x0 on inode 173, resetting magic number
bad version number 0x0 on inode 173, resetting version number
bad magic number 0x0 on inode 174, resetting magic number
bad version number 0x0 on inode 174, resetting version number
bad magic number 0x0 on inode 175, resetting magic number
bad version number 0x0 on inode 175, resetting version number
bad magic number 0x0 on inode 176, resetting magic number
bad version number 0x0 on inode 176, resetting version number
bad magic number 0x0 on inode 177, resetting magic number
bad version number 0x0 on inode 177, resetting version number
bad magic number 0x0 on inode 178, resetting magic number
bad version number 0x0 on inode 178, resetting version number
bad magic number 0x0 on inode 179, resetting magic number
bad version number 0x0 on inode 179, resetting version number
bad magic number 0x0 on inode 180, resetting magic number
bad version number 0x0 on inode 180, resetting version number
bad magic number 0x0 on inode 181, resetting magic number
bad version number 0x0 on inode 181, resetting version number
bad magic number 0x0 on inode 182, resetting magic number
bad version number 0x0 on inode 182, resetting version number
bad magic number 0x0 on inode 183, resetting magic number
bad version number 0x0 on inode 183, resetting version number
bad magic number 0x0 on inode 184, resetting magic number
bad version number 0x0 on inode 184, resetting version number
bad magic number 0x0 on inode 185, resetting magic number
bad version number 0x0 on inode 185, resetting version number
bad magic number 0x0 on inode 186, resetting magic number
bad version number 0x0 on inode 186, resetting version number
bad magic number 0x0 on inode 187, resetting magic number
bad version number 0x0 on inode 187, resetting version number
bad magic number 0x0 on inode 188, resetting magic number
bad version number 0x0 on inode 188, resetting version number
bad magic number 0x0 on inode 189, resetting magic number
bad version number 0x0 on inode 189, resetting version number
bad magic number 0x0 on inode 190, resetting magic number
bad version number 0x0 on inode 190, resetting version number
bad magic number 0x0 on inode 191, resetting magic number
bad version number 0x0 on inode 191, resetting version number
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - 10:29:06: process known inodes and inode discovery - 64 of 0
inodes done
        - process newly discovered inodes...
        - 10:29:06: process newly discovered inodes - 16 of 16
allocation groups done
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
root inode lost
        - 10:29:06: setting up duplicate extent list - 16 of 16
allocation groups done
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 3
        - agno = 10
        - agno = 13
        - agno = 6
        - agno = 5
        - agno = 8
        - agno = 1
        - agno = 7
        - agno = 9
        - agno = 11
        - agno = 12
        - agno = 2
        - agno = 15
        - agno = 14
        - agno = 4
        - 10:29:06: check for inodes claiming duplicate blocks - 64 of 0
inodes done
Phase 5 - rebuild AG headers and trees...
        - 10:29:06: rebuild AG headers and trees - 16 of 16 allocation
groups done
        - reset superblock...
Phase 6 - check inode connectivity...
reinitializing root directory
reinitializing realtime bitmap inode
reinitializing realtime summary inode
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
resetting inode 128 nlinks from 1 to 2
done
Eric Sandeen Aug. 2, 2016, 7:20 p.m. UTC | #3
On 8/2/16 1:51 PM, Jeff Mahoney wrote:
> On 8/2/16 11:00 AM, Eric Sandeen wrote:

...

>> I could imagine that maybe for each candidate super we find, we should
>> look at its geometry, and spot-check the other locations that it indicates
>> should contain a superblock.  If we get enough semi-valid but conflicting
>> "sets," maybe we should bail out and ask.  It's quite a corner case, tho.
> 
> I'm not sure a geometry check would've helped here.  The superblock
> geometry still would've covered the whole, unpartitioned device.  Since
> we're already linking with blkid, maybe a check to see if there's a
> partition table on the device and bail if it sees one, unless forced?
> The part that I'm still trying to explain is how it managed to get a
> good magic from the log and then got the wrong uuid.

Hm, yeah, maybe right off the bat, if the primary super looks bad do
a blkid check, and a (sigh) "are you sure?" just like we do for mkfs.

>> Any chance you have full xfs_repair output?
> 
> Sure, below.  I heard back from the reporter and confirmed that my
> hypothesis of mkfs -> fdisk -> mkfs was what happened.  He's on SLE12
> SP1, so that means xfsprogs 3.2.1.
> 
> -Jeff
> 
> ---
> 
> labadmin:/ssd # xfs_repair postgre.raw -L
> Phase 1 - find and verify superblock...
>         - reporting progress in intervals of 15 minutes
> sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with
> calculated value 128
> resetting superblock root inode pointer to 128
> sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent
> with calculated value 129
> resetting superblock realtime bitmap ino pointer to 129
> sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent
> with calculated value 130
> resetting superblock realtime summary ino pointer to 130

huh.  Ok, so I guess the primary in block zero was mostly-ok, and all
of the backup supers were still more or less intact, and it didn't have
to go searching...

-Eric
diff mbox

Patch

diff --git a/libxlog/util.c b/libxlog/util.c
index 7aca165..aa3093d 100644
--- a/libxlog/util.c
+++ b/libxlog/util.c
@@ -49,8 +49,10 @@  header_check_uuid(xfs_mount_t *mp, xlog_rec_header_t
*head)
     printf("* ERROR: mismatched uuid in log\n"
            "*            SB : %s\n*            log: %s\n",
             uu_sb, uu_log);
+
+    memcpy(&mp->m_sb.sb_uuid, head->h_fs_uuid, sizeof(uuid_t));

-    return 1;
+    return 0;
 }