Message ID | CA+55aFzukw-6m0vtm6jDn4kXyfcUQbDFSwt1OOoa9nwG6toycQ@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 8/23/17 3:37 PM, Linus Torvalds wrote: > On Wed, Aug 23, 2017 at 12:15 PM, Doug Nazar <nazard@nazar.ca> wrote: >> The following commits cause short reads of block devices, however writes are >> still allowed. >> >> c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()") >> d05c5f7ba164 ("vfs,mm: fix return value of read() at s_maxbytes") >> >> When e2fsck sees this, it thinks it's a bad sector and tries to write a >> block of nulls which overwrites the valid data. > Hmm. Block devices shouldn't have issues with s_maxbytes, and I'm > surprised that nobody has seen that before. > >> Device is LVM over 2 x RAID-5 on an old 32bit desktop. >> >> RO RA SSZ BSZ StartSec Size Device >> rw 4096 512 4096 0 9748044840960 /dev/Storage/Main > .. and the problem may be as simple as just a missing initialization > of s_maxbytes for blockdev_superblock. > > Does the attcahed trivial one-liner fix things for you? > > Al, if it really is this simple, how come nobody even noticed? > > Also, I do wonder if that check in do_generic_file_read() should just > unconditionally use MAX_LFS_FILESIZE, since the whole point there is > really about the index wrap-around, not about any underlying > filesystem limits per se. > > And that's exactly what MAX_LFS_FILESIZE is - the maximum size that > fits in the page index. It's compiling now, but I think it's already set to MAX_LFS_FILESIZE. [ 169.095127] ppos=80180006000, s_maxbytes=7ffffffffff, magic=0x62646576, type=bdev Doug
On Wed, Aug 23, 2017 at 12:53 PM, Doug Nazar <nazard@nazar.ca> wrote: > > It's compiling now, but I think it's already set to MAX_LFS_FILESIZE. > > [ 169.095127] ppos=80180006000, s_maxbytes=7ffffffffff, magic=0x62646576, > type=bdev Oh, right you are - I'm much too used to 64-bit, where MAX_LFS_FILESIZE is basically infinite, and was jusr assuming that it was something like the UFS bug we had not that long ago that was due to the 32-bit limit. But yes, on 32-bit, we are limited by the 32-bit index into the page cache, and we limit the index to 31 bits too, so we have (PAGE_SIZE << 31) -1, which is that 7ffffffffff. And that also explains why people haven't seen it. You do need (a) 32-bit environment (b) a disk larger than that 8TB in size The *hard* limit for the page cache on a 32-bit environment should actually be (PAGE_SIZE << 32)-PAGE_SIZE (that final PAGE_SIZE subtraction is to make sure we don't generate that page cache with index -1), so having a disk that is 16TB or larger is not going to work, but your disk is right in that 8TB-16TB hole that used to work and was broken by that check. Anyway, that makes me feel better. I should have looked at your disk size more, now I at least understand why nobody noticed before. So just throw away my patch. That's wrong, and garbage. The *right* patch is likely to just this instead: -#define MAX_LFS_FILESIZE (((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1) +#define MAX_LFS_FILESIZE (((loff_t)PAGE_SIZE << BITS_PER_LONG)-PAGE_SIZE) which should make MAX_LFS_FILESIZE be 0xffffffff000 and you disk size should be ok. Linus
On Aug 23, 2017, at 2:13 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, Aug 23, 2017 at 12:53 PM, Doug Nazar <nazard@nazar.ca> wrote: >> >> It's compiling now, but I think it's already set to MAX_LFS_FILESIZE. >> >> [ 169.095127] ppos=80180006000, s_maxbytes=7ffffffffff, magic=0x62646576, >> type=bdev > > Oh, right you are - I'm much too used to 64-bit, where > MAX_LFS_FILESIZE is basically infinite, and was jusr assuming that it > was something like the UFS bug we had not that long ago that was due > to the 32-bit limit. > > But yes, on 32-bit, we are limited by the 32-bit index into the page > cache, and we limit the index to 31 bits too, so we have (PAGE_SIZE << > 31) -1, which is that 7ffffffffff. > > And that also explains why people haven't seen it. You do need > > (a) 32-bit environment > > (b) a disk larger than that 8TB in size > > The *hard* limit for the page cache on a 32-bit environment should > actually be (PAGE_SIZE << 32)-PAGE_SIZE (that final PAGE_SIZE > subtraction is to make sure we don't generate that page cache with > index -1), so having a disk that is 16TB or larger is not going to > work, but your disk is right in that 8TB-16TB hole that used to work > and was broken by that check. > > Anyway, that makes me feel better. I should have looked at your disk > size more, now I at least understand why nobody noticed before. > > So just throw away my patch. That's wrong, and garbage. > > The *right* patch is likely to just this instead: > > -#define MAX_LFS_FILESIZE (((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1) > +#define MAX_LFS_FILESIZE (((loff_t)PAGE_SIZE << > BITS_PER_LONG)-PAGE_SIZE) > > which should make MAX_LFS_FILESIZE be 0xffffffff000 and you disk size > should be ok. Doug, I noticed while checking for other implications of changing MAX_LFS_FILESIZE that fs/jfs/super.c is also working around this limit. If you are going to submit a patch for this, it also makes sense to fix jfs_fill_super() to use MAX_LFS_FILESIZE instead of JFS rolling its own, something like: /* logical blocks are represented by 40 bits in pxd_t, etc. * and page cache is indexed by long. */ sb->s_maxbytes = min((u64)sb->s_blocksize) << 40, MAX_LFS_FILESIZE); It also looks like ocfs2_max_file_offset() is trying to avoid overflowing the old 31-bit limit, and isn't using MAX_LFS_FILESIZE directly, so it will now be wrong. It looks like it could use "bitshift = 32; trim = bytes;", but Joel or Mark should confirm. Finally, there is a check in fs/super.c::mount_fs() that is verifying s_maxbytes is not set too large, but this has been present since 2.6.32 and should probably be removed at this point, or changed to a BUG_ON() (see commit 42cb56ae2ab for details). Cheers, Andreas
On 8/23/17 4:13 PM, Linus Torvalds wrote: > Oh, right you are - I'm much too used to 64-bit, where > MAX_LFS_FILESIZE is basically infinite, and was jusr assuming that it > was something like the UFS bug we had not that long ago that was due > to the 32-bit limit. > > But yes, on 32-bit, we are limited by the 32-bit index into the page > cache, and we limit the index to 31 bits too, so we have (PAGE_SIZE << > 31) -1, which is that 7ffffffffff. Yeah, it's an old organically grown storage server (install images, old vms, etc.) that I can't convince myself it's worth upgrading. > The *right* patch is likely to just this instead: > > -#define MAX_LFS_FILESIZE (((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1) > +#define MAX_LFS_FILESIZE (((loff_t)PAGE_SIZE << > BITS_PER_LONG)-PAGE_SIZE) > > which should make MAX_LFS_FILESIZE be 0xffffffff000 and you disk size > should be ok. That solves my issue. I'm curious if that check should also be in the write path. If the check had been there too I wouldn't have ended up with any corruption. I'm not sure if anything over 16TB will create/assemble (or if anybody else is crazy like me). Doug
On 8/23/17 5:01 PM, Andreas Dilger wrote: > Doug, > I noticed while checking for other implications of changing MAX_LFS_FILESIZE > that fs/jfs/super.c is also working around this limit. If you are going > to submit a patch for this, it also makes sense to fix jfs_fill_super() to > use MAX_LFS_FILESIZE instead of JFS rolling its own, something like: > > /* logical blocks are represented by 40 bits in pxd_t, etc. > * and page cache is indexed by long. */ > sb->s_maxbytes = min((u64)sb->s_blocksize) << 40, > MAX_LFS_FILESIZE); > > It also looks like ocfs2_max_file_offset() is trying to avoid overflowing > the old 31-bit limit, and isn't using MAX_LFS_FILESIZE directly, so it will > now be wrong. It looks like it could use "bitshift = 32; trim = bytes;", > but Joel or Mark should confirm. > > Finally, there is a check in fs/super.c::mount_fs() that is verifying > s_maxbytes is not set too large, but this has been present since 2.6.32 > and should probably be removed at this point, or changed to a BUG_ON() > (see commit 42cb56ae2ab for details). I don't have any issue trying to write patches for those, but I have no domain knowledge in the area or any way to test them. From a quick glance, jfs is locked to PSIZE (4096) so should be ok. OCFS looks a little complex, and since it's a shared fs, little hesitant. The check in fs/super.c, maybe that should be: sb->s_maxbytes > MAX_LFS_FILESIZE Actually, little confused, the comment says unsigned, but loff_t looks like its long long. Maybe cast to u64 and check greater than? Doug
On 08/24/2017 05:20 AM, Doug Nazar wrote: > On 8/23/17 5:01 PM, Andreas Dilger wrote: >> Doug, >> I noticed while checking for other implications of changing >> MAX_LFS_FILESIZE >> that fs/jfs/super.c is also working around this limit. If you are going >> to submit a patch for this, it also makes sense to fix >> jfs_fill_super() to >> use MAX_LFS_FILESIZE instead of JFS rolling its own, something like: >> >> /* logical blocks are represented by 40 bits in pxd_t, etc. >> * and page cache is indexed by long. */ >> sb->s_maxbytes = min((u64)sb->s_blocksize) << 40, >> MAX_LFS_FILESIZE); >> >> It also looks like ocfs2_max_file_offset() is trying to avoid overflowing >> the old 31-bit limit, and isn't using MAX_LFS_FILESIZE directly, so it >> will >> now be wrong. It looks like it could use "bitshift = 32; trim = bytes;", >> but Joel or Mark should confirm. >> >> Finally, there is a check in fs/super.c::mount_fs() that is verifying >> s_maxbytes is not set too large, but this has been present since 2.6.32 >> and should probably be removed at this point, or changed to a BUG_ON() >> (see commit 42cb56ae2ab for details). > > I don't have any issue trying to write patches for those, but I have no > domain knowledge > in the area or any way to test them. If you want to wrap the jfs change into this, I will be happy to test it for you, or I could take care of jfs with a separate patch if you'd prefer. > > From a quick glance, jfs is locked to PSIZE (4096) so should be ok. > OCFS looks a little complex, and since it's a shared fs, little hesitant. > > The check in fs/super.c, maybe that should be: > > sb->s_maxbytes > MAX_LFS_FILESIZE > > Actually, little confused, the comment says unsigned, but loff_t looks > like its long long. > Maybe cast to u64 and check greater than? > > Doug >
On Wed, Aug 23, 2017 at 2:01 PM, Andreas Dilger <adilger@dilger.ca> wrote: > > Doug, > I noticed while checking for other implications of changing MAX_LFS_FILESIZE > that fs/jfs/super.c is also working around this limit. Note to people: I just committed the patch to update MAX_LFS_FILESIZE. I made it use the simpler (and clearer) calculation of ((loff_t)ULONG_MAX << PAGE_SHIFT) for the 32-bit case, and I did *not* change any other users. The jfs comment was a bit confusing, and talks about "wraps around" at 8TB, when that actually happens at 16TB. Yes, if you use a signed number for the index, it does wrap at 8TB, but you really shouldn't (and the code the jfs comment points to doesn't). So I didn't touch that. Nor did I touch: > it also makes sense to fix jfs_fill_super() to > use MAX_LFS_FILESIZE instead of JFS rolling its own, something like: > > /* logical blocks are represented by 40 bits in pxd_t, etc. > * and page cache is indexed by long. */ > sb->s_maxbytes = min((u64)sb->s_blocksize) << 40, > MAX_LFS_FILESIZE); which I agree should be modified. The new MAX_LFS_FILESIZE should be the right size, but the difference now is only one page less one byte. Linus
On 08/27/2017 02:47 PM, Linus Torvalds wrote: > On Wed, Aug 23, 2017 at 2:01 PM, Andreas Dilger <adilger@dilger.ca> wrote: >> >> Doug, >> I noticed while checking for other implications of changing MAX_LFS_FILESIZE >> that fs/jfs/super.c is also working around this limit. > > Note to people: I just committed the patch to update MAX_LFS_FILESIZE. > > I made it use the simpler (and clearer) calculation of > > ((loff_t)ULONG_MAX << PAGE_SHIFT) > > for the 32-bit case, and I did *not* change any other users. > > The jfs comment was a bit confusing, and talks about "wraps around" at > 8TB, when that actually happens at 16TB. Yes, if you use a signed > number for the index, it does wrap at 8TB, but you really shouldn't > (and the code the jfs comment points to doesn't). > > So I didn't touch that. Nor did I touch: > >> it also makes sense to fix jfs_fill_super() to >> use MAX_LFS_FILESIZE instead of JFS rolling its own, something like: >> >> /* logical blocks are represented by 40 bits in pxd_t, etc. >> * and page cache is indexed by long. */ >> sb->s_maxbytes = min((u64)sb->s_blocksize) << 40, >> MAX_LFS_FILESIZE); > > which I agree should be modified. The new MAX_LFS_FILESIZE should be > the right size, but the difference now is only one page less one byte. I'll submit a patch to clean up jfs. Thanks, Shaggy > > Linus >
On 8/27/17 3:47 PM, Linus Torvalds wrote: > Note to people: I just committed the patch to update MAX_LFS_FILESIZE. > I made it use the simpler (and clearer) calculation of > > ((loff_t)ULONG_MAX << PAGE_SHIFT) > > for the 32-bit case, and I did *not* change any other users. Not a problem... I got caught trying to find out why 4.13 hangs on a warm reset. It has really slowed down my testing having towalk couple of floors for every reboot. Doug
diff --git a/fs/block_dev.c b/fs/block_dev.c index 9941dc8342df..4c3867c5298a 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -830,6 +830,7 @@ void __init bdev_cache_init(void) if (IS_ERR(bd_mnt)) panic("Cannot create bdev pseudo-fs"); blockdev_superblock = bd_mnt->mnt_sb; /* For writeback */ + blockdev_superblock->s_maxbytes = MAX_LFS_FILESIZE; } /*