[17/18] xfs: remove check for block sizes smaller than PAGE_SIZE

Message ID	20230918110510.66470-18-hare@suse.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@vger.kernel.org> From: Hannes Reinecke <hare@suse.de> To: Matthew Wilcox <willy@infradead.org> Cc: Luis Chamberlain <mcgrof@kernel.org>, Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>, Pankaj Raghav <p.raghav@samsung.com>, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, Hannes Reinecke <hare@suse.de> Subject: [PATCH 17/18] xfs: remove check for block sizes smaller than PAGE_SIZE Date: Mon, 18 Sep 2023 13:05:09 +0200 Message-Id: <20230918110510.66470-18-hare@suse.de> In-Reply-To: <20230918110510.66470-1-hare@suse.de> References: <20230918110510.66470-1-hare@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	block: update buffer_head for Large-block I/O \| expand [RFC,00/18] block: update buffer_head for Large-block I/O [01/18] mm/readahead: rework loop in page_cache_ra_unbounded() [02/18] fs/mpage: use blocks_per_folio instead of blocks_per_page [03/18] block/buffer_head: introduce block_{index_to_sector,sector_to_index} [04/18] fs/buffer.c: use accessor function to translate page index to sectors [05/18] fs/mpage: use accessor function to translate page index to sectors [06/18] fs: Allow fine-grained control of folio sizes [07/18] mm/filemap: allocate folios with mapping order preference [08/18] mm/readahead: allocate folios with mapping order preference [09/18] fs/buffer: use mapping order in grow_dev_page() [10/18] block/bdev: lift restrictions on supported blocksize [11/18] block/bdev: enable large folio support for large logical block sizes [12/18] brd: convert to folios [13/18] brd: abstract page_size conventions [14/18] brd: use memcpy_{to,from}_folio() [15/18] brd: make sector size configurable [16/18] brd: make logical sector size configurable [17/18] xfs: remove check for block sizes smaller than PAGE_SIZE [18/18] nvme: enable logical block size > PAGE_SIZE

On Mon, Sep 18, 2023 at 01:05:09PM +0200, Hannes Reinecke wrote: > We now support block sizes larger than PAGE_SIZE, so this > check is pointless. > > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > fs/xfs/xfs_super.c | 12 ------------ > 1 file changed, 12 deletions(-) > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index 1f77014c6e1a..67dcdd4dcf2d 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -1651,18 +1651,6 @@ xfs_fs_fill_super( > goto out_free_sb; > } > > - /* > - * Until this is fixed only page-sized or smaller data blocks work. > - */ > - if (mp->m_sb.sb_blocksize > PAGE_SIZE) { > - xfs_warn(mp, > - "File system with blocksize %d bytes. " > - "Only pagesize (%ld) or less will currently work.", > - mp->m_sb.sb_blocksize, PAGE_SIZE); > - error = -ENOSYS; > - goto out_free_sb; > - } This really needs to be replaced with an EXPERIMENTAL warning - we're not going to support these LBS configurations until we are sure it doesn't eat data. Anyway, smoke tests.... # mkfs.xfs -f -b size=64k /dev/pmem0 meta-data=/dev/pmem0 isize=512 agcount=4, agsize=32768 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 bigtime=1 inobtcount=1 nrext64=0 data = bsize=65536 blocks=131072, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=65536 ascii-ci=0, ftype=1 log =internal log bsize=65536 blocks=1024, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=65536 blocks=0, rtextents=0 # mount /dev/pmem0 /mnt/test Message from syslogd@test3 at Sep 20 11:23:32 ... kernel:[ 73.521819] XFS: Assertion failed: PAGE_SHIFT >= sbp->sb_blocklog, file: fs/xfs/xfs_mount.c, line: 134 Message from syslogd@test3 at Sep 20 11:23:32 ... kernel:[ 73.521819] XFS: Assertion failed: PAGE_SHIFT >= sbp->sb_blocklog, file: fs/xfs/xfs_mount.c, line: 134 Segmentation fault # Looks like this hasn't been tested with CONFIG_XFS_DEBUG=y. If that's the case, I expect that none of this actually works... :/ I've attached a patch at the end of the email that allows XFS filesystems to mount with debug enabled. Next problem, filesystem created with a 32kB sector size: # mkfs.xfs -f -b size=64k -s size=32k /dev/pmem0 meta-data=/dev/pmem0 isize=512 agcount=4, agsize=32768 blks = sectsz=32768 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 nrext64=0 data = bsize=65536 blocks=131072, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=65536 ascii-ci=0, ftype=1 log =internal log bsize=65536 blocks=2709, version=2 = sectsz=32768 sunit=1 blks, lazy-count=1 realtime =none extsz=65536 blocks=0, rtextents=0 # and then running xfs_db on it to change the UUID: # xfs_admin -U generate /dev/pmem0 Results in a kernel panic: [ 132.151886] XFS (pmem0): Mounting V5 Filesystem 3d96f860-2aa2-4e50-970c-134508b7954a [ 132.161673] XFS (pmem0): Ending clean mount [ 175.824015] XFS (pmem0): Unmounting Filesystem 3d96f860-2aa2-4e50-970c-134508b7954a [ 185.759251] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: do_mpage_readpage+0x7e5/0x7f0 [ 185.766632] CPU: 1 PID: 4383 Comm: xfs_db Not tainted 6.6.0-rc2-dgc+ #1903 [ 185.771882] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 185.778827] Call Trace: [ 185.780706] <TASK> [ 185.782318] dump_stack_lvl+0x37/0x50 [ 185.785069] dump_stack+0x10/0x20 [ 185.787557] panic+0x15b/0x300 [ 185.789763] ? do_mpage_readpage+0x7e5/0x7f0 [ 185.792705] __stack_chk_fail+0x14/0x20 [ 185.795183] do_mpage_readpage+0x7e5/0x7f0 [ 185.797826] ? blkdev_write_begin+0x30/0x30 [ 185.800500] ? blkdev_readahead+0x15/0x20 [ 185.802894] ? read_pages+0x5c/0x230 [ 185.805023] ? page_cache_ra_order+0x2ae/0x310 [ 185.807538] ? ondemand_readahead+0x1f1/0x3a0 [ 185.809899] ? page_cache_async_ra+0x26/0x30 [ 185.812175] ? filemap_get_pages+0x540/0x6d0 [ 185.814327] ? _copy_to_iter+0x65/0x4c0 [ 185.816283] ? filemap_read+0xfc/0x3a0 [ 185.818086] ? __fsnotify_parent+0x107/0x340 [ 185.820142] ? __might_sleep+0x42/0x70 [ 185.821854] ? blkdev_read_iter+0x6d/0x150 [ 185.823697] ? vfs_read+0x1b1/0x300 [ 185.825307] ? __x64_sys_pread64+0x8f/0xc0 [ 185.827159] ? irqentry_exit+0x33/0x40 [ 185.828858] ? do_syscall_64+0x35/0x80 [ 185.830485] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 185.832677] </TASK> [ 185.834068] Kernel Offset: disabled [ 185.835510] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: do_mpage_readpage+0x7e5/0x7f0 ]--- Probably related to the block device having a 32kB block size set on the pmem device by xfs_db when it only really has a 4kB sector size.... Anyway, back to just using 64k fsb, 4k sector. generic/001 ASSERT fails immediately with: [ 111.785796] run fstests generic/001 at 2023-09-20 11:50:19 [ 113.346797] XFS: Assertion failed: imap.br_startblock != DELAYSTARTBLOCK, file: fs/xfs/xfs_reflink.c, line: 1392 [ 113.352512] ------------[ cut here ]------------ [ 113.354793] kernel BUG at fs/xfs/xfs_message.c:102! [ 113.358444] invalid opcode: 0000 [#1] PREEMPT SMP [ 113.360769] CPU: 8 PID: 7581 Comm: cp Not tainted 6.6.0-rc2-dgc+ #1903 [ 113.364183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 113.369784] RIP: 0010:assfail+0x35/0x40 [ 113.372038] Code: c9 48 c7 c2 00 d8 6c 82 48 89 e5 48 89 f1 48 89 fe 48 c7 c7 d8 d3 60 82 e8 a8 fd ff ff 80 3d d1 36 ec 02 00 75 04 0f 0b 5d c3 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 63 f6 49 89 [ 113.384178] RSP: 0018:ffffc9000962bca0 EFLAGS: 00010202 [ 113.387467] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 000000007fffffff [ 113.392326] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8260d3d8 [ 113.397235] RBP: ffffc9000962bca0 R08: 0000000000000000 R09: 000000000000000a [ 113.401449] R10: 000000000000000a R11: 0fffffffffffffff R12: 0000000000000000 [ 113.406312] R13: 00000000ffffff8b R14: 0000000000000000 R15: ffff88810de00f00 [ 113.410562] FS: 00007f4f6219b500(0000) GS:ffff8885fec00000(0000) knlGS:0000000000000000 [ 113.415506] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 113.418688] CR2: 0000564951bfcf50 CR3: 00000005cd6b8005 CR4: 0000000000060ee0 [ 113.422825] Call Trace: [ 113.424302] <TASK> [ 113.425566] ? show_regs+0x61/0x70 [ 113.427444] ? die+0x37/0x90 [ 113.429175] ? do_trap+0xec/0x100 [ 113.431016] ? do_error_trap+0x6c/0x90 [ 113.432951] ? assfail+0x35/0x40 [ 113.434677] ? exc_invalid_op+0x52/0x70 [ 113.436755] ? assfail+0x35/0x40 [ 113.438659] ? asm_exc_invalid_op+0x1b/0x20 [ 113.440864] ? assfail+0x35/0x40 [ 113.442671] xfs_reflink_remap_blocks+0x197/0x350 [ 113.445278] xfs_file_remap_range+0xf3/0x320 [ 113.447504] do_clone_file_range+0xfe/0x2b0 [ 113.449689] vfs_clone_file_range+0x3f/0x150 [ 113.452080] ioctl_file_clone+0x52/0xa0 [ 113.453600] do_vfs_ioctl+0x485/0x8d0 [ 113.455054] ? selinux_file_ioctl+0x96/0x120 [ 113.456637] ? selinux_file_ioctl+0x96/0x120 [ 113.458213] __x64_sys_ioctl+0x73/0xd0 [ 113.459598] do_syscall_64+0x35/0x80 [ 113.460840] entry_SYSCALL_64_after_hwframe+0x63/0xcd Problems with unexpected extent types in the reflink remap code. This is due to the reflink operation finding a delalloc extent where it should be finding a real extent. this implies the xfs_flush_unmap_range() call in xfs_reflink_remap_prep() didn't flush the full data range it was supposed to. xfs_flush_unmap_range() is supposed to round the range out to: rounding = max_t(xfs_off_t, mp->m_sb.sb_blocksize, PAGE_SIZE); the block size for these cases, so it is LBS aware. So maybe there's a problem with filemap_write_and_wait_range() and/or truncate_pagecache_range() when dealing with LBS enabled page cache? So, yeah, debug XFS builds tell us straight away that important stuff does not appear to be not working correctly. Debug really needs to be enabled, otherwise silent data corruption situations like this can go unnoticed by fstests tests... -Dave

[17/18] xfs: remove check for block sizes smaller than PAGE_SIZE

Commit Message

Comments

Patch