Message ID | 20230621083823.1724337-4-p.raghav@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | minimum folio order support in filemap | expand |
On 6/21/23 10:38, Pankaj Raghav wrote: > From: Luis Chamberlain <mcgrof@kernel.org> > > Automatically set the minimum mapping order for block devices in > set_init_blocksize(). The mapping order will be set only when the block > device uses iomap based aops. > > Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> > --- > block/bdev.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/block/bdev.c b/block/bdev.c > index 9bb54d9d02a6..db8cede8a320 100644 > --- a/block/bdev.c > +++ b/block/bdev.c > @@ -126,6 +126,7 @@ static void set_init_blocksize(struct block_device *bdev) > { > unsigned int bsize = bdev_logical_block_size(bdev); > loff_t size = i_size_read(bdev->bd_inode); > + int order, folio_order; > > while (bsize < PAGE_SIZE) { > if (size & bsize) > @@ -133,6 +134,14 @@ static void set_init_blocksize(struct block_device *bdev) > bsize <<= 1; > } > bdev->bd_inode->i_blkbits = blksize_bits(bsize); > + order = bdev->bd_inode->i_blkbits - PAGE_SHIFT; > + folio_order = mapping_min_folio_order(bdev->bd_inode->i_mapping); > + > + if (!IS_ENABLED(CONFIG_BUFFER_HEAD)) { > + /* Do not allow changing the folio order after it is set */ > + WARN_ON_ONCE(folio_order && (folio_order != order)); > + mapping_set_folio_orders(bdev->bd_inode->i_mapping, order, 31); > + } > } > > int set_blocksize(struct block_device *bdev, int size) This really has nothing to do with buffer heads. In fact, I've got a patchset to make it work _with_ buffer heads. So please, don't make it conditional on CONFIG_BUFFER_HEAD. And we should be calling into 'mapping_set_folio_order()' only if the 'order' argument is larger than PAGE_ORDER, otherwise we end up enabling large folio support for _every_ block device. Which I doubt we want. Cheers, Hannes
>> bdev->bd_inode->i_blkbits = blksize_bits(bsize); >> + order = bdev->bd_inode->i_blkbits - PAGE_SHIFT; >> + folio_order = mapping_min_folio_order(bdev->bd_inode->i_mapping); >> + >> + if (!IS_ENABLED(CONFIG_BUFFER_HEAD)) { >> + /* Do not allow changing the folio order after it is set */ >> + WARN_ON_ONCE(folio_order && (folio_order != order)); >> + mapping_set_folio_orders(bdev->bd_inode->i_mapping, order, 31); >> + } >> } >> int set_blocksize(struct block_device *bdev, int size) > This really has nothing to do with buffer heads. > > In fact, I've got a patchset to make it work _with_ buffer heads. > > So please, don't make it conditional on CONFIG_BUFFER_HEAD. > > And we should be calling into 'mapping_set_folio_order()' only if the 'order' argument is larger > than PAGE_ORDER, otherwise we end up enabling > large folio support for _every_ block device. > Which I doubt we want. > Hmm, which aops are you using for the block device? If you are using the old aops, then we will be using helpers from buffer.c and mpage.c which do not support large folios. I am getting a BUG_ON when I don't use iomap based aops for the block device: [ 11.596239] kernel BUG at fs/buffer.c:2384! [ 11.596609] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 11.597064] CPU: 3 PID: 10 Comm: kworker/u8:0 Not tainted 6.4.0-rc7-next-20230621-00010-g87171074c649-dirty #183 [ 11.597934] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 11.598882] Workqueue: nvme-wq nvme_scan_work [nvme_core] [ 11.599370] RIP: 0010:block_read_full_folio+0x70d/0x8f0 Let me know what you think! > Cheers, > > Hannes > >
On 6/21/23 12:42, Pankaj Raghav wrote: >>> bdev->bd_inode->i_blkbits = blksize_bits(bsize); >>> + order = bdev->bd_inode->i_blkbits - PAGE_SHIFT; >>> + folio_order = mapping_min_folio_order(bdev->bd_inode->i_mapping); >>> + >>> + if (!IS_ENABLED(CONFIG_BUFFER_HEAD)) { >>> + /* Do not allow changing the folio order after it is set */ >>> + WARN_ON_ONCE(folio_order && (folio_order != order)); >>> + mapping_set_folio_orders(bdev->bd_inode->i_mapping, order, 31); >>> + } >>> } >>> int set_blocksize(struct block_device *bdev, int size) >> This really has nothing to do with buffer heads. >> >> In fact, I've got a patchset to make it work _with_ buffer heads. >> >> So please, don't make it conditional on CONFIG_BUFFER_HEAD. >> >> And we should be calling into 'mapping_set_folio_order()' only if the 'order' argument is larger >> than PAGE_ORDER, otherwise we end up enabling >> large folio support for _every_ block device. >> Which I doubt we want. >> > > Hmm, which aops are you using for the block device? If you are using the old aops, then we will be > using helpers from buffer.c and mpage.c which do not support large folios. I am getting a BUG_ON > when I don't use iomap based aops for the block device: > I know. I haven't said that mpage.c / buffer.c support large folios _now_. All I'm saying is that I have a patchset enabling it to support large folios :-) Cheers, Hannes
>> >> Hmm, which aops are you using for the block device? If you are using the old aops, then we will be >> using helpers from buffer.c and mpage.c which do not support large folios. I am getting a BUG_ON >> when I don't use iomap based aops for the block device: >> > I know. I haven't said that mpage.c / buffer.c support large folios _now_. All I'm saying is that I > have a patchset enabling it to support large folios :-) > Ah ok! I thought we are not going that route based on the discussion we had in LSF. > Cheers, > > Hannes >
Hello, kernel test robot noticed "WARNING:at_block/bdev.c:#set_init_blocksize" on: commit: 01671335e600baacb98c70cdb010d539e5f615c2 ("[RFC 3/4] block: set mapping order for the block cache in set_init_blocksize") url: https://github.com/intel-lab-lkp/linux/commits/Pankaj-Raghav/fs-Allow-fine-grained-control-of-folio-sizes/20230621-172322 base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/all/20230621083823.1724337-4-p.raghav@samsung.com/ patch subject: [RFC 3/4] block: set mapping order for the block cache in set_init_blocksize in testcase: boot compiler: gcc-12 test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202306241656.fa99b0e0-oliver.sang@intel.com [ 20.303844][ T130] ------------[ cut here ]------------ [ 20.304791][ T130] WARNING: CPU: 1 PID: 130 at block/bdev.c:142 set_init_blocksize (block/bdev.c:142 (discriminator 1)) [ 20.306150][ T130] Modules linked in: sr_mod cdrom sg ata_generic bochs intel_rapl_msr ata_piix intel_rapl_common drm_vram_helper crct10dif_pclmul crc32_pclmul ppdev crc32c_intel ghash_clmulni_intel sha512_ssse3 drm_kms_helper rapl libata syscopyarea sysfillrect sysimgblt drm_ttm_helper ttm joydev ipmi_devintf parport_pc ipmi_msghandler serio_raw i2c_piix4 parport fuse drm ip_tables [ 20.310835][ T130] CPU: 1 PID: 130 Comm: systemd-udevd Not tainted 6.4.0-rc4-00504-g01671335e600 #1 [ 20.312212][ T130] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 20.313760][ T130] RIP: 0010:set_init_blocksize (block/bdev.c:142 (discriminator 1)) [ 20.314621][ T130] Code: c1 e2 08 48 09 d0 48 0d 00 e0 03 00 48 89 86 98 00 00 00 c3 cc cc cc cc 3d ff 0f 00 00 0f 86 7a ff ff ff c1 e8 09 89 c6 eb 91 <0f> 0b eb cc ba 09 00 00 00 eb 96 66 66 2e 0f 1f 84 00 00 00 00 00 All code ======== 0: c1 e2 08 shl $0x8,%edx 3: 48 09 d0 or %rdx,%rax 6: 48 0d 00 e0 03 00 or $0x3e000,%rax c: 48 89 86 98 00 00 00 mov %rax,0x98(%rsi) 13: c3 retq 14: cc int3 15: cc int3 16: cc int3 17: cc int3 18: 3d ff 0f 00 00 cmp $0xfff,%eax 1d: 0f 86 7a ff ff ff jbe 0xffffffffffffff9d 23: c1 e8 09 shr $0x9,%eax 26: 89 c6 mov %eax,%esi 28: eb 91 jmp 0xffffffffffffffbb 2a:* 0f 0b ud2 <-- trapping instruction 2c: eb cc jmp 0xfffffffffffffffa 2e: ba 09 00 00 00 mov $0x9,%edx 33: eb 96 jmp 0xffffffffffffffcb 35: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1) 3c: 00 00 00 00 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: eb cc jmp 0xffffffffffffffd0 4: ba 09 00 00 00 mov $0x9,%edx 9: eb 96 jmp 0xffffffffffffffa1 b: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1) 12: 00 00 00 00 [ 20.317420][ T130] RSP: 0000:ffffadd780567c48 EFLAGS: 00010282 [ 20.318345][ T130] RAX: 00000000fffffd00 RBX: ffff8d79b1670c00 RCX: 000000000000001d [ 20.319432][ T130] RDX: 00000000fffffffd RSI: ffff8d79b16710f8 RDI: ffff8d79b1670c00 [ 20.320641][ T130] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000584843 [ 20.321827][ T130] R10: 0000000000000000 R11: 0000000000586c43 R12: ffff8d7998831400 [ 20.322912][ T130] R13: ffff8d7998831400 R14: ffff8d7a7aca2800 R15: ffff8d79988315f8 [ 20.324947][ T130] FS: 0000000000000000(0000) GS:ffff8d7cafd00000(0063) knlGS:00000000f7929b00 [ 20.326981][ T130] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 [ 20.328939][ T130] CR2: 0000000059b91a8c CR3: 0000000117ac4000 CR4: 00000000000406e0 [ 20.330843][ T130] Call Trace: [ 20.332322][ T130] <TASK> [ 20.333553][ T130] ? set_init_blocksize (block/bdev.c:142 (discriminator 1)) [ 20.335087][ T130] ? __warn (kernel/panic.c:673) [ 20.336555][ T130] ? set_init_blocksize (block/bdev.c:142 (discriminator 1)) [ 20.337984][ T130] ? report_bug (lib/bug.c:180 lib/bug.c:219) [ 20.339449][ T130] ? handle_bug (arch/x86/kernel/traps.c:324) [ 20.340945][ T130] ? exc_invalid_op (arch/x86/kernel/traps.c:345 (discriminator 1)) [ 20.342386][ T130] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568) [ 20.344022][ T130] ? set_init_blocksize (block/bdev.c:142 (discriminator 1)) [ 20.345530][ T130] blkdev_get_whole (block/bdev.c:626) [ 20.346950][ T130] blkdev_get_by_dev (block/bdev.c:837) [ 20.348464][ T130] ? __pfx_blkdev_open (block/fops.c:474) [ 20.349941][ T130] blkdev_open (block/fops.c:494) [ 20.351309][ T130] do_dentry_open (fs/open.c:920) [ 20.352846][ T130] do_open (fs/namei.c:3639) [ 20.354190][ T130] ? open_last_lookups (fs/namei.c:3583) [ 20.355564][ T130] path_openat (fs/namei.c:3792) [ 20.356950][ T130] do_filp_open (fs/namei.c:3818) [ 20.358276][ T130] do_sys_openat2 (fs/open.c:1356) [ 20.359601][ T130] __ia32_compat_sys_openat (fs/open.c:1430) [ 20.361103][ T130] __do_fast_syscall_32 (arch/x86/entry/common.c:112 arch/x86/entry/common.c:178) [ 20.362534][ T130] do_fast_syscall_32 (arch/x86/entry/common.c:203) [ 20.363986][ T130] entry_SYSENTER_compat_after_hwframe (arch/x86/entry/entry_64_compat.S:122) [ 20.365697][ T130] RIP: 0023:0xf7f90589 [ 20.366940][ T130] Code: 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 All code ======== 0: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi ... 20: 00 51 52 add %dl,0x52(%rcx) 23: 55 push %rbp 24:* 89 e5 mov %esp,%ebp <-- trapping instruction 26: 0f 34 sysenter 28: cd 80 int $0x80 2a: 5d pop %rbp 2b: 5a pop %rdx 2c: 59 pop %rcx 2d: c3 retq 2e: 90 nop 2f: 90 nop 30: 90 nop 31: 90 nop 32: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi 39: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi Code starting with the faulting instruction =========================================== 0: 5d pop %rbp 1: 5a pop %rdx 2: 59 pop %rcx 3: c3 retq 4: 90 nop 5: 90 nop 6: 90 nop 7: 90 nop 8: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi f: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi [ 20.370969][ T130] RSP: 002b:00000000ffd9b010 EFLAGS: 00200206 ORIG_RAX: 0000000000000127 [ 20.372809][ T130] RAX: ffffffffffffffda RBX: 00000000ffffff9c RCX: 00000000578c6540 [ 20.374652][ T130] RDX: 00000000000a8800 RSI: 0000000000000000 RDI: 000000005663b8ad [ 20.376552][ T130] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 20.378381][ T130] R10: 0000000000000000 R11: 0000000000200206 R12: 0000000000000000 [ 20.380235][ T130] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 20.381888][ T130] </TASK> [ 20.382968][ T130] ---[ end trace 0000000000000000 ]--- LKP: ttyS0: 194: /lkp/lkp/src/bin/run-lkp /lkp/jobs/scheduled/vm-meta-307/boot-1-debian-11.1-i386-20220923.cgz-01671335e600-20230622-46155-yec2o-1.yaml [ 29.071854][ T211] is_virt=true [ 29.071868][ T211] [ 30.394619][ T211] lkp: kernel tainted state: 0 [ 30.394639][ T211] [ 31.373739][ T211] LKP: stdout: 194: Kernel tests: Boot OK! [ 31.373784][ T211] LKP: ttyS0: 194: LKP: rebooting forcely [ 35.497053][ T211] LKP: stdout: 194: HOSTNAME vm-snb, MAC 52:54:00:12:34:56, kernel 6.4.0-rc4-00504-g01671335e600 1 [ 35.497067][ T211] [ 35.674956][ T211] install debs round one: dpkg -i --force-confdef --force-depends /opt/deb/gawk_1%3a5.1.0-1_i386.deb [ 35.674975][ T211] [ 35.679306][ T211] Selecting previously unselected package gawk. [ 35.679315][ T211] [ 35.684028][ T211] (Reading database ... 16439 files and directories currently installed.) [ 35.684040][ T211] [ 35.688683][ T211] Preparing to unpack .../deb/gawk_1%3a5.1.0-1_i386.deb ... [ 35.688693][ T211] [ 35.691823][ T211] Unpacking gawk (1:5.1.0-1) ... [ 35.691830][ T211] [ 35.694690][ T211] Setting up gawk (1:5.1.0-1) ... [ 35.694697][ T211] [ 35.696731][ T211] NO_NETWORK= [ 35.696739][ T211] [ 35.730000][ T194] sysrq: Emergency Sync [ 35.730860][ T57] Emergency Sync complete [ 35.731988][ T194] sysrq: Resetting Kboot worker: lkp-worker23 Elapsed time: 60 kvm=( qemu-system-x86_64 -enable-kvm -cpu SandyBridge -kernel $kernel -initrd initrd-vm-meta-307.cgz -m 16384 -smp 2 To reproduce: # build kernel cd linux cp config-6.4.0-rc4-00504-g01671335e600 .config make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install cd <mod-install-dir> find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state.
diff --git a/block/bdev.c b/block/bdev.c index 9bb54d9d02a6..db8cede8a320 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -126,6 +126,7 @@ static void set_init_blocksize(struct block_device *bdev) { unsigned int bsize = bdev_logical_block_size(bdev); loff_t size = i_size_read(bdev->bd_inode); + int order, folio_order; while (bsize < PAGE_SIZE) { if (size & bsize) @@ -133,6 +134,14 @@ static void set_init_blocksize(struct block_device *bdev) bsize <<= 1; } bdev->bd_inode->i_blkbits = blksize_bits(bsize); + order = bdev->bd_inode->i_blkbits - PAGE_SHIFT; + folio_order = mapping_min_folio_order(bdev->bd_inode->i_mapping); + + if (!IS_ENABLED(CONFIG_BUFFER_HEAD)) { + /* Do not allow changing the folio order after it is set */ + WARN_ON_ONCE(folio_order && (folio_order != order)); + mapping_set_folio_orders(bdev->bd_inode->i_mapping, order, 31); + } } int set_blocksize(struct block_device *bdev, int size)