Message ID | 20240704070357.1993-4-kundan.kumar@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: add larger order folio instead of pages | expand |
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
Hello, kernel test robot noticed "WARNING:at_mm/gup.c:#try_grab_page" on: commit: 69b40318d4fdb6a9ac6bb833618e4cd954db4946 ("[PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio") url: https://github.com/intel-lab-lkp/linux/commits/Kundan-Kumar/block-Added-folio-lized-version-of-bvec_try_merge_hw_page/20240705-055633 base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next patch link: https://lore.kernel.org/all/20240704070357.1993-4-kundan.kumar@samsung.com/ patch subject: [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio in testcase: ltp version: ltp-x86_64-14c1f76-1_20240706 with following parameters: disk: 1HDD fs: ext4 test: ltp-aiodio.part2-00 compiler: gcc-13 test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202407081641.6a640f9e-oliver.sang@intel.com kern :warn : [ 327.605962] ------------[ cut here ]------------ kern :warn : [ 327.606706] WARNING: CPU: 1 PID: 5867 at mm/gup.c:229 try_grab_page (mm/gup.c:229 (discriminator 1)) kern :warn : [ 327.607701] Modules linked in: intel_rapl_msr intel_rapl_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 rapl btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c crc32c_intel sd_mod sg intel_cstate ahci mei_me nvme libahci ipmi_devintf ipmi_msghandler intel_uncore intel_wmi_thunderbolt wmi_bmof mxm_wmi wdat_wdt libata nvme_core i2c_i801 ioatdma mei i2c_smbus dca wmi binfmt_misc drm fuse loop dm_mod ip_tables kern :warn : [ 327.613419] CPU: 1 PID: 5867 Comm: dio_sparse Not tainted 6.10.0-rc6-00246-g69b40318d4fd #1 kern :warn : [ 327.614547] Hardware name: Gigabyte Technology Co., Ltd. X299 UD4 Pro/X299 UD4 Pro-CF, BIOS F8a 04/27/2021 kern :warn : [ 327.615788] RIP: 0010:try_grab_page (mm/gup.c:229 (discriminator 1)) kern :warn : [ 327.616621] Code: 40 f6 c5 01 0f 84 1a fe ff ff 48 83 ed 01 e9 14 fe ff ff be 04 00 00 00 4c 89 e7 e8 bd 68 14 00 f0 41 ff 04 24 e9 67 ff ff ff <0f> 0b b8 f4 ff ff ff 5b 5d 41 5c 41 5d c3 cc cc cc cc e8 9c 68 14 All code ======== 0: 40 f6 c5 01 test $0x1,%bpl 4: 0f 84 1a fe ff ff je 0xfffffffffffffe24 a: 48 83 ed 01 sub $0x1,%rbp e: e9 14 fe ff ff jmp 0xfffffffffffffe27 13: be 04 00 00 00 mov $0x4,%esi 18: 4c 89 e7 mov %r12,%rdi 1b: e8 bd 68 14 00 call 0x1468dd 20: f0 41 ff 04 24 lock incl (%r12) 25: e9 67 ff ff ff jmp 0xffffffffffffff91 2a:* 0f 0b ud2 <-- trapping instruction 2c: b8 f4 ff ff ff mov $0xfffffff4,%eax 31: 5b pop %rbx 32: 5d pop %rbp 33: 41 5c pop %r12 35: 41 5d pop %r13 37: c3 ret 38: cc int3 39: cc int3 3a: cc int3 3b: cc int3 3c: e8 .byte 0xe8 3d: 9c pushf 3e: 68 .byte 0x68 3f: 14 .byte 0x14 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: b8 f4 ff ff ff mov $0xfffffff4,%eax 7: 5b pop %rbx 8: 5d pop %rbp 9: 41 5c pop %r12 b: 41 5d pop %r13 d: c3 ret e: cc int3 f: cc int3 10: cc int3 11: cc int3 12: e8 .byte 0xe8 13: 9c pushf 14: 68 .byte 0x68 15: 14 .byte 0x14 kern :warn : [ 327.619059] RSP: 0018:ffffc9000d61f2f8 EFLAGS: 00010246 kern :warn : [ 327.619993] RAX: 0000000000000000 RBX: ffffea000bc10000 RCX: ffffffff8194dcbb kern :warn : [ 327.621085] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffea000bc10034 kern :warn : [ 327.622179] RBP: ffffea000bc10000 R08: 0000000000000000 R09: fffff94001782006 kern :warn : [ 327.623273] R10: ffffea000bc10037 R11: 00007fd4b2aa6fff R12: ffffea000bc10034 kern :warn : [ 327.624366] R13: 0000000000290000 R14: ffff8881267fec40 R15: 0000000000000000 kern :warn : [ 327.625460] FS: 00007fd4b2929740(0000) GS:ffff889f8ae80000(0000) knlGS:0000000000000000 kern :warn : [ 327.626636] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kern :warn : [ 327.627622] CR2: 00007f517b030000 CR3: 0000002081e22001 CR4: 00000000003706f0 kern :warn : [ 327.628727] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kern :warn : [ 327.629832] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kern :warn : [ 327.631001] Call Trace: kern :warn : [ 327.631743] <TASK> kern :warn : [ 327.632445] ? __warn (kernel/panic.c:693) kern :warn : [ 327.633243] ? try_grab_page (mm/gup.c:229 (discriminator 1)) kern :warn : [ 327.634128] ? report_bug (lib/bug.c:180 lib/bug.c:219) kern :warn : [ 327.634985] ? handle_bug (arch/x86/kernel/traps.c:239 (discriminator 1)) kern :warn : [ 327.635835] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) kern :warn : [ 327.636695] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) kern :warn : [ 327.637587] ? try_grab_page (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/atomic/atomic-instrumented.h:33 include/linux/page_ref.h:67 include/linux/page_ref.h:89 mm/gup.c:229) kern :warn : [ 327.638509] ? try_grab_page (mm/gup.c:229 (discriminator 1)) kern :warn : [ 327.639418] ? try_grab_page (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/atomic/atomic-instrumented.h:33 include/linux/page_ref.h:67 include/linux/page_ref.h:89 mm/gup.c:229) kern :warn : [ 327.640285] follow_huge_pmd (mm/gup.c:809) kern :warn : [ 327.641162] follow_pmd_mask+0x3f8/0x7e0 kern :warn : [ 327.642090] ? __pfx_follow_pmd_mask+0x10/0x10 kern :warn : [ 327.643125] ? find_vma (mm/mmap.c:1944) kern :warn : [ 327.643948] ? __pfx_find_vma (mm/mmap.c:1944) kern :warn : [ 327.644804] follow_page_mask (mm/gup.c:1116 mm/gup.c:1162) kern :warn : [ 327.645674] __get_user_pages (mm/gup.c:1588 (discriminator 1)) kern :warn : [ 327.646601] ? __pfx___get_user_pages (mm/gup.c:1522) kern :warn : [ 327.647533] ? down_read_killable (arch/x86/include/asm/atomic64_64.h:20 include/linux/atomic/atomic-arch-fallback.h:2629 include/linux/atomic/atomic-long.h:79 include/linux/atomic/atomic-instrumented.h:3224 kernel/locking/rwsem.c:176 kernel/locking/rwsem.c:181 kernel/locking/rwsem.c:249 kernel/locking/rwsem.c:241 kernel/locking/rwsem.c:1249 kernel/locking/rwsem.c:1273 kernel/locking/rwsem.c:1551) kern :warn : [ 327.648428] ? __pfx_down_read_killable (kernel/locking/rwsem.c:1547) kern :warn : [ 327.649345] __gup_longterm_locked (mm/gup.c:1859 mm/gup.c:2556) kern :warn : [ 327.650237] gup_fast_fallback (mm/gup.c:3476) kern :warn : [ 327.651111] ? __pfx__raw_spin_lock_irqsave (kernel/locking/spinlock.c:161) kern :warn : [ 327.652030] ? __pfx_gup_fast_fallback (mm/gup.c:3439) kern :warn : [ 327.652890] ? __link_object (include/linux/rculist.h:79 (discriminator 1) include/linux/rculist.h:128 (discriminator 1) mm/kmemleak.c:728 (discriminator 1)) kern :warn : [ 327.653674] iov_iter_extract_pages (lib/iov_iter.c:1584 (discriminator 1) lib/iov_iter.c:1646 (discriminator 1)) kern :warn : [ 327.654501] __bio_iov_iter_get_pages (block/bio.c:1353) kern :warn : [ 327.655382] ? __pfx___bio_iov_iter_get_pages (block/bio.c:1324) kern :warn : [ 327.656273] ? bio_init (arch/x86/include/asm/atomic.h:28 include/linux/atomic/atomic-arch-fallback.h:503 include/linux/atomic/atomic-instrumented.h:68 block/bio.c:279) kern :warn : [ 327.656980] ? bio_alloc_bioset (block/bio.c:578) kern :warn : [ 327.657721] bio_iov_iter_get_pages (block/bio.c:1446 (discriminator 3)) kern :warn : [ 327.658474] iomap_dio_bio_iter (fs/iomap/direct-io.c:388) kern :warn : [ 327.659248] __iomap_dio_rw (fs/iomap/direct-io.c:501 fs/iomap/direct-io.c:660) kern :warn : [ 327.659982] ? __pfx___iomap_dio_rw (fs/iomap/direct-io.c:544) kern :warn : [ 327.660729] ? __pfx_ext4_dio_write_checks (fs/ext4/file.c:424) kern :warn : [ 327.661518] ? __pfx_ext4_dio_write_end_io (fs/ext4/file.c:376) kern :warn : [ 327.662299] ? iomap_dio_complete (fs/iomap/direct-io.c:133) kern :warn : [ 327.663052] iomap_dio_rw (fs/iomap/direct-io.c:749) kern :warn : [ 327.663723] ext4_dio_write_iter (fs/ext4/file.c:577) kern :warn : [ 327.664439] ? __pfx_ext4_dio_write_iter (fs/ext4/file.c:499) kern :warn : [ 327.665185] ? folio_unlock (arch/x86/include/asm/bitops.h:101 include/asm-generic/bitops/instrumented-lock.h:80 include/linux/page-flags.h:762 mm/filemap.c:1508) kern :warn : [ 327.665838] ? do_wp_page (include/linux/vmstat.h:71 (discriminator 1) mm/memory.c:3193 (discriminator 1) mm/memory.c:3663 (discriminator 1)) kern :warn : [ 327.666494] ? __pfx___might_resched (kernel/sched/core.c:10151) kern :warn : [ 327.667254] vfs_write (fs/read_write.c:497 fs/read_write.c:590) kern :warn : [ 327.667915] ? __pfx___might_resched (kernel/sched/core.c:10151) kern :warn : [ 327.668657] ? __pfx_vfs_write (fs/read_write.c:571) kern :warn : [ 327.669342] ? __pfx___might_resched (kernel/sched/core.c:10151) kern :warn : [ 327.670062] ? __pfx_put_timespec64 (kernel/time/time.c:904) kern :warn : [ 327.670774] ksys_write (fs/read_write.c:643) kern :warn : [ 327.671409] ? __pfx_ksys_write (fs/read_write.c:633) kern :warn : [ 327.672129] ? __pfx___x64_sys_clock_gettime (kernel/time/posix-timers.c:1132) kern :warn : [ 327.672939] ? fpregs_restore_userregs (arch/x86/include/asm/bitops.h:75 include/asm-generic/bitops/instrumented-atomic.h:42 include/linux/thread_info.h:94 arch/x86/kernel/fpu/context.h:79) kern :warn : [ 327.673697] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) kern :warn : [ 327.674363] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) kern :warn : [ 327.675145] RIP: 0033:0x7fd4b2a24240 kern :warn : [ 327.675803] Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89 All code ======== 0: 40 00 48 8b rex add %cl,-0x75(%rax) 4: 15 c1 9b 0d 00 adc $0xd9bc1,%eax 9: f7 d8 neg %eax b: 64 89 02 mov %eax,%fs:(%rdx) e: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax 15: eb b7 jmp 0xffffffffffffffce 17: 0f 1f 00 nopl (%rax) 1a: 80 3d a1 23 0e 00 00 cmpb $0x0,0xe23a1(%rip) # 0xe23c2 21: 74 17 je 0x3a 23: b8 01 00 00 00 mov $0x1,%eax 28: 0f 05 syscall 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction 30: 77 58 ja 0x8a 32: c3 ret 33: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 3a: 48 83 ec 28 sub $0x28,%rsp 3e: 48 rex.W 3f: 89 .byte 0x89 Code starting with the faulting instruction =========================================== 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax 6: 77 58 ja 0x60 8: c3 ret 9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 10: 48 83 ec 28 sub $0x28,%rsp 14: 48 rex.W 15: 89 .byte 0x89 The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240708/202407081641.6a640f9e-oliver.sang@intel.com
diff --git a/block/bio.c b/block/bio.c index 05d624f016f0..32c9c6d80384 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1243,8 +1243,8 @@ void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter) bio_set_flag(bio, BIO_CLONED); } -static int bio_iov_add_page(struct bio *bio, struct page *page, - unsigned int len, unsigned int offset) +static int bio_iov_add_folio(struct bio *bio, struct folio *folio, size_t len, + size_t offset) { bool same_page = false; @@ -1253,30 +1253,61 @@ static int bio_iov_add_page(struct bio *bio, struct page *page, if (bio->bi_vcnt > 0 && bvec_try_merge_page(&bio->bi_io_vec[bio->bi_vcnt - 1], - page, len, offset, &same_page)) { + folio_page(folio, 0), len, offset, + &same_page)) { bio->bi_iter.bi_size += len; if (same_page) - bio_release_page(bio, page); + bio_release_page(bio, folio_page(folio, 0)); return 0; } - __bio_add_page(bio, page, len, offset); + bio_add_folio_nofail(bio, folio, len, offset); return 0; } -static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page, - unsigned int len, unsigned int offset) +static int bio_iov_add_zone_append_folio(struct bio *bio, struct folio *folio, + size_t len, size_t offset) { struct request_queue *q = bdev_get_queue(bio->bi_bdev); bool same_page = false; - if (bio_add_hw_page(q, bio, page, len, offset, + if (bio_add_hw_folio(q, bio, folio, len, offset, queue_max_zone_append_sectors(q), &same_page) != len) return -EINVAL; if (same_page) - bio_release_page(bio, page); + bio_release_page(bio, folio_page(folio, 0)); return 0; } +static unsigned int get_contig_folio_len(unsigned int *num_pages, + struct page **pages, unsigned int i, + struct folio *folio, size_t left, + size_t offset) +{ + size_t bytes = left; + size_t contig_sz = min_t(size_t, PAGE_SIZE - offset, bytes); + unsigned int j; + + /* + * We might COW a single page in the middle of + * a large folio, so we have to check that all + * pages belong to the same folio. + */ + bytes -= contig_sz; + for (j = i + 1; j < i + *num_pages; j++) { + size_t next = min_t(size_t, PAGE_SIZE, bytes); + + if (page_folio(pages[j]) != folio || + pages[j] != pages[j - 1] + 1) { + break; + } + contig_sz += next; + bytes -= next; + } + *num_pages = j - i; + + return contig_sz; +} + #define PAGE_PTRS_PER_BVEC (sizeof(struct bio_vec) / sizeof(struct page *)) /** @@ -1296,9 +1327,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt; struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; struct page **pages = (struct page **)bv; - ssize_t size, left; - unsigned len, i = 0; - size_t offset; + ssize_t size; + unsigned int i = 0, num_pages; + size_t offset, folio_offset, left, len; int ret = 0; /* @@ -1340,15 +1371,29 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) for (left = size, i = 0; left > 0; left -= len, i++) { struct page *page = pages[i]; + struct folio *folio = page_folio(page); + + folio_offset = ((size_t)folio_page_idx(folio, page) << + PAGE_SHIFT) + offset; + + len = min_t(size_t, (folio_size(folio) - folio_offset), left); + + num_pages = DIV_ROUND_UP(offset + len, PAGE_SIZE); + + if (num_pages > 1) + len = get_contig_folio_len(&num_pages, pages, i, + folio, left, offset); - len = min_t(size_t, PAGE_SIZE - offset, left); if (bio_op(bio) == REQ_OP_ZONE_APPEND) { - ret = bio_iov_add_zone_append_page(bio, page, len, - offset); + ret = bio_iov_add_zone_append_folio(bio, folio, len, + folio_offset); if (ret) break; } else - bio_iov_add_page(bio, page, len, offset); + bio_iov_add_folio(bio, folio, len, folio_offset); + + /* Skip the pages which got added */ + i = i + (num_pages - 1); offset = 0; }
Add a bigger size from folio to bio and skip merge processing for pages. Fetch the offset of page within a folio. Depending on the size of folio and folio_offset, fetch a larger length. This length may consist of multiple contiguous pages if folio is multiorder. Using the length calculate number of pages which will be added to bio and increment the loop counter to skip those pages. Using a helper function check if pages are contiguous and belong to same folio, this is done as a COW may happen and change contiguous mapping of pages of folio. This technique helps to avoid overhead of merging pages which belong to same large order folio. Also folio-lize the functions bio_iov_add_page() and bio_iov_add_zone_append_page() Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com> --- block/bio.c | 77 ++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 61 insertions(+), 16 deletions(-)