diff mbox series

[v7,3/4] block: introduce folio awareness and add a bigger size from folio

Message ID 20240704070357.1993-4-kundan.kumar@samsung.com (mailing list archive)
State New, archived
Headers show
Series block: add larger order folio instead of pages | expand

Commit Message

Kundan Kumar July 4, 2024, 7:03 a.m. UTC
Add a bigger size from folio to bio and skip merge processing for pages.

Fetch the offset of page within a folio. Depending on the size of folio
and folio_offset, fetch a larger length. This length may consist of
multiple contiguous pages if folio is multiorder.

Using the length calculate number of pages which will be added to bio and
increment the loop counter to skip those pages.

Using a helper function check if pages are contiguous and belong to same
folio, this is done as a COW may happen and change contiguous mapping of
pages of folio.

This technique helps to avoid overhead of merging pages which belong to
same large order folio.

Also folio-lize the functions bio_iov_add_page() and
bio_iov_add_zone_append_page()

Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
---
 block/bio.c | 77 ++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 61 insertions(+), 16 deletions(-)

Comments

Christoph Hellwig July 6, 2024, 8:03 a.m. UTC | #1
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
kernel test robot July 8, 2024, 8:24 a.m. UTC | #2
Hello,

kernel test robot noticed "WARNING:at_mm/gup.c:#try_grab_page" on:

commit: 69b40318d4fdb6a9ac6bb833618e4cd954db4946 ("[PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio")
url: https://github.com/intel-lab-lkp/linux/commits/Kundan-Kumar/block-Added-folio-lized-version-of-bvec_try_merge_hw_page/20240705-055633
base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/all/20240704070357.1993-4-kundan.kumar@samsung.com/
patch subject: [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio

in testcase: ltp
version: ltp-x86_64-14c1f76-1_20240706
with following parameters:

	disk: 1HDD
	fs: ext4
	test: ltp-aiodio.part2-00



compiler: gcc-13
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202407081641.6a640f9e-oliver.sang@intel.com


kern  :warn  : [  327.605962] ------------[ cut here ]------------
kern :warn : [  327.606706] WARNING: CPU: 1 PID: 5867 at mm/gup.c:229 try_grab_page (mm/gup.c:229 (discriminator 1)) 
kern  :warn  : [  327.607701] Modules linked in: intel_rapl_msr intel_rapl_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 rapl btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c crc32c_intel sd_mod sg intel_cstate ahci mei_me nvme libahci ipmi_devintf ipmi_msghandler intel_uncore intel_wmi_thunderbolt wmi_bmof mxm_wmi wdat_wdt libata nvme_core i2c_i801 ioatdma mei i2c_smbus dca wmi binfmt_misc drm fuse loop dm_mod ip_tables
kern  :warn  : [  327.613419] CPU: 1 PID: 5867 Comm: dio_sparse Not tainted 6.10.0-rc6-00246-g69b40318d4fd #1
kern  :warn  : [  327.614547] Hardware name: Gigabyte Technology Co., Ltd. X299 UD4 Pro/X299 UD4 Pro-CF, BIOS F8a 04/27/2021
kern :warn : [  327.615788] RIP: 0010:try_grab_page (mm/gup.c:229 (discriminator 1)) 
kern :warn : [ 327.616621] Code: 40 f6 c5 01 0f 84 1a fe ff ff 48 83 ed 01 e9 14 fe ff ff be 04 00 00 00 4c 89 e7 e8 bd 68 14 00 f0 41 ff 04 24 e9 67 ff ff ff <0f> 0b b8 f4 ff ff ff 5b 5d 41 5c 41 5d c3 cc cc cc cc e8 9c 68 14
All code
========
   0:	40 f6 c5 01          	test   $0x1,%bpl
   4:	0f 84 1a fe ff ff    	je     0xfffffffffffffe24
   a:	48 83 ed 01          	sub    $0x1,%rbp
   e:	e9 14 fe ff ff       	jmp    0xfffffffffffffe27
  13:	be 04 00 00 00       	mov    $0x4,%esi
  18:	4c 89 e7             	mov    %r12,%rdi
  1b:	e8 bd 68 14 00       	call   0x1468dd
  20:	f0 41 ff 04 24       	lock incl (%r12)
  25:	e9 67 ff ff ff       	jmp    0xffffffffffffff91
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	b8 f4 ff ff ff       	mov    $0xfffffff4,%eax
  31:	5b                   	pop    %rbx
  32:	5d                   	pop    %rbp
  33:	41 5c                	pop    %r12
  35:	41 5d                	pop    %r13
  37:	c3                   	ret
  38:	cc                   	int3
  39:	cc                   	int3
  3a:	cc                   	int3
  3b:	cc                   	int3
  3c:	e8                   	.byte 0xe8
  3d:	9c                   	pushf
  3e:	68                   	.byte 0x68
  3f:	14                   	.byte 0x14

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	b8 f4 ff ff ff       	mov    $0xfffffff4,%eax
   7:	5b                   	pop    %rbx
   8:	5d                   	pop    %rbp
   9:	41 5c                	pop    %r12
   b:	41 5d                	pop    %r13
   d:	c3                   	ret
   e:	cc                   	int3
   f:	cc                   	int3
  10:	cc                   	int3
  11:	cc                   	int3
  12:	e8                   	.byte 0xe8
  13:	9c                   	pushf
  14:	68                   	.byte 0x68
  15:	14                   	.byte 0x14
kern  :warn  : [  327.619059] RSP: 0018:ffffc9000d61f2f8 EFLAGS: 00010246
kern  :warn  : [  327.619993] RAX: 0000000000000000 RBX: ffffea000bc10000 RCX: ffffffff8194dcbb
kern  :warn  : [  327.621085] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffea000bc10034
kern  :warn  : [  327.622179] RBP: ffffea000bc10000 R08: 0000000000000000 R09: fffff94001782006
kern  :warn  : [  327.623273] R10: ffffea000bc10037 R11: 00007fd4b2aa6fff R12: ffffea000bc10034
kern  :warn  : [  327.624366] R13: 0000000000290000 R14: ffff8881267fec40 R15: 0000000000000000
kern  :warn  : [  327.625460] FS:  00007fd4b2929740(0000) GS:ffff889f8ae80000(0000) knlGS:0000000000000000
kern  :warn  : [  327.626636] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kern  :warn  : [  327.627622] CR2: 00007f517b030000 CR3: 0000002081e22001 CR4: 00000000003706f0
kern  :warn  : [  327.628727] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kern  :warn  : [  327.629832] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kern  :warn  : [  327.631001] Call Trace:
kern  :warn  : [  327.631743]  <TASK>
kern :warn : [  327.632445] ? __warn (kernel/panic.c:693) 
kern :warn : [  327.633243] ? try_grab_page (mm/gup.c:229 (discriminator 1)) 
kern :warn : [  327.634128] ? report_bug (lib/bug.c:180 lib/bug.c:219) 
kern :warn : [  327.634985] ? handle_bug (arch/x86/kernel/traps.c:239 (discriminator 1)) 
kern :warn : [  327.635835] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) 
kern :warn : [  327.636695] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) 
kern :warn : [  327.637587] ? try_grab_page (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/atomic/atomic-instrumented.h:33 include/linux/page_ref.h:67 include/linux/page_ref.h:89 mm/gup.c:229) 
kern :warn : [  327.638509] ? try_grab_page (mm/gup.c:229 (discriminator 1)) 
kern :warn : [  327.639418] ? try_grab_page (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/atomic/atomic-instrumented.h:33 include/linux/page_ref.h:67 include/linux/page_ref.h:89 mm/gup.c:229) 
kern :warn : [  327.640285] follow_huge_pmd (mm/gup.c:809) 
kern :warn : [  327.641162] follow_pmd_mask+0x3f8/0x7e0 
kern :warn : [  327.642090] ? __pfx_follow_pmd_mask+0x10/0x10 
kern :warn : [  327.643125] ? find_vma (mm/mmap.c:1944) 
kern :warn : [  327.643948] ? __pfx_find_vma (mm/mmap.c:1944) 
kern :warn : [  327.644804] follow_page_mask (mm/gup.c:1116 mm/gup.c:1162) 
kern :warn : [  327.645674] __get_user_pages (mm/gup.c:1588 (discriminator 1)) 
kern :warn : [  327.646601] ? __pfx___get_user_pages (mm/gup.c:1522) 
kern :warn : [  327.647533] ? down_read_killable (arch/x86/include/asm/atomic64_64.h:20 include/linux/atomic/atomic-arch-fallback.h:2629 include/linux/atomic/atomic-long.h:79 include/linux/atomic/atomic-instrumented.h:3224 kernel/locking/rwsem.c:176 kernel/locking/rwsem.c:181 kernel/locking/rwsem.c:249 kernel/locking/rwsem.c:241 kernel/locking/rwsem.c:1249 kernel/locking/rwsem.c:1273 kernel/locking/rwsem.c:1551) 
kern :warn : [  327.648428] ? __pfx_down_read_killable (kernel/locking/rwsem.c:1547) 
kern :warn : [  327.649345] __gup_longterm_locked (mm/gup.c:1859 mm/gup.c:2556) 
kern :warn : [  327.650237] gup_fast_fallback (mm/gup.c:3476) 
kern :warn : [  327.651111] ? __pfx__raw_spin_lock_irqsave (kernel/locking/spinlock.c:161) 
kern :warn : [  327.652030] ? __pfx_gup_fast_fallback (mm/gup.c:3439) 
kern :warn : [  327.652890] ? __link_object (include/linux/rculist.h:79 (discriminator 1) include/linux/rculist.h:128 (discriminator 1) mm/kmemleak.c:728 (discriminator 1)) 
kern :warn : [  327.653674] iov_iter_extract_pages (lib/iov_iter.c:1584 (discriminator 1) lib/iov_iter.c:1646 (discriminator 1)) 
kern :warn : [  327.654501] __bio_iov_iter_get_pages (block/bio.c:1353) 
kern :warn : [  327.655382] ? __pfx___bio_iov_iter_get_pages (block/bio.c:1324) 
kern :warn : [  327.656273] ? bio_init (arch/x86/include/asm/atomic.h:28 include/linux/atomic/atomic-arch-fallback.h:503 include/linux/atomic/atomic-instrumented.h:68 block/bio.c:279) 
kern :warn : [  327.656980] ? bio_alloc_bioset (block/bio.c:578) 
kern :warn : [  327.657721] bio_iov_iter_get_pages (block/bio.c:1446 (discriminator 3)) 
kern :warn : [  327.658474] iomap_dio_bio_iter (fs/iomap/direct-io.c:388) 
kern :warn : [  327.659248] __iomap_dio_rw (fs/iomap/direct-io.c:501 fs/iomap/direct-io.c:660) 
kern :warn : [  327.659982] ? __pfx___iomap_dio_rw (fs/iomap/direct-io.c:544) 
kern :warn : [  327.660729] ? __pfx_ext4_dio_write_checks (fs/ext4/file.c:424) 
kern :warn : [  327.661518] ? __pfx_ext4_dio_write_end_io (fs/ext4/file.c:376) 
kern :warn : [  327.662299] ? iomap_dio_complete (fs/iomap/direct-io.c:133) 
kern :warn : [  327.663052] iomap_dio_rw (fs/iomap/direct-io.c:749) 
kern :warn : [  327.663723] ext4_dio_write_iter (fs/ext4/file.c:577) 
kern :warn : [  327.664439] ? __pfx_ext4_dio_write_iter (fs/ext4/file.c:499) 
kern :warn : [  327.665185] ? folio_unlock (arch/x86/include/asm/bitops.h:101 include/asm-generic/bitops/instrumented-lock.h:80 include/linux/page-flags.h:762 mm/filemap.c:1508) 
kern :warn : [  327.665838] ? do_wp_page (include/linux/vmstat.h:71 (discriminator 1) mm/memory.c:3193 (discriminator 1) mm/memory.c:3663 (discriminator 1)) 
kern :warn : [  327.666494] ? __pfx___might_resched (kernel/sched/core.c:10151) 
kern :warn : [  327.667254] vfs_write (fs/read_write.c:497 fs/read_write.c:590) 
kern :warn : [  327.667915] ? __pfx___might_resched (kernel/sched/core.c:10151) 
kern :warn : [  327.668657] ? __pfx_vfs_write (fs/read_write.c:571) 
kern :warn : [  327.669342] ? __pfx___might_resched (kernel/sched/core.c:10151) 
kern :warn : [  327.670062] ? __pfx_put_timespec64 (kernel/time/time.c:904) 
kern :warn : [  327.670774] ksys_write (fs/read_write.c:643) 
kern :warn : [  327.671409] ? __pfx_ksys_write (fs/read_write.c:633) 
kern :warn : [  327.672129] ? __pfx___x64_sys_clock_gettime (kernel/time/posix-timers.c:1132) 
kern :warn : [  327.672939] ? fpregs_restore_userregs (arch/x86/include/asm/bitops.h:75 include/asm-generic/bitops/instrumented-atomic.h:42 include/linux/thread_info.h:94 arch/x86/kernel/fpu/context.h:79) 
kern :warn : [  327.673697] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) 
kern :warn : [  327.674363] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) 
kern  :warn  : [  327.675145] RIP: 0033:0x7fd4b2a24240
kern :warn : [ 327.675803] Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
All code
========
   0:	40 00 48 8b          	rex add %cl,-0x75(%rax)
   4:	15 c1 9b 0d 00       	adc    $0xd9bc1,%eax
   9:	f7 d8                	neg    %eax
   b:	64 89 02             	mov    %eax,%fs:(%rdx)
   e:	48 c7 c0 ff ff ff ff 	mov    $0xffffffffffffffff,%rax
  15:	eb b7                	jmp    0xffffffffffffffce
  17:	0f 1f 00             	nopl   (%rax)
  1a:	80 3d a1 23 0e 00 00 	cmpb   $0x0,0xe23a1(%rip)        # 0xe23c2
  21:	74 17                	je     0x3a
  23:	b8 01 00 00 00       	mov    $0x1,%eax
  28:	0f 05                	syscall
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 58                	ja     0x8a
  32:	c3                   	ret
  33:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  3a:	48 83 ec 28          	sub    $0x28,%rsp
  3e:	48                   	rex.W
  3f:	89                   	.byte 0x89

Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 58                	ja     0x60
   8:	c3                   	ret
   9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  10:	48 83 ec 28          	sub    $0x28,%rsp
  14:	48                   	rex.W
  15:	89                   	.byte 0x89


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240708/202407081641.6a640f9e-oliver.sang@intel.com
diff mbox series

Patch

diff --git a/block/bio.c b/block/bio.c
index 05d624f016f0..32c9c6d80384 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1243,8 +1243,8 @@  void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
 	bio_set_flag(bio, BIO_CLONED);
 }
 
-static int bio_iov_add_page(struct bio *bio, struct page *page,
-		unsigned int len, unsigned int offset)
+static int bio_iov_add_folio(struct bio *bio, struct folio *folio, size_t len,
+			     size_t offset)
 {
 	bool same_page = false;
 
@@ -1253,30 +1253,61 @@  static int bio_iov_add_page(struct bio *bio, struct page *page,
 
 	if (bio->bi_vcnt > 0 &&
 	    bvec_try_merge_page(&bio->bi_io_vec[bio->bi_vcnt - 1],
-				page, len, offset, &same_page)) {
+				folio_page(folio, 0), len, offset,
+				&same_page)) {
 		bio->bi_iter.bi_size += len;
 		if (same_page)
-			bio_release_page(bio, page);
+			bio_release_page(bio, folio_page(folio, 0));
 		return 0;
 	}
-	__bio_add_page(bio, page, len, offset);
+	bio_add_folio_nofail(bio, folio, len, offset);
 	return 0;
 }
 
-static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
-		unsigned int len, unsigned int offset)
+static int bio_iov_add_zone_append_folio(struct bio *bio, struct folio *folio,
+					 size_t len, size_t offset)
 {
 	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 	bool same_page = false;
 
-	if (bio_add_hw_page(q, bio, page, len, offset,
+	if (bio_add_hw_folio(q, bio, folio, len, offset,
 			queue_max_zone_append_sectors(q), &same_page) != len)
 		return -EINVAL;
 	if (same_page)
-		bio_release_page(bio, page);
+		bio_release_page(bio, folio_page(folio, 0));
 	return 0;
 }
 
+static unsigned int get_contig_folio_len(unsigned int *num_pages,
+					 struct page **pages, unsigned int i,
+					 struct folio *folio, size_t left,
+					 size_t offset)
+{
+	size_t bytes = left;
+	size_t contig_sz = min_t(size_t, PAGE_SIZE - offset, bytes);
+	unsigned int j;
+
+	/*
+	 * We might COW a single page in the middle of
+	 * a large folio, so we have to check that all
+	 * pages belong to the same folio.
+	 */
+	bytes -= contig_sz;
+	for (j = i + 1; j < i + *num_pages; j++) {
+		size_t next = min_t(size_t, PAGE_SIZE, bytes);
+
+		if (page_folio(pages[j]) != folio ||
+		    pages[j] != pages[j - 1] + 1) {
+			break;
+		}
+		contig_sz += next;
+		bytes -= next;
+	}
+	*num_pages = j - i;
+
+	return contig_sz;
+}
+
 #define PAGE_PTRS_PER_BVEC     (sizeof(struct bio_vec) / sizeof(struct page *))
 
 /**
@@ -1296,9 +1327,9 @@  static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
 	struct page **pages = (struct page **)bv;
-	ssize_t size, left;
-	unsigned len, i = 0;
-	size_t offset;
+	ssize_t size;
+	unsigned int i = 0, num_pages;
+	size_t offset, folio_offset, left, len;
 	int ret = 0;
 
 	/*
@@ -1340,15 +1371,29 @@  static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 
 	for (left = size, i = 0; left > 0; left -= len, i++) {
 		struct page *page = pages[i];
+		struct folio *folio = page_folio(page);
+
+		folio_offset = ((size_t)folio_page_idx(folio, page) <<
+				PAGE_SHIFT) + offset;
+
+		len = min_t(size_t, (folio_size(folio) - folio_offset), left);
+
+		num_pages = DIV_ROUND_UP(offset + len, PAGE_SIZE);
+
+		if (num_pages > 1)
+			len = get_contig_folio_len(&num_pages, pages, i,
+						   folio, left, offset);
 
-		len = min_t(size_t, PAGE_SIZE - offset, left);
 		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
-			ret = bio_iov_add_zone_append_page(bio, page, len,
-					offset);
+			ret = bio_iov_add_zone_append_folio(bio, folio, len,
+					folio_offset);
 			if (ret)
 				break;
 		} else
-			bio_iov_add_page(bio, page, len, offset);
+			bio_iov_add_folio(bio, folio, len, folio_offset);
+
+		/* Skip the pages which got added */
+		i = i + (num_pages - 1);
 
 		offset = 0;
 	}