[RFCv3,7/7] iomap: Optimize data access patterns for filesystems with indirect mappings

This patch optimizes the data access patterns for filesystems with
indirect block mapping by implementing BH_Boundary handling within
iomap.

Currently the bios for reads within iomap are only submitted at
2 places -
1. If we cannot merge the new req. with previous bio, only then we
   submit the previous bio.
2. Submit the bio at the end of the entire read processing.

This means for filesystems with indirect block mapping, we call into
->iomap_begin() again w/o submitting the previous bios. That causes
unoptimized data access patterns for blocks which are of BH_Boundary type.

For e.g. consider the file mapping
logical block(4k) 		physical block(4k)
0-11 				1000-1011
12-15 				1013-1016

In above physical block 1012 is an indirect metadata block which has the
mapping information for next set of indirect blocks (1013-1016).
With iomap buffered reads for reading 1st 16 logical blocks of a file
(0-15), we get below I/O pattern
	- submit a bio for 1012
	- complete the bio for 1012
	- submit a bio for 1000-1011
	- submit a bio for 1013-1016
	- complete the bios for 1000-1011
	- complete the bios for 1013-1016

So as we can see, above is an non-optimal I/O access pattern and also we
get 3 bio completions instead of 2.

This patch changes this behavior by doing submit_bio() if there are any
bios already processed, before calling ->iomap_begin() again.
That means if there are any blocks which are already processed, gets
submitted for I/O earlier and then within ->iomap_begin(), if we get a
request for reading an indirect metadata block, then block layer can merge
those bios with the already submitted read request to reduce the no. of bio
completions.

Now, for bs < ps or for large folios, this patch requires proper handling
of "ifs->read_bytes_pending". In that we first set ifs->read_bytes_pending
to folio_size. Then handle all the cases where we need to subtract
ifs->read_bytes_pending either during the submission side
(if we don't need to submit any I/O - for e.g. for uptodate sub blocks),
or during an I/O error, or at the completion of an I/O.

Here is the ftrace output of iomap and block layer with ext2 iomap
conversion patches -

root# filefrag -b512 -v /mnt1/test/f1
Filesystem type is: ef53
Filesystem cylinder groups approximately 32
File size of /mnt1/test/f1 is 65536 (128 blocks of 512 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..      95:      98304..     98399:     96:             merged
   1:       96..     127:      98408..     98439:     32:      98400: last,merged,eof
/mnt1/test/f1: 2 extents found

root# #This reads 4 blocks starting from lblk 10, 11, 12, 13
root# xfs_io -c "pread -b$((4*4096)) $((10*4096)) $((4*4096))" /mnt1/test/f1

w/o this patch - (indirect block is submitted before and does not get merged, resulting in 3 bios completion)
      xfs_io-907     [002] .....   185.608791: iomap_readahead: dev 8:16 ino 0xc nr_pages 4
      xfs_io-907     [002] .....   185.608819: iomap_iter: dev 8:16 ino 0xc pos 0xa000 length 0x4000 processed 0 flags  (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x9d/0x2c0
      xfs_io-907     [002] .....   185.608823: iomap_iter_dstmap: dev 8:16 ino 0xc bdev 8:16 addr 0x300a000 offset 0xa000 length 0x2000 type MAPPED flags MERGED
      xfs_io-907     [002] .....   185.608831: iomap_iter: dev 8:16 ino 0xc pos 0xa000 length 0x2000 processed 8192 flags  (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x1e1/0x2c0
      xfs_io-907     [002] .....   185.608859: block_bio_queue: 8,16 R 98400 + 8 [xfs_io]
      xfs_io-907     [002] .....   185.608865: block_getrq: 8,16 R 98400 + 8 [xfs_io]
      xfs_io-907     [002] .....   185.608867: block_io_start: 8,16 R 4096 () 98400 + 8 [xfs_io]
      xfs_io-907     [002] .....   185.608869: block_plug: [xfs_io]
      xfs_io-907     [002] .....   185.608872: block_unplug: [xfs_io] 1
      xfs_io-907     [002] .....   185.608874: block_rq_insert: 8,16 R 4096 () 98400 + 8 [xfs_io]
kworker/2:1H-198     [002] .....   185.608908: block_rq_issue: 8,16 R 4096 () 98400 + 8 [kworker/2:1H]
      <idle>-0       [002] d.h2.   185.609579: block_rq_complete: 8,16 R () 98400 + 8 [0]
      <idle>-0       [002] dNh2.   185.609631: block_io_done: 8,16 R 0 () 98400 + 0 [swapper/2]
      xfs_io-907     [002] .....   185.609694: iomap_iter_dstmap: dev 8:16 ino 0xc bdev 8:16 addr 0x300d000 offset 0xc000 length 0x2000 type MAPPED flags MERGED
      xfs_io-907     [002] .....   185.609704: block_bio_queue: 8,16 RA 98384 + 16 [xfs_io]
      xfs_io-907     [002] .....   185.609718: block_getrq: 8,16 RA 98384 + 16 [xfs_io]
      xfs_io-907     [002] .....   185.609721: block_io_start: 8,16 RA 8192 () 98384 + 16 [xfs_io]
      xfs_io-907     [002] .....   185.609726: block_plug: [xfs_io]
      xfs_io-907     [002] .....   185.609735: iomap_iter: dev 8:16 ino 0xc pos 0xc000 length 0x2000 processed 8192 flags  (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x1e1/0x2c0
      xfs_io-907     [002] .....   185.609736: block_bio_queue: 8,16 RA 98408 + 16 [xfs_io]
      xfs_io-907     [002] .....   185.609740: block_getrq: 8,16 RA 98408 + 16 [xfs_io]
      xfs_io-907     [002] .....   185.609741: block_io_start: 8,16 RA 8192 () 98408 + 16 [xfs_io]
      xfs_io-907     [002] .....   185.609756: block_rq_issue: 8,16 RA 8192 () 98408 + 16 [xfs_io]
      xfs_io-907     [002] .....   185.609769: block_rq_issue: 8,16 RA 8192 () 98384 + 16 [xfs_io]
      <idle>-0       [002] d.H2.   185.610280: block_rq_complete: 8,16 RA () 98408 + 16 [0]
      <idle>-0       [002] d.H2.   185.610289: block_io_done: 8,16 RA 0 () 98408 + 0 [swapper/2]
      <idle>-0       [002] d.H2.   185.610292: block_rq_complete: 8,16 RA () 98384 + 16 [0]
      <idle>-0       [002] dNH2.   185.610301: block_io_done: 8,16 RA 0 () 98384 + 0 [swapper/2]

v/s with the patch - (optimzed I/O access pattern and bio gets merged resulting in only 2 bios completion)
      xfs_io-944     [005] .....    99.926187: iomap_readahead: dev 8:16 ino 0xc nr_pages 4
      xfs_io-944     [005] .....    99.926208: iomap_iter: dev 8:16 ino 0xc pos 0xa000 length 0x4000 processed 0 flags  (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x9d/0x2c0
      xfs_io-944     [005] .....    99.926211: iomap_iter_dstmap: dev 8:16 ino 0xc bdev 8:16 addr 0x300a000 offset 0xa000 length 0x2000 type MAPPED flags MERGED
      xfs_io-944     [005] .....    99.926222: block_bio_queue: 8,16 RA 98384 + 16 [xfs_io]
      xfs_io-944     [005] .....    99.926232: block_getrq: 8,16 RA 98384 + 16 [xfs_io]
      xfs_io-944     [005] .....    99.926233: block_io_start: 8,16 RA 8192 () 98384 + 16 [xfs_io]
      xfs_io-944     [005] .....    99.926234: block_plug: [xfs_io]
      xfs_io-944     [005] .....    99.926235: iomap_iter: dev 8:16 ino 0xc pos 0xa000 length 0x2000 processed 8192 flags  (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x1f9/0x2c0
      xfs_io-944     [005] .....    99.926261: block_bio_queue: 8,16 R 98400 + 8 [xfs_io]
      xfs_io-944     [005] .....    99.926266: block_bio_backmerge: 8,16 R 98400 + 8 [xfs_io]
      xfs_io-944     [005] .....    99.926271: block_unplug: [xfs_io] 1
      xfs_io-944     [005] .....    99.926272: block_rq_insert: 8,16 RA 12288 () 98384 + 24 [xfs_io]
kworker/5:1H-234     [005] .....    99.926314: block_rq_issue: 8,16 RA 12288 () 98384 + 24 [kworker/5:1H]
      <idle>-0       [005] d.h2.    99.926905: block_rq_complete: 8,16 RA () 98384 + 24 [0]
      <idle>-0       [005] dNh2.    99.926931: block_io_done: 8,16 RA 0 () 98384 + 0 [swapper/5]
      xfs_io-944     [005] .....    99.926971: iomap_iter_dstmap: dev 8:16 ino 0xc bdev 8:16 addr 0x300d000 offset 0xc000 length 0x2000 type MAPPED flags MERGED
      xfs_io-944     [005] .....    99.926981: block_bio_queue: 8,16 RA 98408 + 16 [xfs_io]
      xfs_io-944     [005] .....    99.926989: block_getrq: 8,16 RA 98408 + 16 [xfs_io]
      xfs_io-944     [005] .....    99.926989: block_io_start: 8,16 RA 8192 () 98408 + 16 [xfs_io]
      xfs_io-944     [005] .....    99.926991: block_plug: [xfs_io]
      xfs_io-944     [005] .....    99.926993: iomap_iter: dev 8:16 ino 0xc pos 0xc000 length 0x2000 processed 8192 flags  (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x1f9/0x2c0
      xfs_io-944     [005] .....    99.927001: block_rq_issue: 8,16 RA 8192 () 98408 + 16 [xfs_io]
      <idle>-0       [005] d.h2.    99.927397: block_rq_complete: 8,16 RA () 98408 + 16 [0]
      <idle>-0       [005] dNh2.    99.927414: block_io_done: 8,16 RA 0 () 98408 + 0 [swapper/5]

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
cc: Ojaswin Mujoo <ojaswin@linux.ibm.com>
---
 fs/iomap/buffered-io.c | 112 +++++++++++++++++++++++++++++++----------
 1 file changed, 85 insertions(+), 27 deletions(-)

--
2.44.0

Message ID	4e2752e99f55469c4eb5f2fe83e816d529110192.1714046808.git.ritesh.list@gmail.com (mailing list archive)
State	New, archived
Headers	show Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16FE6149C5E; Thu, 25 Apr 2024 13:29:29 +0000 (UTC) From: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> To: linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Matthew Wilcox <willy@infradead.org>, "Darrick J . Wong" <djwong@kernel.org>, Ojaswin Mujoo <ojaswin@linux.ibm.com>, Ritesh Harjani <ritesh.list@gmail.com>, Jan Kara <jack@suse.cz> Subject: [RFCv3 7/7] iomap: Optimize data access patterns for filesystems with indirect mappings Date: Thu, 25 Apr 2024 18:58:51 +0530 Message-ID: <4e2752e99f55469c4eb5f2fe83e816d529110192.1714046808.git.ritesh.list@gmail.com> In-Reply-To: <cover.1714046808.git.ritesh.list@gmail.com> References: <cover.1714046808.git.ritesh.list@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	ext2 iomap changes and iomap improvements \| expand [RFCv3,0/7] ext2 iomap changes and iomap improvements [RFCv3,1/7] ext2: Remove comment related to journal handle [RFCv3,2/7] ext2: Convert ext2 regular file buffered I/O to use iomap [RFCv3,3/7] ext2: Enable large folio support [RFCv3,4/7] ext2: Implement seq counter for validating cached iomap [RFCv3,5/7] iomap: Fix iomap_adjust_read_range for plen calculation [RFCv3,6/7] iomap: Optimize iomap_read_folio [RFCv3,7/7] iomap: Optimize data access patterns for filesystems with indirect mappings

[RFCv3,7/7] iomap: Optimize data access patterns for filesystems with indirect mappings

Commit Message

Comments

Patch