diff mbox series

[v2] iomap: avoid redundant fault_in_iov_iter_readable() judgement when use larger chunks

Message ID 20240517201407.2144528-1-xu.yang_2@nxp.com (mailing list archive)
State New, archived
Headers show
Series [v2] iomap: avoid redundant fault_in_iov_iter_readable() judgement when use larger chunks | expand

Commit Message

Xu Yang May 17, 2024, 8:14 p.m. UTC
Since commit (5d8edfb900d5 "iomap: Copy larger chunks from userspace"),
iomap will try to copy in larger chunks than PAGE_SIZE. However, if the
mapping doesn't support large folio, only one page of maximum 4KB will
be created and 4KB data will be writen to pagecache each time. Then,
next 4KB will be handled in next iteration.

If chunk is 2MB, total 512 pages need to be handled finally. During this
period, fault_in_iov_iter_readable() is called to check iov_iter readable
validity. Since only 4KB will be handled each time, below address space
will be checked over and over again:

start         	end
-
buf,    	buf+2MB
buf+4KB, 	buf+2MB
buf+8KB, 	buf+2MB
...
buf+2044KB 	buf+2MB

Obviously the checking size is wrong since only 4KB will be handled each
time. So this will get a correct bytes before fault_in_iov_iter_readable()
to let iomap work well in non-large folio case.

Fixes: 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
Cc: stable@vger.kernel.org
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>

---
Change in v2:
 - fix address space description in message
---
 fs/iomap/buffered-io.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Matthew Wilcox May 17, 2024, 2:38 p.m. UTC | #1
On Sat, May 18, 2024 at 04:14:07AM +0800, Xu Yang wrote:
> Since commit (5d8edfb900d5 "iomap: Copy larger chunks from userspace"),
> iomap will try to copy in larger chunks than PAGE_SIZE. However, if the
> mapping doesn't support large folio, only one page of maximum 4KB will
> be created and 4KB data will be writen to pagecache each time. Then,
> next 4KB will be handled in next iteration.
> 
> If chunk is 2MB, total 512 pages need to be handled finally. During this
> period, fault_in_iov_iter_readable() is called to check iov_iter readable
> validity. Since only 4KB will be handled each time, below address space
> will be checked over and over again:
> 
> start         	end
> -
> buf,    	buf+2MB
> buf+4KB, 	buf+2MB
> buf+8KB, 	buf+2MB
> ...
> buf+2044KB 	buf+2MB
> 
> Obviously the checking size is wrong since only 4KB will be handled each
> time. So this will get a correct bytes before fault_in_iov_iter_readable()
> to let iomap work well in non-large folio case.

You haven't talked at all about why this is important.  Is it a
performance problem?  If so, numbers please.  Particularly if you want
this backported to stable.

I alos think this is the wrong way to solve the problem.  We should
instead adjust 'chunk'.  Given everything else going on, I think we
want:

(in filemap.h):

static inline size_t mapping_max_folio_size(struct address_space *mapping)
{
	if (mapping_large_folio_support(mapping))
		return PAGE_SIZE << MAX_PAGECACHE_ORDER;
	return PAGE_SIZE;
}

and then in iomap,

-	size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
+	size_t chunk = mapping_max_folio_size(mapping);

(and move the initialisation of 'mapping' to before this)
Xu Yang May 20, 2024, 2:28 a.m. UTC | #2
On Fri, May 17, 2024 at 03:38:06PM +0100, Matthew Wilcox wrote:
> On Sat, May 18, 2024 at 04:14:07AM +0800, Xu Yang wrote:
> > Since commit (5d8edfb900d5 "iomap: Copy larger chunks from userspace"),
> > iomap will try to copy in larger chunks than PAGE_SIZE. However, if the
> > mapping doesn't support large folio, only one page of maximum 4KB will
> > be created and 4KB data will be writen to pagecache each time. Then,
> > next 4KB will be handled in next iteration.
> > 
> > If chunk is 2MB, total 512 pages need to be handled finally. During this
> > period, fault_in_iov_iter_readable() is called to check iov_iter readable
> > validity. Since only 4KB will be handled each time, below address space
> > will be checked over and over again:
> > 
> > start         	end
> > -
> > buf,    	buf+2MB
> > buf+4KB, 	buf+2MB
> > buf+8KB, 	buf+2MB
> > ...
> > buf+2044KB 	buf+2MB
> > 
> > Obviously the checking size is wrong since only 4KB will be handled each
> > time. So this will get a correct bytes before fault_in_iov_iter_readable()
> > to let iomap work well in non-large folio case.
> 
> You haven't talked at all about why this is important.  Is it a
> performance problem?  If so, numbers please.  Particularly if you want
> this backported to stable.

Yes, it's indeed a performance problem. I meet the issue on all my ARM64
devices. Simply running the 'dd' command with varying block sizes will
result in different write speeds.

I totally write 4GB data to sda for each test, the results as below:

 - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (334 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (278 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (204 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (170 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (150 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (139 MB/s)

> 
> I alos think this is the wrong way to solve the problem.  We should
> instead adjust 'chunk'.  Given everything else going on, I think we
> want:
> 
> (in filemap.h):
> 
> static inline size_t mapping_max_folio_size(struct address_space *mapping)
> {
> 	if (mapping_large_folio_support(mapping))
> 		return PAGE_SIZE << MAX_PAGECACHE_ORDER;
> 	return PAGE_SIZE;
> }
> 
> and then in iomap,
> 
> -	size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
> +	size_t chunk = mapping_max_folio_size(mapping);
> 
> (and move the initialisation of 'mapping' to before this)
> 

Thanks for your suggestions. This should be a better way to improve the
logic. With above changes, I can see stable and high write speed now:

 - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (339 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (330 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (332 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (333 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (333 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (333 MB/s)

I will send v3 later.

Thanks,
Xu Yang
diff mbox series

Patch

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 41c8f0c68ef5..51ca31cd94ae 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -925,6 +925,9 @@  static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
 		if (bytes > length)
 			bytes = length;
 
+		if (!mapping_large_folio_support(iter->inode->i_mapping))
+			bytes = min_t(size_t, bytes, PAGE_SIZE - offset_in_page(pos));
+
 		/*
 		 * Bring in the user page that we'll copy from _first_.
 		 * Otherwise there's a nasty deadlock on copying from the