diff mbox series

[v2] mm: allow read-ahead with IOCB_NOWAIT set

Message ID 20240820022639.89562-1-laoar.shao@gmail.com (mailing list archive)
State New
Headers show
Series [v2] mm: allow read-ahead with IOCB_NOWAIT set | expand

Commit Message

Yafang Shao Aug. 20, 2024, 2:26 a.m. UTC
Readahead support for IOCB_NOWAIT was introduced in commit 2e85abf053b9
("mm: allow read-ahead with IOCB_NOWAIT set"). However, this implementation
broke the semantics of IOCB_NOWAIT by potentially causing it to wait on I/O
during memory reclamation. This behavior was later modified in commit
efa8480a8316 ("fs: RWF_NOWAIT should imply IOCB_NOIO").

To resolve the blocking issue during memory reclamation, we can
use memalloc_noio_{save,restore} to ensure non-blocking behavior. This
change restores the original functionality, allowing preadv2(IOCB_NOWAIT)
to trigger readahead if the file content is not present in the page cache.

While this process may trigger direct memory reclamation, the __GFP_NORETRY
flag is set in the readahead GFP flags, ensuring it won't block.

A use case for this change is when we want to trigger readahead in the
preadv2(2) syscall if the file cache is absent, but without waiting for
certain filesystem locks, like xfs_ilock. A simple example is as follows:

retry:
    if (preadv2(fd, iovec, cnt, offset, RWF_NOWAIT) < 0) {
        do_other_work();
        goto retry;
    }

Link: https://lore.gnuweeb.org/io-uring/20200624164127.GP21350@casper.infradead.org/
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
---
 include/linux/fs.h | 1 -
 mm/filemap.c       | 6 ++++++
 2 files changed, 6 insertions(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd34b5755c0b..ced74b1b350d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3455,7 +3455,6 @@  static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags,
 	if (flags & RWF_NOWAIT) {
 		if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
 			return -EOPNOTSUPP;
-		kiocb_flags |= IOCB_NOIO;
 	}
 	if (flags & RWF_ATOMIC) {
 		if (rw_type != WRITE)
diff --git a/mm/filemap.c b/mm/filemap.c
index 657bcd887fdb..fe325c2ea626 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -46,6 +46,7 @@ 
 #include <linux/pipe_fs_i.h>
 #include <linux/splice.h>
 #include <linux/rcupdate_wait.h>
+#include <linux/sched/mm.h>
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 #include "internal.h"
@@ -2514,6 +2515,7 @@  static int filemap_get_pages(struct kiocb *iocb, size_t count,
 	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
 	pgoff_t last_index;
 	struct folio *folio;
+	unsigned int flags;
 	int err = 0;
 
 	/* "last_index" is the index of the page beyond the end of the read */
@@ -2526,8 +2528,12 @@  static int filemap_get_pages(struct kiocb *iocb, size_t count,
 	if (!folio_batch_count(fbatch)) {
 		if (iocb->ki_flags & IOCB_NOIO)
 			return -EAGAIN;
+		if (iocb->ki_flags & IOCB_NOWAIT)
+			flags = memalloc_noio_save();
 		page_cache_sync_readahead(mapping, ra, filp, index,
 				last_index - index);
+		if (iocb->ki_flags & IOCB_NOWAIT)
+			memalloc_noio_restore(flags);
 		filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
 	}
 	if (!folio_batch_count(fbatch)) {