diff mbox series

[04/12] mm: add support for async page locking

Message ID 20200526195123.29053-5-axboe@kernel.dk (mailing list archive)
State New, archived
Headers show
Series Add support for async buffered reads | expand

Commit Message

Jens Axboe May 26, 2020, 7:51 p.m. UTC
Normally waiting for a page to become unlocked, or locking the page,
requires waiting for IO to complete. Add support for lock_page_async()
and wait_on_page_locked_async(), which are callback based instead. This
allows a caller to get notified when a page becomes unlocked, rather
than wait for it.

We add a new iocb field, ki_waitq, to pass in the necessary data for this
to happen. We can unionize this with ki_cookie, since that is only used
for polled IO. Polled IO can never co-exist with async callbacks, as it is
(by definition) polled completions. struct wait_page_key is made public,
and we define struct wait_page_async as the interface between the caller
and the core.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/fs.h      |  7 ++++++-
 include/linux/pagemap.h |  9 +++++++++
 mm/filemap.c            | 41 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 56 insertions(+), 1 deletion(-)

Comments

Johannes Weiner May 26, 2020, 9:59 p.m. UTC | #1
On Tue, May 26, 2020 at 01:51:15PM -0600, Jens Axboe wrote:
> Normally waiting for a page to become unlocked, or locking the page,
> requires waiting for IO to complete. Add support for lock_page_async()
> and wait_on_page_locked_async(), which are callback based instead. This

wait_on_page_locked_async() is actually in the next patch, requiring
some back and forth to review. I wonder if this and the next patch
could be merged to have the new API and callers introduced together?

> allows a caller to get notified when a page becomes unlocked, rather
> than wait for it.
> 
> We add a new iocb field, ki_waitq, to pass in the necessary data for this
> to happen. We can unionize this with ki_cookie, since that is only used
> for polled IO. Polled IO can never co-exist with async callbacks, as it is
> (by definition) polled completions. struct wait_page_key is made public,
> and we define struct wait_page_async as the interface between the caller
> and the core.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Jens Axboe May 26, 2020, 10:01 p.m. UTC | #2
On 5/26/20 3:59 PM, Johannes Weiner wrote:
> On Tue, May 26, 2020 at 01:51:15PM -0600, Jens Axboe wrote:
>> Normally waiting for a page to become unlocked, or locking the page,
>> requires waiting for IO to complete. Add support for lock_page_async()
>> and wait_on_page_locked_async(), which are callback based instead. This
> 
> wait_on_page_locked_async() is actually in the next patch, requiring
> some back and forth to review. I wonder if this and the next patch
> could be merged to have the new API and callers introduced together?

I'm fine with that, if that is preferable. Don't feel strongly about
that at all, just tried to do it as piecemeal as possible to make
it easier to review.
Johannes Weiner May 27, 2020, 4:02 p.m. UTC | #3
On Tue, May 26, 2020 at 04:01:07PM -0600, Jens Axboe wrote:
> On 5/26/20 3:59 PM, Johannes Weiner wrote:
> > On Tue, May 26, 2020 at 01:51:15PM -0600, Jens Axboe wrote:
> >> Normally waiting for a page to become unlocked, or locking the page,
> >> requires waiting for IO to complete. Add support for lock_page_async()
> >> and wait_on_page_locked_async(), which are callback based instead. This
> > 
> > wait_on_page_locked_async() is actually in the next patch, requiring
> > some back and forth to review. I wonder if this and the next patch
> > could be merged to have the new API and callers introduced together?
> 
> I'm fine with that, if that is preferable. Don't feel strongly about
> that at all, just tried to do it as piecemeal as possible to make
> it easier to review.

Not worth sending a new iteration over, IMO.
Matthew Wilcox (Oracle) June 1, 2020, 2:26 p.m. UTC | #4
On Tue, May 26, 2020 at 01:51:15PM -0600, Jens Axboe wrote:
> +static int __wait_on_page_locked_async(struct page *page,
> +				       struct wait_page_queue *wait, bool set)
> +{
> +	struct wait_queue_head *q = page_waitqueue(page);
> +	int ret = 0;
> +
> +	wait->page = page;
> +	wait->bit_nr = PG_locked;
> +
> +	spin_lock_irq(&q->lock);
> +	if (set)
> +		ret = !trylock_page(page);
> +	else
> +		ret = PageLocked(page);
> +	if (ret) {
> +		__add_wait_queue_entry_tail(q, &wait->wait);
> +		SetPageWaiters(page);
> +		if (set)
> +			ret = !trylock_page(page);
> +		else
> +			ret = PageLocked(page);

Between the callers and this function, we actually look at PG_lock three
times; once in the caller, then after taking the spinlock, then after
adding ourselves to the waitqueue.  I understand the first and third, but
is it really worth doing the second test?  It feels unlikely to succeed
and only saves us setting PageWaiters.
Jens Axboe June 1, 2020, 5:15 p.m. UTC | #5
On 6/1/20 8:26 AM, Matthew Wilcox wrote:
> On Tue, May 26, 2020 at 01:51:15PM -0600, Jens Axboe wrote:
>> +static int __wait_on_page_locked_async(struct page *page,
>> +				       struct wait_page_queue *wait, bool set)
>> +{
>> +	struct wait_queue_head *q = page_waitqueue(page);
>> +	int ret = 0;
>> +
>> +	wait->page = page;
>> +	wait->bit_nr = PG_locked;
>> +
>> +	spin_lock_irq(&q->lock);
>> +	if (set)
>> +		ret = !trylock_page(page);
>> +	else
>> +		ret = PageLocked(page);
>> +	if (ret) {
>> +		__add_wait_queue_entry_tail(q, &wait->wait);
>> +		SetPageWaiters(page);
>> +		if (set)
>> +			ret = !trylock_page(page);
>> +		else
>> +			ret = PageLocked(page);
> 
> Between the callers and this function, we actually look at PG_lock three
> times; once in the caller, then after taking the spinlock, then after
> adding ourselves to the waitqueue.  I understand the first and third, but
> is it really worth doing the second test?  It feels unlikely to succeed
> and only saves us setting PageWaiters.

That's probably true, and we can skip the 2nd one. I'll make the change.
diff mbox series

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index d3ebb49189df..ba1fff0e7bca 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -314,6 +314,8 @@  enum rw_hint {
 #define IOCB_SYNC		(1 << 5)
 #define IOCB_WRITE		(1 << 6)
 #define IOCB_NOWAIT		(1 << 7)
+/* iocb->ki_waitq is valid */
+#define IOCB_WAITQ		(1 << 8)
 
 struct kiocb {
 	struct file		*ki_filp;
@@ -327,7 +329,10 @@  struct kiocb {
 	int			ki_flags;
 	u16			ki_hint;
 	u16			ki_ioprio; /* See linux/ioprio.h */
-	unsigned int		ki_cookie; /* for ->iopoll */
+	union {
+		unsigned int		ki_cookie; /* for ->iopoll */
+		struct wait_page_queue	*ki_waitq; /* for async buffered IO */
+	};
 
 	randomized_struct_fields_end
 };
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 53d980f2208d..d3e63c9c61ae 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -495,6 +495,7 @@  static inline int wake_page_match(struct wait_page_queue *wait_page,
 
 extern void __lock_page(struct page *page);
 extern int __lock_page_killable(struct page *page);
+extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
 extern void unlock_page(struct page *page);
@@ -531,6 +532,14 @@  static inline int lock_page_killable(struct page *page)
 	return 0;
 }
 
+static inline int lock_page_async(struct page *page,
+				  struct wait_page_queue *wait)
+{
+	if (!trylock_page(page))
+		return __lock_page_async(page, wait);
+	return 0;
+}
+
 /*
  * lock_page_or_retry - Lock the page, unless this would block and the
  * caller indicated that it can handle a retry.
diff --git a/mm/filemap.c b/mm/filemap.c
index e891b5bee8fd..c746541b1d49 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1183,6 +1183,42 @@  int wait_on_page_bit_killable(struct page *page, int bit_nr)
 }
 EXPORT_SYMBOL(wait_on_page_bit_killable);
 
+static int __wait_on_page_locked_async(struct page *page,
+				       struct wait_page_queue *wait, bool set)
+{
+	struct wait_queue_head *q = page_waitqueue(page);
+	int ret = 0;
+
+	wait->page = page;
+	wait->bit_nr = PG_locked;
+
+	spin_lock_irq(&q->lock);
+	if (set)
+		ret = !trylock_page(page);
+	else
+		ret = PageLocked(page);
+	if (ret) {
+		__add_wait_queue_entry_tail(q, &wait->wait);
+		SetPageWaiters(page);
+		if (set)
+			ret = !trylock_page(page);
+		else
+			ret = PageLocked(page);
+		/*
+		 * If we were succesful now, we know we're still on the
+		 * waitqueue as we're still under the lock. This means it's
+		 * safe to remove and return success, we know the callback
+		 * isn't going to trigger.
+		 */
+		if (!ret)
+			__remove_wait_queue(q, &wait->wait);
+		else
+			ret = -EIOCBQUEUED;
+	}
+	spin_unlock_irq(&q->lock);
+	return ret;
+}
+
 /**
  * put_and_wait_on_page_locked - Drop a reference and wait for it to be unlocked
  * @page: The page to wait for.
@@ -1345,6 +1381,11 @@  int __lock_page_killable(struct page *__page)
 }
 EXPORT_SYMBOL_GPL(__lock_page_killable);
 
+int __lock_page_async(struct page *page, struct wait_page_queue *wait)
+{
+	return __wait_on_page_locked_async(page, wait, true);
+}
+
 /*
  * Return values:
  * 1 - page is locked; mmap_sem is still held.