diff mbox series

[RFC,1/7] mm/gup: introduce pin_user_page()

Message ID 20220225085025.3052894-2-jhubbard@nvidia.com (mailing list archive)
State New
Headers show
Series block, fs: convert Direct IO to FOLL_PIN | expand

Commit Message

John Hubbard Feb. 25, 2022, 8:50 a.m. UTC
pin_user_page() is an externally-usable version of try_grab_page(), but
with semantics that match get_page(), so that it can act as a drop-in
replacement for get_page(). Specifically, pin_user_page() has a void
return type.

pin_user_page() elevates a page's refcount is using FOLL_PIN rules. This
means that the caller must release the page via unpin_user_page().

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/mm.h |  1 +
 mm/gup.c           | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

Comments

David Hildenbrand Feb. 28, 2022, 1:27 p.m. UTC | #1
On 25.02.22 09:50, John Hubbard wrote:
> pin_user_page() is an externally-usable version of try_grab_page(), but
> with semantics that match get_page(), so that it can act as a drop-in
> replacement for get_page(). Specifically, pin_user_page() has a void
> return type.
> 
> pin_user_page() elevates a page's refcount is using FOLL_PIN rules. This
> means that the caller must release the page via unpin_user_page().
> 
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  include/linux/mm.h |  1 +
>  mm/gup.c           | 34 ++++++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 929488a47181..bb51f5487aef 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1914,6 +1914,7 @@ long pin_user_pages_remote(struct mm_struct *mm,
>  long get_user_pages(unsigned long start, unsigned long nr_pages,
>  			    unsigned int gup_flags, struct page **pages,
>  			    struct vm_area_struct **vmas);
> +void pin_user_page(struct page *page);
>  long pin_user_pages(unsigned long start, unsigned long nr_pages,
>  		    unsigned int gup_flags, struct page **pages,
>  		    struct vm_area_struct **vmas);
> diff --git a/mm/gup.c b/mm/gup.c
> index 5c3f6ede17eb..44446241c3a9 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -3034,6 +3034,40 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages,
>  }
>  EXPORT_SYMBOL(pin_user_pages);
>  
> +/**
> + * pin_user_page() - apply a FOLL_PIN reference to a page ()
> + *
> + * @page: the page to be pinned.
> + *
> + * Similar to get_user_pages(), in that the page's refcount is elevated using
> + * FOLL_PIN rules.
> + *
> + * IMPORTANT: That means that the caller must release the page via
> + * unpin_user_page().
> + *
> + */
> +void pin_user_page(struct page *page)
> +{
> +	struct folio *folio = page_folio(page);
> +
> +	WARN_ON_ONCE(folio_ref_count(folio) <= 0);
> +
> +	/*
> +	 * Similar to try_grab_page(): be sure to *also*
> +	 * increment the normal page refcount field at least once,
> +	 * so that the page really is pinned.
> +	 */
> +	if (folio_test_large(folio)) {
> +		folio_ref_add(folio, 1);
> +		atomic_add(1, folio_pincount_ptr(folio));
> +	} else {
> +		folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
> +	}
> +
> +	node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
> +}
> +EXPORT_SYMBOL(pin_user_page);
> +
>  /*
>   * pin_user_pages_unlocked() is the FOLL_PIN variant of
>   * get_user_pages_unlocked(). Behavior is the same, except that this one sets

I assume that function will only get called on a page that has been
obtained by a previous pin_user_pages_fast(), correct?
John Hubbard Feb. 28, 2022, 9:14 p.m. UTC | #2
On 2/28/22 05:27, David Hildenbrand wrote:
...
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 5c3f6ede17eb..44446241c3a9 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -3034,6 +3034,40 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages,
>>   }
>>   EXPORT_SYMBOL(pin_user_pages);
>>   
>> +/**
>> + * pin_user_page() - apply a FOLL_PIN reference to a page ()
>> + *
>> + * @page: the page to be pinned.
>> + *
>> + * Similar to get_user_pages(), in that the page's refcount is elevated using
>> + * FOLL_PIN rules.
>> + *
>> + * IMPORTANT: That means that the caller must release the page via
>> + * unpin_user_page().
>> + *
>> + */
>> +void pin_user_page(struct page *page)
>> +{
>> +	struct folio *folio = page_folio(page);
>> +
>> +	WARN_ON_ONCE(folio_ref_count(folio) <= 0);
>> +
>> +	/*
>> +	 * Similar to try_grab_page(): be sure to *also*
>> +	 * increment the normal page refcount field at least once,
>> +	 * so that the page really is pinned.
>> +	 */
>> +	if (folio_test_large(folio)) {
>> +		folio_ref_add(folio, 1);
>> +		atomic_add(1, folio_pincount_ptr(folio));
>> +	} else {
>> +		folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
>> +	}
>> +
>> +	node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
>> +}
>> +EXPORT_SYMBOL(pin_user_page);
>> +
>>   /*
>>    * pin_user_pages_unlocked() is the FOLL_PIN variant of
>>    * get_user_pages_unlocked(). Behavior is the same, except that this one sets
> 
> I assume that function will only get called on a page that has been
> obtained by a previous pin_user_pages_fast(), correct?
> 

Well, no. This is meant to be used in place of get_page(), for code that
knows that the pages will be released via unpin_user_page(). So there is
no special prerequisite there.


thanks,
David Hildenbrand March 1, 2022, 8:11 a.m. UTC | #3
On 28.02.22 22:14, John Hubbard wrote:
> On 2/28/22 05:27, David Hildenbrand wrote:
> ...
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 5c3f6ede17eb..44446241c3a9 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -3034,6 +3034,40 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages,
>>>   }
>>>   EXPORT_SYMBOL(pin_user_pages);
>>>   
>>> +/**
>>> + * pin_user_page() - apply a FOLL_PIN reference to a page ()
>>> + *
>>> + * @page: the page to be pinned.
>>> + *
>>> + * Similar to get_user_pages(), in that the page's refcount is elevated using
>>> + * FOLL_PIN rules.
>>> + *
>>> + * IMPORTANT: That means that the caller must release the page via
>>> + * unpin_user_page().
>>> + *
>>> + */
>>> +void pin_user_page(struct page *page)
>>> +{
>>> +	struct folio *folio = page_folio(page);
>>> +
>>> +	WARN_ON_ONCE(folio_ref_count(folio) <= 0);
>>> +
>>> +	/*
>>> +	 * Similar to try_grab_page(): be sure to *also*
>>> +	 * increment the normal page refcount field at least once,
>>> +	 * so that the page really is pinned.
>>> +	 */
>>> +	if (folio_test_large(folio)) {
>>> +		folio_ref_add(folio, 1);
>>> +		atomic_add(1, folio_pincount_ptr(folio));
>>> +	} else {
>>> +		folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
>>> +	}
>>> +
>>> +	node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
>>> +}
>>> +EXPORT_SYMBOL(pin_user_page);
>>> +
>>>   /*
>>>    * pin_user_pages_unlocked() is the FOLL_PIN variant of
>>>    * get_user_pages_unlocked(). Behavior is the same, except that this one sets
>>
>> I assume that function will only get called on a page that has been
>> obtained by a previous pin_user_pages_fast(), correct?
>>
> 
> Well, no. This is meant to be used in place of get_page(), for code that
> knows that the pages will be released via unpin_user_page(). So there is
> no special prerequisite there.

That might be problematic and possibly the wrong approach, depending on
*what* we're actually pinning and what we're intending to do with that.

My assumption would have been that this interface is to duplicate a pin
on a page, which would be perfectly fine, because the page actually saw
a FOLL_PIN previously.

We're taking a pin on a page that we haven't obtained via FOLL_PIN if I
understand correctly. Which raises the questions, how do we end up with
the pages here, and what are we doing to do with them (use them like we
obtained them via FOLL_PIN?)?


If it's converting FOLL_GET -> FOLL_PIN manually, then we're bypassing
FOLL_PIN special handling in GUP code:

page = get_user_pages(FOLL_GET)
pin_user_page(page)
put_page(page)


For anonymous pages, we'll bail out for example once we have

https://lkml.kernel.org/r/20220224122614.94921-14-david@redhat.com

Because the conditions for pinned anonymous pages might no longer hold.

If we won't call pin_user_page() on anonymous pages, it would be fine.
But then, I still wonder how we come up the "struct page" here.
John Hubbard March 1, 2022, 8:40 a.m. UTC | #4
On 3/1/22 00:11, David Hildenbrand wrote:
>> ...
>>>> +EXPORT_SYMBOL(pin_user_page);
>>>> +
>>>>    /*
>>>>     * pin_user_pages_unlocked() is the FOLL_PIN variant of
>>>>     * get_user_pages_unlocked(). Behavior is the same, except that this one sets
>>>
>>> I assume that function will only get called on a page that has been
>>> obtained by a previous pin_user_pages_fast(), correct?
>>>
>>
>> Well, no. This is meant to be used in place of get_page(), for code that
>> knows that the pages will be released via unpin_user_page(). So there is
>> no special prerequisite there.
> 
> That might be problematic and possibly the wrong approach, depending on
> *what* we're actually pinning and what we're intending to do with that.
> 
> My assumption would have been that this interface is to duplicate a pin

I see that I need to put more documentation here, so people don't have
to assume things... :)

> on a page, which would be perfectly fine, because the page actually saw
> a FOLL_PIN previously.
> 
> We're taking a pin on a page that we haven't obtained via FOLL_PIN if I
> understand correctly. Which raises the questions, how do we end up with
> the pages here, and what are we doing to do with them (use them like we
> obtained them via FOLL_PIN?)?
> 
> 
> If it's converting FOLL_GET -> FOLL_PIN manually, then we're bypassing
> FOLL_PIN special handling in GUP code:
> 
> page = get_user_pages(FOLL_GET)
> pin_user_page(page)
> put_page(page)

No, that's not where this is going at all. The idea, which  I now see
needs better documentation, is to handle file-backed pages. Only.

We're not converting from one type to another, nor are we doubling up.
We're just keeping the pin type consistent so that the vast block-
processing machinery can take pages in and handle them, then release
them at the end with bio_release_pages(), which will call
unpin_user_pages().

> 
> 
> For anonymous pages, we'll bail out for example once we have
> 
> https://lkml.kernel.org/r/20220224122614.94921-14-david@redhat.com
> 
> Because the conditions for pinned anonymous pages might no longer hold.
> 
> If we won't call pin_user_page() on anonymous pages, it would be fine.

We won't, and in fact, I should add WARN_ON_ONCE(PageAnon(page)) to
this function.

> But then, I still wonder how we come up the "struct page" here.
> 

 From the file system. For example, the NFS-direct and fuse conversions
in the last patches show how that works.

Thanks for this feedback, this is very helpful.

thanks,
David Hildenbrand March 1, 2022, 9:30 a.m. UTC | #5
>>>> That might be problematic and possibly the wrong approach, depending on
>> *what* we're actually pinning and what we're intending to do with that.
>>
>> My assumption would have been that this interface is to duplicate a pin
> 
> I see that I need to put more documentation here, so people don't have
> to assume things... :)
> 

Yes, please :)

>> on a page, which would be perfectly fine, because the page actually saw
>> a FOLL_PIN previously.
>>
>> We're taking a pin on a page that we haven't obtained via FOLL_PIN if I
>> understand correctly. Which raises the questions, how do we end up with
>> the pages here, and what are we doing to do with them (use them like we
>> obtained them via FOLL_PIN?)?
>>
>>
>> If it's converting FOLL_GET -> FOLL_PIN manually, then we're bypassing
>> FOLL_PIN special handling in GUP code:
>>
>> page = get_user_pages(FOLL_GET)
>> pin_user_page(page)
>> put_page(page)
> 
> No, that's not where this is going at all. The idea, which  I now see
> needs better documentation, is to handle file-backed pages. Only.
> 
> We're not converting from one type to another, nor are we doubling up.
> We're just keeping the pin type consistent so that the vast block-
> processing machinery can take pages in and handle them, then release
> them at the end with bio_release_pages(), which will call
> unpin_user_pages().
> 

Ah, okay, that makes sense. Glad to hear that we're intending to use
this with !anon pages only.

>>
>>
>> For anonymous pages, we'll bail out for example once we have
>>
>> https://lkml.kernel.org/r/20220224122614.94921-14-david@redhat.com
>>
>> Because the conditions for pinned anonymous pages might no longer hold.
>>
>> If we won't call pin_user_page() on anonymous pages, it would be fine.
> 
> We won't, and in fact, I should add WARN_ON_ONCE(PageAnon(page)) to
> this function.

Exactly what I would have suggested,

> 
>> But then, I still wonder how we come up the "struct page" here.
>>
> 
>  From the file system. For example, the NFS-direct and fuse conversions
> in the last patches show how that works.
> 
> Thanks for this feedback, this is very helpful.

Thanks for clarifying, John!
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 929488a47181..bb51f5487aef 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1914,6 +1914,7 @@  long pin_user_pages_remote(struct mm_struct *mm,
 long get_user_pages(unsigned long start, unsigned long nr_pages,
 			    unsigned int gup_flags, struct page **pages,
 			    struct vm_area_struct **vmas);
+void pin_user_page(struct page *page);
 long pin_user_pages(unsigned long start, unsigned long nr_pages,
 		    unsigned int gup_flags, struct page **pages,
 		    struct vm_area_struct **vmas);
diff --git a/mm/gup.c b/mm/gup.c
index 5c3f6ede17eb..44446241c3a9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3034,6 +3034,40 @@  long pin_user_pages(unsigned long start, unsigned long nr_pages,
 }
 EXPORT_SYMBOL(pin_user_pages);
 
+/**
+ * pin_user_page() - apply a FOLL_PIN reference to a page ()
+ *
+ * @page: the page to be pinned.
+ *
+ * Similar to get_user_pages(), in that the page's refcount is elevated using
+ * FOLL_PIN rules.
+ *
+ * IMPORTANT: That means that the caller must release the page via
+ * unpin_user_page().
+ *
+ */
+void pin_user_page(struct page *page)
+{
+	struct folio *folio = page_folio(page);
+
+	WARN_ON_ONCE(folio_ref_count(folio) <= 0);
+
+	/*
+	 * Similar to try_grab_page(): be sure to *also*
+	 * increment the normal page refcount field at least once,
+	 * so that the page really is pinned.
+	 */
+	if (folio_test_large(folio)) {
+		folio_ref_add(folio, 1);
+		atomic_add(1, folio_pincount_ptr(folio));
+	} else {
+		folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
+	}
+
+	node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
+}
+EXPORT_SYMBOL(pin_user_page);
+
 /*
  * pin_user_pages_unlocked() is the FOLL_PIN variant of
  * get_user_pages_unlocked(). Behavior is the same, except that this one sets