Message ID | 20220225085025.3052894-2-jhubbard@nvidia.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | block, fs: convert Direct IO to FOLL_PIN | expand |
On 25.02.22 09:50, John Hubbard wrote: > pin_user_page() is an externally-usable version of try_grab_page(), but > with semantics that match get_page(), so that it can act as a drop-in > replacement for get_page(). Specifically, pin_user_page() has a void > return type. > > pin_user_page() elevates a page's refcount is using FOLL_PIN rules. This > means that the caller must release the page via unpin_user_page(). > > Signed-off-by: John Hubbard <jhubbard@nvidia.com> > --- > include/linux/mm.h | 1 + > mm/gup.c | 34 ++++++++++++++++++++++++++++++++++ > 2 files changed, 35 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 929488a47181..bb51f5487aef 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1914,6 +1914,7 @@ long pin_user_pages_remote(struct mm_struct *mm, > long get_user_pages(unsigned long start, unsigned long nr_pages, > unsigned int gup_flags, struct page **pages, > struct vm_area_struct **vmas); > +void pin_user_page(struct page *page); > long pin_user_pages(unsigned long start, unsigned long nr_pages, > unsigned int gup_flags, struct page **pages, > struct vm_area_struct **vmas); > diff --git a/mm/gup.c b/mm/gup.c > index 5c3f6ede17eb..44446241c3a9 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -3034,6 +3034,40 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages, > } > EXPORT_SYMBOL(pin_user_pages); > > +/** > + * pin_user_page() - apply a FOLL_PIN reference to a page () > + * > + * @page: the page to be pinned. > + * > + * Similar to get_user_pages(), in that the page's refcount is elevated using > + * FOLL_PIN rules. > + * > + * IMPORTANT: That means that the caller must release the page via > + * unpin_user_page(). > + * > + */ > +void pin_user_page(struct page *page) > +{ > + struct folio *folio = page_folio(page); > + > + WARN_ON_ONCE(folio_ref_count(folio) <= 0); > + > + /* > + * Similar to try_grab_page(): be sure to *also* > + * increment the normal page refcount field at least once, > + * so that the page really is pinned. > + */ > + if (folio_test_large(folio)) { > + folio_ref_add(folio, 1); > + atomic_add(1, folio_pincount_ptr(folio)); > + } else { > + folio_ref_add(folio, GUP_PIN_COUNTING_BIAS); > + } > + > + node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1); > +} > +EXPORT_SYMBOL(pin_user_page); > + > /* > * pin_user_pages_unlocked() is the FOLL_PIN variant of > * get_user_pages_unlocked(). Behavior is the same, except that this one sets I assume that function will only get called on a page that has been obtained by a previous pin_user_pages_fast(), correct?
On 2/28/22 05:27, David Hildenbrand wrote: ... >> diff --git a/mm/gup.c b/mm/gup.c >> index 5c3f6ede17eb..44446241c3a9 100644 >> --- a/mm/gup.c >> +++ b/mm/gup.c >> @@ -3034,6 +3034,40 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages, >> } >> EXPORT_SYMBOL(pin_user_pages); >> >> +/** >> + * pin_user_page() - apply a FOLL_PIN reference to a page () >> + * >> + * @page: the page to be pinned. >> + * >> + * Similar to get_user_pages(), in that the page's refcount is elevated using >> + * FOLL_PIN rules. >> + * >> + * IMPORTANT: That means that the caller must release the page via >> + * unpin_user_page(). >> + * >> + */ >> +void pin_user_page(struct page *page) >> +{ >> + struct folio *folio = page_folio(page); >> + >> + WARN_ON_ONCE(folio_ref_count(folio) <= 0); >> + >> + /* >> + * Similar to try_grab_page(): be sure to *also* >> + * increment the normal page refcount field at least once, >> + * so that the page really is pinned. >> + */ >> + if (folio_test_large(folio)) { >> + folio_ref_add(folio, 1); >> + atomic_add(1, folio_pincount_ptr(folio)); >> + } else { >> + folio_ref_add(folio, GUP_PIN_COUNTING_BIAS); >> + } >> + >> + node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1); >> +} >> +EXPORT_SYMBOL(pin_user_page); >> + >> /* >> * pin_user_pages_unlocked() is the FOLL_PIN variant of >> * get_user_pages_unlocked(). Behavior is the same, except that this one sets > > I assume that function will only get called on a page that has been > obtained by a previous pin_user_pages_fast(), correct? > Well, no. This is meant to be used in place of get_page(), for code that knows that the pages will be released via unpin_user_page(). So there is no special prerequisite there. thanks,
On 28.02.22 22:14, John Hubbard wrote: > On 2/28/22 05:27, David Hildenbrand wrote: > ... >>> diff --git a/mm/gup.c b/mm/gup.c >>> index 5c3f6ede17eb..44446241c3a9 100644 >>> --- a/mm/gup.c >>> +++ b/mm/gup.c >>> @@ -3034,6 +3034,40 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages, >>> } >>> EXPORT_SYMBOL(pin_user_pages); >>> >>> +/** >>> + * pin_user_page() - apply a FOLL_PIN reference to a page () >>> + * >>> + * @page: the page to be pinned. >>> + * >>> + * Similar to get_user_pages(), in that the page's refcount is elevated using >>> + * FOLL_PIN rules. >>> + * >>> + * IMPORTANT: That means that the caller must release the page via >>> + * unpin_user_page(). >>> + * >>> + */ >>> +void pin_user_page(struct page *page) >>> +{ >>> + struct folio *folio = page_folio(page); >>> + >>> + WARN_ON_ONCE(folio_ref_count(folio) <= 0); >>> + >>> + /* >>> + * Similar to try_grab_page(): be sure to *also* >>> + * increment the normal page refcount field at least once, >>> + * so that the page really is pinned. >>> + */ >>> + if (folio_test_large(folio)) { >>> + folio_ref_add(folio, 1); >>> + atomic_add(1, folio_pincount_ptr(folio)); >>> + } else { >>> + folio_ref_add(folio, GUP_PIN_COUNTING_BIAS); >>> + } >>> + >>> + node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1); >>> +} >>> +EXPORT_SYMBOL(pin_user_page); >>> + >>> /* >>> * pin_user_pages_unlocked() is the FOLL_PIN variant of >>> * get_user_pages_unlocked(). Behavior is the same, except that this one sets >> >> I assume that function will only get called on a page that has been >> obtained by a previous pin_user_pages_fast(), correct? >> > > Well, no. This is meant to be used in place of get_page(), for code that > knows that the pages will be released via unpin_user_page(). So there is > no special prerequisite there. That might be problematic and possibly the wrong approach, depending on *what* we're actually pinning and what we're intending to do with that. My assumption would have been that this interface is to duplicate a pin on a page, which would be perfectly fine, because the page actually saw a FOLL_PIN previously. We're taking a pin on a page that we haven't obtained via FOLL_PIN if I understand correctly. Which raises the questions, how do we end up with the pages here, and what are we doing to do with them (use them like we obtained them via FOLL_PIN?)? If it's converting FOLL_GET -> FOLL_PIN manually, then we're bypassing FOLL_PIN special handling in GUP code: page = get_user_pages(FOLL_GET) pin_user_page(page) put_page(page) For anonymous pages, we'll bail out for example once we have https://lkml.kernel.org/r/20220224122614.94921-14-david@redhat.com Because the conditions for pinned anonymous pages might no longer hold. If we won't call pin_user_page() on anonymous pages, it would be fine. But then, I still wonder how we come up the "struct page" here.
On 3/1/22 00:11, David Hildenbrand wrote: >> ... >>>> +EXPORT_SYMBOL(pin_user_page); >>>> + >>>> /* >>>> * pin_user_pages_unlocked() is the FOLL_PIN variant of >>>> * get_user_pages_unlocked(). Behavior is the same, except that this one sets >>> >>> I assume that function will only get called on a page that has been >>> obtained by a previous pin_user_pages_fast(), correct? >>> >> >> Well, no. This is meant to be used in place of get_page(), for code that >> knows that the pages will be released via unpin_user_page(). So there is >> no special prerequisite there. > > That might be problematic and possibly the wrong approach, depending on > *what* we're actually pinning and what we're intending to do with that. > > My assumption would have been that this interface is to duplicate a pin I see that I need to put more documentation here, so people don't have to assume things... :) > on a page, which would be perfectly fine, because the page actually saw > a FOLL_PIN previously. > > We're taking a pin on a page that we haven't obtained via FOLL_PIN if I > understand correctly. Which raises the questions, how do we end up with > the pages here, and what are we doing to do with them (use them like we > obtained them via FOLL_PIN?)? > > > If it's converting FOLL_GET -> FOLL_PIN manually, then we're bypassing > FOLL_PIN special handling in GUP code: > > page = get_user_pages(FOLL_GET) > pin_user_page(page) > put_page(page) No, that's not where this is going at all. The idea, which I now see needs better documentation, is to handle file-backed pages. Only. We're not converting from one type to another, nor are we doubling up. We're just keeping the pin type consistent so that the vast block- processing machinery can take pages in and handle them, then release them at the end with bio_release_pages(), which will call unpin_user_pages(). > > > For anonymous pages, we'll bail out for example once we have > > https://lkml.kernel.org/r/20220224122614.94921-14-david@redhat.com > > Because the conditions for pinned anonymous pages might no longer hold. > > If we won't call pin_user_page() on anonymous pages, it would be fine. We won't, and in fact, I should add WARN_ON_ONCE(PageAnon(page)) to this function. > But then, I still wonder how we come up the "struct page" here. > From the file system. For example, the NFS-direct and fuse conversions in the last patches show how that works. Thanks for this feedback, this is very helpful. thanks,
>>>> That might be problematic and possibly the wrong approach, depending on >> *what* we're actually pinning and what we're intending to do with that. >> >> My assumption would have been that this interface is to duplicate a pin > > I see that I need to put more documentation here, so people don't have > to assume things... :) > Yes, please :) >> on a page, which would be perfectly fine, because the page actually saw >> a FOLL_PIN previously. >> >> We're taking a pin on a page that we haven't obtained via FOLL_PIN if I >> understand correctly. Which raises the questions, how do we end up with >> the pages here, and what are we doing to do with them (use them like we >> obtained them via FOLL_PIN?)? >> >> >> If it's converting FOLL_GET -> FOLL_PIN manually, then we're bypassing >> FOLL_PIN special handling in GUP code: >> >> page = get_user_pages(FOLL_GET) >> pin_user_page(page) >> put_page(page) > > No, that's not where this is going at all. The idea, which I now see > needs better documentation, is to handle file-backed pages. Only. > > We're not converting from one type to another, nor are we doubling up. > We're just keeping the pin type consistent so that the vast block- > processing machinery can take pages in and handle them, then release > them at the end with bio_release_pages(), which will call > unpin_user_pages(). > Ah, okay, that makes sense. Glad to hear that we're intending to use this with !anon pages only. >> >> >> For anonymous pages, we'll bail out for example once we have >> >> https://lkml.kernel.org/r/20220224122614.94921-14-david@redhat.com >> >> Because the conditions for pinned anonymous pages might no longer hold. >> >> If we won't call pin_user_page() on anonymous pages, it would be fine. > > We won't, and in fact, I should add WARN_ON_ONCE(PageAnon(page)) to > this function. Exactly what I would have suggested, > >> But then, I still wonder how we come up the "struct page" here. >> > > From the file system. For example, the NFS-direct and fuse conversions > in the last patches show how that works. > > Thanks for this feedback, this is very helpful. Thanks for clarifying, John!
diff --git a/include/linux/mm.h b/include/linux/mm.h index 929488a47181..bb51f5487aef 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1914,6 +1914,7 @@ long pin_user_pages_remote(struct mm_struct *mm, long get_user_pages(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas); +void pin_user_page(struct page *page); long pin_user_pages(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas); diff --git a/mm/gup.c b/mm/gup.c index 5c3f6ede17eb..44446241c3a9 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -3034,6 +3034,40 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages, } EXPORT_SYMBOL(pin_user_pages); +/** + * pin_user_page() - apply a FOLL_PIN reference to a page () + * + * @page: the page to be pinned. + * + * Similar to get_user_pages(), in that the page's refcount is elevated using + * FOLL_PIN rules. + * + * IMPORTANT: That means that the caller must release the page via + * unpin_user_page(). + * + */ +void pin_user_page(struct page *page) +{ + struct folio *folio = page_folio(page); + + WARN_ON_ONCE(folio_ref_count(folio) <= 0); + + /* + * Similar to try_grab_page(): be sure to *also* + * increment the normal page refcount field at least once, + * so that the page really is pinned. + */ + if (folio_test_large(folio)) { + folio_ref_add(folio, 1); + atomic_add(1, folio_pincount_ptr(folio)); + } else { + folio_ref_add(folio, GUP_PIN_COUNTING_BIAS); + } + + node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1); +} +EXPORT_SYMBOL(pin_user_page); + /* * pin_user_pages_unlocked() is the FOLL_PIN variant of * get_user_pages_unlocked(). Behavior is the same, except that this one sets
pin_user_page() is an externally-usable version of try_grab_page(), but with semantics that match get_page(), so that it can act as a drop-in replacement for get_page(). Specifically, pin_user_page() has a void return type. pin_user_page() elevates a page's refcount is using FOLL_PIN rules. This means that the caller must release the page via unpin_user_page(). Signed-off-by: John Hubbard <jhubbard@nvidia.com> --- include/linux/mm.h | 1 + mm/gup.c | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+)