Message ID | 20220102215729.2943705-6-willy@infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Convert GUP to folios | expand |
On Sun, Jan 02, 2022 at 09:57:17PM +0000, Matthew Wilcox (Oracle) wrote: > This replaces try_get_compound_head(). It includes a small optimisation > for the race where a folio is split between being looked up from its > tail page and the reference count being obtained. Before, it returned > NULL, which presumably triggered a retry under the mmap_lock, whereas > now it will retry without the lock. Finding a frozen page will still > return NULL. Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> Although I think splitting out the rety optimization into a separate patch would be nicer to document the change and to ease bisection if that arises.
On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote: > This replaces try_get_compound_head(). It includes a small optimisation > for the race where a folio is split between being looked up from its > tail page and the reference count being obtained. Before, it returned > NULL, which presumably triggered a retry under the mmap_lock, whereas > now it will retry without the lock. Finding a frozen page will still > return NULL. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > mm/gup.c | 69 +++++++++++++++++++++++++++++--------------------------- > 1 file changed, 36 insertions(+), 33 deletions(-) > > diff --git a/mm/gup.c b/mm/gup.c > index 2c51e9748a6a..58e5cfaaa676 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -29,12 +29,11 @@ struct follow_page_context { > unsigned int page_mask; > }; > > -static void hpage_pincount_add(struct page *page, int refs) > +static void folio_pincount_add(struct folio *folio, int refs) > { > - VM_BUG_ON_PAGE(!hpage_pincount_available(page), page); > - VM_BUG_ON_PAGE(page != compound_head(page), page); > + VM_BUG_ON_FOLIO(!folio_pincount_available(folio), folio); > > - atomic_add(refs, compound_pincount_ptr(page)); > + atomic_add(refs, folio_pincount_ptr(folio)); > } > > static void hpage_pincount_sub(struct page *page, int refs) > @@ -63,33 +62,35 @@ static void put_page_refs(struct page *page, int refs) > } > > /* > - * Return the compound head page with ref appropriately incremented, > + * Return the folio with ref appropriately incremented, > * or NULL if that failed. > */ > -static inline struct page *try_get_compound_head(struct page *page, int refs) > +static inline struct folio *try_get_folio(struct page *page, int refs) > { > - struct page *head = compound_head(page); > + struct folio *folio; > > - if (WARN_ON_ONCE(page_ref_count(head) < 0)) > +retry: Yes, this new retry looks like a solid improvement. Retrying at this low level makes a lot of sense, given that it is racing with a very transient sort of behavior. > + folio = page_folio(page); > + if (WARN_ON_ONCE(folio_ref_count(folio) < 0)) > return NULL; > - if (unlikely(!page_cache_add_speculative(head, refs))) > + if (unlikely(!folio_ref_try_add_rcu(folio, refs))) I'm a little lost about the meaning and intended use of the _rcu aspects of folio_ref_try_add_rcu() here. For example, try_get_folio() does not require that callers are in an rcu read section, right? This is probably just a documentation question, sorry if it's obvious--I wasn't able to work it out on my own. > return NULL; > > /* > - * At this point we have a stable reference to the head page; but it > - * could be that between the compound_head() lookup and the refcount > - * increment, the compound page was split, in which case we'd end up > - * holding a reference on a page that has nothing to do with the page > + * At this point we have a stable reference to the folio; but it > + * could be that between calling page_folio() and the refcount > + * increment, the folio was split, in which case we'd end up > + * holding a reference on a folio that has nothing to do with the page > * we were given anymore. > - * So now that the head page is stable, recheck that the pages still > - * belong together. > + * So now that the folio is stable, recheck that the page still > + * belongs to this folio. > */ > - if (unlikely(compound_head(page) != head)) { > - put_page_refs(head, refs); > - return NULL; > + if (unlikely(page_folio(page) != folio)) { > + folio_put_refs(folio, refs); > + goto retry; > } > > - return head; > + return folio; > } > > /** > @@ -128,8 +129,10 @@ struct page *try_grab_compound_head(struct page *page, > int refs, unsigned int flags) > { > if (flags & FOLL_GET) > - return try_get_compound_head(page, refs); > + return &try_get_folio(page, refs)->page; Did you want to use folio_page() here, instead? > else if (flags & FOLL_PIN) { > + struct folio *folio; > + > /* > * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a > * right zone, so fail and let the caller fall back to the slow > @@ -143,29 +146,29 @@ struct page *try_grab_compound_head(struct page *page, > * CAUTION: Don't use compound_head() on the page before this > * point, the result won't be stable. > */ > - page = try_get_compound_head(page, refs); > - if (!page) > + folio = try_get_folio(page, refs); > + if (!folio) > return NULL; > > /* > - * When pinning a compound page of order > 1 (which is what > + * When pinning a folio of order > 1 (which is what > * hpage_pincount_available() checks for), use an exact count to > - * track it, via hpage_pincount_add/_sub(). > + * track it, via folio_pincount_add/_sub(). > * > - * However, be sure to *also* increment the normal page refcount > - * field at least once, so that the page really is pinned. > + * However, be sure to *also* increment the normal folio refcount > + * field at least once, so that the folio really is pinned. > * That's why the refcount from the earlier > - * try_get_compound_head() is left intact. > + * try_get_folio() is left intact. > */ > - if (hpage_pincount_available(page)) > - hpage_pincount_add(page, refs); > + if (folio_pincount_available(folio)) > + folio_pincount_add(folio, refs); > else > - page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1)); > + folio_ref_add(folio, > + refs * (GUP_PIN_COUNTING_BIAS - 1)); > > - mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED, > - refs); > + node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs); > > - return page; > + return &folio->page; > } > > WARN_ON_ONCE(1); Just minor questions above. This all looks solid though. thanks,
On 1/4/22 17:25, John Hubbard wrote: >> @@ -128,8 +129,10 @@ struct page *try_grab_compound_head(struct page *page, >> int refs, unsigned int flags) >> { >> if (flags & FOLL_GET) >> - return try_get_compound_head(page, refs); >> + return &try_get_folio(page, refs)->page; > > > Did you want to use folio_page() here, instead? > ...never mind about that, because I see that patch 08 gets rid of the "->page", anyway. thanks,
On Tue, Jan 04, 2022 at 05:25:07PM -0800, John Hubbard wrote: > > + folio = page_folio(page); > > + if (WARN_ON_ONCE(folio_ref_count(folio) < 0)) > > return NULL; > > - if (unlikely(!page_cache_add_speculative(head, refs))) > > + if (unlikely(!folio_ref_try_add_rcu(folio, refs))) > > I'm a little lost about the meaning and intended use of the _rcu aspects > of folio_ref_try_add_rcu() here. For example, try_get_folio() does not > require that callers are in an rcu read section, right? This is probably > just a documentation question, sorry if it's obvious--I wasn't able to > work it out on my own. Normally this is called under gup_fast, so it is operating under the quasi-rcu it uses. I have no idea about preempt kernels if this should be marked with rcu or if the local_irq_save is good enough.. The exceptional cases are get_gate_page() which I suppose relies on the mmap lock to ensure that the special gate page cannot be concurrently freed. Then there are the cases under slow GUP (eg follow_page_pte()) which don't need "try" at all because they hold the PTLs so it is guarenteed the refcount should be non-zero AFAIK.. It wouldn't be a bad idea to have a non-try version of this just for clarity, could it peform a bit better? /* * try_grab_page() should always succeed here, because: a) we * hold the pmd (ptl) lock, and b) we've just checked that the * huge pmd (head) page is present in the page tables. The ptl * prevents the head page and tail pages from being rearranged * in any way. So this page must be available at this point, * unless the page refcount overflowed: */ if (WARN_ON_ONCE(!try_grab_page(page, flags))) { Jason
On Tue, Jan 04, 2022 at 05:25:07PM -0800, John Hubbard wrote: > On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote: > > + folio = page_folio(page); > > + if (WARN_ON_ONCE(folio_ref_count(folio) < 0)) > > return NULL; > > - if (unlikely(!page_cache_add_speculative(head, refs))) > > + if (unlikely(!folio_ref_try_add_rcu(folio, refs))) > > I'm a little lost about the meaning and intended use of the _rcu aspects > of folio_ref_try_add_rcu() here. For example, try_get_folio() does not > require that callers are in an rcu read section, right? This is probably > just a documentation question, sorry if it's obvious--I wasn't able to > work it out on my own. page_cache_add_speculative() always assumed that you were at least as protected as RCU. Quoting v5.4's pagemap.h: * speculatively take a reference to a page. * If the page is free (_refcount == 0), then _refcount is untouched, and 0 * is returned. Otherwise, _refcount is incremented by 1 and 1 is returned. * * This function must be called inside the same rcu_read_lock() section as has * been used to lookup the page in the pagecache radix-tree (or page table): * this allows allocators to use a synchronize_rcu() to stabilize _refcount. ... so is it the addition of "rcu" in the name that's scared you? If so, that seems like a good thing, because previously you were using page_cache_add_speculative() without realising that RCU protection was required. I think there is sufficient protection; either we have interrupts disabled in gup-fast (which prevents RCU from finishing a grace period), or we have the mmap_sem held in gup-slow (which prevents pages from being removed from the page tables). But maybe I've missed something?
On 1/7/22 17:37, Matthew Wilcox wrote: > On Tue, Jan 04, 2022 at 05:25:07PM -0800, John Hubbard wrote: >> On 1/2/22 13:57, Matthew Wilcox (Oracle) wrote: >>> + folio = page_folio(page); >>> + if (WARN_ON_ONCE(folio_ref_count(folio) < 0)) >>> return NULL; >>> - if (unlikely(!page_cache_add_speculative(head, refs))) >>> + if (unlikely(!folio_ref_try_add_rcu(folio, refs))) >> >> I'm a little lost about the meaning and intended use of the _rcu aspects >> of folio_ref_try_add_rcu() here. For example, try_get_folio() does not >> require that callers are in an rcu read section, right? This is probably >> just a documentation question, sorry if it's obvious--I wasn't able to >> work it out on my own. > > page_cache_add_speculative() always assumed that you were at least as > protected as RCU. Quoting v5.4's pagemap.h: > > * speculatively take a reference to a page. > * If the page is free (_refcount == 0), then _refcount is untouched, and 0 > * is returned. Otherwise, _refcount is incremented by 1 and 1 is returned. > * > * This function must be called inside the same rcu_read_lock() section as has > * been used to lookup the page in the pagecache radix-tree (or page table): > * this allows allocators to use a synchronize_rcu() to stabilize _refcount. Thanks for the v5.4 documentation reference. I did read and understand that a while back, but it didn't show up at all when I was looking through the code during this review, so I missed a chance to re-read it. I'd forgotten about the RCU qualifier there. > > ... so is it the addition of "rcu" in the name that's scared you? > If so, that seems like a good thing, because previously you were using > page_cache_add_speculative() without realising that RCU protection > was required. > > I think there is sufficient protection; either we have interrupts disabled > in gup-fast (which prevents RCU from finishing a grace period), or we > have the mmap_sem held in gup-slow (which prevents pages from being > removed from the page tables). > > But maybe I've missed something? Clearly, I'm not the right person to answer that question, at this point. :) FWIW, I went back and looked again, and it seems like everything is covered. On the documentation front, I do miss the v5.4 helpful comment block above __page_cache_add_speculative(), it would be nice to have an updated version of that revived, but of course that's a separate point from this patchset. thanks,
On Sat, Jan 08, 2022 at 01:37:09AM +0000, Matthew Wilcox wrote: > I think there is sufficient protection; either we have interrupts > disabled in gup-fast (which prevents RCU from finishing a grace > period), or we have the mmap_sem held in gup-slow (which prevents > pages from being removed from the page tables). mmap sem doesn't prevent pages from being removed, for instance unmap_mapping_pages() doesn't hold it. It is safe because gup slow holds the PTLs when it calls this. No idea about how the get_gate_page() works Jason
diff --git a/mm/gup.c b/mm/gup.c index 2c51e9748a6a..58e5cfaaa676 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -29,12 +29,11 @@ struct follow_page_context { unsigned int page_mask; }; -static void hpage_pincount_add(struct page *page, int refs) +static void folio_pincount_add(struct folio *folio, int refs) { - VM_BUG_ON_PAGE(!hpage_pincount_available(page), page); - VM_BUG_ON_PAGE(page != compound_head(page), page); + VM_BUG_ON_FOLIO(!folio_pincount_available(folio), folio); - atomic_add(refs, compound_pincount_ptr(page)); + atomic_add(refs, folio_pincount_ptr(folio)); } static void hpage_pincount_sub(struct page *page, int refs) @@ -63,33 +62,35 @@ static void put_page_refs(struct page *page, int refs) } /* - * Return the compound head page with ref appropriately incremented, + * Return the folio with ref appropriately incremented, * or NULL if that failed. */ -static inline struct page *try_get_compound_head(struct page *page, int refs) +static inline struct folio *try_get_folio(struct page *page, int refs) { - struct page *head = compound_head(page); + struct folio *folio; - if (WARN_ON_ONCE(page_ref_count(head) < 0)) +retry: + folio = page_folio(page); + if (WARN_ON_ONCE(folio_ref_count(folio) < 0)) return NULL; - if (unlikely(!page_cache_add_speculative(head, refs))) + if (unlikely(!folio_ref_try_add_rcu(folio, refs))) return NULL; /* - * At this point we have a stable reference to the head page; but it - * could be that between the compound_head() lookup and the refcount - * increment, the compound page was split, in which case we'd end up - * holding a reference on a page that has nothing to do with the page + * At this point we have a stable reference to the folio; but it + * could be that between calling page_folio() and the refcount + * increment, the folio was split, in which case we'd end up + * holding a reference on a folio that has nothing to do with the page * we were given anymore. - * So now that the head page is stable, recheck that the pages still - * belong together. + * So now that the folio is stable, recheck that the page still + * belongs to this folio. */ - if (unlikely(compound_head(page) != head)) { - put_page_refs(head, refs); - return NULL; + if (unlikely(page_folio(page) != folio)) { + folio_put_refs(folio, refs); + goto retry; } - return head; + return folio; } /** @@ -128,8 +129,10 @@ struct page *try_grab_compound_head(struct page *page, int refs, unsigned int flags) { if (flags & FOLL_GET) - return try_get_compound_head(page, refs); + return &try_get_folio(page, refs)->page; else if (flags & FOLL_PIN) { + struct folio *folio; + /* * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a * right zone, so fail and let the caller fall back to the slow @@ -143,29 +146,29 @@ struct page *try_grab_compound_head(struct page *page, * CAUTION: Don't use compound_head() on the page before this * point, the result won't be stable. */ - page = try_get_compound_head(page, refs); - if (!page) + folio = try_get_folio(page, refs); + if (!folio) return NULL; /* - * When pinning a compound page of order > 1 (which is what + * When pinning a folio of order > 1 (which is what * hpage_pincount_available() checks for), use an exact count to - * track it, via hpage_pincount_add/_sub(). + * track it, via folio_pincount_add/_sub(). * - * However, be sure to *also* increment the normal page refcount - * field at least once, so that the page really is pinned. + * However, be sure to *also* increment the normal folio refcount + * field at least once, so that the folio really is pinned. * That's why the refcount from the earlier - * try_get_compound_head() is left intact. + * try_get_folio() is left intact. */ - if (hpage_pincount_available(page)) - hpage_pincount_add(page, refs); + if (folio_pincount_available(folio)) + folio_pincount_add(folio, refs); else - page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1)); + folio_ref_add(folio, + refs * (GUP_PIN_COUNTING_BIAS - 1)); - mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED, - refs); + node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs); - return page; + return &folio->page; } WARN_ON_ONCE(1);
This replaces try_get_compound_head(). It includes a small optimisation for the race where a folio is split between being looked up from its tail page and the reference count being obtained. Before, it returned NULL, which presumably triggered a retry under the mmap_lock, whereas now it will retry without the lock. Finding a frozen page will still return NULL. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- mm/gup.c | 69 +++++++++++++++++++++++++++++--------------------------- 1 file changed, 36 insertions(+), 33 deletions(-)