Message ID | 20241105032944.141488-2-jhubbard@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases | expand |
On 05.11.24 04:29, John Hubbard wrote: > commit 53ba78de064b ("mm/gup: introduce > check_and_migrate_movable_folios()") created a new constraint on the > pin_user_pages*() API family: a potentially large internal allocation > must now occur, for FOLL_LONGTERM cases. > > A user-visible consequence has now appeared: user space can no longer > pin more than 2GB of memory anymore on x86_64. That's because, on a 4KB > PAGE_SIZE system, when user space tries to (indirectly, via a device > driver that calls pin_user_pages()) pin 2GB, this requires an allocation > of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for > kmalloc(). > > In addition to the directly visible effect described above, there is > also the problem of adding an unnecessary allocation. The **pages array > argument has already been allocated, and there is no need for a > redundant **folios array allocation in this case. > > Fix this by avoiding the new allocation entirely. This is done by > referring to either the original page[i] within **pages, or to the > associated folio. Thanks to David Hildenbrand for suggesting this > approach and for providing the initial implementation (which I've tested > and adjusted slightly) as well. > > Fixes: 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()") > Suggested-by: David Hildenbrand <david@redhat.com> > Cc: Vivek Kasireddy <vivek.kasireddy@intel.com> > Cc: Dave Airlie <airlied@redhat.com> > Cc: Gerd Hoffmann <kraxel@redhat.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Jason Gunthorpe <jgg@nvidia.com> > Cc: Peter Xu <peterx@redhat.com> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: Dongwon Kim <dongwon.kim@intel.com> > Cc: Hugh Dickins <hughd@google.com> > Cc: Junxiao Chang <junxiao.chang@intel.com> > Cc: Mike Kravetz <mike.kravetz@oracle.com> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: linux-stable@vger.kernel.org > Signed-off-by: John Hubbard <jhubbard@nvidia.com> > --- > > return ret; > } > > +static long > +check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs) > +{ > + LIST_HEAD(movable_folio_list); > + unsigned long collected; > + > + collected = > + collect_longterm_unpinnable_folios(&movable_folio_list, pofs); Nit: We're allowed to use more than 80 characters :) (I would prefer the old way it was split across more lines if we really want to split; this way here is less common) Thanks! Acked-by: David Hildenbrand <david@redhat.com>
On 11/5/24 12:47 AM, David Hildenbrand wrote: > On 05.11.24 04:29, John Hubbard wrote: ... >> return ret; >> } >> +static long >> +check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs) >> +{ >> + LIST_HEAD(movable_folio_list); >> + unsigned long collected; >> + >> + collected = >> + collect_longterm_unpinnable_folios(&movable_folio_list, pofs); > > Nit: We're allowed to use more than 80 characters :) Yes, I'm aware. Most of gup.c stays nicely within 80 cols, so it's nice to keep that if not too awkward.... > > (I would prefer the old way it was split across more lines if we really want to split; this way here is less common) > OK, if I need to respin I'll apply this, to restore the old way. If not, maybe Andrew can fix it up please? diff --git a/mm/gup.c b/mm/gup.c index 0f5121ad0b9f..0a22f7def83c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2435,8 +2435,8 @@ check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs) LIST_HEAD(movable_folio_list); unsigned long collected; - collected = - collect_longterm_unpinnable_folios(&movable_folio_list, pofs); + collected = collect_longterm_unpinnable_folios(&movable_folio_list, + pofs); if (!collected) return 0; > > Thanks! > > Acked-by: David Hildenbrand <david@redhat.com> > Appreciate the ack, and of course the idea and implementation as well! :) thanks,
On Mon, Nov 04, 2024 at 07:29:44PM -0800, John Hubbard wrote: > commit 53ba78de064b ("mm/gup: introduce > check_and_migrate_movable_folios()") created a new constraint on the > pin_user_pages*() API family: a potentially large internal allocation > must now occur, for FOLL_LONGTERM cases. > > A user-visible consequence has now appeared: user space can no longer > pin more than 2GB of memory anymore on x86_64. That's because, on a 4KB > PAGE_SIZE system, when user space tries to (indirectly, via a device > driver that calls pin_user_pages()) pin 2GB, this requires an allocation > of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for > kmalloc(). > > In addition to the directly visible effect described above, there is > also the problem of adding an unnecessary allocation. The **pages array > argument has already been allocated, and there is no need for a > redundant **folios array allocation in this case. > > Fix this by avoiding the new allocation entirely. This is done by > referring to either the original page[i] within **pages, or to the > associated folio. Thanks to David Hildenbrand for suggesting this > approach and for providing the initial implementation (which I've tested > and adjusted slightly) as well. > > Fixes: 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()") > Suggested-by: David Hildenbrand <david@redhat.com> > Cc: Vivek Kasireddy <vivek.kasireddy@intel.com> > Cc: Dave Airlie <airlied@redhat.com> > Cc: Gerd Hoffmann <kraxel@redhat.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Jason Gunthorpe <jgg@nvidia.com> > Cc: Peter Xu <peterx@redhat.com> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: Dongwon Kim <dongwon.kim@intel.com> > Cc: Hugh Dickins <hughd@google.com> > Cc: Junxiao Chang <junxiao.chang@intel.com> > Cc: Mike Kravetz <mike.kravetz@oracle.com> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: linux-stable@vger.kernel.org > Signed-off-by: John Hubbard <jhubbard@nvidia.com> Hi John, thanks for doing this. Reviewed-by: Oscar Salvador <osalvador@suse.de> Nit below: > +static int > +migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list, > + struct pages_or_folios *pofs) > { > int ret; > unsigned long i; > > - for (i = 0; i < nr_folios; i++) { > - struct folio *folio = folios[i]; > + for (i = 0; i < pofs->nr_entries; i++) { > + struct folio *folio = pofs_get_folio(pofs, i); > > if (folio_is_device_coherent(folio)) { > /* > @@ -2344,7 +2380,7 @@ static int migrate_longterm_unpinnable_folios( > * convert the pin on the source folio to a normal > * reference. > */ > - folios[i] = NULL; > + pofs_clear_entry(pofs, i); > folio_get(folio); > gup_put_folio(folio, 1, FOLL_PIN); > > @@ -2363,8 +2399,8 @@ static int migrate_longterm_unpinnable_folios( > * calling folio_isolate_lru() which takes a reference so the > * folio won't be freed if it's migrating. > */ > - unpin_folio(folios[i]); > - folios[i] = NULL; > + unpin_folio(pofs_get_folio(pofs, i)); We already retrieved the folio before, cannot we just bypass pofs_get_folio() here?
On 11/6/24 1:23 AM, Oscar Salvador wrote: > On Mon, Nov 04, 2024 at 07:29:44PM -0800, John Hubbard wrote: ... > Hi John, thanks for doing this. > > Reviewed-by: Oscar Salvador <osalvador@suse.de> Thanks for the review! > > Nit below: > ... >> @@ -2363,8 +2399,8 @@ static int migrate_longterm_unpinnable_folios( >> * calling folio_isolate_lru() which takes a reference so the >> * folio won't be freed if it's migrating. >> */ >> - unpin_folio(folios[i]); >> - folios[i] = NULL; >> + unpin_folio(pofs_get_folio(pofs, i)); > > We already retrieved the folio before, cannot we just bypass > pofs_get_folio() here? > Yes, you are right. Andrew, can we please add this fixup on top of the commits in today's mm-hotfixes-unstable (or, let me know if you'd prefer a v3 instead): diff --git a/mm/gup.c b/mm/gup.c index 0a22f7def83c..ad0c8922dac3 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2399,7 +2399,7 @@ migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list, * calling folio_isolate_lru() which takes a reference so the * folio won't be freed if it's migrating. */ - unpin_folio(pofs_get_folio(pofs, i)); + unpin_folio(folio); pofs_clear_entry(pofs, i); } thanks,
diff --git a/mm/gup.c b/mm/gup.c index 4637dab7b54f..0f5121ad0b9f 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2273,20 +2273,57 @@ struct page *get_dump_page(unsigned long addr) #endif /* CONFIG_ELF_CORE */ #ifdef CONFIG_MIGRATION + +/* + * An array of either pages or folios ("pofs"). Although it may seem tempting to + * avoid this complication, by simply interpreting a list of folios as a list of + * pages, that approach won't work in the longer term, because eventually the + * layouts of struct page and struct folio will become completely different. + * Furthermore, this pof approach avoids excessive page_folio() calls. + */ +struct pages_or_folios { + union { + struct page **pages; + struct folio **folios; + void **entries; + }; + bool has_folios; + long nr_entries; +}; + +static struct folio *pofs_get_folio(struct pages_or_folios *pofs, long i) +{ + if (pofs->has_folios) + return pofs->folios[i]; + return page_folio(pofs->pages[i]); +} + +static void pofs_clear_entry(struct pages_or_folios *pofs, long i) +{ + pofs->entries[i] = NULL; +} + +static void pofs_unpin(struct pages_or_folios *pofs) +{ + if (pofs->has_folios) + unpin_folios(pofs->folios, pofs->nr_entries); + else + unpin_user_pages(pofs->pages, pofs->nr_entries); +} + /* * Returns the number of collected folios. Return value is always >= 0. */ static unsigned long collect_longterm_unpinnable_folios( - struct list_head *movable_folio_list, - unsigned long nr_folios, - struct folio **folios) + struct list_head *movable_folio_list, + struct pages_or_folios *pofs) { unsigned long i, collected = 0; struct folio *prev_folio = NULL; bool drain_allow = true; - for (i = 0; i < nr_folios; i++) { - struct folio *folio = folios[i]; + for (i = 0; i < pofs->nr_entries; i++) { + struct folio *folio = pofs_get_folio(pofs, i); if (folio == prev_folio) continue; @@ -2327,16 +2364,15 @@ static unsigned long collect_longterm_unpinnable_folios( * Returns -EAGAIN if all folios were successfully migrated or -errno for * failure (or partial success). */ -static int migrate_longterm_unpinnable_folios( - struct list_head *movable_folio_list, - unsigned long nr_folios, - struct folio **folios) +static int +migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list, + struct pages_or_folios *pofs) { int ret; unsigned long i; - for (i = 0; i < nr_folios; i++) { - struct folio *folio = folios[i]; + for (i = 0; i < pofs->nr_entries; i++) { + struct folio *folio = pofs_get_folio(pofs, i); if (folio_is_device_coherent(folio)) { /* @@ -2344,7 +2380,7 @@ static int migrate_longterm_unpinnable_folios( * convert the pin on the source folio to a normal * reference. */ - folios[i] = NULL; + pofs_clear_entry(pofs, i); folio_get(folio); gup_put_folio(folio, 1, FOLL_PIN); @@ -2363,8 +2399,8 @@ static int migrate_longterm_unpinnable_folios( * calling folio_isolate_lru() which takes a reference so the * folio won't be freed if it's migrating. */ - unpin_folio(folios[i]); - folios[i] = NULL; + unpin_folio(pofs_get_folio(pofs, i)); + pofs_clear_entry(pofs, i); } if (!list_empty(movable_folio_list)) { @@ -2387,12 +2423,26 @@ static int migrate_longterm_unpinnable_folios( return -EAGAIN; err: - unpin_folios(folios, nr_folios); + pofs_unpin(pofs); putback_movable_pages(movable_folio_list); return ret; } +static long +check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs) +{ + LIST_HEAD(movable_folio_list); + unsigned long collected; + + collected = + collect_longterm_unpinnable_folios(&movable_folio_list, pofs); + if (!collected) + return 0; + + return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs); +} + /* * Check whether all folios are *allowed* to be pinned indefinitely (long term). * Rather confusingly, all folios in the range are required to be pinned via @@ -2417,16 +2467,13 @@ static int migrate_longterm_unpinnable_folios( static long check_and_migrate_movable_folios(unsigned long nr_folios, struct folio **folios) { - unsigned long collected; - LIST_HEAD(movable_folio_list); + struct pages_or_folios pofs = { + .folios = folios, + .has_folios = true, + .nr_entries = nr_folios, + }; - collected = collect_longterm_unpinnable_folios(&movable_folio_list, - nr_folios, folios); - if (!collected) - return 0; - - return migrate_longterm_unpinnable_folios(&movable_folio_list, - nr_folios, folios); + return check_and_migrate_movable_pages_or_folios(&pofs); } /* @@ -2436,22 +2483,13 @@ static long check_and_migrate_movable_folios(unsigned long nr_folios, static long check_and_migrate_movable_pages(unsigned long nr_pages, struct page **pages) { - struct folio **folios; - long i, ret; + struct pages_or_folios pofs = { + .pages = pages, + .has_folios = false, + .nr_entries = nr_pages, + }; - folios = kmalloc_array(nr_pages, sizeof(*folios), GFP_KERNEL); - if (!folios) { - unpin_user_pages(pages, nr_pages); - return -ENOMEM; - } - - for (i = 0; i < nr_pages; i++) - folios[i] = page_folio(pages[i]); - - ret = check_and_migrate_movable_folios(nr_pages, folios); - - kfree(folios); - return ret; + return check_and_migrate_movable_pages_or_folios(&pofs); } #else static long check_and_migrate_movable_pages(unsigned long nr_pages,
commit 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()") created a new constraint on the pin_user_pages*() API family: a potentially large internal allocation must now occur, for FOLL_LONGTERM cases. A user-visible consequence has now appeared: user space can no longer pin more than 2GB of memory anymore on x86_64. That's because, on a 4KB PAGE_SIZE system, when user space tries to (indirectly, via a device driver that calls pin_user_pages()) pin 2GB, this requires an allocation of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for kmalloc(). In addition to the directly visible effect described above, there is also the problem of adding an unnecessary allocation. The **pages array argument has already been allocated, and there is no need for a redundant **folios array allocation in this case. Fix this by avoiding the new allocation entirely. This is done by referring to either the original page[i] within **pages, or to the associated folio. Thanks to David Hildenbrand for suggesting this approach and for providing the initial implementation (which I've tested and adjusted slightly) as well. Fixes: 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()") Suggested-by: David Hildenbrand <david@redhat.com> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Junxiao Chang <junxiao.chang@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: linux-stable@vger.kernel.org Signed-off-by: John Hubbard <jhubbard@nvidia.com> --- mm/gup.c | 116 ++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 77 insertions(+), 39 deletions(-)