Message ID | 20241018011711.183642-2-jhubbard@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/gup: stop leaking pinned pages in low memory conditions | expand |
On 18.10.24 03:17, John Hubbard wrote: > If a driver tries to call any of the pin_user_pages*(FOLL_LONGTERM) > family of functions, and requests "too many" pages, then the call will > erroneously leave pages pinned. This is visible in user space as an > actual memory leak. > > Repro is trivial: just make enough pin_user_pages(FOLL_LONGTERM) calls > to exhaust memory. > > The root cause of the problem is this sequence, within > __gup_longterm_locked(): > > __get_user_pages_locked() > rc = check_and_migrate_movable_pages() > > ...which gets retried in a loop. The loop error handling is incomplete, > clearly due to a somewhat unusual and complicated tri-state error API. > But anyway, if -ENOMEM, or in fact, any unexpected error is returned > from check_and_migrate_movable_pages(), then __gup_longterm_locked() > happily returns the error, while leaving the pages pinned. Sorry for another comment, I am taking my time to look into the code again in more detail ... migrate_longterm_unpinnable_folios() will always unpin all pages: no matter which error it returns. a) If it returns -EAGAIN, it unpinned all folios b) If it returns any error it first calls unpin_folios(). So shouldn't the fix just be in check_and_migrate_movable_pages()? diff --git a/mm/gup.c b/mm/gup.c index a82890b46a36..81fc8314e687 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2403,8 +2403,9 @@ static int migrate_longterm_unpinnable_folios( * -EAGAIN. The caller should re-pin the entire range with FOLL_PIN and then * call this routine again. * - * If an error other than -EAGAIN occurs, this indicates a migration failure. - * The caller should give up, and propagate the error back up the call stack. + * If an error occurs, all folios are unpinned. If an error other than + * -EAGAIN occurs, this indicates a migration failure. The caller should give + * up, and propagate the error back up the call stack. * * If everything is OK and all folios in the range are allowed to be pinned, * then this routine leaves all folios pinned and returns zero for success. @@ -2437,8 +2438,10 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages, long i, ret; folios = kmalloc_array(nr_pages, sizeof(*folios), GFP_KERNEL); - if (!folios) + if (!folios) { + unpin_user_pages(pages, nr_pages); return -ENOMEM; + } for (i = 0; i < nr_pages; i++) folios[i] = page_folio(pages[i]); Then, check_and_migrate_movable_pages() will never return with an error and having folios pinned. If check_and_migrate_movable_pages() -> check_and_migrate_movable_folios() returns "0", all folios remain pinned an no harm is done. Consequently, I think patch #2 is not really required, because it doesn't perform the temporary allocation that could fail with -ENOMEM. Sorry for taking a closer look only now ...
On 10/18/24 12:47 AM, David Hildenbrand wrote: > On 18.10.24 03:17, John Hubbard wrote: >> If a driver tries to call any of the pin_user_pages*(FOLL_LONGTERM) >> family of functions, and requests "too many" pages, then the call will >> erroneously leave pages pinned. This is visible in user space as an >> actual memory leak. >> >> Repro is trivial: just make enough pin_user_pages(FOLL_LONGTERM) calls >> to exhaust memory. >> >> The root cause of the problem is this sequence, within >> __gup_longterm_locked(): >> >> __get_user_pages_locked() >> rc = check_and_migrate_movable_pages() >> >> ...which gets retried in a loop. The loop error handling is incomplete, >> clearly due to a somewhat unusual and complicated tri-state error API. >> But anyway, if -ENOMEM, or in fact, any unexpected error is returned >> from check_and_migrate_movable_pages(), then __gup_longterm_locked() >> happily returns the error, while leaving the pages pinned. > > Sorry for another comment, I am taking my time to look into the code > again in more detail ... > > migrate_longterm_unpinnable_folios() will always unpin all pages: no > matter which error it returns. > > a) If it returns -EAGAIN, it unpinned all folios > b) If it returns any error it first calls unpin_folios(). > > So shouldn't the fix just be in check_and_migrate_movable_pages()? OK, sure. It's a little odd from a layering point of view, because the callee "helpfully" unpins the pages for you (wheee!), but the updated comment highlights that, at least. And actually this whole thing of "pin the pages, just for a short time, even though you're not allowed to" is partly why this area is so entertaining. > > diff --git a/mm/gup.c b/mm/gup.c > index a82890b46a36..81fc8314e687 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2403,8 +2403,9 @@ static int migrate_longterm_unpinnable_folios( > * -EAGAIN. The caller should re-pin the entire range with FOLL_PIN > and then > * call this routine again. > * > - * If an error other than -EAGAIN occurs, this indicates a migration > failure. > - * The caller should give up, and propagate the error back up the call > stack. > + * If an error occurs, all folios are unpinned. If an error other than > + * -EAGAIN occurs, this indicates a migration failure. The caller > should give > + * up, and propagate the error back up the call stack. > * > * If everything is OK and all folios in the range are allowed to be > pinned, > * then this routine leaves all folios pinned and returns zero for > success. > @@ -2437,8 +2438,10 @@ static long > check_and_migrate_movable_pages(unsigned long nr_pages, > long i, ret; > > folios = kmalloc_array(nr_pages, sizeof(*folios), GFP_KERNEL); > - if (!folios) > + if (!folios) { > + unpin_user_pages(pages, nr_pages); > return -ENOMEM; > + } > > for (i = 0; i < nr_pages; i++) > folios[i] = page_folio(pages[i]); > > > > Then, check_and_migrate_movable_pages() will never return with an error and > having folios pinned. > > > If check_and_migrate_movable_pages() -> check_and_migrate_movable_folios() > returns "0", all folios remain pinned an no harm is done. > > > Consequently, I think patch #2 is not really required, because it doesn't > perform the temporary allocation that could fail with -ENOMEM. > Yes! > > Sorry for taking a closer look only now ... > It's all still in review, so the timing is perfectly fine. I really appreciate the closer look, it's definitely making things better. thanks,
John Hubbard <jhubbard@nvidia.com> writes: > On 10/18/24 12:47 AM, David Hildenbrand wrote: >> On 18.10.24 03:17, John Hubbard wrote: [...] > And actually this whole thing of "pin the pages, just for a short time, even > though you're not allowed to" is partly why this area is so entertaining. I'm looking at your v3 but as an aside I disagree with this statement. AFAIK you're always allowed to pin the pages for a short time (ie. !FOLL_LONGTERM), or did I misunderstand your comment? >> diff --git a/mm/gup.c b/mm/gup.c >> index a82890b46a36..81fc8314e687 100644 >> --- a/mm/gup.c >> +++ b/mm/gup.c >> @@ -2403,8 +2403,9 @@ static int migrate_longterm_unpinnable_folios( >> * -EAGAIN. The caller should re-pin the entire range with >> FOLL_PIN and then >> * call this routine again. >> * >> - * If an error other than -EAGAIN occurs, this indicates a >> migration failure. >> - * The caller should give up, and propagate the error back up the >> call stack. >> + * If an error occurs, all folios are unpinned. If an error other than >> + * -EAGAIN occurs, this indicates a migration failure. The caller >> should give >> + * up, and propagate the error back up the call stack. >> * >> * If everything is OK and all folios in the range are allowed to >> be pinned, >> * then this routine leaves all folios pinned and returns zero for >> success. >> @@ -2437,8 +2438,10 @@ static long >> check_and_migrate_movable_pages(unsigned long nr_pages, >> long i, ret; >> folios = kmalloc_array(nr_pages, sizeof(*folios), >> GFP_KERNEL); >> - if (!folios) >> + if (!folios) { >> + unpin_user_pages(pages, nr_pages); >> return -ENOMEM; >> + } >> for (i = 0; i < nr_pages; i++) >> folios[i] = page_folio(pages[i]); >> Then, check_and_migrate_movable_pages() will never return with an >> error and >> having folios pinned. >> If check_and_migrate_movable_pages() -> >> check_and_migrate_movable_folios() >> returns "0", all folios remain pinned an no harm is done. >> Consequently, I think patch #2 is not really required, because it >> doesn't >> perform the temporary allocation that could fail with -ENOMEM. >> > > Yes! > >> Sorry for taking a closer look only now ... >> > > It's all still in review, so the timing is perfectly fine. I really > appreciate the closer look, it's definitely making things better. > > > thanks,
On 10/20/24 3:59 PM, Alistair Popple wrote: > John Hubbard <jhubbard@nvidia.com> writes: >> On 10/18/24 12:47 AM, David Hildenbrand wrote: >>> On 18.10.24 03:17, John Hubbard wrote: > [...] >> And actually this whole thing of "pin the pages, just for a short time, even >> though you're not allowed to" is partly why this area is so entertaining. > > I'm looking at your v3 but as an aside I disagree with this > statement. AFAIK you're always allowed to pin the pages for a short time > (ie. !FOLL_LONGTERM), or did I misunderstand your comment? Sort of: short term pins are allowed, but at this point in the code, here: pin_user_pages(FOLL_PIN | FOLL_LONGTERM) __gup_longterm_locked() __get_user_pages_locked(FOLL_PIN | FOLL_LONGTERM) , just before calling check_and_migrate_movable_pages(), we have already filtered out any cases other than (FOLL_PIN | FOLL_LONGTERM). And that means that code has taken a *longterm* pin of presumably short duration (this incongruity bothers me), on pages that are not actually allowed to be long term pinned. That also feels imperfect, even though it is supposedly short duration...except that page migration is only sort of short...hmmm. I'm starting to think that migrating any ZONE_MOVABLE pages away first might be better. Since I'm already preparing that "wait for folio refcount" idea for migration, which is almost related, I'll take a closer look at this idea while I'm at it. thanks,
diff --git a/mm/gup.c b/mm/gup.c index a82890b46a36..233c284e8e66 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2492,6 +2492,17 @@ static long __gup_longterm_locked(struct mm_struct *mm, /* FOLL_LONGTERM implies FOLL_PIN */ rc = check_and_migrate_movable_pages(nr_pinned_pages, pages); + + /* + * The __get_user_pages_locked() call happens before we know if + * it's possible to successfully complete the whole operation. + * To compensate for this, if we get an unexpected error (such + * as -ENOMEM) then we must unpin everything, before erroring + * out. + */ + if (rc != -EAGAIN && rc != 0) + unpin_user_pages(pages, nr_pinned_pages); + } while (rc == -EAGAIN); memalloc_pin_restore(flags); return rc ? rc : nr_pinned_pages;