Message ID | 20240502085259.103784-3-david@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL | expand |
On 02.05.24 16:28, Peter Xu wrote: > On Thu, May 02, 2024 at 10:52:59AM +0200, David Hildenbrand wrote: >> Let's document why hugetlb still uses folio_mapcount() and is prone to >> leaking memory between processes, for example using vmsplice() that >> still uses FOLL_GET. >> >> More details can be found in [1], especially around how hugetlb pages >> cannot really be overcommitted, and why we don't particularly care about >> these vmsplice() leaks for hugetlb -- in contrast to ordinary memory. >> >> [1] https://lore.kernel.org/all/8b42a24d-caf0-46ef-9e15-0f88d47d2f21@redhat.com/ >> >> Suggested-by: Peter Xu <peterx@redhat.com> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> mm/hugetlb.c | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index 417fc5cdb6eeb..a7efb350f5d07 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -5963,6 +5963,13 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, >> /* >> * If no-one else is actually using this page, we're the exclusive >> * owner and can reuse this page. >> + * >> + * Note that we don't rely on the (safer) folio refcount here, because >> + * copying the hugetlb folio when there are unexpected (temporary) >> + * folio references could harm simple fork()+exit() users when >> + * we run out of free hugetlb folios: we would have to kill processes >> + * in scenarios that used to work. As a side effect, there can still >> + * be leaks between processes, for example, with FOLL_GET users. >> */ >> if (folio_mapcount(old_folio) == 1 && folio_test_anon(old_folio)) { >> if (!PageAnonExclusive(&old_folio->page)) { > > Thanks for preparing such updates, David. > > However is fork+exit the real problem? E.g. if a child simply fork, do > something, then exit, I don't see a major issue even if we follow refcount > here (despite the "check against 1 or 2 or 3" issue, where hugetlb_fault > can take one already). As long as the child quits, all ref / map counts > will be released then. If the child needs to write to ANON|PRIV it needs > to manage hugetlb reservations anyways. The PAE flag was cleared and if there is any unexpected (temporary) reference (page migration, lockless swapcache lookups, whatever), we're in trouble. > > In the case of vmsplice it's kind of malicious, and holding that refcount > (with 0 mapcount) doesn't sound the common scenario to me. Yes, I'm not that concerned about something that that APP would be triggering (vmsplice). > > IIUC if we need to keep this, it was more about the case where (as you > correctly mentioned in another follow up reply) hugetlb isn't that flexible > to memory overcommits, and in many cases it won't have extra pages floating > around to allow adhoc CoWs? While random refcount boost is easy to happen, > and here the problem is we simply cannot identify that v.s. vmsplice() > malicious takers. Yes, exactly.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 417fc5cdb6eeb..a7efb350f5d07 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5963,6 +5963,13 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, /* * If no-one else is actually using this page, we're the exclusive * owner and can reuse this page. + * + * Note that we don't rely on the (safer) folio refcount here, because + * copying the hugetlb folio when there are unexpected (temporary) + * folio references could harm simple fork()+exit() users when + * we run out of free hugetlb folios: we would have to kill processes + * in scenarios that used to work. As a side effect, there can still + * be leaks between processes, for example, with FOLL_GET users. */ if (folio_mapcount(old_folio) == 1 && folio_test_anon(old_folio)) { if (!PageAnonExclusive(&old_folio->page)) {
Let's document why hugetlb still uses folio_mapcount() and is prone to leaking memory between processes, for example using vmsplice() that still uses FOLL_GET. More details can be found in [1], especially around how hugetlb pages cannot really be overcommitted, and why we don't particularly care about these vmsplice() leaks for hugetlb -- in contrast to ordinary memory. [1] https://lore.kernel.org/all/8b42a24d-caf0-46ef-9e15-0f88d47d2f21@redhat.com/ Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/hugetlb.c | 7 +++++++ 1 file changed, 7 insertions(+)