Message ID | cover.1702882426.git.xuyu@linux.alibaba.com (mailing list archive) |
---|---|
Headers | show |
Series | attempt to map anonymous pte-mapped THPs by pmds | expand |
Hey Xu, Thanks for the patches. As a precursor, can you help understand what the use case is for these patches? In-place collapse of anon memory is something I've thought about before, but the opportunity has never been especially clear. In particular, your patches take an order-9 compound page, and just try to see if we can update the mappings to it (like we do with file/shmem). Functionally this seems fine, but the difference is that with file/shmem, it's quite easy to have a pte-mapped-hugepage arise naturally (the formation of the hugepage happening in the pagecache being logically separate from the pmd-mapping of w/e task is mapping it).\ For anonymous memory, the only time I can see us having a pte-mapped hugepage (that isn't destined for splitting on deferred split list) that we want to remap by a pmd is if we cause a VMA split + remerge by mucking with VMA attributes. In my mind, what I had been thinking of w.r.t in-place anon collapse was for the case where we've split a THP with MADV_FREE/MADV_DONTNEED (i.e. to subrelease pages back to kernel), but later want to reform the THP. In particular, if, for example, we only subrelease O(10s) of order-0 pages, it seems wasteful to have to reallocate a fresh hugepage, then copy over O(100s) of pages, on collapse. If we were able to attempt to first migrate-away any of those previously subreleased pages (now possibly backing some other memory entirely), it could save us from having to allocate a fresh order-9 page. Under memory pressure / fragmentation, this could mean the difference between success and failure. Thanks for your help here, Zach On Sun, Dec 17, 2023 at 11:06 PM Xu Yu <xuyu@linux.alibaba.com> wrote: > > Result of tools/testing/selftests/mm/cow.c tests: > # [RUN] Basic COW after fork() when collapsing before fork() > ok 145 No leak from parent into child > # [RUN] Basic COW after fork() when collapsing after fork() (fully shared) > ok 146 No leak from parent into child > # [RUN] Basic COW after fork() when collapsing after fork() (lower shared) > ok 147 No leak from parent into child > # [RUN] Basic COW after fork() when collapsing after fork() (upper shared) > ok 148 No leak from parent into child > > A long run (w/ CONFIG_DEBUG_VM enabled) shows no panic or memory leaks. > > Changes since v2: > - Use folios in the new code, as suggested by David. > - Handle folio refcount and rmap properly, as suggested by David. > - minor modification includes 1) advance vma write lock, 2) remove > redundant rollback logic, 3) clear old ptes in pgtable before deposit. > > Changes since v1: > - Deal with PageAnonExclusive properly, as suggested by David. > > Xu Yu (2): > mm/khugepaged: map RO non-exclusive pte-mapped anon THPs by pmds > mm/khugepaged: map exclusive anonymous pte-mapped THPs by pmds > > mm/khugepaged.c | 229 ++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 229 insertions(+) > > -- > 2.37.1 > >
On 21.12.23 21:40, Zach O'Keefe wrote: > Hey Xu, > > Thanks for the patches. > > As a precursor, can you help understand what the use case is for these > patches? In-place collapse of anon memory is something I've thought > about before, but the opportunity has never been especially clear. > > In particular, your patches take an order-9 compound page, and just > try to see if we can update the mappings to it (like we do with > file/shmem). Functionally this seems fine, but the difference is that > with file/shmem, it's quite easy to have a pte-mapped-hugepage arise > naturally (the formation of the hugepage happening in the pagecache > being logically separate from the pmd-mapping of w/e task is mapping > it).\ > > For anonymous memory, the only time I can see us having a pte-mapped > hugepage (that isn't destined for splitting on deferred split list) > that we want to remap by a pmd is if we cause a VMA split + remerge by > mucking with VMA attributes. Yes, mostly because of madvise(), mprotect(), mremap(). But also, when putting a THP into the swap cache right now. When refaulting, you get a PTE-mapped THP. There are some other odd cases, and there might be more in the future (below) > > In my mind, what I had been thinking of w.r.t in-place anon collapse > was for the case where we've split a THP with MADV_FREE/MADV_DONTNEED > (i.e. to subrelease pages back to kernel), but later want to reform > the THP. In particular, if, for example, we only subrelease O(10s) of Right, and in-place collapse even works if the folio has been pinned, which is nice. > order-0 pages, it seems wasteful to have to reallocate a fresh > hugepage, then copy over O(100s) of pages, on collapse. If we were > able to attempt to first migrate-away any of those previously > subreleased pages (now possibly backing some other memory entirely), > it could save us from having to allocate a fresh order-9 page. Under > memory pressure / fragmentation, this could mean the difference > between success and failure. > One thing that popped up a couple of times already is that we might want to PTE-map a PMD-sized THP for a couple of reasons (IIRC, FreeBSD does some of that). For example: * Lazily zero the pages of the folio on demand, keeping all non-zeroed parts protnone. At a certain time (e.g., all zeroed), simply remap using a PMD. * Detecting sub-page access by temporarily mapping the THP using PTEs. Maybe, also some uffd optimizations, whereby protnone parts are not faulted in yet.