Message ID | fca2f694-2098-b0ef-d4e-f1d8b94d318c@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm,huge,rmap: unify and speed up compound mapcounts | expand |
On Wed, Nov 9, 2022 at 6:18 PM Hugh Dickins <hughd@google.com> wrote: > > Commit ("mm,thp,rmap: lock_compound_mapcounts() on THP mapcounts") > propagated the "if (compound) {lock} else if (PageCompound) {lock} else > {atomic}" pattern throughout; but Linus hated the way that gives primacy > to the uncommon case: switch to "if (!PageCompound) {atomic} else if > (compound) {lock} else {lock}" throughout. Side note, that 'compound' naming is also on my list of "I'm _really_ not a fan". We actually have a completely different meaning for PageCompound() than the meaning of 'compound' in the rmap functions, and those functions literally mix those meanings if not on the same line, then at least right next to each other. What 'rmap' actually means with 'compound' in the add/remove functions is basically 'not PAGE_SIZE' as far as I can tell. So if I get the energy to do the rmap counts, I will *also* be renaming that horrible thing completely. In fact, I'd be inclined to just pass in the actual page size (possibly a page shift order), which some of the functions want anyway, and which would be a lot clearer than the horrid "compound" name. One reason I find the "compound" name so horrifying is that it is used very much for HUGETLB pages, which I don't think end up ever being marked as PageCompund(), and which are - for various historical reasons - doubly confusing because they use a "pte_t" to describe themselves, even when they are actually using a "pmd_t" or a "pud_t" to actually map the page. So a HUGETLB entry really is (for historical reasons) designed to look like a single last-level pte_t entry, but from an rmap perspective it is explicitly *not* made to look like that at all, completely violating the HUGETLB design. So the rmap code has several really REALLY confusing cases: - the common one: just a page mapped at a *real* pte_t level. To make that more confusing, it can actually be a single-page _part_ of a compound page in the PageCompound() sense, but the rmap 'compound' argument will *not* be set, because from a *mmap* standpoint it's mapped as a single page. This is generally recognized by the rmap code by 'compound' being zero. - a HUGETLB mapping, which uses '->pte' in the page walking (please kill me now) and is *not* using a PageCompund() page, but 'compound' is still set, because from a *mapping* standpoint it's not a final pte_t entry (buit from a MM design standpoint it _was_ supposed to be designed like a single page). This is randomly recognized by looking at the vma flags (using "is_vm_hugetlb_page(vma)") or just inherent in the code itself (ie the 'hugetlb()' functions are only called by code that has tested this situation one way or another) To make things more confusing, some places use PageHeadHuge() instead (but the folio version of said test is called "folio_test_hugetlb()", just so that nobody could possibly ever accuse the HUGETLB code to have consistency). You'd think that PageHeadHuge() is one of the functions that checks the page flag bits. You'd be wrong. It's very very special. - an *actual* PageCompound() page, mapped as such as a THP page (ie mapped by a pmd, not a pte). This may be the same page size as a HUGETLB mapping (and often is), but it's a completely different case in every single way. But like the HUGETLB case, the 'compound' argument will be set, and now it's actually a compound page (but hey, so could the single page mapping case be too). Unlike the HUGETLB case, the page walker does not use ->pte for this, and some of the walkers will very much use that, ie folio_referenced_one() will do if (pvmw.pte) { to distinguish the "natively mapped PageCompound()" case (no pte) from the "map a single page" or from the HUGETLB case (yes pte). There may be more cases than those three, and I may have confused myself and gotten some of the details wrong, but I did want to write the above diatribe out to (a) see if somebody corrects me for any of the cases I enumerated (b) see if somebody can point to yet another odd case (c) see if somebody has suggestions for better and more obvious names for that 'compound' argument in the rmap code I do wish the HUGETLB case didn't use 'pte' for its notion of how HUGETLB entries are mapped, but that's literally how HUGETLB is designed: it started life as a larger last-level pte. It just means that it ends up being very confusing when from a page table walk perspective, you're walking a pud or a pmd entry, and then you see a 'pte_t' instead. An example of that confusion is visible in try_to_unmap_one(), which can be called with a HUGEPTE page (well, folio), and that does while (page_vma_mapped_walk(&pvmw)) { to find the rmap entries, but it can't do that if (pvmw.pte) { test to see what mapping it's walking (since both regular pages and HUGETLB pages use that), so then it just keeps testing what kind of page that was passed in. Which really smells very odd to me, but apparently it works, presumably because unlike THP there can be no splitting. But it's a case where the whole "was it a regular page or a HUGETLB page" is really really confusing/ And mm/hugetlb.c (and places like mm/pagewalk.c too) has a fair number of random casts as a result of this "it's not really a pte_t, but it's designed to look like one" thing. This all really is understandable from a historical context, and from HUGETLB really being just kind of designed to be a bigger page (not a collection of small pages that can be mapped as a bigger entity), but it really does mean that 'rmap' calling those HUGETLB pages 'compound' is conceptually very very wrong. Oh well. That whole HUGETLB model isn't getting fixed, but I think the naming confusion about 'compound' *can* be fixed fairly easily, and we could try to at least avoid having 'compound' and 'PageCompound()' mean totally different things in the same function. I'm not going to do any of this cleanup now, but I wanted to at least voice my concerns. Maybe I'll get around to actually trying to clarify the code later. Because this was all stuff that was *very* confusing when I did the rmap simplification in that (now completely rewritten to explicitly _not_ touch rmap at all) original version of the delayed rmap patch series. Linus
On Wed, 9 Nov 2022, Linus Torvalds wrote: > On Wed, Nov 9, 2022 at 6:18 PM Hugh Dickins <hughd@google.com> wrote: > > > > Commit ("mm,thp,rmap: lock_compound_mapcounts() on THP mapcounts") > > propagated the "if (compound) {lock} else if (PageCompound) {lock} else > > {atomic}" pattern throughout; but Linus hated the way that gives primacy > > to the uncommon case: switch to "if (!PageCompound) {atomic} else if > > (compound) {lock} else {lock}" throughout. > > Side note, that 'compound' naming is also on my list of "I'm _really_ > not a fan". > > We actually have a completely different meaning for PageCompound() > than the meaning of 'compound' in the rmap functions, and those > functions literally mix those meanings if not on the same line, then > at least right next to each other. > > What 'rmap' actually means with 'compound' in the add/remove functions > is basically 'not PAGE_SIZE' as far as I can tell. > > So if I get the energy to do the rmap counts, See other mail, I got some zest to try your idea on the counts. > I will *also* be renaming that horrible thing completely. But I don't suppose I'll spend time on that part, I don't really see the problem. "compound" might be better named, say, "large_rmap" (I'd have said "pmd_mapped" or "pmd_rmap", but you raise the spectre of hugetlb below, and powerpc as usual does hugetlb very differently), but compound seems okay to me, and consistent with usage elsewhere. > > In fact, I'd be inclined to just pass in the actual page size > (possibly a page shift order), which some of the functions want > anyway, and which would be a lot clearer than the horrid "compound" > name. But yes, I think that would be an improvement; yet you might find a reason why so often we don't do that - there's often an awkward BUILD_BUG when you build without CONFIG_TRANSPARENT_HUGEPAGE=y. And much as I've often wanted to remove it, it does give some assurance that we're not bloating THP-disabled configs. Maybe the steady growth of compound_nr() usage gets around that better now (or will you be renaming that too ?-) > > One reason I find the "compound" name so horrifying is that it is used > very much for HUGETLB pages, which I don't think end up ever being > marked as PageCompund(), and which are - for various historical hugetlb pages are always PageCompound. Shoot me if they're not. > reasons - doubly confusing because they use a "pte_t" to describe > themselves, even when they are actually using a "pmd_t" or a "pud_t" > to actually map the page. Yes, I wish we would undo that hugetlb deception: it would probably be much more (un)doable, were it not for powerpc (and ia64 iirc). > > So a HUGETLB entry really is (for historical reasons) designed to look > like a single last-level pte_t entry, but from an rmap perspective it > is explicitly *not* made to look like that at all, completely > violating the HUGETLB design. > > So the rmap code has several really REALLY confusing cases: > > - the common one: just a page mapped at a *real* pte_t level. > > To make that more confusing, it can actually be a single-page > _part_ of a compound page in the PageCompound() sense, but the rmap > 'compound' argument will *not* be set, because from a *mmap* > standpoint it's mapped as a single page. Yes. Most pages are unambiguous, but when a PageHead page arrives at page_add/remove_rmap(), we have to do different things, according to whether it's mapped with a large or a small entry. But I'm going away at this point, you write much faster than I can read and understand and respond. I'm responding in part to "fix" my stupid typo on Johannes's address. Hugh > > This is generally recognized by the rmap code by 'compound' being zero. > > - a HUGETLB mapping, which uses '->pte' in the page walking (please > kill me now) and is *not* using a PageCompund() page, but 'compound' > is still set, because from a *mapping* standpoint it's not a final > pte_t entry (buit from a MM design standpoint it _was_ supposed to be > designed like a single page). > > This is randomly recognized by looking at the vma flags (using > "is_vm_hugetlb_page(vma)") or just inherent in the code itself (ie the > 'hugetlb()' functions are only called by code that has tested this > situation one way or another) > > To make things more confusing, some places use PageHeadHuge() > instead (but the folio version of said test is called > "folio_test_hugetlb()", just so that nobody could possibly ever accuse > the HUGETLB code to have consistency). > > You'd think that PageHeadHuge() is one of the functions that > checks the page flag bits. You'd be wrong. It's very very special. > > - an *actual* PageCompound() page, mapped as such as a THP page (ie > mapped by a pmd, not a pte). > > This may be the same page size as a HUGETLB mapping (and often is), > but it's a completely different case in every single way. > > But like the HUGETLB case, the 'compound' argument will be set, and > now it's actually a compound page (but hey, so could the single page > mapping case be too). > > Unlike the HUGETLB case, the page walker does not use ->pte for > this, and some of the walkers will very much use that, ie > folio_referenced_one() will do > > if (pvmw.pte) { > > to distinguish the "natively mapped PageCompound()" case (no pte) > from the "map a single page" or from the HUGETLB case (yes pte). > > There may be more cases than those three, and I may have confused > myself and gotten some of the details wrong, but I did want to write > the above diatribe out to > > (a) see if somebody corrects me for any of the cases I enumerated > > (b) see if somebody can point to yet another odd case > > (c) see if somebody has suggestions for better and more obvious names > for that 'compound' argument in the rmap code > > I do wish the HUGETLB case didn't use 'pte' for its notion of how > HUGETLB entries are mapped, but that's literally how HUGETLB is > designed: it started life as a larger last-level pte. > > It just means that it ends up being very confusing when from a page > table walk perspective, you're walking a pud or a pmd entry, and then > you see a 'pte_t' instead. > > An example of that confusion is visible in try_to_unmap_one(), which > can be called with a HUGEPTE page (well, folio), and that does > > while (page_vma_mapped_walk(&pvmw)) { > > to find the rmap entries, but it can't do that > > if (pvmw.pte) { > > test to see what mapping it's walking (since both regular pages and > HUGETLB pages use that), so then it just keeps testing what kind of > page that was passed in. > > Which really smells very odd to me, but apparently it works, > presumably because unlike THP there can be no splitting. But it's a > case where the whole "was it a regular page or a HUGETLB page" is > really really confusing/ > > And mm/hugetlb.c (and places like mm/pagewalk.c too) has a fair number > of random casts as a result of this "it's not really a pte_t, but it's > designed to look like one" thing. > > This all really is understandable from a historical context, and from > HUGETLB really being just kind of designed to be a bigger page (not a > collection of small pages that can be mapped as a bigger entity), but > it really does mean that 'rmap' calling those HUGETLB pages 'compound' > is conceptually very very wrong. > > Oh well. That whole HUGETLB model isn't getting fixed, but I think the > naming confusion about 'compound' *can* be fixed fairly easily, and we > could try to at least avoid having 'compound' and 'PageCompound()' > mean totally different things in the same function. > > I'm not going to do any of this cleanup now, but I wanted to at least > voice my concerns. Maybe I'll get around to actually trying to clarify > the code later. > > Because this was all stuff that was *very* confusing when I did the > rmap simplification in that (now completely rewritten to explicitly > _not_ touch rmap at all) original version of the delayed rmap patch > series. > > Linus
On Wed, Nov 09, 2022 at 07:23:08PM -0800, Linus Torvalds wrote: > On Wed, Nov 9, 2022 at 6:18 PM Hugh Dickins <hughd@google.com> wrote: > > > > Commit ("mm,thp,rmap: lock_compound_mapcounts() on THP mapcounts") > > propagated the "if (compound) {lock} else if (PageCompound) {lock} else > > {atomic}" pattern throughout; but Linus hated the way that gives primacy > > to the uncommon case: switch to "if (!PageCompound) {atomic} else if > > (compound) {lock} else {lock}" throughout. > > Side note, that 'compound' naming is also on my list of "I'm _really_ > not a fan". > > We actually have a completely different meaning for PageCompound() > than the meaning of 'compound' in the rmap functions, and those > functions literally mix those meanings if not on the same line, then > at least right next to each other. > > What 'rmap' actually means with 'compound' in the add/remove functions > is basically 'not PAGE_SIZE' as far as I can tell. Ah. I've been trying to understand what that 'compound' really means, and what the difference is to 'PageCompound()' and why we need both. Thanks! > One reason I find the "compound" name so horrifying is that it is used > very much for HUGETLB pages, which I don't think end up ever being > marked as PageCompund(), and which are - for various historical > reasons - doubly confusing because they use a "pte_t" to describe > themselves, even when they are actually using a "pmd_t" or a "pud_t" > to actually map the page. HugeTLB pages _are_ marked as Compound. There's some fairly horrific code to manually make them compound when they have to be allocated piecemeal (because they're 1GB and too large for the page allocator). > To make things more confusing, some places use PageHeadHuge() > instead (but the folio version of said test is called > "folio_test_hugetlb()", just so that nobody could possibly ever accuse > the HUGETLB code to have consistency). That one's my fault, but it's a reaction to all the times that I and others have got confused between PageHuge and PageTransHuge. I suppose we could do a big sed s/PageHuge/PageHugeTLB/, but I'm hopeful the entire hugetlb codebase is either converted to folios or unified with THP handling. > I do wish the HUGETLB case didn't use 'pte' for its notion of how > HUGETLB entries are mapped, but that's literally how HUGETLB is > designed: it started life as a larger last-level pte. > > It just means that it ends up being very confusing when from a page > table walk perspective, you're walking a pud or a pmd entry, and then > you see a 'pte_t' instead. Yes, one of the long-term things I want to try is making the hugetlb code use the pmd/pud types like the THP code does.
On Thu, Nov 10, 2022 at 8:31 AM Matthew Wilcox <willy@infradead.org> wrote: > > Ah. I've been trying to understand what that 'compound' really means, > and what the difference is to 'PageCompound()' and why we need both. Yeah, so the 'why' is: (a) to distinguish the case of "I'm mapping the first sub-page of a compound page as a _single_ page page entry in the pte" from "I'm mapping the whole compound/THP/HUGETLB page as a pmd" The actual 'page' pointer can be the same in both cases, so you can't tell from that: PageCompound() will be true in both cases. Of course, sometimes you *can* tell from the page pointer too (eg the HUGETLB case can never be mapped as a small page), but not always. (b) because we do completely different things from a page locking and statistics standpoint for the two cases. That (b) is obviously related to (a), but it's effectively the main reason why rmap needs to be able to tell the difference in the first place. > HugeTLB pages _are_ marked as Compound. Oh, ok. It's not clear why they would be, and historically I don't think they were, but I guess it's for random implementation details (probably to look up the head page logic). > There's some fairly horrific > code to manually make them compound when they have to be allocated > piecemeal (because they're 1GB and too large for the page allocator). Yeah, the HUGETLB case is a mess these days, but it made sense historically, because it was a much simpler thing than the THP pages that have all the fragmentation cases. Now that we handle pmd-sized pages anyway, the HUGETLB case is mostly just a nasty oddity, but we obviously also do the pud case with HUGETLB. And who knows what ia64 did with its completely random page-size thing. I don't even want to think about it, and thankfully these days I don't feel like I need to care any more ;) > > To make things more confusing, some places use PageHeadHuge() > > instead (but the folio version of said test is called > > "folio_test_hugetlb()", just so that nobody could possibly ever accuse > > the HUGETLB code to have consistency). > > That one's my fault, but it's a reaction to all the times that I and > others have got confused between PageHuge and PageTransHuge. I suppose > we could do a big sed s/PageHuge/PageHugeTLB/, but I'm hopeful the > entire hugetlb codebase is either converted to folios or unified with > THP handling. Yeah, it would be lovely to make HUGETLB some THP special case some day. Linus
diff --git a/mm/rmap.c b/mm/rmap.c index 512e53cae2ca..4833d28c5e1a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1311,7 +1311,11 @@ void page_add_anon_rmap(struct page *page, else VM_BUG_ON_PAGE(!PageLocked(page), page); - if (compound && PageTransHuge(page)) { + if (likely(!PageCompound(page))) { + first = atomic_inc_and_test(&page->_mapcount); + nr = first; + + } else if (compound && PageTransHuge(page)) { lock_compound_mapcounts(page, &mapcounts); first = !mapcounts.compound_mapcount; mapcounts.compound_mapcount++; @@ -1321,8 +1325,7 @@ void page_add_anon_rmap(struct page *page, nr = nr_subpages_unmapped(page, nr_pmdmapped); } unlock_compound_mapcounts(page, &mapcounts); - - } else if (PageCompound(page)) { + } else { struct page *head = compound_head(page); lock_compound_mapcounts(head, &mapcounts); @@ -1330,10 +1333,6 @@ void page_add_anon_rmap(struct page *page, first = subpage_mapcount_inc(page); nr = first && !mapcounts.compound_mapcount; unlock_compound_mapcounts(head, &mapcounts); - - } else { - first = atomic_inc_and_test(&page->_mapcount); - nr = first; } VM_BUG_ON_PAGE(!first && (flags & RMAP_EXCLUSIVE), page); @@ -1373,20 +1372,23 @@ void page_add_anon_rmap(struct page *page, void page_add_new_anon_rmap(struct page *page, struct vm_area_struct *vma, unsigned long address) { - const bool compound = PageCompound(page); - int nr = compound ? thp_nr_pages(page) : 1; + int nr; VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); __SetPageSwapBacked(page); - if (compound) { + + if (likely(!PageCompound(page))) { + /* increment count (starts at -1) */ + atomic_set(&page->_mapcount, 0); + nr = 1; + } else { VM_BUG_ON_PAGE(!PageTransHuge(page), page); /* increment count (starts at -1) */ atomic_set(compound_mapcount_ptr(page), 0); + nr = thp_nr_pages(page); __mod_lruvec_page_state(page, NR_ANON_THPS, nr); - } else { - /* increment count (starts at -1) */ - atomic_set(&page->_mapcount, 0); } + __mod_lruvec_page_state(page, NR_ANON_MAPPED, nr); __page_set_anon_rmap(page, vma, address, 1); } @@ -1409,7 +1411,11 @@ void page_add_file_rmap(struct page *page, VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page); lock_page_memcg(page); - if (compound && PageTransHuge(page)) { + if (likely(!PageCompound(page))) { + first = atomic_inc_and_test(&page->_mapcount); + nr = first; + + } else if (compound && PageTransHuge(page)) { lock_compound_mapcounts(page, &mapcounts); first = !mapcounts.compound_mapcount; mapcounts.compound_mapcount++; @@ -1419,8 +1425,7 @@ void page_add_file_rmap(struct page *page, nr = nr_subpages_unmapped(page, nr_pmdmapped); } unlock_compound_mapcounts(page, &mapcounts); - - } else if (PageCompound(page)) { + } else { struct page *head = compound_head(page); lock_compound_mapcounts(head, &mapcounts); @@ -1428,10 +1433,6 @@ void page_add_file_rmap(struct page *page, first = subpage_mapcount_inc(page); nr = first && !mapcounts.compound_mapcount; unlock_compound_mapcounts(head, &mapcounts); - - } else { - first = atomic_inc_and_test(&page->_mapcount); - nr = first; } if (nr_pmdmapped) @@ -1471,7 +1472,11 @@ void page_remove_rmap(struct page *page, lock_page_memcg(page); /* page still mapped by someone else? */ - if (compound && PageTransHuge(page)) { + if (likely(!PageCompound(page))) { + last = atomic_add_negative(-1, &page->_mapcount); + nr = last; + + } else if (compound && PageTransHuge(page)) { lock_compound_mapcounts(page, &mapcounts); mapcounts.compound_mapcount--; last = !mapcounts.compound_mapcount; @@ -1481,8 +1486,7 @@ void page_remove_rmap(struct page *page, nr = nr_subpages_unmapped(page, nr_pmdmapped); } unlock_compound_mapcounts(page, &mapcounts); - - } else if (PageCompound(page)) { + } else { struct page *head = compound_head(page); lock_compound_mapcounts(head, &mapcounts); @@ -1490,10 +1494,6 @@ void page_remove_rmap(struct page *page, last = subpage_mapcount_dec(page); nr = last && !mapcounts.compound_mapcount; unlock_compound_mapcounts(head, &mapcounts); - - } else { - last = atomic_add_negative(-1, &page->_mapcount); - nr = last; } if (nr_pmdmapped) {
Commit ("mm,thp,rmap: lock_compound_mapcounts() on THP mapcounts") propagated the "if (compound) {lock} else if (PageCompound) {lock} else {atomic}" pattern throughout; but Linus hated the way that gives primacy to the uncommon case: switch to "if (!PageCompound) {atomic} else if (compound) {lock} else {lock}" throughout. Linus has a bigger idea for how to improve it all, but here just make that rearrangement. Signed-off-by: Hugh Dickins <hughd@google.com> --- mm/rmap.c | 54 +++++++++++++++++++++++++++--------------------------- 1 file changed, 27 insertions(+), 27 deletions(-)