Message ID | 20200804214807.169256-1-jhubbard@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] mm, dump_page: do not crash with bad compound_mapcount() | expand |
On 8/4/20 11:48 PM, John Hubbard wrote: > If a compound page is being split while dump_page() is being run on that > page, we can end up calling compound_mapcount() on a page that is no > longer compound. This leads to a crash (already seen at least once in > the field), due to the VM_BUG_ON_PAGE() assertion inside > compound_mapcount(). > > (The above is from Matthew Wilcox's analysis of Qian Cai's bug report.) > > A similar problem is possible, via compound_pincount() instead of > compound_mapcount(). > > In order to avoid this kind of crash, make dump_page() slightly more > robust, by providing a pair of simpler routines that don't contain > assertions: head_mapcount() and head_pincount(). > > For debug tools, we don't want to go *too* far in this direction, but > this is a simple small fix, and the crash has already been seen, so it's > a good trade-off. > > Reported-by: Qian Cai <cai@lca.pw> > Suggested-by: Matthew Wilcox <willy@infradead.org> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Signed-off-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> > --- > Hi, > > I'm assuming that a fix is not required for -stable, but let me know if > others feel differently. The dump_page() code has changed a lot in that > area. I'd say only if we backport all those changes, as most were to avoid similar kind of crashes? How about this additional patch now that we have head_mapcoun()? (I wouldn't go for squashing as the goal and scope is too different). ----8<---- From 3b3d5b4639eea9c0787eed510a32acdc918d569f Mon Sep 17 00:00:00 2001 From: Vlastimil Babka <vbabka@suse.cz> Date: Thu, 6 Aug 2020 13:33:58 +0200 Subject: [PATCH] mm, thp: use head_mapcount when we know we have a head page Patch "mm, dump_page: do not crash with bad compound_mapcount()" has introduced head_mapcount() to split out the part of compound_mapcount() where we already know/assume we have a head page. We can use the new function in more places where we already have a head page, to avoid the overhead of compound_head() and (with DEBUG_VM) a debug check. This patch does that. There are few more applicable places, but behind DEBUG_VM so performance is not important, and the extra debug check in compound_mapcount() could be useful instead. The bloat-o-meter difference without DEBUG_VM is the following: add/remove: 0/0 grow/shrink: 1/4 up/down: 32/-56 (-24) Function old new delta __split_huge_pmd 2867 2899 +32 shrink_page_list 3860 3847 -13 reuse_swap_page 762 748 -14 page_trans_huge_mapcount 153 139 -14 total_mapcount 187 172 -15 Total: Before=8687306, After=8687282, chg -0.00% This just shows that compiler wasn't able to prove we have a head page by itself. In __split_huge_pmd() the eliminated check probably led to different optimization decisions thus code size increased. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- mm/huge_memory.c | 6 +++--- mm/swapfile.c | 2 +- mm/vmscan.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 90733cefa528..5927874b7894 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, * Set PG_double_map before dropping compound_mapcount to avoid * false-negative page_mapped(). */ - if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { + if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { for (i = 0; i < HPAGE_PMD_NR; i++) atomic_inc(&page[i]._mapcount); } @@ -2467,7 +2467,7 @@ int total_mapcount(struct page *page) if (likely(!PageCompound(page))) return atomic_read(&page->_mapcount) + 1; - compound = compound_mapcount(page); + compound = head_mapcount(page); if (PageHuge(page)) return compound; ret = compound; @@ -2531,7 +2531,7 @@ int page_trans_huge_mapcount(struct page *page, int *total_mapcount) ret -= 1; _total_mapcount -= HPAGE_PMD_NR; } - mapcount = compound_mapcount(page); + mapcount = head_mapcount(page); ret += mapcount; _total_mapcount += mapcount; if (total_mapcount) diff --git a/mm/swapfile.c b/mm/swapfile.c index 9ee4211835c6..c5e722de38b8 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1673,7 +1673,7 @@ static int page_trans_huge_map_swapcount(struct page *page, int *total_mapcount, map_swapcount -= 1; _total_mapcount -= HPAGE_PMD_NR; } - mapcount = compound_mapcount(page); + mapcount = head_mapcount(page); map_swapcount += mapcount; _total_mapcount += mapcount; if (total_mapcount) diff --git a/mm/vmscan.c b/mm/vmscan.c index a086c104a9a6..72218cdfd902 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1248,7 +1248,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, * away. Chances are some or all of the * tail pages can be freed without IO. */ - if (!compound_mapcount(page) && + if (!head_mapcount(page) && split_huge_page_to_list(page, page_list)) goto activate_locked;
On Thu, Aug 06, 2020 at 01:45:11PM +0200, Vlastimil Babka wrote: > How about this additional patch now that we have head_mapcoun()? (I wouldn't > go for squashing as the goal and scope is too different). I like it. It bothers me that the compiler doesn't know that compound_head(compound_head(x)) == compound_head(x). I updated https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32911 with a request to be able to tell the compiler that compound_head() is idempotent. > The bloat-o-meter difference without DEBUG_VM is the following: > > add/remove: 0/0 grow/shrink: 1/4 up/down: 32/-56 (-24) > Function old new delta > __split_huge_pmd 2867 2899 +32 > shrink_page_list 3860 3847 -13 > reuse_swap_page 762 748 -14 > page_trans_huge_mapcount 153 139 -14 > total_mapcount 187 172 -15 > Total: Before=8687306, After=8687282, chg -0.00% That's great. I'm expecting improvements from my thp_head() macro when that lands (currently in Andrew's tree). I have been reluctant to replace current callers of compound_head() with thp_head(), but I suspect PF_HEAD could use thp_head() and save a few bytes on a tinyconfig build. > +++ b/mm/huge_memory.c > @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > * Set PG_double_map before dropping compound_mapcount to avoid > * false-negative page_mapped(). > */ > - if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { > + if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { I'm a little nervous about this one. The page does actually come from pmd_page(), and today that's guaranteed to be a head page. But I'm not convinced that's going to still be true in twenty years. With the current THP patchset, I won't allocate pages larger than PMD order, but I can see there being interest in tracking pages in chunks larger than 2MB in the future. And then pmd_page() might well return a tail page. So it might be a good idea to not convert this one. > @@ -2467,7 +2467,7 @@ int total_mapcount(struct page *page) > if (likely(!PageCompound(page))) > return atomic_read(&page->_mapcount) + 1; > > - compound = compound_mapcount(page); > + compound = head_mapcount(page); > if (PageHuge(page)) > return compound; > ret = compound; Yes. This function is a little confusing because it uses PageCompound() all the way through when it really should be using PageHead ... because the opening line of the function is: VM_BUG_ON_PAGE(PageTail(page), page); > @@ -2531,7 +2531,7 @@ int page_trans_huge_mapcount(struct page *page, int *total_mapcount) > ret -= 1; > _total_mapcount -= HPAGE_PMD_NR; > } > - mapcount = compound_mapcount(page); > + mapcount = head_mapcount(page); > ret += mapcount; > _total_mapcount += mapcount; > if (total_mapcount) Yes, we called compound_head() earlier in the function. Safe. > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 9ee4211835c6..c5e722de38b8 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1673,7 +1673,7 @@ static int page_trans_huge_map_swapcount(struct page *page, int *total_mapcount, > map_swapcount -= 1; > _total_mapcount -= HPAGE_PMD_NR; > } > - mapcount = compound_mapcount(page); > + mapcount = head_mapcount(page); > map_swapcount += mapcount; > _total_mapcount += mapcount; > if (total_mapcount) Yes. page is a head page at this point. > diff --git a/mm/vmscan.c b/mm/vmscan.c > index a086c104a9a6..72218cdfd902 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1248,7 +1248,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, > * away. Chances are some or all of the > * tail pages can be freed without IO. > */ > - if (!compound_mapcount(page) && > + if (!head_mapcount(page) && > split_huge_page_to_list(page, > page_list)) > goto activate_locked; Yes. We don't put (can't put!) tail pages on the lists, so this must be a head page.
On 8/6/20 3:48 PM, Matthew Wilcox wrote: > On Thu, Aug 06, 2020 at 01:45:11PM +0200, Vlastimil Babka wrote: >> How about this additional patch now that we have head_mapcoun()? (I wouldn't >> go for squashing as the goal and scope is too different). > > I like it. It bothers me that the compiler doesn't know that > compound_head(compound_head(x)) == compound_head(x). I updated > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32911 with a request to be > able to tell the compiler that compound_head() is idempotent. Yeah it would be nice to get the benefits everywhere automatically. But I guess the compiler would have to discard the idempotence assumptions if there are multiple consecutive (perhaps hidden behind page flag access) compound_head(page) from a function, as soon as we modify the struct page somewhere. >> The bloat-o-meter difference without DEBUG_VM is the following: >> >> add/remove: 0/0 grow/shrink: 1/4 up/down: 32/-56 (-24) >> Function old new delta >> __split_huge_pmd 2867 2899 +32 >> shrink_page_list 3860 3847 -13 >> reuse_swap_page 762 748 -14 >> page_trans_huge_mapcount 153 139 -14 >> total_mapcount 187 172 -15 >> Total: Before=8687306, After=8687282, chg -0.00% > > That's great. I'm expecting improvements from my thp_head() macro when > that lands (currently in Andrew's tree). I have been reluctant to replace > current callers of compound_head() with thp_head(), but I suspect PF_HEAD > could use thp_head() and save a few bytes on a tinyconfig build. > >> +++ b/mm/huge_memory.c >> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, >> * Set PG_double_map before dropping compound_mapcount to avoid >> * false-negative page_mapped(). >> */ >> - if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { >> + if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { > > I'm a little nervous about this one. The page does actually come from > pmd_page(), and today that's guaranteed to be a head page. But I'm > not convinced that's going to still be true in twenty years. With the > current THP patchset, I won't allocate pages larger than PMD order, but > I can see there being interest in tracking pages in chunks larger than > 2MB in the future. And then pmd_page() might well return a tail page. > So it might be a good idea to not convert this one. Hmm the function converts the compound mapcount of the whole page to a HPAGE_PMD_NR of base pages. If suddenly the compound page was bigger than a pmd, then I guess this wouldn't work properly anymore without changes anyway? Maybe we could stick something like VM_BUG_ON(PageTransHuge(page)) there as "enforced documentation" for now?
On Thu, Aug 06, 2020 at 05:13:05PM +0200, Vlastimil Babka wrote: > On 8/6/20 3:48 PM, Matthew Wilcox wrote: > > On Thu, Aug 06, 2020 at 01:45:11PM +0200, Vlastimil Babka wrote: > >> How about this additional patch now that we have head_mapcoun()? (I wouldn't > >> go for squashing as the goal and scope is too different). > > > > I like it. It bothers me that the compiler doesn't know that > > compound_head(compound_head(x)) == compound_head(x). I updated > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32911 with a request to be > > able to tell the compiler that compound_head() is idempotent. > > Yeah it would be nice to get the benefits everywhere automatically. But I guess > the compiler would have to discard the idempotence assumptions if there are > multiple consecutive (perhaps hidden behind page flag access) > compound_head(page) from a function, as soon as we modify the struct page somewhere. The only place where we modify the struct page is the split code, and I think we're pretty careful to handle the pages appropriately there. The buddy allocator has its own way of combining pages. > >> +++ b/mm/huge_memory.c > >> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > >> * Set PG_double_map before dropping compound_mapcount to avoid > >> * false-negative page_mapped(). > >> */ > >> - if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { > >> + if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { > > > > I'm a little nervous about this one. The page does actually come from > > pmd_page(), and today that's guaranteed to be a head page. But I'm > > not convinced that's going to still be true in twenty years. With the > > current THP patchset, I won't allocate pages larger than PMD order, but > > I can see there being interest in tracking pages in chunks larger than > > 2MB in the future. And then pmd_page() might well return a tail page. > > So it might be a good idea to not convert this one. > > Hmm the function converts the compound mapcount of the whole page to a > HPAGE_PMD_NR of base pages. If suddenly the compound page was bigger than a pmd, > then I guess this wouldn't work properly anymore without changes anyway? > Maybe we could stick something like VM_BUG_ON(PageTransHuge(page)) there as > "enforced documentation" for now? I think it would work as-is. But also I may have totally misunderstood it. I'll write this declaratively and specifically for x86 (PMD order is 9) ... tell me when I've made a mistake ;-) This function is for splitting the PMD. We're leaving the underlying page intact and just changing the page table. So if, say, we have an underlying 4MB page (and maybe the pages are mapped as PMDs in this process), we might get subpage number 512 of this order-10 page. We'd need to check the DoubleMap bit on subpage 1, and the compound_mapcount also stored in page 1, but we'd only want to spread the mapcount out over the 512 subpages from 512-1023; we wouldn't want to spread it out over 0-511 because they aren't affected by this particular PMD. Having to reason about stuff like this is why I limited the THP code to stop at PMD order ... I don't want to make my life even more complicated than I have to!
On 8/6/20 5:39 PM, Matthew Wilcox wrote: >> >> +++ b/mm/huge_memory.c >> >> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, >> >> * Set PG_double_map before dropping compound_mapcount to avoid >> >> * false-negative page_mapped(). >> >> */ >> >> - if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { >> >> + if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { >> > >> > I'm a little nervous about this one. The page does actually come from >> > pmd_page(), and today that's guaranteed to be a head page. But I'm >> > not convinced that's going to still be true in twenty years. With the >> > current THP patchset, I won't allocate pages larger than PMD order, but >> > I can see there being interest in tracking pages in chunks larger than >> > 2MB in the future. And then pmd_page() might well return a tail page. >> > So it might be a good idea to not convert this one. >> >> Hmm the function converts the compound mapcount of the whole page to a >> HPAGE_PMD_NR of base pages. If suddenly the compound page was bigger than a pmd, >> then I guess this wouldn't work properly anymore without changes anyway? >> Maybe we could stick something like VM_BUG_ON(PageTransHuge(page)) there as >> "enforced documentation" for now? > > I think it would work as-is. But also I may have totally misunderstood it. > I'll write this declaratively and specifically for x86 (PMD order is 9) > ... tell me when I've made a mistake ;-) > > This function is for splitting the PMD. We're leaving the underlying > page intact and just changing the page table. So if, say, we have an > underlying 4MB page (and maybe the pages are mapped as PMDs in this > process), we might get subpage number 512 of this order-10 page. We'd > need to check the DoubleMap bit on subpage 1, and the compound_mapcount > also stored in page 1, but we'd only want to spread the mapcount out > over the 512 subpages from 512-1023; we wouldn't want to spread it out > over 0-511 because they aren't affected by this particular PMD. Yeah, and then we decrease the compound mapcount, which is a counter of "how many times is this compound page mapped as a whole". But we only removed (the second) half of the compound mapping, so imho that would be wrong? > Having to reason about stuff like this is why I limited the THP code to > stop at PMD order ... I don't want to make my life even more complicated > than I have to! Kirill might correct me but I'd expect the THP code right now has baked in many assumptions about THP pages being exactly HPAGE_PMD_ORDER large?
On Thu, Aug 06, 2020 at 05:53:10PM +0200, Vlastimil Babka wrote: > On 8/6/20 5:39 PM, Matthew Wilcox wrote: > >> >> +++ b/mm/huge_memory.c > >> >> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > >> >> * Set PG_double_map before dropping compound_mapcount to avoid > >> >> * false-negative page_mapped(). > >> >> */ > >> >> - if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { > >> >> + if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { > >> > > >> > I'm a little nervous about this one. The page does actually come from > >> > pmd_page(), and today that's guaranteed to be a head page. But I'm > >> > not convinced that's going to still be true in twenty years. With the > >> > current THP patchset, I won't allocate pages larger than PMD order, but > >> > I can see there being interest in tracking pages in chunks larger than > >> > 2MB in the future. And then pmd_page() might well return a tail page. > >> > So it might be a good idea to not convert this one. > >> > >> Hmm the function converts the compound mapcount of the whole page to a > >> HPAGE_PMD_NR of base pages. If suddenly the compound page was bigger than a pmd, > >> then I guess this wouldn't work properly anymore without changes anyway? > >> Maybe we could stick something like VM_BUG_ON(PageTransHuge(page)) there as > >> "enforced documentation" for now? > > > > I think it would work as-is. But also I may have totally misunderstood it. > > I'll write this declaratively and specifically for x86 (PMD order is 9) > > ... tell me when I've made a mistake ;-) > > > > This function is for splitting the PMD. We're leaving the underlying > > page intact and just changing the page table. So if, say, we have an > > underlying 4MB page (and maybe the pages are mapped as PMDs in this > > process), we might get subpage number 512 of this order-10 page. We'd > > need to check the DoubleMap bit on subpage 1, and the compound_mapcount > > also stored in page 1, but we'd only want to spread the mapcount out > > over the 512 subpages from 512-1023; we wouldn't want to spread it out > > over 0-511 because they aren't affected by this particular PMD. > > Yeah, and then we decrease the compound mapcount, which is a counter of "how > many times is this compound page mapped as a whole". But we only removed (the > second) half of the compound mapping, so imho that would be wrong? I'd expect that count to be incremented by 1 for each PMD that it's mapped to? ie change the definition of that counter slightly. > > Having to reason about stuff like this is why I limited the THP code to > > stop at PMD order ... I don't want to make my life even more complicated > > than I have to! > > Kirill might correct me but I'd expect the THP code right now has baked in many > assumptions about THP pages being exactly HPAGE_PMD_ORDER large? There are somewhat fewer places that make that assumption after applying the ~80 patches here ... http://git.infradead.org/users/willy/pagecache.git I have mostly not touched the anonymous THPs (obviously some of the code paths are shared), although both Kirill & I think there's a win to be had there too.
On Tue, Aug 04, 2020 at 02:48:07PM -0700, John Hubbard wrote: > If a compound page is being split while dump_page() is being run on that > page, we can end up calling compound_mapcount() on a page that is no > longer compound. This leads to a crash (already seen at least once in > the field), due to the VM_BUG_ON_PAGE() assertion inside > compound_mapcount(). > > (The above is from Matthew Wilcox's analysis of Qian Cai's bug report.) > > A similar problem is possible, via compound_pincount() instead of > compound_mapcount(). > > In order to avoid this kind of crash, make dump_page() slightly more > robust, by providing a pair of simpler routines that don't contain > assertions: head_mapcount() and head_pincount(). I find naming misleading. head_mapcount() and head_pincount() sounds like a mapcount/pincount of the head page, but it's not. It's mapcount and pincount of the compound page. Maybe compound_mapcount_head() and compound_pincoun_head()? Or __compound_mapcount() and __compound_pincount(). > For debug tools, we don't want to go *too* far in this direction, but > this is a simple small fix, and the crash has already been seen, so it's > a good trade-off. > > Reported-by: Qian Cai <cai@lca.pw> > Suggested-by: Matthew Wilcox <willy@infradead.org> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Signed-off-by: John Hubbard <jhubbard@nvidia.com> > --- > Hi, > > I'm assuming that a fix is not required for -stable, but let me know if > others feel differently. The dump_page() code has changed a lot in that > area. > > Changes since v1 [1]: > > 1) Rebased onto mmotm > > 2) Used a simpler head_*count() approach. > > 3) Added Matthew's Suggested-by: tag > > 4) Support pincount as well as mapcount. > > [1] https://lore.kernel.org/linux-mm/20200804183943.1244828-1-jhubbard@nvidia.com/ > > thanks, > John Hubbard > > include/linux/mm.h | 14 ++++++++++++-- > mm/debug.c | 6 +++--- > 2 files changed, 15 insertions(+), 5 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 4f12b2465e80..8ab941cf73f4 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -776,6 +776,11 @@ static inline void *kvcalloc(size_t n, size_t size, gfp_t flags) > extern void kvfree(const void *addr); > extern void kvfree_sensitive(const void *addr, size_t len); > > +static inline int head_mapcount(struct page *head) > +{ Do we want VM_BUG_ON_PAGE(!PageHead(head), head) here? > + return atomic_read(compound_mapcount_ptr(head)) + 1; > +} > + > /* > * Mapcount of compound page as a whole, does not include mapped sub-pages. > * > @@ -785,7 +790,7 @@ static inline int compound_mapcount(struct page *page) > { > VM_BUG_ON_PAGE(!PageCompound(page), page); > page = compound_head(page); > - return atomic_read(compound_mapcount_ptr(page)) + 1; > + return head_mapcount(page); > } > > /* > @@ -898,11 +903,16 @@ static inline bool hpage_pincount_available(struct page *page) > return PageCompound(page) && compound_order(page) > 1; > } > > +static inline int head_pincount(struct page *head) > +{ Ditto. > + return atomic_read(compound_pincount_ptr(head)); > +} > + > static inline int compound_pincount(struct page *page) > { > VM_BUG_ON_PAGE(!hpage_pincount_available(page), page); > page = compound_head(page); > - return atomic_read(compound_pincount_ptr(page)); > + return head_pincount(page); > } > > static inline void set_compound_order(struct page *page, unsigned int order) > diff --git a/mm/debug.c b/mm/debug.c > index c27fff1e3ca8..69b60637112b 100644 > --- a/mm/debug.c > +++ b/mm/debug.c > @@ -102,12 +102,12 @@ void __dump_page(struct page *page, const char *reason) > if (hpage_pincount_available(page)) { > pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n", > head, compound_order(head), > - compound_mapcount(head), > - compound_pincount(head)); > + head_mapcount(head), > + head_pincount(head)); > } else { > pr_warn("head:%p order:%u compound_mapcount:%d\n", > head, compound_order(head), > - compound_mapcount(head)); > + head_mapcount(head)); > } > } > if (PageKsm(page)) > -- > 2.28.0 >
On Thu, Aug 06, 2020 at 06:15:00PM +0100, Matthew Wilcox wrote: > On Thu, Aug 06, 2020 at 05:53:10PM +0200, Vlastimil Babka wrote: > > On 8/6/20 5:39 PM, Matthew Wilcox wrote: > > >> >> +++ b/mm/huge_memory.c > > >> >> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > > >> >> * Set PG_double_map before dropping compound_mapcount to avoid > > >> >> * false-negative page_mapped(). > > >> >> */ > > >> >> - if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { > > >> >> + if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { > > >> > > > >> > I'm a little nervous about this one. The page does actually come from > > >> > pmd_page(), and today that's guaranteed to be a head page. But I'm > > >> > not convinced that's going to still be true in twenty years. With the > > >> > current THP patchset, I won't allocate pages larger than PMD order, but > > >> > I can see there being interest in tracking pages in chunks larger than > > >> > 2MB in the future. And then pmd_page() might well return a tail page. > > >> > So it might be a good idea to not convert this one. > > >> > > >> Hmm the function converts the compound mapcount of the whole page to a > > >> HPAGE_PMD_NR of base pages. If suddenly the compound page was bigger than a pmd, > > >> then I guess this wouldn't work properly anymore without changes anyway? > > >> Maybe we could stick something like VM_BUG_ON(PageTransHuge(page)) there as > > >> "enforced documentation" for now? > > > > > > I think it would work as-is. But also I may have totally misunderstood it. > > > I'll write this declaratively and specifically for x86 (PMD order is 9) > > > ... tell me when I've made a mistake ;-) > > > > > > This function is for splitting the PMD. We're leaving the underlying > > > page intact and just changing the page table. So if, say, we have an > > > underlying 4MB page (and maybe the pages are mapped as PMDs in this > > > process), we might get subpage number 512 of this order-10 page. We'd > > > need to check the DoubleMap bit on subpage 1, and the compound_mapcount > > > also stored in page 1, but we'd only want to spread the mapcount out > > > over the 512 subpages from 512-1023; we wouldn't want to spread it out > > > over 0-511 because they aren't affected by this particular PMD. > > > > Yeah, and then we decrease the compound mapcount, which is a counter of "how > > many times is this compound page mapped as a whole". But we only removed (the > > second) half of the compound mapping, so imho that would be wrong? > > I'd expect that count to be incremented by 1 for each PMD that it's > mapped to? ie change the definition of that counter slightly. > > > > Having to reason about stuff like this is why I limited the THP code to > > > stop at PMD order ... I don't want to make my life even more complicated > > > than I have to! > > > > Kirill might correct me but I'd expect the THP code right now has baked in many > > assumptions about THP pages being exactly HPAGE_PMD_ORDER large? That will be true for PMD-mapped THP pages after applying Matthew's patchset. > There are somewhat fewer places that make that assumption after applying > the ~80 patches here ... http://git.infradead.org/users/willy/pagecache.git The patchset allows for THP to be anywhere between order-2 and order-9 (on x86-64). > I have mostly not touched the anonymous THPs (obviously some of the code > paths are shared), although both Kirill & I think there's a win to be > had there too. Yeah. Reducing LRU handling overhead alone should be enough to justify the effort. But we still would need to have numbers.
On Fri, Aug 07, 2020 at 05:35:04PM +0300, Kirill A. Shutemov wrote: > On Tue, Aug 04, 2020 at 02:48:07PM -0700, John Hubbard wrote: > > If a compound page is being split while dump_page() is being run on that > > page, we can end up calling compound_mapcount() on a page that is no > > longer compound. This leads to a crash (already seen at least once in > > the field), due to the VM_BUG_ON_PAGE() assertion inside > > compound_mapcount(). [...] > > +static inline int head_mapcount(struct page *head) > > +{ > > Do we want VM_BUG_ON_PAGE(!PageHead(head), head) here? Well, no. That was the point of the bug report -- by the time we called compound_mapcount, the page was no longer a head page. > > A similar problem is possible, via compound_pincount() instead of > > compound_mapcount(). > > > > In order to avoid this kind of crash, make dump_page() slightly more > > robust, by providing a pair of simpler routines that don't contain > > assertions: head_mapcount() and head_pincount(). > > I find naming misleading. head_mapcount() and head_pincount() sounds like > a mapcount/pincount of the head page, but it's not. It's mapcount and > pincount of the compound page. OK, point taken. I might go for head_compound_mapcount()? Or as I originally suggested, just opencoding it like we do in __page_mapcount().
On Fri, Aug 07, 2020 at 04:10:29PM +0100, Matthew Wilcox wrote: > On Fri, Aug 07, 2020 at 05:35:04PM +0300, Kirill A. Shutemov wrote: > > On Tue, Aug 04, 2020 at 02:48:07PM -0700, John Hubbard wrote: > > > If a compound page is being split while dump_page() is being run on that > > > page, we can end up calling compound_mapcount() on a page that is no > > > longer compound. This leads to a crash (already seen at least once in > > > the field), due to the VM_BUG_ON_PAGE() assertion inside > > > compound_mapcount(). > > [...] > > > +static inline int head_mapcount(struct page *head) > > > +{ > > > > Do we want VM_BUG_ON_PAGE(!PageHead(head), head) here? > > Well, no. That was the point of the bug report -- by the time we called > compound_mapcount, the page was no longer a head page. Right. VM_BUG_ON_PAGE(PageTail(head), head)? > > > A similar problem is possible, via compound_pincount() instead of > > > compound_mapcount(). > > > > > > In order to avoid this kind of crash, make dump_page() slightly more > > > robust, by providing a pair of simpler routines that don't contain > > > assertions: head_mapcount() and head_pincount(). > > > > I find naming misleading. head_mapcount() and head_pincount() sounds like > > a mapcount/pincount of the head page, but it's not. It's mapcount and > > pincount of the compound page. > > OK, point taken. I might go for head_compound_mapcount()? Or as I > originally suggested, just opencoding it like we do in __page_mapcount(). I'm fine either way.
On 8/7/20 9:48 AM, Kirill A. Shutemov wrote: >> [...] >>>> +static inline int head_mapcount(struct page *head) >>>> +{ >>> >>> Do we want VM_BUG_ON_PAGE(!PageHead(head), head) here? >> >> Well, no. That was the point of the bug report -- by the time we called >> compound_mapcount, the page was no longer a head page. > > Right. VM_BUG_ON_PAGE(PageTail(head), head)? Sorry for overlooking that feedback point. Looking at it now, I'd much rather not put any assertions at all here. This supposed to be for implementing the failure case, and we've moved past assertions at this point. In other words, dump_page() is part of *implementing* an assertion, so calling assertions from within it is undesirable. It's better to put the assertions in code that would call these inner routines. Then, if we want to use these routines outside of mm/debug.c, as per the other thread, then we should provide a wrapper that has such assertions. thanks,
diff --git a/include/linux/mm.h b/include/linux/mm.h index 4f12b2465e80..8ab941cf73f4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -776,6 +776,11 @@ static inline void *kvcalloc(size_t n, size_t size, gfp_t flags) extern void kvfree(const void *addr); extern void kvfree_sensitive(const void *addr, size_t len); +static inline int head_mapcount(struct page *head) +{ + return atomic_read(compound_mapcount_ptr(head)) + 1; +} + /* * Mapcount of compound page as a whole, does not include mapped sub-pages. * @@ -785,7 +790,7 @@ static inline int compound_mapcount(struct page *page) { VM_BUG_ON_PAGE(!PageCompound(page), page); page = compound_head(page); - return atomic_read(compound_mapcount_ptr(page)) + 1; + return head_mapcount(page); } /* @@ -898,11 +903,16 @@ static inline bool hpage_pincount_available(struct page *page) return PageCompound(page) && compound_order(page) > 1; } +static inline int head_pincount(struct page *head) +{ + return atomic_read(compound_pincount_ptr(head)); +} + static inline int compound_pincount(struct page *page) { VM_BUG_ON_PAGE(!hpage_pincount_available(page), page); page = compound_head(page); - return atomic_read(compound_pincount_ptr(page)); + return head_pincount(page); } static inline void set_compound_order(struct page *page, unsigned int order) diff --git a/mm/debug.c b/mm/debug.c index c27fff1e3ca8..69b60637112b 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -102,12 +102,12 @@ void __dump_page(struct page *page, const char *reason) if (hpage_pincount_available(page)) { pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n", head, compound_order(head), - compound_mapcount(head), - compound_pincount(head)); + head_mapcount(head), + head_pincount(head)); } else { pr_warn("head:%p order:%u compound_mapcount:%d\n", head, compound_order(head), - compound_mapcount(head)); + head_mapcount(head)); } } if (PageKsm(page))
If a compound page is being split while dump_page() is being run on that page, we can end up calling compound_mapcount() on a page that is no longer compound. This leads to a crash (already seen at least once in the field), due to the VM_BUG_ON_PAGE() assertion inside compound_mapcount(). (The above is from Matthew Wilcox's analysis of Qian Cai's bug report.) A similar problem is possible, via compound_pincount() instead of compound_mapcount(). In order to avoid this kind of crash, make dump_page() slightly more robust, by providing a pair of simpler routines that don't contain assertions: head_mapcount() and head_pincount(). For debug tools, we don't want to go *too* far in this direction, but this is a simple small fix, and the crash has already been seen, so it's a good trade-off. Reported-by: Qian Cai <cai@lca.pw> Suggested-by: Matthew Wilcox <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> --- Hi, I'm assuming that a fix is not required for -stable, but let me know if others feel differently. The dump_page() code has changed a lot in that area. Changes since v1 [1]: 1) Rebased onto mmotm 2) Used a simpler head_*count() approach. 3) Added Matthew's Suggested-by: tag 4) Support pincount as well as mapcount. [1] https://lore.kernel.org/linux-mm/20200804183943.1244828-1-jhubbard@nvidia.com/ thanks, John Hubbard include/linux/mm.h | 14 ++++++++++++-- mm/debug.c | 6 +++--- 2 files changed, 15 insertions(+), 5 deletions(-)