Message ID | 20211012180148.1669685-1-hannes@cmpxchg.org (mailing list archive) |
---|---|
Headers | show |
Series | PageSlab: eliminate unnecessary compound_head() calls | expand |
On Tue, Oct 12, 2021 at 02:01:37PM -0400, Johannes Weiner wrote: > PageSlab() currently imposes a compound_head() call on all callsites > even though only very few situations encounter tailpages. This short > series bubbles tailpage resolution up to the few sites that need it, > and eliminates it everywhere else. > > This is a stand-alone improvement. However, it's inspired by Willy's > patches to split struct slab from struct page. Those patches currently > resolve a slab object pointer to its struct slab as follows: > > slab = virt_to_slab(p); /* tailpage resolution */ > if (slab_test_cache(slab)) { /* slab or page alloc bypass? */ > do_slab_stuff(slab); > } else { > page = (struct page *)slab; > do_page_stuff(page); > } > > which makes struct slab an ambiguous type that needs to self-identify > with the slab_test_cache() test (which in turn relies on PG_slab in > the flags field shared between page and slab). > > It would be preferable to do: > > page = virt_to_head_page(p); /* tailpage resolution */ > if (PageSlab(page)) { /* slab or page alloc bypass? */ > slab = page_slab(page); > do_slab_stuff(slab); > } else { > do_page_stuff(page); > } > > and leave the ambiguity and the need to self-identify with struct > page, so that struct slab is a strong and unambiguous type, and a > non-NULL struct slab encountered in the wild is always a valid object > without the need to check another dedicated flag for validity first. > > However, because PageSlab() currently implies tailpage resolution, > writing the virt->slab lookup in the preferred way would add yet more > unnecessary compound_head() call to the hottest MM paths. > > The page flag helpers should eventually all be weaned off of those > compound_head() calls for their unnecessary overhead alone. But this > one in particular is now getting in the way of writing code in the > preferred manner, and bleeding page ambiguity into the new types that > are supposed to eliminate specifically that. It's ripe for a cleanup. So what I had in mind was more the patch at the end (which I now realise is missing the corresponding changes to __ClearPageSlab()). There is, however, some weirdness with kfence, which appears to be abusing PageSlab by setting it on pages which are not slab pages??? page = virt_to_page(p); if (PageSlab(page)) { /* slab or page alloc bypass? */ slab = page_slab(page); /* tail page resolution */ do_slab_stuff(slab); } else { do_page_stuff(page); /* or possibly compound_head(page) */ } There could also be a PageTail check in there for some of the cases -- catch people doing something like: kfree(kmalloc(65536, GFP_KERNEL) + 16384); which happens to work today, but should probably not. diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 8951da34aa51..b4b62fa100eb 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -344,7 +344,7 @@ PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD) TESTCLEARFLAG(Active, active, PF_HEAD) PAGEFLAG(Workingset, workingset, PF_HEAD) TESTCLEARFLAG(Workingset, workingset, PF_HEAD) -__PAGEFLAG(Slab, slab, PF_NO_TAIL) +__PAGEFLAG(Slab, slab, PF_ANY) __PAGEFLAG(SlobFree, slob_free, PF_NO_TAIL) PAGEFLAG(Checked, checked, PF_NO_COMPOUND) /* Used by some filesystems */ diff --git a/mm/slab.c b/mm/slab.c index d0f725637663..c8c6e8576936 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1371,6 +1371,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid) { struct page *page; + int i; flags |= cachep->allocflags; @@ -1381,7 +1382,8 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, } account_slab_page(page, cachep->gfporder, cachep, flags); - __SetPageSlab(page); + for (i = 0; i < compound_nr(page); i++) + __SetPageSlab(page + i); /* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */ if (sk_memalloc_socks() && page_is_pfmemalloc(page)) SetPageSlabPfmemalloc(page); diff --git a/mm/slub.c b/mm/slub.c index f7ac28646580..e442cebda4ef 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1916,7 +1916,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) account_slab_page(page, oo_order(oo), s, flags); page->slab_cache = s; - __SetPageSlab(page); + for (idx = 0; idx < compound_nr(page); idx++) + __SetPageSlab(page + idx); if (page_is_pfmemalloc(page)) SetPageSlabPfmemalloc(page);
On Tue, Oct 12, 2021 at 08:23:26PM +0100, Matthew Wilcox wrote: > On Tue, Oct 12, 2021 at 02:01:37PM -0400, Johannes Weiner wrote: > > PageSlab() currently imposes a compound_head() call on all callsites > > even though only very few situations encounter tailpages. This short > > series bubbles tailpage resolution up to the few sites that need it, > > and eliminates it everywhere else. > > > > This is a stand-alone improvement. However, it's inspired by Willy's > > patches to split struct slab from struct page. Those patches currently > > resolve a slab object pointer to its struct slab as follows: > > > > slab = virt_to_slab(p); /* tailpage resolution */ > > if (slab_test_cache(slab)) { /* slab or page alloc bypass? */ > > do_slab_stuff(slab); > > } else { > > page = (struct page *)slab; > > do_page_stuff(page); > > } > > > > which makes struct slab an ambiguous type that needs to self-identify > > with the slab_test_cache() test (which in turn relies on PG_slab in > > the flags field shared between page and slab). > > > > It would be preferable to do: > > > > page = virt_to_head_page(p); /* tailpage resolution */ > > if (PageSlab(page)) { /* slab or page alloc bypass? */ > > slab = page_slab(page); > > do_slab_stuff(slab); > > } else { > > do_page_stuff(page); > > } > > > > and leave the ambiguity and the need to self-identify with struct > > page, so that struct slab is a strong and unambiguous type, and a > > non-NULL struct slab encountered in the wild is always a valid object > > without the need to check another dedicated flag for validity first. > > > > However, because PageSlab() currently implies tailpage resolution, > > writing the virt->slab lookup in the preferred way would add yet more > > unnecessary compound_head() call to the hottest MM paths. > > > > The page flag helpers should eventually all be weaned off of those > > compound_head() calls for their unnecessary overhead alone. But this > > one in particular is now getting in the way of writing code in the > > preferred manner, and bleeding page ambiguity into the new types that > > are supposed to eliminate specifically that. It's ripe for a cleanup. > > So what I had in mind was more the patch at the end (which I now realise > is missing the corresponding changes to __ClearPageSlab()). There is, > however, some weirdness with kfence, which appears to be abusing PageSlab > by setting it on pages which are not slab pages??? > > page = virt_to_page(p); > if (PageSlab(page)) { /* slab or page alloc bypass? */ > slab = page_slab(page); /* tail page resolution */ > do_slab_stuff(slab); > } else { > do_page_stuff(page); /* or possibly compound_head(page) */ > } Can you elaborate why you think this would be better? If the object is sitting in a tailpage, the flag test itself could avoid the compound_head(), but at the end of the day it's the slab or the headpage that needs to be operated on in the fastpaths, and we need to do the compound_head() whether the flag is set or not. I suppose it could make some debugging checks marginally cheaper? But OTOH it comes at the cost of the flag setting and clearing loops in the slab allocation path, even when debugging checks are disabled. And it would further complicate the compound page model by introducing another distinct flag handling scheme (would there be other users for it?). The open-coded loops as a means to ensure flag integrity seem error prone; but creating Set and Clear variants that encapsulate the loops sounds like a move in the wrong direction, given the pain the compound_head() alone in them has already created. > There could also be a PageTail check in there for some of the cases -- > catch people doing something like: > kfree(kmalloc(65536, GFP_KERNEL) + 16384); > which happens to work today, but should probably not. I actually wondered about that when looking at the slob code. Its kfree does this: sp = virt_to_page(block); if (PageSlab(compound_head(sp))) { int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN); unsigned int *m = (unsigned int *)(block - align); slob_free(m, *m + align); } else { unsigned int order = compound_order(sp); mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B, -(PAGE_SIZE << order)); __free_pages(sp, order); } Note the virt_to_page(), instead of virt_to_head_page(). It does test PG_slab correctly, but if this is a bypass page, it operates on whatever tail page the kfree() argument points into. If you did what you write above, it would leak the pages before the object.
On Tue, Oct 12, 2021 at 04:30:20PM -0400, Johannes Weiner wrote: > On Tue, Oct 12, 2021 at 08:23:26PM +0100, Matthew Wilcox wrote: > > On Tue, Oct 12, 2021 at 02:01:37PM -0400, Johannes Weiner wrote: > > > PageSlab() currently imposes a compound_head() call on all callsites > > > even though only very few situations encounter tailpages. This short > > > series bubbles tailpage resolution up to the few sites that need it, > > > and eliminates it everywhere else. > > > > > > This is a stand-alone improvement. However, it's inspired by Willy's > > > patches to split struct slab from struct page. Those patches currently > > > resolve a slab object pointer to its struct slab as follows: > > > > > > slab = virt_to_slab(p); /* tailpage resolution */ > > > if (slab_test_cache(slab)) { /* slab or page alloc bypass? */ > > > do_slab_stuff(slab); > > > } else { > > > page = (struct page *)slab; > > > do_page_stuff(page); > > > } > > > > > > which makes struct slab an ambiguous type that needs to self-identify > > > with the slab_test_cache() test (which in turn relies on PG_slab in > > > the flags field shared between page and slab). > > > > > > It would be preferable to do: > > > > > > page = virt_to_head_page(p); /* tailpage resolution */ > > > if (PageSlab(page)) { /* slab or page alloc bypass? */ > > > slab = page_slab(page); > > > do_slab_stuff(slab); > > > } else { > > > do_page_stuff(page); > > > } > > > > > > and leave the ambiguity and the need to self-identify with struct > > > page, so that struct slab is a strong and unambiguous type, and a > > > non-NULL struct slab encountered in the wild is always a valid object > > > without the need to check another dedicated flag for validity first. > > > > > > However, because PageSlab() currently implies tailpage resolution, > > > writing the virt->slab lookup in the preferred way would add yet more > > > unnecessary compound_head() call to the hottest MM paths. > > > > > > The page flag helpers should eventually all be weaned off of those > > > compound_head() calls for their unnecessary overhead alone. But this > > > one in particular is now getting in the way of writing code in the > > > preferred manner, and bleeding page ambiguity into the new types that > > > are supposed to eliminate specifically that. It's ripe for a cleanup. > > > > So what I had in mind was more the patch at the end (which I now realise > > is missing the corresponding changes to __ClearPageSlab()). There is, > > however, some weirdness with kfence, which appears to be abusing PageSlab > > by setting it on pages which are not slab pages??? > > > > page = virt_to_page(p); > > if (PageSlab(page)) { /* slab or page alloc bypass? */ > > slab = page_slab(page); /* tail page resolution */ > > do_slab_stuff(slab); > > } else { > > do_page_stuff(page); /* or possibly compound_head(page) */ > > } > > Can you elaborate why you think this would be better? > > If the object is sitting in a tailpage, the flag test itself could > avoid the compound_head(), but at the end of the day it's the slab or > the headpage that needs to be operated on in the fastpaths, and we > need to do the compound_head() whether the flag is set or not. > > I suppose it could make some debugging checks marginally cheaper? > > But OTOH it comes at the cost of the flag setting and clearing loops > in the slab allocation path, even when debugging checks are disabled. > > And it would further complicate the compound page model by introducing > another distinct flag handling scheme (would there be other users for > it?). The open-coded loops as a means to ensure flag integrity seem > error prone; but creating Set and Clear variants that encapsulate the > loops sounds like a move in the wrong direction, given the pain the > compound_head() alone in them has already created. I see this as a move towards the dynamically allocated future. In that future, I think we'll do something like: static inline struct slab *alloc_slab_pages(...) { struct page *page; struct slab *slab = kmalloc(sizeof(struct slab), gfp); if (!slab) return -ENOMEM; ... init slab ... page = palloc(slab, MEM_TYPE_SLAB, node, gfp, order); if (!page) { kfree(slab); return -ENOMEM; } slab->virt = page_address(page); return slab; } where palloc() does something like: unsigned long page_desc = mem_type | (unsigned long)mem_desc; ... allocates the struct pages ... for (i = 0; i < (1 << order); i++) page[i] = page_desc; return page; } For today, testing PageSlab on the tail page helps the test proceed in parallel with the action. Looking at slub's kfree() for an example: page = virt_to_head_page(x); if (unlikely(!PageSlab(page))) { free_nonslab_page(page, object); return; } slab_free(page->slab_cache, page, object, NULL, 1, _RET_IP_); Your proposal is certainly an improvement (since gcc doesn't know that compound_head(compound_head(x)) == compound_head(x)), but I think checking on the tail page is even better: page = virt_to_page(x); if (unlikely(!PageSlab(page))) { free_nonslab_page(compound_head(page), object); return; } slab = page_slab(page); slab_free(slab->slab_cache, slab, object, NULL, 1, _RET_IP_); The compound_head() parts can proceed in parallel with the check of PageSlab(). As far as the cost of setting PageSlab, those cachelines are already dirty because we set compound_head on each of those pages already (or in the case of freeing, we're about to clear compound_head on each of those pages). > > There could also be a PageTail check in there for some of the cases -- > > catch people doing something like: > > kfree(kmalloc(65536, GFP_KERNEL) + 16384); > > which happens to work today, but should probably not. > > I actually wondered about that when looking at the slob code. Its > kfree does this: > > sp = virt_to_page(block); > if (PageSlab(compound_head(sp))) { > int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN); > unsigned int *m = (unsigned int *)(block - align); > slob_free(m, *m + align); > } else { > unsigned int order = compound_order(sp); > mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B, > -(PAGE_SIZE << order)); > __free_pages(sp, order); > > } > > Note the virt_to_page(), instead of virt_to_head_page(). It does test > PG_slab correctly, but if this is a bypass page, it operates on > whatever tail page the kfree() argument points into. If you did what > you write above, it would leak the pages before the object. slob doesn't have to be as careful because it falls back to the page allocator for all allocations > PAGE_SIZE (see calls to slob_new_pages())
On Wed, Oct 13, 2021 at 04:19:18AM +0100, Matthew Wilcox wrote: > On Tue, Oct 12, 2021 at 04:30:20PM -0400, Johannes Weiner wrote: > > On Tue, Oct 12, 2021 at 08:23:26PM +0100, Matthew Wilcox wrote: > > > On Tue, Oct 12, 2021 at 02:01:37PM -0400, Johannes Weiner wrote: > > > > PageSlab() currently imposes a compound_head() call on all callsites > > > > even though only very few situations encounter tailpages. This short > > > > series bubbles tailpage resolution up to the few sites that need it, > > > > and eliminates it everywhere else. > > > > > > > > This is a stand-alone improvement. However, it's inspired by Willy's > > > > patches to split struct slab from struct page. Those patches currently > > > > resolve a slab object pointer to its struct slab as follows: > > > > > > > > slab = virt_to_slab(p); /* tailpage resolution */ > > > > if (slab_test_cache(slab)) { /* slab or page alloc bypass? */ > > > > do_slab_stuff(slab); > > > > } else { > > > > page = (struct page *)slab; > > > > do_page_stuff(page); > > > > } > > > > > > > > which makes struct slab an ambiguous type that needs to self-identify > > > > with the slab_test_cache() test (which in turn relies on PG_slab in > > > > the flags field shared between page and slab). > > > > > > > > It would be preferable to do: > > > > > > > > page = virt_to_head_page(p); /* tailpage resolution */ > > > > if (PageSlab(page)) { /* slab or page alloc bypass? */ > > > > slab = page_slab(page); > > > > do_slab_stuff(slab); > > > > } else { > > > > do_page_stuff(page); > > > > } > > > > > > > > and leave the ambiguity and the need to self-identify with struct > > > > page, so that struct slab is a strong and unambiguous type, and a > > > > non-NULL struct slab encountered in the wild is always a valid object > > > > without the need to check another dedicated flag for validity first. > > > > > > > > However, because PageSlab() currently implies tailpage resolution, > > > > writing the virt->slab lookup in the preferred way would add yet more > > > > unnecessary compound_head() call to the hottest MM paths. > > > > > > > > The page flag helpers should eventually all be weaned off of those > > > > compound_head() calls for their unnecessary overhead alone. But this > > > > one in particular is now getting in the way of writing code in the > > > > preferred manner, and bleeding page ambiguity into the new types that > > > > are supposed to eliminate specifically that. It's ripe for a cleanup. > > > > > > So what I had in mind was more the patch at the end (which I now realise > > > is missing the corresponding changes to __ClearPageSlab()). There is, > > > however, some weirdness with kfence, which appears to be abusing PageSlab > > > by setting it on pages which are not slab pages??? > > > > > > page = virt_to_page(p); > > > if (PageSlab(page)) { /* slab or page alloc bypass? */ > > > slab = page_slab(page); /* tail page resolution */ > > > do_slab_stuff(slab); > > > } else { > > > do_page_stuff(page); /* or possibly compound_head(page) */ > > > } > > > > Can you elaborate why you think this would be better? > > > > If the object is sitting in a tailpage, the flag test itself could > > avoid the compound_head(), but at the end of the day it's the slab or > > the headpage that needs to be operated on in the fastpaths, and we > > need to do the compound_head() whether the flag is set or not. > > > > I suppose it could make some debugging checks marginally cheaper? > > > > But OTOH it comes at the cost of the flag setting and clearing loops > > in the slab allocation path, even when debugging checks are disabled. > > > > And it would further complicate the compound page model by introducing > > another distinct flag handling scheme (would there be other users for > > it?). The open-coded loops as a means to ensure flag integrity seem > > error prone; but creating Set and Clear variants that encapsulate the > > loops sounds like a move in the wrong direction, given the pain the > > compound_head() alone in them has already created. > > I see this as a move towards the dynamically allocated future. > In that future, I think we'll do something like: > > static inline struct slab *alloc_slab_pages(...) > { > struct page *page; > struct slab *slab = kmalloc(sizeof(struct slab), gfp); > if (!slab) > return -ENOMEM; > ... init slab ... > page = palloc(slab, MEM_TYPE_SLAB, node, gfp, order); > if (!page) { > kfree(slab); > return -ENOMEM; > } > slab->virt = page_address(page); > > return slab; > } > > where palloc() does something like: > > unsigned long page_desc = mem_type | (unsigned long)mem_desc; > > ... allocates the struct pages ... > > for (i = 0; i < (1 << order); i++) > page[i] = page_desc; > > return page; > } Yeah having the page point to the full descriptor makes sense. That's efficient. And it's very simple, conceptually... > For today, testing PageSlab on the tail page helps the test proceed > in parallel with the action. Looking at slub's kfree() for an example: > > page = virt_to_head_page(x); > if (unlikely(!PageSlab(page))) { > free_nonslab_page(page, object); > return; > } > slab_free(page->slab_cache, page, object, NULL, 1, _RET_IP_); > > Your proposal is certainly an improvement (since gcc doesn't know > that compound_head(compound_head(x)) == compound_head(x)), but I > think checking on the tail page is even better: > > page = virt_to_page(x); > if (unlikely(!PageSlab(page))) { > free_nonslab_page(compound_head(page), object); > return; > } > slab = page_slab(page); > slab_free(slab->slab_cache, slab, object, NULL, 1, _RET_IP_); > > The compound_head() parts can proceed in parallel with the check of > PageSlab(). > > As far as the cost of setting PageSlab, those cachelines are already > dirty because we set compound_head on each of those pages already > (or in the case of freeing, we're about to clear compound_head on > each of those pages). ... but this is not. I think the performance gains from this would have to be significant to justify complicating page flags further.
On Wed, Oct 13, 2021 at 09:49:31AM -0400, Johannes Weiner wrote: > On Wed, Oct 13, 2021 at 04:19:18AM +0100, Matthew Wilcox wrote: > > For today, testing PageSlab on the tail page helps the test proceed > > in parallel with the action. Looking at slub's kfree() for an example: > > > > page = virt_to_head_page(x); > > if (unlikely(!PageSlab(page))) { > > free_nonslab_page(page, object); > > return; > > } > > slab_free(page->slab_cache, page, object, NULL, 1, _RET_IP_); > > > > Your proposal is certainly an improvement (since gcc doesn't know > > that compound_head(compound_head(x)) == compound_head(x)), but I > > think checking on the tail page is even better: > > > > page = virt_to_page(x); > > if (unlikely(!PageSlab(page))) { > > free_nonslab_page(compound_head(page), object); > > return; > > } > > slab = page_slab(page); > > slab_free(slab->slab_cache, slab, object, NULL, 1, _RET_IP_); > > > > The compound_head() parts can proceed in parallel with the check of > > PageSlab(). > > > > As far as the cost of setting PageSlab, those cachelines are already > > dirty because we set compound_head on each of those pages already > > (or in the case of freeing, we're about to clear compound_head on > > each of those pages). > > ... but this is not. I think the performance gains from this would > have to be significant to justify complicating page flags further. My argument isn't really "this is more efficient", because I think the performance gains are pretty minimal. More that I would like to be able to write code in the style which we'll want to use when we're using dynamically allocated memory descriptors. It's all just code, and we can change it at any time, but better to change it to something that continues to work well in the future. I don't think we end up with "virt_to_head_page()" in a dynamically allocated memory descriptor world. The head page contains no different information from the tail pages, and indeed the tail pages don't know that they're tail pages, or where the head page is. Or maybe they're all tail pages. I could see a world where we had a virt_to_memdesc() which returned a generic memory descriptor that could be cast to a struct slab if the flags within that memdesc said it was a slab. But I think it works out better to tag the memory descriptor pointer with a discriminator that defines what the pointer is. Plus it saves a page flag. Maybe that's the best way to approach it -- how would you want to write this part of kfree() when memory descriptors are dynamically allocated?
On Wed, Oct 13, 2021 at 06:55:46PM +0100, Matthew Wilcox wrote: > On Wed, Oct 13, 2021 at 09:49:31AM -0400, Johannes Weiner wrote: > > On Wed, Oct 13, 2021 at 04:19:18AM +0100, Matthew Wilcox wrote: > > > For today, testing PageSlab on the tail page helps the test proceed > > > in parallel with the action. Looking at slub's kfree() for an example: > > > > > > page = virt_to_head_page(x); > > > if (unlikely(!PageSlab(page))) { > > > free_nonslab_page(page, object); > > > return; > > > } > > > slab_free(page->slab_cache, page, object, NULL, 1, _RET_IP_); > > > > > > Your proposal is certainly an improvement (since gcc doesn't know > > > that compound_head(compound_head(x)) == compound_head(x)), but I > > > think checking on the tail page is even better: > > > > > > page = virt_to_page(x); > > > if (unlikely(!PageSlab(page))) { > > > free_nonslab_page(compound_head(page), object); > > > return; > > > } > > > slab = page_slab(page); > > > slab_free(slab->slab_cache, slab, object, NULL, 1, _RET_IP_); > > > > > > The compound_head() parts can proceed in parallel with the check of > > > PageSlab(). > > > > > > As far as the cost of setting PageSlab, those cachelines are already > > > dirty because we set compound_head on each of those pages already > > > (or in the case of freeing, we're about to clear compound_head on > > > each of those pages). > > > > ... but this is not. I think the performance gains from this would > > have to be significant to justify complicating page flags further. > > My argument isn't really "this is more efficient", because I think > the performance gains are pretty minimal. More that I would like to > be able to write code in the style which we'll want to use when we're > using dynamically allocated memory descriptors. It's all just code, > and we can change it at any time, but better to change it to something > that continues to work well in the future. > > I don't think we end up with "virt_to_head_page()" in a dynamically > allocated memory descriptor world. The head page contains no different > information from the tail pages, and indeed the tail pages don't know > that they're tail pages, or where the head page is. Or maybe they're > all tail pages. I agree with that, but future-provisioning is a tradeoff. It'll be trivial to replace virt_to_head_page() with virt_to_page() and remove compound_head() calls when whatever is left of struct page will unconditionally point to a memory descriptor. And that can be part of the changes that make it so. But in today's codebase, maintaining type flags in the tail pages while having to go through the headpage to find the type descriptor makes things unnecessarily complicated in an area that already has accrued too much tech debt. I don't think that's a sensible thing to do as of today. > I could see a world where we had a virt_to_memdesc() which returned > a generic memory descriptor that could be cast to a struct slab if > the flags within that memdesc said it was a slab. But I think it works > out better to tag the memory descriptor pointer with a discriminator > that defines what the pointer is. Plus it saves a page flag. > > Maybe that's the best way to approach it -- how would you want to write > this part of kfree() when memory descriptors are dynamically allocated? There are still many question marks on how the split out memory descriptors actually will look like, and which state is tracked where. 'struct slab' is an excellent trial balloon. It's good to have common north stars to set the direction of where to place efforts ("small struct page, dynamically allocated descriptors etc.") but I don't think it makes sense to take on yet more tech debt in this area for a future that may not pan out the way we think now.
On Wed, Oct 13, 2021 at 03:33:00PM -0400, Johannes Weiner wrote: > On Wed, Oct 13, 2021 at 06:55:46PM +0100, Matthew Wilcox wrote: > > On Wed, Oct 13, 2021 at 09:49:31AM -0400, Johannes Weiner wrote: > > > On Wed, Oct 13, 2021 at 04:19:18AM +0100, Matthew Wilcox wrote: > > > > For today, testing PageSlab on the tail page helps the test proceed > > > > in parallel with the action. Looking at slub's kfree() for an example: > > > > > > > > page = virt_to_head_page(x); > > > > if (unlikely(!PageSlab(page))) { > > > > free_nonslab_page(page, object); > > > > return; > > > > } > > > > slab_free(page->slab_cache, page, object, NULL, 1, _RET_IP_); > > > > > > > > Your proposal is certainly an improvement (since gcc doesn't know > > > > that compound_head(compound_head(x)) == compound_head(x)), but I > > > > think checking on the tail page is even better: > > > > > > > > page = virt_to_page(x); > > > > if (unlikely(!PageSlab(page))) { > > > > free_nonslab_page(compound_head(page), object); > > > > return; > > > > } > > > > slab = page_slab(page); > > > > slab_free(slab->slab_cache, slab, object, NULL, 1, _RET_IP_); > > > > > > > > The compound_head() parts can proceed in parallel with the check of > > > > PageSlab(). > > > > > > > > As far as the cost of setting PageSlab, those cachelines are already > > > > dirty because we set compound_head on each of those pages already > > > > (or in the case of freeing, we're about to clear compound_head on > > > > each of those pages). > > > > > > ... but this is not. I think the performance gains from this would > > > have to be significant to justify complicating page flags further. > > > > My argument isn't really "this is more efficient", because I think > > the performance gains are pretty minimal. More that I would like to > > be able to write code in the style which we'll want to use when we're > > using dynamically allocated memory descriptors. It's all just code, > > and we can change it at any time, but better to change it to something > > that continues to work well in the future. > > > > I don't think we end up with "virt_to_head_page()" in a dynamically > > allocated memory descriptor world. The head page contains no different > > information from the tail pages, and indeed the tail pages don't know > > that they're tail pages, or where the head page is. Or maybe they're > > all tail pages. > > I agree with that, but future-provisioning is a tradeoff. > > It'll be trivial to replace virt_to_head_page() with virt_to_page() > and remove compound_head() calls when whatever is left of struct page > will unconditionally point to a memory descriptor. And that can be > part of the changes that make it so. I.e. remove all the *to_head() stuff when head/tail pages actually cease to be a thing, not earlier. Essentially, I think it's the right direction to pursue, but I'm not sure yet that it's exactly where we will end up. > > I could see a world where we had a virt_to_memdesc() which returned > > a generic memory descriptor that could be cast to a struct slab if > > the flags within that memdesc said it was a slab. But I think it works > > out better to tag the memory descriptor pointer with a discriminator > > that defines what the pointer is. Plus it saves a page flag. > > > > Maybe that's the best way to approach it -- how would you want to write > > this part of kfree() when memory descriptors are dynamically allocated? Yeah, or as Kent put it "how would you like the code to look like with infinite refactoring?" But that also implies we can do it in incremental, self-contained steps that each leave the code base in a better place than before. Which avoids building up dependencies on future code and unimplemented ideas that are vague, likely look different in everybody's head, or may not pan out at all.
> It's good to have common north stars to set the direction of where to > place efforts ("small struct page, dynamically allocated descriptors > etc.") but I don't think it makes sense to take on yet more tech debt > in this area for a future that may not pan out the way we think now. > That precisely sums up my thoughts, thanks Johannes.
On 10/12/21 20:01, Johannes Weiner wrote: > PageSlab() currently imposes a compound_head() call on all callsites > even though only very few situations encounter tailpages. This short > series bubbles tailpage resolution up to the few sites that need it, > and eliminates it everywhere else. > > This is a stand-alone improvement. However, it's inspired by Willy's > patches to split struct slab from struct page. Those patches currently > resolve a slab object pointer to its struct slab as follows: > > slab = virt_to_slab(p); /* tailpage resolution */ > if (slab_test_cache(slab)) { /* slab or page alloc bypass? */ > do_slab_stuff(slab); > } else { > page = (struct page *)slab; > do_page_stuff(page); > } > > which makes struct slab an ambiguous type that needs to self-identify > with the slab_test_cache() test (which in turn relies on PG_slab in > the flags field shared between page and slab). > > It would be preferable to do: > > page = virt_to_head_page(p); /* tailpage resolution */ > if (PageSlab(page)) { /* slab or page alloc bypass? */ > slab = page_slab(page); > do_slab_stuff(slab); > } else { > do_page_stuff(page); > } I agree with this. Also not fond of introducing setting PG_slab on all tail pages as willy proposed. But also would like to avoid the tree-wide changes to PageSlab tailpage resolution that this series is doing. What we could do is what you suggest above, but the few hotpaths in slab itself would use a __PageSlab() variant that just doesn't do tailpage resolution. The rest of tree could keep doing PageSlab with tailpage resolution for now, and then we can improve on that later. Would that be acceptable? With that I'm proposing to take over willy's series (as he asked for that in the cover letter) which would be modified in the above sense. And maybe in other ways I can't immediately predict. But I do want to at least try an approach where we would deal with the "boundary" functions first (that need to deal with struct page) and then convert the bulk of "struct page" to "struct slab" at once with help of a coccinelle semantic patch. I'm hoping that this wouldn't thus be a "slapdash" conversion, while avoiding a large series of incremental patches - because the result of such automation wouldn't need such close manual review as human-written patches do. > and leave the ambiguity and the need to self-identify with struct > page, so that struct slab is a strong and unambiguous type, and a > non-NULL struct slab encountered in the wild is always a valid object > without the need to check another dedicated flag for validity first. > > However, because PageSlab() currently implies tailpage resolution, > writing the virt->slab lookup in the preferred way would add yet more > unnecessary compound_head() call to the hottest MM paths. > > The page flag helpers should eventually all be weaned off of those > compound_head() calls for their unnecessary overhead alone. But this > one in particular is now getting in the way of writing code in the > preferred manner, and bleeding page ambiguity into the new types that > are supposed to eliminate specifically that. It's ripe for a cleanup. > > Based on v5.15-rc4. > > arch/ia64/kernel/mca_drv.c | 2 +- > drivers/ata/libata-sff.c | 2 +- > fs/proc/page.c | 12 +++++++----- > include/linux/net.h | 2 +- > include/linux/page-flags.h | 10 +++++----- > mm/kasan/generic.c | 2 +- > mm/kasan/kasan.h | 2 +- > mm/kasan/report.c | 4 ++-- > mm/kasan/report_tags.c | 2 +- > mm/memory-failure.c | 6 +++--- > mm/memory.c | 3 ++- > mm/slab.h | 2 +- > mm/slob.c | 4 ++-- > mm/slub.c | 6 +++--- > mm/util.c | 2 +- > 15 files changed, 32 insertions(+), 29 deletions(-) > > >