Message ID | 20220528161157.3934825-1-bh1scw@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/slub: replace alloc_pages with folio_alloc | expand |
On Sun, May 29, 2022 at 12:11:58AM +0800, bh1scw@gmail.com wrote: > From: Fanjun Kong <bh1scw@gmail.com> > > This patch will use folio allocation functions for allocating pages. That's not actually a good idea. folio_alloc() will do the prep_transhuge_page() step which isn't needed for slab. > Signed-off-by: Fanjun Kong <bh1scw@gmail.com> > --- > mm/slub.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/slub.c b/mm/slub.c > index e5535020e0fd..00c4049a17d6 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1794,9 +1794,9 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node, > unsigned int order = oo_order(oo); > > if (node == NUMA_NO_NODE) > - folio = (struct folio *)alloc_pages(flags, order); > + folio = (struct folio *)folio_alloc(flags, order); > else > - folio = (struct folio *)__alloc_pages_node(node, flags, order); > + folio = (struct folio *)__folio_alloc_node(node, flags, order); > > if (!folio) > return NULL; > -- > 2.36.0 > >
On Sat, May 28, 2022 at 05:27:11PM +0100, Matthew Wilcox wrote: > On Sun, May 29, 2022 at 12:11:58AM +0800, bh1scw@gmail.com wrote: > > From: Fanjun Kong <bh1scw@gmail.com> > > > > This patch will use folio allocation functions for allocating pages. > > That's not actually a good idea. folio_alloc() will do the > prep_transhuge_page() step which isn't needed for slab. > You mean folio_alloc() is dedicated for THP allocation? It is a little surprise to me. I thought folio_alloc() is just a variant of alloc_page(), which returns a folio struct instead of a page. Seems like I was wrong. May I ask what made us decide to do this? Thanks.
On Sun, May 29, 2022 at 10:58:18AM +0800, Muchun Song wrote: > On Sat, May 28, 2022 at 05:27:11PM +0100, Matthew Wilcox wrote: > > On Sun, May 29, 2022 at 12:11:58AM +0800, bh1scw@gmail.com wrote: > > > From: Fanjun Kong <bh1scw@gmail.com> > > > > > > This patch will use folio allocation functions for allocating pages. > > > > That's not actually a good idea. folio_alloc() will do the > > prep_transhuge_page() step which isn't needed for slab. > > You mean folio_alloc() is dedicated for THP allocation? It is a little > surprise to me. I thought folio_alloc() is just a variant of alloc_page(), > which returns a folio struct instead of a page. Seems like I was wrong. > May I ask what made us decide to do this? Yeah, the naming isn't great here. The problem didn't really occur to me until I saw this patch, and I don't have a good solution yet. We're in the middle of a transition, but the transition is likely to take years and I don't think we necessarily have the final form of the transition fully agreed to or understood, so we should come up with something better for the transition. Ignoring the naming here, memory allocated to filesystems can be split, but the split can fail, so they need the page-deferred-list and the DTOR. Memory allocated to slab cannot be split, so initialising the page-deferred-list is a waste of time. The end-goal is to split apart allocating the memory from allocating its memory descriptor (which I like to call memdesc). So for filesystem folios, we'd call slab to allocate a struct folio and then tell the buddy allocator "here is the memdesc of type folio, allocate me 2^n pages and make pfn_to_memdesc return this memdesc for each of the 2^n pages in it". In this end-goal, slab would also allocate a struct slab (... there's a recursion problem here which has a solution ...), and then allocate 2^n pages. But until we're ready to shrink struct page down to one or two words, doing this is just a waste of memory and time. So I still don't have a good solution to receiving patches like this other than maybe adding a comment like /* Do not change this to allocate a folio */ which will be ignored.
On Sun, May 29, 2022 at 04:31:07AM +0100, Matthew Wilcox wrote: > On Sun, May 29, 2022 at 10:58:18AM +0800, Muchun Song wrote: > > On Sat, May 28, 2022 at 05:27:11PM +0100, Matthew Wilcox wrote: > > > On Sun, May 29, 2022 at 12:11:58AM +0800, bh1scw@gmail.com wrote: > > > > From: Fanjun Kong <bh1scw@gmail.com> > > > > > > > > This patch will use folio allocation functions for allocating pages. > > > > > > That's not actually a good idea. folio_alloc() will do the > > > prep_transhuge_page() step which isn't needed for slab. > > > > You mean folio_alloc() is dedicated for THP allocation? It is a little > > surprise to me. I thought folio_alloc() is just a variant of alloc_page(), > > which returns a folio struct instead of a page. Seems like I was wrong. > > May I ask what made us decide to do this? > > Yeah, the naming isn't great here. The problem didn't really occur > to me until I saw this patch, and I don't have a good solution yet. OK, I have an idea. None of the sl*b allocators use the page refcount. So the atomic operations on it are just a waste of time. If we add an alloc_unref_page() to match our free_unref_page(), that'll be enough difference to stop pepole sending "helpful" patches. Also, it'll be a (small?) performance improvement for slab.
Greeting, FYI, we noticed the following commit (built with gcc-11): commit: 4c863a2fd728c505831f4d200c4d689b8b389a70 ("[PATCH] mm/slub: replace alloc_pages with folio_alloc") url: https://github.com/intel-lab-lkp/linux/commits/bh1scw-gmail-com/mm-slub-replace-alloc_pages-with-folio_alloc/20220529-001434 base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/linux-mm/20220528161157.3934825-1-bh1scw@gmail.com in testcase: boot on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): +---------------------------------------------+------------+------------+ | | dee992745d | 4c863a2fd7 | +---------------------------------------------+------------+------------+ | boot_successes | 22 | 0 | | boot_failures | 0 | 18 | | WARNING:at_mm/page_alloc.c:#__alloc_pages | 0 | 18 | | RIP:__alloc_pages | 0 | 18 | | BUG:kernel_NULL_pointer_dereference,address | 0 | 18 | | Oops:#[##] | 0 | 18 | | RIP:__irq_domain_alloc_irqs | 0 | 18 | | Kernel_panic-not_syncing:Fatal_exception | 0 | 18 | +---------------------------------------------+------------+------------+ If you fix the issue, kindly add following tag Reported-by: kernel test robot <oliver.sang@intel.com> [ 0.335509][ T0] ------------[ cut here ]------------ [ 0.336091][ T0] WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:5483 __alloc_pages+0x5e/0x1e3 [ 0.336971][ T0] Modules linked in: [ 0.337362][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0-10160-g4c863a2fd728 #1 [ 0.338213][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 [ 0.339282][ T0] RIP: 0010:__alloc_pages+0x5e/0x1e3 [ 0.339843][ T0] Code: 41 0f ba e0 0d c7 44 24 08 01 00 00 00 f3 ab 72 20 83 fe 0a 76 24 80 3d c6 b0 93 01 00 0f 85 64 01 00 00 c6 05 b9 b0 93 01 01 < 0f> 0b e9 56 01 00 00 83 fe 0a 0f 87 4d 01 00 00 8b 3d e2 af 95 01 [ 0.341881][ T0] RSP: 0000:ffffffff82603da0 EFLAGS: 00010046 [ 0.342495][ T0] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 0.343324][ T0] RDX: 0000000000000000 RSI: 0000000000012000 RDI: ffffffff82603dd8 [ 0.344164][ T0] RBP: 0000000000000000 R08: 0000000000040000 R09: 0000000000000003 [ 0.344979][ T0] R10: 0000000000001000 R11: 0000000000000100 R12: 0000000000000040 [ 0.345793][ T0] R13: 0000000000012000 R14: 0000000000000000 R15: 0000000000000000 [ 0.346605][ T0] FS: 0000000000000000(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000 [ 0.347516][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.348204][ T0] CR2: ffff88843ffff000 CR3: 0000000002612000 CR4: 00000000000406b0 [ 0.349023][ T0] Call Trace: [ 0.349357][ T0] <TASK> [ 0.349651][ T0] __folio_alloc+0x15/0x32 [ 0.350105][ T0] alloc_slab_page+0x48/0x61 [ 0.350572][ T0] allocate_slab+0x5b/0x1b2 [ 0.351031][ T0] init_kmem_cache_nodes+0x4f/0x1cc [ 0.351679][ T0] kmem_cache_open+0x128/0x18b [ 0.352161][ T0] __kmem_cache_create+0x11/0x52 [ 0.352664][ T0] create_boot_cache+0x6c/0x96 [ 0.353150][ T0] kmem_cache_init+0x89/0x150 [ 0.353628][ T0] start_kernel+0x210/0x4ae [ 0.354085][ T0] secondary_startup_64_no_verify+0xe0/0xeb [ 0.354695][ T0] </TASK> [ 0.354999][ T0] ---[ end trace 0000000000000000 ]--- To reproduce: # build kernel cd linux cp config-5.18.0-10160-g4c863a2fd728 .config make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install cd <mod-install-dir> find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state.
diff --git a/mm/slub.c b/mm/slub.c index e5535020e0fd..00c4049a17d6 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1794,9 +1794,9 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node, unsigned int order = oo_order(oo); if (node == NUMA_NO_NODE) - folio = (struct folio *)alloc_pages(flags, order); + folio = (struct folio *)folio_alloc(flags, order); else - folio = (struct folio *)__alloc_pages_node(node, flags, order); + folio = (struct folio *)__folio_alloc_node(node, flags, order); if (!folio) return NULL;