diff mbox series

mm/slub: replace alloc_pages with folio_alloc

Message ID 20220528161157.3934825-1-bh1scw@gmail.com (mailing list archive)
State New
Headers show
Series mm/slub: replace alloc_pages with folio_alloc | expand

Commit Message

FanJun Kong May 28, 2022, 4:11 p.m. UTC
From: Fanjun Kong <bh1scw@gmail.com>

This patch will use folio allocation functions for allocating pages.

Signed-off-by: Fanjun Kong <bh1scw@gmail.com>
---
 mm/slub.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Matthew Wilcox (Oracle) May 28, 2022, 4:27 p.m. UTC | #1
On Sun, May 29, 2022 at 12:11:58AM +0800, bh1scw@gmail.com wrote:
> From: Fanjun Kong <bh1scw@gmail.com>
> 
> This patch will use folio allocation functions for allocating pages.

That's not actually a good idea.  folio_alloc() will do the
prep_transhuge_page() step which isn't needed for slab.

> Signed-off-by: Fanjun Kong <bh1scw@gmail.com>
> ---
>  mm/slub.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index e5535020e0fd..00c4049a17d6 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1794,9 +1794,9 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node,
>  	unsigned int order = oo_order(oo);
>  
>  	if (node == NUMA_NO_NODE)
> -		folio = (struct folio *)alloc_pages(flags, order);
> +		folio = (struct folio *)folio_alloc(flags, order);
>  	else
> -		folio = (struct folio *)__alloc_pages_node(node, flags, order);
> +		folio = (struct folio *)__folio_alloc_node(node, flags, order);
>  
>  	if (!folio)
>  		return NULL;
> -- 
> 2.36.0
> 
>
Muchun Song May 29, 2022, 2:58 a.m. UTC | #2
On Sat, May 28, 2022 at 05:27:11PM +0100, Matthew Wilcox wrote:
> On Sun, May 29, 2022 at 12:11:58AM +0800, bh1scw@gmail.com wrote:
> > From: Fanjun Kong <bh1scw@gmail.com>
> > 
> > This patch will use folio allocation functions for allocating pages.
> 
> That's not actually a good idea.  folio_alloc() will do the
> prep_transhuge_page() step which isn't needed for slab.
>

You mean folio_alloc() is dedicated for THP allocation?  It is a little
surprise to me.  I thought folio_alloc() is just a variant of alloc_page(),
which returns a folio struct instead of a page.  Seems like I was wrong.
May I ask what made us decide to do this?

Thanks.
Matthew Wilcox (Oracle) May 29, 2022, 3:31 a.m. UTC | #3
On Sun, May 29, 2022 at 10:58:18AM +0800, Muchun Song wrote:
> On Sat, May 28, 2022 at 05:27:11PM +0100, Matthew Wilcox wrote:
> > On Sun, May 29, 2022 at 12:11:58AM +0800, bh1scw@gmail.com wrote:
> > > From: Fanjun Kong <bh1scw@gmail.com>
> > > 
> > > This patch will use folio allocation functions for allocating pages.
> > 
> > That's not actually a good idea.  folio_alloc() will do the
> > prep_transhuge_page() step which isn't needed for slab.
> 
> You mean folio_alloc() is dedicated for THP allocation?  It is a little
> surprise to me.  I thought folio_alloc() is just a variant of alloc_page(),
> which returns a folio struct instead of a page.  Seems like I was wrong.
> May I ask what made us decide to do this?

Yeah, the naming isn't great here.  The problem didn't really occur
to me until I saw this patch, and I don't have a good solution yet.
We're in the middle of a transition, but the transition is likely to
take years and I don't think we necessarily have the final form of the
transition fully agreed to or understood, so we should come up with
something better for the transition.

Ignoring the naming here, memory allocated to filesystems can be split,
but the split can fail, so they need the page-deferred-list and the
DTOR.  Memory allocated to slab cannot be split, so initialising the
page-deferred-list is a waste of time.

The end-goal is to split apart allocating the memory from allocating
its memory descriptor (which I like to call memdesc).  So for filesystem
folios, we'd call slab to allocate a struct folio and then tell the
buddy allocator "here is the memdesc of type folio, allocate
me 2^n pages and make pfn_to_memdesc return this memdesc for each of
the 2^n pages in it".

In this end-goal, slab would also allocate a struct slab (... there's
a recursion problem here which has a solution ...), and then allocate
2^n pages.  But until we're ready to shrink struct page down to one
or two words, doing this is just a waste of memory and time.

So I still don't have a good solution to receiving patches like this
other than maybe adding a comment like

	/* Do not change this to allocate a folio */

which will be ignored.
Matthew Wilcox (Oracle) May 30, 2022, 12:55 a.m. UTC | #4
On Sun, May 29, 2022 at 04:31:07AM +0100, Matthew Wilcox wrote:
> On Sun, May 29, 2022 at 10:58:18AM +0800, Muchun Song wrote:
> > On Sat, May 28, 2022 at 05:27:11PM +0100, Matthew Wilcox wrote:
> > > On Sun, May 29, 2022 at 12:11:58AM +0800, bh1scw@gmail.com wrote:
> > > > From: Fanjun Kong <bh1scw@gmail.com>
> > > > 
> > > > This patch will use folio allocation functions for allocating pages.
> > > 
> > > That's not actually a good idea.  folio_alloc() will do the
> > > prep_transhuge_page() step which isn't needed for slab.
> > 
> > You mean folio_alloc() is dedicated for THP allocation?  It is a little
> > surprise to me.  I thought folio_alloc() is just a variant of alloc_page(),
> > which returns a folio struct instead of a page.  Seems like I was wrong.
> > May I ask what made us decide to do this?
> 
> Yeah, the naming isn't great here.  The problem didn't really occur
> to me until I saw this patch, and I don't have a good solution yet.

OK, I have an idea.

None of the sl*b allocators use the page refcount.  So the
atomic operations on it are just a waste of time.  If we add an
alloc_unref_page() to match our free_unref_page(), that'll be enough
difference to stop pepole sending "helpful" patches.  Also, it'll be a
(small?) performance improvement for slab.
kernel test robot May 30, 2022, 4:10 a.m. UTC | #5
Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: 4c863a2fd728c505831f4d200c4d689b8b389a70 ("[PATCH] mm/slub: replace alloc_pages with folio_alloc")
url: https://github.com/intel-lab-lkp/linux/commits/bh1scw-gmail-com/mm-slub-replace-alloc_pages-with-folio_alloc/20220529-001434
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/linux-mm/20220528161157.3934825-1-bh1scw@gmail.com

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+---------------------------------------------+------------+------------+
|                                             | dee992745d | 4c863a2fd7 |
+---------------------------------------------+------------+------------+
| boot_successes                              | 22         | 0          |
| boot_failures                               | 0          | 18         |
| WARNING:at_mm/page_alloc.c:#__alloc_pages   | 0          | 18         |
| RIP:__alloc_pages                           | 0          | 18         |
| BUG:kernel_NULL_pointer_dereference,address | 0          | 18         |
| Oops:#[##]                                  | 0          | 18         |
| RIP:__irq_domain_alloc_irqs                 | 0          | 18         |
| Kernel_panic-not_syncing:Fatal_exception    | 0          | 18         |
+---------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@intel.com>


[    0.335509][    T0] ------------[ cut here ]------------
[    0.336091][    T0] WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:5483 __alloc_pages+0x5e/0x1e3
[    0.336971][    T0] Modules linked in:
[    0.337362][    T0] CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0-10160-g4c863a2fd728 #1
[    0.338213][    T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[    0.339282][    T0] RIP: 0010:__alloc_pages+0x5e/0x1e3
[    0.339843][    T0] Code: 41 0f ba e0 0d c7 44 24 08 01 00 00 00 f3 ab 72 20 83 fe 0a 76 24 80 3d c6 b0 93 01 00 0f 85 64 01 00 00 c6 05 b9 b0 93 01 01 <
0f> 0b e9 56 01 00 00 83 fe 0a 0f 87 4d 01 00 00 8b 3d e2 af 95 01
[    0.341881][    T0] RSP: 0000:ffffffff82603da0 EFLAGS: 00010046
[    0.342495][    T0] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[    0.343324][    T0] RDX: 0000000000000000 RSI: 0000000000012000 RDI: ffffffff82603dd8
[    0.344164][    T0] RBP: 0000000000000000 R08: 0000000000040000 R09: 0000000000000003
[    0.344979][    T0] R10: 0000000000001000 R11: 0000000000000100 R12: 0000000000000040
[    0.345793][    T0] R13: 0000000000012000 R14: 0000000000000000 R15: 0000000000000000
[    0.346605][    T0] FS:  0000000000000000(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000
[    0.347516][    T0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.348204][    T0] CR2: ffff88843ffff000 CR3: 0000000002612000 CR4: 00000000000406b0
[    0.349023][    T0] Call Trace:
[    0.349357][    T0]  <TASK>
[    0.349651][    T0]  __folio_alloc+0x15/0x32
[    0.350105][    T0]  alloc_slab_page+0x48/0x61
[    0.350572][    T0]  allocate_slab+0x5b/0x1b2
[    0.351031][    T0]  init_kmem_cache_nodes+0x4f/0x1cc
[    0.351679][    T0]  kmem_cache_open+0x128/0x18b
[    0.352161][    T0]  __kmem_cache_create+0x11/0x52
[    0.352664][    T0]  create_boot_cache+0x6c/0x96
[    0.353150][    T0]  kmem_cache_init+0x89/0x150
[    0.353628][    T0]  start_kernel+0x210/0x4ae
[    0.354085][    T0]  secondary_startup_64_no_verify+0xe0/0xeb
[    0.354695][    T0]  </TASK>
[    0.354999][    T0] ---[ end trace 0000000000000000 ]---



To reproduce:

        # build kernel
	cd linux
	cp config-5.18.0-10160-g4c863a2fd728 .config
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.
diff mbox series

Patch

diff --git a/mm/slub.c b/mm/slub.c
index e5535020e0fd..00c4049a17d6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1794,9 +1794,9 @@  static inline struct slab *alloc_slab_page(gfp_t flags, int node,
 	unsigned int order = oo_order(oo);
 
 	if (node == NUMA_NO_NODE)
-		folio = (struct folio *)alloc_pages(flags, order);
+		folio = (struct folio *)folio_alloc(flags, order);
 	else
-		folio = (struct folio *)__alloc_pages_node(node, flags, order);
+		folio = (struct folio *)__folio_alloc_node(node, flags, order);
 
 	if (!folio)
 		return NULL;