diff mbox series

[v4,3/3] mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages

Message ID 20211223094435.248523-4-bhe@redhat.com (mailing list archive)
State New
Headers show
Series Handle warning of allocation failure on DMA zone w/o managed pages | expand

Commit Message

Baoquan He Dec. 23, 2021, 9:44 a.m. UTC
In kdump kernel of x86_64, page allocation failure is observed:

 kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 55 Comm: kworker/u2:2 Not tainted 5.16.0-rc4+ #5
 Hardware name: AMD Dinar/Dinar, BIOS RDN1505B 06/05/2013
 Workqueue: events_unbound async_run_entry_fn
 Call Trace:
  <TASK>
  dump_stack_lvl+0x48/0x5e
  warn_alloc.cold+0x72/0xd6
  __alloc_pages_slowpath.constprop.0+0xc69/0xcd0
  __alloc_pages+0x1df/0x210
  new_slab+0x389/0x4d0
  ___slab_alloc+0x58f/0x770
  __slab_alloc.constprop.0+0x4a/0x80
  kmem_cache_alloc_trace+0x24b/0x2c0
  sr_probe+0x1db/0x620
  ......
  device_add+0x405/0x920
  ......
  __scsi_add_device+0xe5/0x100
  ata_scsi_scan_host+0x97/0x1d0
  async_run_entry_fn+0x30/0x130
  process_one_work+0x1e8/0x3c0
  worker_thread+0x50/0x3b0
  ? rescuer_thread+0x350/0x350
  kthread+0x16b/0x190
  ? set_kthread_struct+0x40/0x40
  ret_from_fork+0x22/0x30
  </TASK>
 Mem-Info:
 ......

The above failure happened when calling kmalloc() to allocate buffer with
GFP_DMA. It requests to allocate slab page from DMA zone while no managed
pages at all in there.
 sr_probe()
 --> get_capabilities()
     --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);

Because in the current kernel, dma-kmalloc will be created as long as
CONFIG_ZONE_DMA is enabled. However, kdump kernel of x86_64 doesn't have
managed pages on DMA zone since commit 6f599d84231f ("x86/kdump: Always
reserve the low 1M when the crashkernel option is specified"). The failure
can be always reproduced.

For now, let's mute the warning of allocation failure if requesting pages
from DMA zone while no managed pages.

Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Cc: stable@vger.kernel.org
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

John Donnelly Dec. 23, 2021, 3:01 p.m. UTC | #1
On 12/23/21 3:44 AM, Baoquan He wrote:
> In kdump kernel of x86_64, page allocation failure is observed:
> 
>   kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>   CPU: 0 PID: 55 Comm: kworker/u2:2 Not tainted 5.16.0-rc4+ #5
>   Hardware name: AMD Dinar/Dinar, BIOS RDN1505B 06/05/2013
>   Workqueue: events_unbound async_run_entry_fn
>   Call Trace:
>    <TASK>
>    dump_stack_lvl+0x48/0x5e
>    warn_alloc.cold+0x72/0xd6
>    __alloc_pages_slowpath.constprop.0+0xc69/0xcd0
>    __alloc_pages+0x1df/0x210
>    new_slab+0x389/0x4d0
>    ___slab_alloc+0x58f/0x770
>    __slab_alloc.constprop.0+0x4a/0x80
>    kmem_cache_alloc_trace+0x24b/0x2c0
>    sr_probe+0x1db/0x620
>    ......
>    device_add+0x405/0x920
>    ......
>    __scsi_add_device+0xe5/0x100
>    ata_scsi_scan_host+0x97/0x1d0
>    async_run_entry_fn+0x30/0x130
>    process_one_work+0x1e8/0x3c0
>    worker_thread+0x50/0x3b0
>    ? rescuer_thread+0x350/0x350
>    kthread+0x16b/0x190
>    ? set_kthread_struct+0x40/0x40
>    ret_from_fork+0x22/0x30
>    </TASK>
>   Mem-Info:
>   ......
> 
> The above failure happened when calling kmalloc() to allocate buffer with
> GFP_DMA. It requests to allocate slab page from DMA zone while no managed
> pages at all in there.
>   sr_probe()
>   --> get_capabilities()
>       --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);
> 
> Because in the current kernel, dma-kmalloc will be created as long as
> CONFIG_ZONE_DMA is enabled. However, kdump kernel of x86_64 doesn't have
> managed pages on DMA zone since commit 6f599d84231f ("x86/kdump: Always
> reserve the low 1M when the crashkernel option is specified"). The failure
> can be always reproduced.
> 
> For now, let's mute the warning of allocation failure if requesting pages
> from DMA zone while no managed pages.
> 
> Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
> Cc: stable@vger.kernel.org
> Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: John Donnelly  <john.p.donnelly@oracle.com>


> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> ---
>   mm/page_alloc.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7c7a0b5de2ff..843bc8e5550a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
>   	va_list args;
>   	static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
>   
> -	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
> +	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
> +		(gfp_mask & __GFP_DMA) && !has_managed_dma())
>   		return;
>   
>   	va_start(args, fmt);
Hyeonggon Yoo Dec. 25, 2021, 5:53 a.m. UTC | #2
On Thu, Dec 23, 2021 at 05:44:35PM +0800, Baoquan He wrote:
> In kdump kernel of x86_64, page allocation failure is observed:
> 
>  kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>  CPU: 0 PID: 55 Comm: kworker/u2:2 Not tainted 5.16.0-rc4+ #5
>  Hardware name: AMD Dinar/Dinar, BIOS RDN1505B 06/05/2013
>  Workqueue: events_unbound async_run_entry_fn
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x48/0x5e
>   warn_alloc.cold+0x72/0xd6
>   __alloc_pages_slowpath.constprop.0+0xc69/0xcd0
>   __alloc_pages+0x1df/0x210
>   new_slab+0x389/0x4d0
>   ___slab_alloc+0x58f/0x770
>   __slab_alloc.constprop.0+0x4a/0x80
>   kmem_cache_alloc_trace+0x24b/0x2c0
>   sr_probe+0x1db/0x620
>   ......
>   device_add+0x405/0x920
>   ......
>   __scsi_add_device+0xe5/0x100
>   ata_scsi_scan_host+0x97/0x1d0
>   async_run_entry_fn+0x30/0x130
>   process_one_work+0x1e8/0x3c0
>   worker_thread+0x50/0x3b0
>   ? rescuer_thread+0x350/0x350
>   kthread+0x16b/0x190
>   ? set_kthread_struct+0x40/0x40
>   ret_from_fork+0x22/0x30
>   </TASK>
>  Mem-Info:
>  ......
> 
> The above failure happened when calling kmalloc() to allocate buffer with
> GFP_DMA. It requests to allocate slab page from DMA zone while no managed
> pages at all in there.
>  sr_probe()
>  --> get_capabilities()
>      --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);
> 
> Because in the current kernel, dma-kmalloc will be created as long as
> CONFIG_ZONE_DMA is enabled. However, kdump kernel of x86_64 doesn't have
> managed pages on DMA zone since commit 6f599d84231f ("x86/kdump: Always
> reserve the low 1M when the crashkernel option is specified"). The failure
> can be always reproduced.
> 
> For now, let's mute the warning of allocation failure if requesting pages
> from DMA zone while no managed pages.
> 
> Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
> Cc: stable@vger.kernel.org
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/page_alloc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7c7a0b5de2ff..843bc8e5550a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
>  	va_list args;
>  	static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
>  
> -	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
> +	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
> +		(gfp_mask & __GFP_DMA) && !has_managed_dma())
>  		return;
>

Warning when there's always no page in DMA zone is unnecessary 
and it confuses user.

The patch looks good.
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

And there is some driers that allocate memory with GFP_DMA
even if that flag is unnecessary. We need to do cleanup later.

Baoquan Are you planning to do it soon?
I want to help that.

Merry Christmas,
Hyeonggon

>  	va_start(args, fmt);
> -- 
> 2.26.3
> 
>
Baoquan He Dec. 27, 2021, 8:32 a.m. UTC | #3
On 12/25/21 at 05:53am, Hyeonggon Yoo wrote:
> On Thu, Dec 23, 2021 at 05:44:35PM +0800, Baoquan He wrote:
...... 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 7c7a0b5de2ff..843bc8e5550a 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4204,7 +4204,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
> >  	va_list args;
> >  	static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
> >  
> > -	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
> > +	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
> > +		(gfp_mask & __GFP_DMA) && !has_managed_dma())
> >  		return;
> >
> 
> Warning when there's always no page in DMA zone is unnecessary 
> and it confuses user.
> 
> The patch looks good.
> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> 
> And there is some driers that allocate memory with GFP_DMA
> even if that flag is unnecessary. We need to do cleanup later.

Thanks for reviewing and giving out some awesome suggestions.

> 
> Baoquan Are you planning to do it soon?
> I want to help that.

Yes, I had the plan and have done a little part. I talked to Christoph
about my thought. I planned to collect all kmalloc(GFP_DMA) callsite and
post a RFC mail, CC mailing list and maintainers related. Anyone
interested or know one or several callsites well can help.

Now, Christoph has handled all under drviers/scsi, and post patches to
fix them. I have gone throug those places and found out below callsites
where we can remove GFP_DMA directly when calling kmalloc() since not
necessary. And even found one place kmalloc(GFP_DMA32).

(HEAD -> master) vxge: don't use GFP_DMA
mtd: rawnand: marvell: don't use GFP_DMA
HID: intel-ish-hid: remove wrong GFP_DMA32 flag
ps3disk: don't use GFP_DMA
atm: iphase: don't use GFP_DMA

Next, I will send a RFC mail to contain those suspect callsites. We can
track them and can help if needed. Suggest to change them with:
1) using dma_alloc_xx , or dma_map_xx after kmalloc()
2) using alloc_pages(GFP_DMA) instead

When we fix, we all post patch with subject key words as
'xxxx: don't use GFP_DMA'. Christoph has posted patch with the similar
subject, we can search subject to get all related patches for later back
porting.

I will add you to CC when sending. Could be tomorrow. Any suggestion or thought?

Thanks
Baoquan
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7c7a0b5de2ff..843bc8e5550a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4204,7 +4204,8 @@  void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
 	va_list args;
 	static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);
 
-	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
+	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
+		(gfp_mask & __GFP_DMA) && !has_managed_dma())
 		return;
 
 	va_start(args, fmt);