diff mbox series

zram_drv: add __GFP_NOMEMALLOC not to use ALLOC_NO_WATERMARKS

Message ID 20220603055747.11694-1-jaewon31.kim@samsung.com (mailing list archive)
State New
Headers show
Series zram_drv: add __GFP_NOMEMALLOC not to use ALLOC_NO_WATERMARKS | expand

Commit Message

Jaewon Kim June 3, 2022, 5:57 a.m. UTC
The atomic page allocation failure sometimes happened, and most of them
seem to occur during boot time.

<4>[   59.707645] system_server: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=foreground-boost,mems_allowed=0
<4>[   59.707676] CPU: 5 PID: 1209 Comm: system_server Tainted: G S O      5.4.161-qgki-24219806-abA236USQU0AVE1 #1
<4>[   59.707691] Call trace:
<4>[   59.707702]  dump_backtrace.cfi_jt+0x0/0x4
<4>[   59.707712]  show_stack+0x18/0x24
<4>[   59.707719]  dump_stack+0xa4/0xe0
<4>[   59.707728]  warn_alloc+0x114/0x194
<4>[   59.707734]  __alloc_pages_slowpath+0x828/0x83c
<4>[   59.707740]  __alloc_pages_nodemask+0x2b4/0x310
<4>[   59.707747]  alloc_slab_page+0x40/0x5c8
<4>[   59.707753]  new_slab+0x404/0x420
<4>[   59.707759]  ___slab_alloc+0x224/0x3b0
<4>[   59.707765]  __kmalloc+0x37c/0x394
<4>[   59.707773]  context_struct_to_string+0x110/0x1b8
<4>[   59.707778]  context_add_hash+0x6c/0xc8
<4>[   59.707785]  security_compute_sid.llvm.13699573597798246927+0x508/0x5d8
<4>[   59.707792]  security_transition_sid+0x2c/0x38
<4>[   59.707804]  selinux_socket_create+0xa0/0xd8
<4>[   59.707811]  security_socket_create+0x68/0xbc
<4>[   59.707818]  __sock_create+0x8c/0x2f8
<4>[   59.707823]  __sys_socket+0x94/0x19c
<4>[   59.707829]  __arm64_sys_socket+0x20/0x30
<4>[   59.707836]  el0_svc_common+0x100/0x1e0
<4>[   59.707841]  el0_svc_handler+0x68/0x74
<4>[   59.707848]  el0_svc+0x8/0xc
<4>[   59.707853] Mem-Info:
<4>[   59.707890] active_anon:223569 inactive_anon:74412 isolated_anon:0
<4>[   59.707890]  active_file:51395 inactive_file:176622 isolated_file:0
<4>[   59.707890]  unevictable:1018 dirty:211 writeback:4 unstable:0
<4>[   59.707890]  slab_reclaimable:14398 slab_unreclaimable:61909
<4>[   59.707890]  mapped:134779 shmem:1231 pagetables:26706 bounce:0
<4>[   59.707890]  free:528 free_pcp:844 free_cma:147
<4>[   59.707900] Node 0 active_anon:894276kB inactive_anon:297648kB active_file:205580kB inactive_file:706488kB unevictable:4072kB isolated(anon):0kB isolated(file):0kB mapped:539116kB dirty:844kB writeback:16kB shmem:4924kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
<4>[   59.707912] Normal free:2112kB min:7244kB low:68892kB high:72180kB active_anon:893140kB inactive_anon:297660kB active_file:204740kB inactive_file:706396kB unevictable:4072kB writepending:860kB present:3626812kB managed:3288700kB mlocked:4068kB kernel_stack:62416kB shadow_call_stack:15656kB pagetables:106824kB bounce:0kB free_pcp:3372kB local_pcp:176kB free_cma:588kB
<4>[   59.707915] lowmem_reserve[]: 0 0
<4>[   59.707922] Normal: 8*4kB (H) 5*8kB (H) 13*16kB (H) 25*32kB (H) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1080kB
<4>[   59.707942] 242549 total pagecache pages
<4>[   59.707951] 12446 pages in swap cache
<4>[   59.707956] Swap cache stats: add 212408, delete 199969, find 36869/71571
<4>[   59.707961] Free swap  = 3445756kB
<4>[   59.707965] Total swap = 4194300kB
<4>[   59.707969] 906703 pages RAM
<4>[   59.707973] 0 pages HighMem/MovableOnly
<4>[   59.707978] 84528 pages reserved
<4>[   59.707982] 49152 pages cma reserved

The kswapd or other reclaim contexts may not prepare enough free pages
for too many atomic allocations occurred in short time. But zram may not
be helpful for this atomic allocation even though zram is used to
reclaim.

To get one zs object for a specific size, zram may allocate serveral
pages. And this can be happened on different class sizes at the same
time. It means zram may consume more pages to reclaim only one page.
This inefficiency may consume all free pages below watmerk min by a
process having PF_MEMALLOC like kswapd.

We can avoid this by adding __GFP_NOMEMALLOC. PF_MEMALLOC process won't
use ALLOC_NO_WATERMARKS.

Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
---
 drivers/block/zram/zram_drv.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Minchan Kim June 6, 2022, 7:46 p.m. UTC | #1
On Fri, Jun 03, 2022 at 02:57:47PM +0900, Jaewon Kim wrote:
> The atomic page allocation failure sometimes happened, and most of them
> seem to occur during boot time.
> 
> <4>[   59.707645] system_server: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=foreground-boost,mems_allowed=0
> <4>[   59.707676] CPU: 5 PID: 1209 Comm: system_server Tainted: G S O      5.4.161-qgki-24219806-abA236USQU0AVE1 #1
> <4>[   59.707691] Call trace:
> <4>[   59.707702]  dump_backtrace.cfi_jt+0x0/0x4
> <4>[   59.707712]  show_stack+0x18/0x24
> <4>[   59.707719]  dump_stack+0xa4/0xe0
> <4>[   59.707728]  warn_alloc+0x114/0x194
> <4>[   59.707734]  __alloc_pages_slowpath+0x828/0x83c
> <4>[   59.707740]  __alloc_pages_nodemask+0x2b4/0x310
> <4>[   59.707747]  alloc_slab_page+0x40/0x5c8
> <4>[   59.707753]  new_slab+0x404/0x420
> <4>[   59.707759]  ___slab_alloc+0x224/0x3b0
> <4>[   59.707765]  __kmalloc+0x37c/0x394
> <4>[   59.707773]  context_struct_to_string+0x110/0x1b8
> <4>[   59.707778]  context_add_hash+0x6c/0xc8
> <4>[   59.707785]  security_compute_sid.llvm.13699573597798246927+0x508/0x5d8
> <4>[   59.707792]  security_transition_sid+0x2c/0x38
> <4>[   59.707804]  selinux_socket_create+0xa0/0xd8
> <4>[   59.707811]  security_socket_create+0x68/0xbc
> <4>[   59.707818]  __sock_create+0x8c/0x2f8
> <4>[   59.707823]  __sys_socket+0x94/0x19c
> <4>[   59.707829]  __arm64_sys_socket+0x20/0x30
> <4>[   59.707836]  el0_svc_common+0x100/0x1e0
> <4>[   59.707841]  el0_svc_handler+0x68/0x74
> <4>[   59.707848]  el0_svc+0x8/0xc
> <4>[   59.707853] Mem-Info:
> <4>[   59.707890] active_anon:223569 inactive_anon:74412 isolated_anon:0
> <4>[   59.707890]  active_file:51395 inactive_file:176622 isolated_file:0
> <4>[   59.707890]  unevictable:1018 dirty:211 writeback:4 unstable:0
> <4>[   59.707890]  slab_reclaimable:14398 slab_unreclaimable:61909
> <4>[   59.707890]  mapped:134779 shmem:1231 pagetables:26706 bounce:0
> <4>[   59.707890]  free:528 free_pcp:844 free_cma:147
> <4>[   59.707900] Node 0 active_anon:894276kB inactive_anon:297648kB active_file:205580kB inactive_file:706488kB unevictable:4072kB isolated(anon):0kB isolated(file):0kB mapped:539116kB dirty:844kB writeback:16kB shmem:4924kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> <4>[   59.707912] Normal free:2112kB min:7244kB low:68892kB high:72180kB active_anon:893140kB inactive_anon:297660kB active_file:204740kB inactive_file:706396kB unevictable:4072kB writepending:860kB present:3626812kB managed:3288700kB mlocked:4068kB kernel_stack:62416kB shadow_call_stack:15656kB pagetables:106824kB bounce:0kB free_pcp:3372kB local_pcp:176kB free_cma:588kB
> <4>[   59.707915] lowmem_reserve[]: 0 0
> <4>[   59.707922] Normal: 8*4kB (H) 5*8kB (H) 13*16kB (H) 25*32kB (H) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1080kB
> <4>[   59.707942] 242549 total pagecache pages
> <4>[   59.707951] 12446 pages in swap cache
> <4>[   59.707956] Swap cache stats: add 212408, delete 199969, find 36869/71571
> <4>[   59.707961] Free swap  = 3445756kB
> <4>[   59.707965] Total swap = 4194300kB
> <4>[   59.707969] 906703 pages RAM
> <4>[   59.707973] 0 pages HighMem/MovableOnly
> <4>[   59.707978] 84528 pages reserved
> <4>[   59.707982] 49152 pages cma reserved
> 
> The kswapd or other reclaim contexts may not prepare enough free pages
> for too many atomic allocations occurred in short time. But zram may not
> be helpful for this atomic allocation even though zram is used to
> reclaim.
> 
> To get one zs object for a specific size, zram may allocate serveral
> pages. And this can be happened on different class sizes at the same
> time. It means zram may consume more pages to reclaim only one page.
> This inefficiency may consume all free pages below watmerk min by a
> process having PF_MEMALLOC like kswapd.

However, that's how zram has worked for a long time(allocate memory
under memory pressure) and many folks already have raised min_free_kbytes
when they use zram as swap. If we don't allow the allocation, swap out
fails easier than old, which would break existing tunes.

> 
> We can avoid this by adding __GFP_NOMEMALLOC. PF_MEMALLOC process won't
> use ALLOC_NO_WATERMARKS.
> 
> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
> ---
>  drivers/block/zram/zram_drv.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index b8549c61ff2c..39cd1397ed3b 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -1383,6 +1383,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
>  
>  	handle = zs_malloc(zram->mem_pool, comp_len,
>  			__GFP_KSWAPD_RECLAIM |
> +			__GFP_NOMEMALLOC |
>  			__GFP_NOWARN |
>  			__GFP_HIGHMEM |
>  			__GFP_MOVABLE);
> -- 
> 2.17.1
> 
>
Andrew Morton June 6, 2022, 7:59 p.m. UTC | #2
On Mon, 6 Jun 2022 12:46:38 -0700 Minchan Kim <minchan@kernel.org> wrote:

> On Fri, Jun 03, 2022 at 02:57:47PM +0900, Jaewon Kim wrote:
> > The atomic page allocation failure sometimes happened, and most of them
> > seem to occur during boot time.
> > 
> > <4>[   59.707645] system_server: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=foreground-boost,mems_allowed=0
>
> ...
>
> > 
> > The kswapd or other reclaim contexts may not prepare enough free pages
> > for too many atomic allocations occurred in short time. But zram may not
> > be helpful for this atomic allocation even though zram is used to
> > reclaim.
> > 
> > To get one zs object for a specific size, zram may allocate serveral
> > pages. And this can be happened on different class sizes at the same
> > time. It means zram may consume more pages to reclaim only one page.
> > This inefficiency may consume all free pages below watmerk min by a
> > process having PF_MEMALLOC like kswapd.
> 
> However, that's how zram has worked for a long time(allocate memory
> under memory pressure) and many folks already have raised min_free_kbytes
> when they use zram as swap. If we don't allow the allocation, swap out
> fails easier than old, which would break existing tunes.

So is there a better way of preventing this warning?  Just suppress it
with __GFP_NOWARN?
Minchan Kim June 6, 2022, 8:48 p.m. UTC | #3
On Mon, Jun 06, 2022 at 12:59:39PM -0700, Andrew Morton wrote:
> On Mon, 6 Jun 2022 12:46:38 -0700 Minchan Kim <minchan@kernel.org> wrote:
> 
> > On Fri, Jun 03, 2022 at 02:57:47PM +0900, Jaewon Kim wrote:
> > > The atomic page allocation failure sometimes happened, and most of them
> > > seem to occur during boot time.
> > > 
> > > <4>[   59.707645] system_server: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=foreground-boost,mems_allowed=0
> >
> > ...
> >
> > > 
> > > The kswapd or other reclaim contexts may not prepare enough free pages
> > > for too many atomic allocations occurred in short time. But zram may not
> > > be helpful for this atomic allocation even though zram is used to
> > > reclaim.
> > > 
> > > To get one zs object for a specific size, zram may allocate serveral
> > > pages. And this can be happened on different class sizes at the same
> > > time. It means zram may consume more pages to reclaim only one page.
> > > This inefficiency may consume all free pages below watmerk min by a
> > > process having PF_MEMALLOC like kswapd.
> > 
> > However, that's how zram has worked for a long time(allocate memory
> > under memory pressure) and many folks already have raised min_free_kbytes
> > when they use zram as swap. If we don't allow the allocation, swap out
> > fails easier than old, which would break existing tunes.
> 
> So is there a better way of preventing this warning?  Just suppress it
> with __GFP_NOWARN?

For me, I usually tries to remove GFP_ATOMIC alllocation since the
atomic allocation can be failed easily(zram is not only source for
it). Otherwise, increase min_free_kbytes?
Jaewon Kim June 7, 2022, 1:17 a.m. UTC | #4
> 
> 
>--------- Original Message ---------
>Sender : Minchan Kim <minchan@kernel.org>
>Date : 2022-06-07 05:48 (GMT+9)
>Title : Re: [PATCH] zram_drv: add __GFP_NOMEMALLOC not to use ALLOC_NO_WATERMARKS
> 
>On Mon, Jun 06, 2022 at 12:59:39PM -0700, Andrew Morton wrote:
>> On Mon, 6 Jun 2022 12:46:38 -0700 Minchan Kim <minchan@kernel.org> wrote:
>> 
>> > On Fri, Jun 03, 2022 at 02:57:47PM +0900, Jaewon Kim wrote:
>> > > The atomic page allocation failure sometimes happened, and most of them
>> > > seem to occur during boot time.
>> > > 
>> > > <4>[   59.707645] system_server: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=foreground-boost,mems_allowed=0
>> >
>> > ...
>> >
>> > > 
>> > > The kswapd or other reclaim contexts may not prepare enough free pages
>> > > for too many atomic allocations occurred in short time. But zram may not
>> > > be helpful for this atomic allocation even though zram is used to
>> > > reclaim.
>> > > 
>> > > To get one zs object for a specific size, zram may allocate serveral
>> > > pages. And this can be happened on different class sizes at the same
>> > > time. It means zram may consume more pages to reclaim only one page.
>> > > This inefficiency may consume all free pages below watmerk min by a
>> > > process having PF_MEMALLOC like kswapd.
>> > 
>> > However, that's how zram has worked for a long time(allocate memory
>> > under memory pressure) and many folks already have raised min_free_kbytes
>> > when they use zram as swap. If we don't allow the allocation, swap out
>> > fails easier than old, which would break existing tunes.


Hello.

Yes correct. We may need to tune again to swap out as much as we did.

But on my experiment, there were quite many zram allocations which might
be failed unless it has the ALLOC_NO_WATERMARKS. I thought the zram
allocations seem to be easy to affect atomic allocation failure.

>> 
>> So is there a better way of preventing this warning?  Just suppress it
>> with __GFP_NOWARN?
> 
>For me, I usually tries to remove GFP_ATOMIC alllocation since the
>atomic allocation can be failed easily(zram is not only source for
>it). Otherwise, increase min_free_kbytes?
> 

I also hope driver developers to handle this atomic allocation failure.
However this selinux stuff, context_struct_to_string, is out of their domain.
Do I need to report this to selinux community? Actualy I got several
different callpaths to reach this context_struct_to_string.

Yes we may need to increase min_free_kbytes. But I have an experience where
changing wmark_min from 4MB to 8MB did not work last year. Could you share 
some advice about size?

Thank you
Minchan Kim June 7, 2022, 11:47 p.m. UTC | #5
Hi Jaewon,

On Tue, Jun 07, 2022 at 10:17:02AM +0900, Jaewon Kim wrote:
> > 
> > 
> >--------- Original Message ---------
> >Sender : Minchan Kim <minchan@kernel.org>
> >Date : 2022-06-07 05:48 (GMT+9)
> >Title : Re: [PATCH] zram_drv: add __GFP_NOMEMALLOC not to use ALLOC_NO_WATERMARKS
> > 
> >On Mon, Jun 06, 2022 at 12:59:39PM -0700, Andrew Morton wrote:
> >> On Mon, 6 Jun 2022 12:46:38 -0700 Minchan Kim <minchan@kernel.org> wrote:
> >> 
> >> > On Fri, Jun 03, 2022 at 02:57:47PM +0900, Jaewon Kim wrote:
> >> > > The atomic page allocation failure sometimes happened, and most of them
> >> > > seem to occur during boot time.
> >> > > 
> >> > > <4>[   59.707645] system_server: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=foreground-boost,mems_allowed=0
> >> >
> >> > ...
> >> >
> >> > > 
> >> > > The kswapd or other reclaim contexts may not prepare enough free pages
> >> > > for too many atomic allocations occurred in short time. But zram may not
> >> > > be helpful for this atomic allocation even though zram is used to
> >> > > reclaim.
> >> > > 
> >> > > To get one zs object for a specific size, zram may allocate serveral
> >> > > pages. And this can be happened on different class sizes at the same
> >> > > time. It means zram may consume more pages to reclaim only one page.
> >> > > This inefficiency may consume all free pages below watmerk min by a
> >> > > process having PF_MEMALLOC like kswapd.
> >> > 
> >> > However, that's how zram has worked for a long time(allocate memory
> >> > under memory pressure) and many folks already have raised min_free_kbytes
> >> > when they use zram as swap. If we don't allow the allocation, swap out
> >> > fails easier than old, which would break existing tunes.
> 
> 
> Hello.
> 
> Yes correct. We may need to tune again to swap out as much as we did.
> 
> But on my experiment, there were quite many zram allocations which might
> be failed unless it has the ALLOC_NO_WATERMARKS. I thought the zram
> allocations seem to be easy to affect atomic allocation failure.

I understand your concern but solution here would affect to existing common
users too much.

> 
> >> 
> >> So is there a better way of preventing this warning?  Just suppress it
> >> with __GFP_NOWARN?
> > 
> >For me, I usually tries to remove GFP_ATOMIC alllocation since the
> >atomic allocation can be failed easily(zram is not only source for
> >it). Otherwise, increase min_free_kbytes?
> > 
> 
> I also hope driver developers to handle this atomic allocation failure.
> However this selinux stuff, context_struct_to_string, is out of their domain.
> Do I need to report this to selinux community? Actualy I got several
> different callpaths to reach this context_struct_to_string.

I am not famliar with selinux stuff but if it's common to see the
GFP_ATOMIC failures in the path, I think it should have __GFP_NOWARN
or other solution to allocate memory in advance.
(BTW, I had similar problem before and fixed it with adding __GFP_NOWARN
648f2c6100cf, selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC)

> 
> Yes we may need to increase min_free_kbytes. But I have an experience where
> changing wmark_min from 4MB to 8MB did not work last year. Could you share 
> some advice about size?

I don't think we could have universal golden value for it since every
workload and configuration are different in their system. Maybe,
your zram size is rather big compared to system memory and swappiness
is rather high for boot.
diff mbox series

Patch

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index b8549c61ff2c..39cd1397ed3b 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1383,6 +1383,7 @@  static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 
 	handle = zs_malloc(zram->mem_pool, comp_len,
 			__GFP_KSWAPD_RECLAIM |
+			__GFP_NOMEMALLOC |
 			__GFP_NOWARN |
 			__GFP_HIGHMEM |
 			__GFP_MOVABLE);