Message ID | 20220603055747.11694-1-jaewon31.kim@samsung.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | zram_drv: add __GFP_NOMEMALLOC not to use ALLOC_NO_WATERMARKS | expand |
On Fri, Jun 03, 2022 at 02:57:47PM +0900, Jaewon Kim wrote: > The atomic page allocation failure sometimes happened, and most of them > seem to occur during boot time. > > <4>[ 59.707645] system_server: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=foreground-boost,mems_allowed=0 > <4>[ 59.707676] CPU: 5 PID: 1209 Comm: system_server Tainted: G S O 5.4.161-qgki-24219806-abA236USQU0AVE1 #1 > <4>[ 59.707691] Call trace: > <4>[ 59.707702] dump_backtrace.cfi_jt+0x0/0x4 > <4>[ 59.707712] show_stack+0x18/0x24 > <4>[ 59.707719] dump_stack+0xa4/0xe0 > <4>[ 59.707728] warn_alloc+0x114/0x194 > <4>[ 59.707734] __alloc_pages_slowpath+0x828/0x83c > <4>[ 59.707740] __alloc_pages_nodemask+0x2b4/0x310 > <4>[ 59.707747] alloc_slab_page+0x40/0x5c8 > <4>[ 59.707753] new_slab+0x404/0x420 > <4>[ 59.707759] ___slab_alloc+0x224/0x3b0 > <4>[ 59.707765] __kmalloc+0x37c/0x394 > <4>[ 59.707773] context_struct_to_string+0x110/0x1b8 > <4>[ 59.707778] context_add_hash+0x6c/0xc8 > <4>[ 59.707785] security_compute_sid.llvm.13699573597798246927+0x508/0x5d8 > <4>[ 59.707792] security_transition_sid+0x2c/0x38 > <4>[ 59.707804] selinux_socket_create+0xa0/0xd8 > <4>[ 59.707811] security_socket_create+0x68/0xbc > <4>[ 59.707818] __sock_create+0x8c/0x2f8 > <4>[ 59.707823] __sys_socket+0x94/0x19c > <4>[ 59.707829] __arm64_sys_socket+0x20/0x30 > <4>[ 59.707836] el0_svc_common+0x100/0x1e0 > <4>[ 59.707841] el0_svc_handler+0x68/0x74 > <4>[ 59.707848] el0_svc+0x8/0xc > <4>[ 59.707853] Mem-Info: > <4>[ 59.707890] active_anon:223569 inactive_anon:74412 isolated_anon:0 > <4>[ 59.707890] active_file:51395 inactive_file:176622 isolated_file:0 > <4>[ 59.707890] unevictable:1018 dirty:211 writeback:4 unstable:0 > <4>[ 59.707890] slab_reclaimable:14398 slab_unreclaimable:61909 > <4>[ 59.707890] mapped:134779 shmem:1231 pagetables:26706 bounce:0 > <4>[ 59.707890] free:528 free_pcp:844 free_cma:147 > <4>[ 59.707900] Node 0 active_anon:894276kB inactive_anon:297648kB active_file:205580kB inactive_file:706488kB unevictable:4072kB isolated(anon):0kB isolated(file):0kB mapped:539116kB dirty:844kB writeback:16kB shmem:4924kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no > <4>[ 59.707912] Normal free:2112kB min:7244kB low:68892kB high:72180kB active_anon:893140kB inactive_anon:297660kB active_file:204740kB inactive_file:706396kB unevictable:4072kB writepending:860kB present:3626812kB managed:3288700kB mlocked:4068kB kernel_stack:62416kB shadow_call_stack:15656kB pagetables:106824kB bounce:0kB free_pcp:3372kB local_pcp:176kB free_cma:588kB > <4>[ 59.707915] lowmem_reserve[]: 0 0 > <4>[ 59.707922] Normal: 8*4kB (H) 5*8kB (H) 13*16kB (H) 25*32kB (H) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1080kB > <4>[ 59.707942] 242549 total pagecache pages > <4>[ 59.707951] 12446 pages in swap cache > <4>[ 59.707956] Swap cache stats: add 212408, delete 199969, find 36869/71571 > <4>[ 59.707961] Free swap = 3445756kB > <4>[ 59.707965] Total swap = 4194300kB > <4>[ 59.707969] 906703 pages RAM > <4>[ 59.707973] 0 pages HighMem/MovableOnly > <4>[ 59.707978] 84528 pages reserved > <4>[ 59.707982] 49152 pages cma reserved > > The kswapd or other reclaim contexts may not prepare enough free pages > for too many atomic allocations occurred in short time. But zram may not > be helpful for this atomic allocation even though zram is used to > reclaim. > > To get one zs object for a specific size, zram may allocate serveral > pages. And this can be happened on different class sizes at the same > time. It means zram may consume more pages to reclaim only one page. > This inefficiency may consume all free pages below watmerk min by a > process having PF_MEMALLOC like kswapd. However, that's how zram has worked for a long time(allocate memory under memory pressure) and many folks already have raised min_free_kbytes when they use zram as swap. If we don't allow the allocation, swap out fails easier than old, which would break existing tunes. > > We can avoid this by adding __GFP_NOMEMALLOC. PF_MEMALLOC process won't > use ALLOC_NO_WATERMARKS. > > Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> > --- > drivers/block/zram/zram_drv.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > index b8549c61ff2c..39cd1397ed3b 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -1383,6 +1383,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, > > handle = zs_malloc(zram->mem_pool, comp_len, > __GFP_KSWAPD_RECLAIM | > + __GFP_NOMEMALLOC | > __GFP_NOWARN | > __GFP_HIGHMEM | > __GFP_MOVABLE); > -- > 2.17.1 > >
On Mon, 6 Jun 2022 12:46:38 -0700 Minchan Kim <minchan@kernel.org> wrote: > On Fri, Jun 03, 2022 at 02:57:47PM +0900, Jaewon Kim wrote: > > The atomic page allocation failure sometimes happened, and most of them > > seem to occur during boot time. > > > > <4>[ 59.707645] system_server: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=foreground-boost,mems_allowed=0 > > ... > > > > > The kswapd or other reclaim contexts may not prepare enough free pages > > for too many atomic allocations occurred in short time. But zram may not > > be helpful for this atomic allocation even though zram is used to > > reclaim. > > > > To get one zs object for a specific size, zram may allocate serveral > > pages. And this can be happened on different class sizes at the same > > time. It means zram may consume more pages to reclaim only one page. > > This inefficiency may consume all free pages below watmerk min by a > > process having PF_MEMALLOC like kswapd. > > However, that's how zram has worked for a long time(allocate memory > under memory pressure) and many folks already have raised min_free_kbytes > when they use zram as swap. If we don't allow the allocation, swap out > fails easier than old, which would break existing tunes. So is there a better way of preventing this warning? Just suppress it with __GFP_NOWARN?
On Mon, Jun 06, 2022 at 12:59:39PM -0700, Andrew Morton wrote: > On Mon, 6 Jun 2022 12:46:38 -0700 Minchan Kim <minchan@kernel.org> wrote: > > > On Fri, Jun 03, 2022 at 02:57:47PM +0900, Jaewon Kim wrote: > > > The atomic page allocation failure sometimes happened, and most of them > > > seem to occur during boot time. > > > > > > <4>[ 59.707645] system_server: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=foreground-boost,mems_allowed=0 > > > > ... > > > > > > > > The kswapd or other reclaim contexts may not prepare enough free pages > > > for too many atomic allocations occurred in short time. But zram may not > > > be helpful for this atomic allocation even though zram is used to > > > reclaim. > > > > > > To get one zs object for a specific size, zram may allocate serveral > > > pages. And this can be happened on different class sizes at the same > > > time. It means zram may consume more pages to reclaim only one page. > > > This inefficiency may consume all free pages below watmerk min by a > > > process having PF_MEMALLOC like kswapd. > > > > However, that's how zram has worked for a long time(allocate memory > > under memory pressure) and many folks already have raised min_free_kbytes > > when they use zram as swap. If we don't allow the allocation, swap out > > fails easier than old, which would break existing tunes. > > So is there a better way of preventing this warning? Just suppress it > with __GFP_NOWARN? For me, I usually tries to remove GFP_ATOMIC alllocation since the atomic allocation can be failed easily(zram is not only source for it). Otherwise, increase min_free_kbytes?
> > >--------- Original Message --------- >Sender : Minchan Kim <minchan@kernel.org> >Date : 2022-06-07 05:48 (GMT+9) >Title : Re: [PATCH] zram_drv: add __GFP_NOMEMALLOC not to use ALLOC_NO_WATERMARKS > >On Mon, Jun 06, 2022 at 12:59:39PM -0700, Andrew Morton wrote: >> On Mon, 6 Jun 2022 12:46:38 -0700 Minchan Kim <minchan@kernel.org> wrote: >> >> > On Fri, Jun 03, 2022 at 02:57:47PM +0900, Jaewon Kim wrote: >> > > The atomic page allocation failure sometimes happened, and most of them >> > > seem to occur during boot time. >> > > >> > > <4>[ 59.707645] system_server: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=foreground-boost,mems_allowed=0 >> > >> > ... >> > >> > > >> > > The kswapd or other reclaim contexts may not prepare enough free pages >> > > for too many atomic allocations occurred in short time. But zram may not >> > > be helpful for this atomic allocation even though zram is used to >> > > reclaim. >> > > >> > > To get one zs object for a specific size, zram may allocate serveral >> > > pages. And this can be happened on different class sizes at the same >> > > time. It means zram may consume more pages to reclaim only one page. >> > > This inefficiency may consume all free pages below watmerk min by a >> > > process having PF_MEMALLOC like kswapd. >> > >> > However, that's how zram has worked for a long time(allocate memory >> > under memory pressure) and many folks already have raised min_free_kbytes >> > when they use zram as swap. If we don't allow the allocation, swap out >> > fails easier than old, which would break existing tunes. Hello. Yes correct. We may need to tune again to swap out as much as we did. But on my experiment, there were quite many zram allocations which might be failed unless it has the ALLOC_NO_WATERMARKS. I thought the zram allocations seem to be easy to affect atomic allocation failure. >> >> So is there a better way of preventing this warning? Just suppress it >> with __GFP_NOWARN? > >For me, I usually tries to remove GFP_ATOMIC alllocation since the >atomic allocation can be failed easily(zram is not only source for >it). Otherwise, increase min_free_kbytes? > I also hope driver developers to handle this atomic allocation failure. However this selinux stuff, context_struct_to_string, is out of their domain. Do I need to report this to selinux community? Actualy I got several different callpaths to reach this context_struct_to_string. Yes we may need to increase min_free_kbytes. But I have an experience where changing wmark_min from 4MB to 8MB did not work last year. Could you share some advice about size? Thank you
Hi Jaewon, On Tue, Jun 07, 2022 at 10:17:02AM +0900, Jaewon Kim wrote: > > > > > >--------- Original Message --------- > >Sender : Minchan Kim <minchan@kernel.org> > >Date : 2022-06-07 05:48 (GMT+9) > >Title : Re: [PATCH] zram_drv: add __GFP_NOMEMALLOC not to use ALLOC_NO_WATERMARKS > > > >On Mon, Jun 06, 2022 at 12:59:39PM -0700, Andrew Morton wrote: > >> On Mon, 6 Jun 2022 12:46:38 -0700 Minchan Kim <minchan@kernel.org> wrote: > >> > >> > On Fri, Jun 03, 2022 at 02:57:47PM +0900, Jaewon Kim wrote: > >> > > The atomic page allocation failure sometimes happened, and most of them > >> > > seem to occur during boot time. > >> > > > >> > > <4>[ 59.707645] system_server: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=foreground-boost,mems_allowed=0 > >> > > >> > ... > >> > > >> > > > >> > > The kswapd or other reclaim contexts may not prepare enough free pages > >> > > for too many atomic allocations occurred in short time. But zram may not > >> > > be helpful for this atomic allocation even though zram is used to > >> > > reclaim. > >> > > > >> > > To get one zs object for a specific size, zram may allocate serveral > >> > > pages. And this can be happened on different class sizes at the same > >> > > time. It means zram may consume more pages to reclaim only one page. > >> > > This inefficiency may consume all free pages below watmerk min by a > >> > > process having PF_MEMALLOC like kswapd. > >> > > >> > However, that's how zram has worked for a long time(allocate memory > >> > under memory pressure) and many folks already have raised min_free_kbytes > >> > when they use zram as swap. If we don't allow the allocation, swap out > >> > fails easier than old, which would break existing tunes. > > > Hello. > > Yes correct. We may need to tune again to swap out as much as we did. > > But on my experiment, there were quite many zram allocations which might > be failed unless it has the ALLOC_NO_WATERMARKS. I thought the zram > allocations seem to be easy to affect atomic allocation failure. I understand your concern but solution here would affect to existing common users too much. > > >> > >> So is there a better way of preventing this warning? Just suppress it > >> with __GFP_NOWARN? > > > >For me, I usually tries to remove GFP_ATOMIC alllocation since the > >atomic allocation can be failed easily(zram is not only source for > >it). Otherwise, increase min_free_kbytes? > > > > I also hope driver developers to handle this atomic allocation failure. > However this selinux stuff, context_struct_to_string, is out of their domain. > Do I need to report this to selinux community? Actualy I got several > different callpaths to reach this context_struct_to_string. I am not famliar with selinux stuff but if it's common to see the GFP_ATOMIC failures in the path, I think it should have __GFP_NOWARN or other solution to allocate memory in advance. (BTW, I had similar problem before and fixed it with adding __GFP_NOWARN 648f2c6100cf, selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC) > > Yes we may need to increase min_free_kbytes. But I have an experience where > changing wmark_min from 4MB to 8MB did not work last year. Could you share > some advice about size? I don't think we could have universal golden value for it since every workload and configuration are different in their system. Maybe, your zram size is rather big compared to system memory and swappiness is rather high for boot.
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index b8549c61ff2c..39cd1397ed3b 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1383,6 +1383,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, handle = zs_malloc(zram->mem_pool, comp_len, __GFP_KSWAPD_RECLAIM | + __GFP_NOMEMALLOC | __GFP_NOWARN | __GFP_HIGHMEM | __GFP_MOVABLE);
The atomic page allocation failure sometimes happened, and most of them seem to occur during boot time. <4>[ 59.707645] system_server: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=foreground-boost,mems_allowed=0 <4>[ 59.707676] CPU: 5 PID: 1209 Comm: system_server Tainted: G S O 5.4.161-qgki-24219806-abA236USQU0AVE1 #1 <4>[ 59.707691] Call trace: <4>[ 59.707702] dump_backtrace.cfi_jt+0x0/0x4 <4>[ 59.707712] show_stack+0x18/0x24 <4>[ 59.707719] dump_stack+0xa4/0xe0 <4>[ 59.707728] warn_alloc+0x114/0x194 <4>[ 59.707734] __alloc_pages_slowpath+0x828/0x83c <4>[ 59.707740] __alloc_pages_nodemask+0x2b4/0x310 <4>[ 59.707747] alloc_slab_page+0x40/0x5c8 <4>[ 59.707753] new_slab+0x404/0x420 <4>[ 59.707759] ___slab_alloc+0x224/0x3b0 <4>[ 59.707765] __kmalloc+0x37c/0x394 <4>[ 59.707773] context_struct_to_string+0x110/0x1b8 <4>[ 59.707778] context_add_hash+0x6c/0xc8 <4>[ 59.707785] security_compute_sid.llvm.13699573597798246927+0x508/0x5d8 <4>[ 59.707792] security_transition_sid+0x2c/0x38 <4>[ 59.707804] selinux_socket_create+0xa0/0xd8 <4>[ 59.707811] security_socket_create+0x68/0xbc <4>[ 59.707818] __sock_create+0x8c/0x2f8 <4>[ 59.707823] __sys_socket+0x94/0x19c <4>[ 59.707829] __arm64_sys_socket+0x20/0x30 <4>[ 59.707836] el0_svc_common+0x100/0x1e0 <4>[ 59.707841] el0_svc_handler+0x68/0x74 <4>[ 59.707848] el0_svc+0x8/0xc <4>[ 59.707853] Mem-Info: <4>[ 59.707890] active_anon:223569 inactive_anon:74412 isolated_anon:0 <4>[ 59.707890] active_file:51395 inactive_file:176622 isolated_file:0 <4>[ 59.707890] unevictable:1018 dirty:211 writeback:4 unstable:0 <4>[ 59.707890] slab_reclaimable:14398 slab_unreclaimable:61909 <4>[ 59.707890] mapped:134779 shmem:1231 pagetables:26706 bounce:0 <4>[ 59.707890] free:528 free_pcp:844 free_cma:147 <4>[ 59.707900] Node 0 active_anon:894276kB inactive_anon:297648kB active_file:205580kB inactive_file:706488kB unevictable:4072kB isolated(anon):0kB isolated(file):0kB mapped:539116kB dirty:844kB writeback:16kB shmem:4924kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no <4>[ 59.707912] Normal free:2112kB min:7244kB low:68892kB high:72180kB active_anon:893140kB inactive_anon:297660kB active_file:204740kB inactive_file:706396kB unevictable:4072kB writepending:860kB present:3626812kB managed:3288700kB mlocked:4068kB kernel_stack:62416kB shadow_call_stack:15656kB pagetables:106824kB bounce:0kB free_pcp:3372kB local_pcp:176kB free_cma:588kB <4>[ 59.707915] lowmem_reserve[]: 0 0 <4>[ 59.707922] Normal: 8*4kB (H) 5*8kB (H) 13*16kB (H) 25*32kB (H) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1080kB <4>[ 59.707942] 242549 total pagecache pages <4>[ 59.707951] 12446 pages in swap cache <4>[ 59.707956] Swap cache stats: add 212408, delete 199969, find 36869/71571 <4>[ 59.707961] Free swap = 3445756kB <4>[ 59.707965] Total swap = 4194300kB <4>[ 59.707969] 906703 pages RAM <4>[ 59.707973] 0 pages HighMem/MovableOnly <4>[ 59.707978] 84528 pages reserved <4>[ 59.707982] 49152 pages cma reserved The kswapd or other reclaim contexts may not prepare enough free pages for too many atomic allocations occurred in short time. But zram may not be helpful for this atomic allocation even though zram is used to reclaim. To get one zs object for a specific size, zram may allocate serveral pages. And this can be happened on different class sizes at the same time. It means zram may consume more pages to reclaim only one page. This inefficiency may consume all free pages below watmerk min by a process having PF_MEMALLOC like kswapd. We can avoid this by adding __GFP_NOMEMALLOC. PF_MEMALLOC process won't use ALLOC_NO_WATERMARKS. Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> --- drivers/block/zram/zram_drv.c | 1 + 1 file changed, 1 insertion(+)