Message ID | 20210705170537.43060-1-urezki@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2,1/2] mm/vmalloc: Use batched page requests in bulk-allocator | expand |
On Mon, 5 Jul 2021 19:05:36 +0200 "Uladzislau Rezki (Sony)" <urezki@gmail.com> wrote: > In case of simultaneous vmalloc allocations, for example it is 1GB and > 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT > kernel. > > <snip> > ... > > are obtained, i.e. do batched page requests adding cond_resched() meanwhile > to reschedule. Batched value is hard-coded and is 100 pages per call. > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Can we please have a Fixes: for this? Is this fix important enough for 4.14-rcx? I think so... > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2785,10 +2785,32 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > * to fails, fallback to a single page allocator that is > * more permissive. > */ > - if (!order) > - nr_allocated = alloc_pages_bulk_array_node( > - gfp, nid, nr_pages, pages); > - else > + if (!order) { > + while (nr_allocated < nr_pages) { > + int nr, nr_pages_request; > + > + /* > + * A maximum allowed request is hard-coded and is 100 > + * pages per call. That is done in order to prevent a > + * long preemption off scenario in the bulk-allocator > + * so the range is [1:100]. > + */ > + nr_pages_request = min(100, (int)(nr_pages - nr_allocated)); Yes, they types are all over the place. nr_pages: unsigned long nr_allocated: unsigned int nr, nr_pages_request: int Can we please choose the most appropriate type and use that consistently?
On Tue 06-07-21 13:26:53, Andrew Morton wrote: > On Mon, 5 Jul 2021 19:05:36 +0200 "Uladzislau Rezki (Sony)" <urezki@gmail.com> wrote: > > > In case of simultaneous vmalloc allocations, for example it is 1GB and > > 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT > > kernel. > > > > <snip> > > ... > > > > are obtained, i.e. do batched page requests adding cond_resched() meanwhile > > to reschedule. Batched value is hard-coded and is 100 pages per call. > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > Can we please have a Fixes: for this? Is this a fix for any actual real life problem? I mean allocating 1GB of vmalloc space back and forth sounds like a stretch to me. > Is this fix important enough for 4.14-rcx? I think so... I do not think so. This is an improvement so that vmalloc behaves more sanely for those abusers...
On Mon 05-07-21 19:05:36, Uladzislau Rezki (Sony) wrote: > In case of simultaneous vmalloc allocations, for example it is 1GB and > 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT > kernel. > > <snip> > [ 62.512621] RIP: 0010:__alloc_pages_bulk+0xa9f/0xbb0 > [ 62.512628] Code: ff 8b 44 24 48 44 29 f8 83 f8 01 0f 84 ea fe ff ff e9 07 f6 ff ff 48 8b 44 24 60 48 89 28 e9 00 f9 ff ff fb 66 0f 1f 44 00 00 <e9> e8 fd ff ff 65 48 01 51 10 e9 3e fe ff ff 48 8b 44 24 78 4d 89 > [ 62.512629] RSP: 0018:ffffa7bfc29ffd20 EFLAGS: 00000206 > [ 62.512631] RAX: 0000000000000200 RBX: ffffcd5405421888 RCX: ffff8c36ffdeb928 > [ 62.512632] RDX: 0000000000040000 RSI: ffffa896f06b2ff8 RDI: ffffcd5405421880 > [ 62.512633] RBP: ffffcd5405421880 R08: 000000000000007d R09: ffffffffffffffff > [ 62.512634] R10: ffffffff9d63c084 R11: 00000000ffffffff R12: ffff8c373ffaeb80 > [ 62.512635] R13: ffff8c36ffdf65f8 R14: ffff8c373ffaeb80 R15: 0000000000040000 > [ 62.512637] FS: 0000000000000000(0000) GS:ffff8c36ffdc0000(0000) knlGS:0000000000000000 > [ 62.512638] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 62.512639] CR2: 000055c8e2fe8610 CR3: 0000000c13e10000 CR4: 00000000000006e0 > [ 62.512641] Call Trace: > [ 62.512646] __vmalloc_node_range+0x11c/0x2d0 > [ 62.512649] ? full_fit_alloc_test+0x140/0x140 [test_vmalloc] > [ 62.512654] __vmalloc_node+0x4b/0x70 > [ 62.512656] ? fix_size_alloc_test+0x44/0x60 [test_vmalloc] > [ 62.512659] fix_size_alloc_test+0x44/0x60 [test_vmalloc] > [ 62.512662] test_func+0xe7/0x1f0 [test_vmalloc] > [ 62.512666] ? fix_align_alloc_test+0x50/0x50 [test_vmalloc] > [ 62.512668] kthread+0x11a/0x140 > [ 62.512671] ? set_kthread_struct+0x40/0x40 > [ 62.512672] ret_from_fork+0x22/0x30 > <snip> > > To address this issue invoke a bulk-allocator many times until all pages > are obtained, i.e. do batched page requests adding cond_resched() meanwhile > to reschedule. Batched value is hard-coded and is 100 pages per call. Yes, this makes perfect sense to me. I would just be more explicit that this is an artificially created problem likely not being a problem at the moment but why not to prepare for a future. > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > --- > mm/vmalloc.c | 30 ++++++++++++++++++++++++++---- > 1 file changed, 26 insertions(+), 4 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index aaad569e8963..5297958ac7c5 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2785,10 +2785,32 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > * to fails, fallback to a single page allocator that is > * more permissive. > */ > - if (!order) > - nr_allocated = alloc_pages_bulk_array_node( > - gfp, nid, nr_pages, pages); > - else > + if (!order) { > + while (nr_allocated < nr_pages) { > + int nr, nr_pages_request; > + > + /* > + * A maximum allowed request is hard-coded and is 100 > + * pages per call. That is done in order to prevent a > + * long preemption off scenario in the bulk-allocator > + * so the range is [1:100]. > + */ > + nr_pages_request = min(100, (int)(nr_pages - nr_allocated)); > + > + nr = alloc_pages_bulk_array_node(gfp, nid, > + nr_pages_request, pages + nr_allocated); > + > + nr_allocated += nr; > + cond_resched(); > + > + /* > + * If zero or pages were obtained partly, > + * fallback to a single page allocator. > + */ > + if (nr != nr_pages_request) > + break; > + } > + } else > /* > * Compound pages required for remap_vmalloc_page if > * high-order pages. > -- > 2.20.1
On Wed, Jul 07, 2021 at 10:42:29AM +0200, Michal Hocko wrote: > On Tue 06-07-21 13:26:53, Andrew Morton wrote: > > On Mon, 5 Jul 2021 19:05:36 +0200 "Uladzislau Rezki (Sony)" <urezki@gmail.com> wrote: > > > > > In case of simultaneous vmalloc allocations, for example it is 1GB and > > > 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT > > > kernel. > > > > > > <snip> > > > ... > > > > > > are obtained, i.e. do batched page requests adding cond_resched() meanwhile > > > to reschedule. Batched value is hard-coded and is 100 pages per call. > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > > Can we please have a Fixes: for this? > > Is this a fix for any actual real life problem? I mean allocating 1GB of > vmalloc space back and forth sounds like a stretch to me. > It is not a real scenario. I simulated it by the stress-suite tests. So the Fixes tag is not needed, IMHO. > > Is this fix important enough for 4.14-rcx? I think so... > > I do not think so. This is an improvement so that vmalloc behaves more > sanely for those abusers... > A bulk-allocator has recently been introduced, so 4.x does not have it, i.e. this change is not applicable and 4.x kernel does not suffer from it. -- Vlad Rezki
On Tue, Jul 06, 2021 at 01:26:53PM -0700, Andrew Morton wrote: > On Mon, 5 Jul 2021 19:05:36 +0200 "Uladzislau Rezki (Sony)" <urezki@gmail.com> wrote: > > > In case of simultaneous vmalloc allocations, for example it is 1GB and > > 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT > > kernel. > > > > <snip> > > ... > > > > are obtained, i.e. do batched page requests adding cond_resched() meanwhile > > to reschedule. Batched value is hard-coded and is 100 pages per call. > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > Can we please have a Fixes: for this? > > Is this fix important enough for 4.14-rcx? I think so... > > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -2785,10 +2785,32 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > > * to fails, fallback to a single page allocator that is > > * more permissive. > > */ > > - if (!order) > > - nr_allocated = alloc_pages_bulk_array_node( > > - gfp, nid, nr_pages, pages); > > - else > > + if (!order) { > > + while (nr_allocated < nr_pages) { > > + int nr, nr_pages_request; > > + > > + /* > > + * A maximum allowed request is hard-coded and is 100 > > + * pages per call. That is done in order to prevent a > > + * long preemption off scenario in the bulk-allocator > > + * so the range is [1:100]. > > + */ > > + nr_pages_request = min(100, (int)(nr_pages - nr_allocated)); > > Yes, they types are all over the place. > > nr_pages: unsigned long > nr_allocated: unsigned int > nr, nr_pages_request: int > > Can we please choose the most appropriate type and use that > consistently? > Let me think over it to see what i can do. -- Vlad Rezki
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index aaad569e8963..5297958ac7c5 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2785,10 +2785,32 @@ vm_area_alloc_pages(gfp_t gfp, int nid, * to fails, fallback to a single page allocator that is * more permissive. */ - if (!order) - nr_allocated = alloc_pages_bulk_array_node( - gfp, nid, nr_pages, pages); - else + if (!order) { + while (nr_allocated < nr_pages) { + int nr, nr_pages_request; + + /* + * A maximum allowed request is hard-coded and is 100 + * pages per call. That is done in order to prevent a + * long preemption off scenario in the bulk-allocator + * so the range is [1:100]. + */ + nr_pages_request = min(100, (int)(nr_pages - nr_allocated)); + + nr = alloc_pages_bulk_array_node(gfp, nid, + nr_pages_request, pages + nr_allocated); + + nr_allocated += nr; + cond_resched(); + + /* + * If zero or pages were obtained partly, + * fallback to a single page allocator. + */ + if (nr != nr_pages_request) + break; + } + } else /* * Compound pages required for remap_vmalloc_page if * high-order pages.
In case of simultaneous vmalloc allocations, for example it is 1GB and 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT kernel. <snip> [ 62.512621] RIP: 0010:__alloc_pages_bulk+0xa9f/0xbb0 [ 62.512628] Code: ff 8b 44 24 48 44 29 f8 83 f8 01 0f 84 ea fe ff ff e9 07 f6 ff ff 48 8b 44 24 60 48 89 28 e9 00 f9 ff ff fb 66 0f 1f 44 00 00 <e9> e8 fd ff ff 65 48 01 51 10 e9 3e fe ff ff 48 8b 44 24 78 4d 89 [ 62.512629] RSP: 0018:ffffa7bfc29ffd20 EFLAGS: 00000206 [ 62.512631] RAX: 0000000000000200 RBX: ffffcd5405421888 RCX: ffff8c36ffdeb928 [ 62.512632] RDX: 0000000000040000 RSI: ffffa896f06b2ff8 RDI: ffffcd5405421880 [ 62.512633] RBP: ffffcd5405421880 R08: 000000000000007d R09: ffffffffffffffff [ 62.512634] R10: ffffffff9d63c084 R11: 00000000ffffffff R12: ffff8c373ffaeb80 [ 62.512635] R13: ffff8c36ffdf65f8 R14: ffff8c373ffaeb80 R15: 0000000000040000 [ 62.512637] FS: 0000000000000000(0000) GS:ffff8c36ffdc0000(0000) knlGS:0000000000000000 [ 62.512638] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 62.512639] CR2: 000055c8e2fe8610 CR3: 0000000c13e10000 CR4: 00000000000006e0 [ 62.512641] Call Trace: [ 62.512646] __vmalloc_node_range+0x11c/0x2d0 [ 62.512649] ? full_fit_alloc_test+0x140/0x140 [test_vmalloc] [ 62.512654] __vmalloc_node+0x4b/0x70 [ 62.512656] ? fix_size_alloc_test+0x44/0x60 [test_vmalloc] [ 62.512659] fix_size_alloc_test+0x44/0x60 [test_vmalloc] [ 62.512662] test_func+0xe7/0x1f0 [test_vmalloc] [ 62.512666] ? fix_align_alloc_test+0x50/0x50 [test_vmalloc] [ 62.512668] kthread+0x11a/0x140 [ 62.512671] ? set_kthread_struct+0x40/0x40 [ 62.512672] ret_from_fork+0x22/0x30 <snip> To address this issue invoke a bulk-allocator many times until all pages are obtained, i.e. do batched page requests adding cond_resched() meanwhile to reschedule. Batched value is hard-coded and is 100 pages per call. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> --- mm/vmalloc.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-)