Message ID | 20191003090906.1261-1-dwagner@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: vmalloc: Use the vmap_area_lock to protect ne_fit_preload_node | expand |
On Thu, Oct 03, 2019 at 11:09:06AM +0200, Daniel Wagner wrote: > Replace preempt_enable() and preempt_disable() with the vmap_area_lock > spin_lock instead. Calling spin_lock() with preempt disabled is > illegal for -rt. Furthermore, enabling preemption inside the > spin_lock() doesn't really make sense. > > Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for > split purpose") > Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> > Signed-off-by: Daniel Wagner <dwagner@suse.de> > --- > mm/vmalloc.c | 9 +++------ > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 08c134aa7ff3..0d1175673583 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1091,11 +1091,11 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, > * Even if it fails we do not really care about that. Just proceed > * as it is. "overflow" path will refill the cache we allocate from. > */ > - preempt_disable(); > + spin_lock(&vmap_area_lock); > if (!__this_cpu_read(ne_fit_preload_node)) { > - preempt_enable(); > + spin_unlock(&vmap_area_lock); > pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node); > - preempt_disable(); > + spin_lock(&vmap_area_lock); > > if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) { > if (pva) > @@ -1103,9 +1103,6 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, > } > } > > - spin_lock(&vmap_area_lock); > - preempt_enable(); > - > /* > * If an allocation fails, the "vend" address is > * returned. Therefore trigger the overflow path. > -- > 2.16.4 > Some background. The idea was to avoid of touching several times vmap_area_lock, therefore there are preempt_disable()/preempt_enable() instead, in order to stay on the same CPU. When it comes to PREEMPT_RT it is a problem, so Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> -- Vlad Rezki
If you post something that is related to PREEMPT_RT please keep tglx and me in Cc. On 2019-10-03 11:09:06 [+0200], Daniel Wagner wrote: > Replace preempt_enable() and preempt_disable() with the vmap_area_lock > spin_lock instead. Calling spin_lock() with preempt disabled is > illegal for -rt. Furthermore, enabling preemption inside the > spin_lock() doesn't really make sense. This spin_lock will cause all CPUs to block on it while the preempt_disable() does not have this limitation. I added a migrate_disable() in my -next tree. Looking at it again, I have reasonable doubt that this preempt_disable() is needed. > Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for > split purpose") > Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> > Signed-off-by: Daniel Wagner <dwagner@suse.de> > --- > mm/vmalloc.c | 9 +++------ > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 08c134aa7ff3..0d1175673583 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1091,11 +1091,11 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, > * Even if it fails we do not really care about that. Just proceed > * as it is. "overflow" path will refill the cache we allocate from. > */ > - preempt_disable(); > + spin_lock(&vmap_area_lock); > if (!__this_cpu_read(ne_fit_preload_node)) { > - preempt_enable(); > + spin_unlock(&vmap_area_lock); > pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node); > - preempt_disable(); > + spin_lock(&vmap_area_lock); > > if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) { > if (pva) > @@ -1103,9 +1103,6 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, > } > } > > - spin_lock(&vmap_area_lock); > - preempt_enable(); > - > /* > * If an allocation fails, the "vend" address is > * returned. Therefore trigger the overflow path. > -- Sebastian
On Fri, Oct 04, 2019 at 05:37:28PM +0200, Sebastian Andrzej Siewior wrote: > If you post something that is related to PREEMPT_RT please keep tglx and > me in Cc. > > On 2019-10-03 11:09:06 [+0200], Daniel Wagner wrote: > > Replace preempt_enable() and preempt_disable() with the vmap_area_lock > > spin_lock instead. Calling spin_lock() with preempt disabled is > > illegal for -rt. Furthermore, enabling preemption inside the > > Looking at it again, I have reasonable doubt that this > preempt_disable() is needed. > The intention was to preload a current CPU with one extra object in non-atomic context, thus to use GFP_KERNEL permissive parameters. I.e. that allows us to avoid any allocation(if we stay on the same CPU) when we are in atomic context do splitting. If we have migrate_disable/enable, then, i think preempt_enable/disable should be replaced by it and not the way how it has been proposed in the patch. -- Vlad Rezki
On 2019-10-04 18:20:41 [+0200], Uladzislau Rezki wrote: > On Fri, Oct 04, 2019 at 05:37:28PM +0200, Sebastian Andrzej Siewior wrote: > > If you post something that is related to PREEMPT_RT please keep tglx and > > me in Cc. > > > > On 2019-10-03 11:09:06 [+0200], Daniel Wagner wrote: > > > Replace preempt_enable() and preempt_disable() with the vmap_area_lock > > > spin_lock instead. Calling spin_lock() with preempt disabled is > > > illegal for -rt. Furthermore, enabling preemption inside the > > > > Looking at it again, I have reasonable doubt that this > > preempt_disable() is needed. > > > The intention was to preload a current CPU with one extra object in > non-atomic context, thus to use GFP_KERNEL permissive parameters. I.e. > that allows us to avoid any allocation(if we stay on the same CPU) > when we are in atomic context do splitting. You could have been migrated to another CPU before the first preempt_disable(). You could have been migrated to another CPU while memory has been allocated. I don't really see the point of that preempt_disable() besides keeping debug code quiet. > If we have migrate_disable/enable, then, i think preempt_enable/disable > should be replaced by it and not the way how it has been proposed > in the patch. I don't think this patch is appropriate for upstream. Sebastian
> > You could have been migrated to another CPU while > memory has been allocated. > That is true that we can migrate since we allow preemption when allocate. But it does not really matter on which CPU an allocation occurs and whether we migrate or not. If we land on another CPU or still stay on the same, we will check anyway one more time if it(another/same CPU) is preloaded or not: <snip> preempt_disable(); if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva) <snip> if another, we can free the object allocated on previous step if it already has it. If another CPU does not have it, save it in ne_fit_preload_node for another current CPU to reuse later. Further we can not migrate because of: <snip> spin_lock(&vmap_area_lock); preempt_enable(); <snip> -- Vlad Rezki
On 2019-10-04 19:04:11 [+0200], Uladzislau Rezki wrote: > if another, we can free the object allocated on previous step if > it already has it. If another CPU does not have it, save it in > ne_fit_preload_node for another current CPU to reuse later. Further > we can not migrate because of: > > <snip> > spin_lock(&vmap_area_lock); > preempt_enable(); > <snip> ah right. So you keep the lock later on, I somehow missed that part. That way it actually makes sense. Sebastian
On Fri, Oct 04, 2019 at 05:37:28PM +0200, Sebastian Andrzej Siewior wrote: > If you post something that is related to PREEMPT_RT please keep tglx and > me in Cc. Sure, just forgot to add it this time. My email setup needs a bit more love. Sorry.
Hi, On Fri, Oct 04, 2019 at 06:30:42PM +0200, Sebastian Andrzej Siewior wrote: > On 2019-10-04 18:20:41 [+0200], Uladzislau Rezki wrote: > > If we have migrate_disable/enable, then, i think preempt_enable/disable > > should be replaced by it and not the way how it has been proposed > > in the patch. > > I don't think this patch is appropriate for upstream. Yes, I agree. The discussion made this clear, this is only for -rt trees. Initially I though this should be in mainline too. Thanks, Daniel
On 2019-10-07 10:30:37 [+0200], Daniel Wagner wrote: > Hi, Hi Daniel, > On Fri, Oct 04, 2019 at 06:30:42PM +0200, Sebastian Andrzej Siewior wrote: > > On 2019-10-04 18:20:41 [+0200], Uladzislau Rezki wrote: > > > If we have migrate_disable/enable, then, i think preempt_enable/disable > > > should be replaced by it and not the way how it has been proposed > > > in the patch. > > > > I don't think this patch is appropriate for upstream. > > Yes, I agree. The discussion made this clear, this is only for -rt > trees. Initially I though this should be in mainline too. Sorry, this was _before_ Uladzislau pointed out that you *just* moved the lock that was there from the beginning. I missed that while looking over the patch. Based on that I don't think that this patch is not appropriate for upstream. > Thanks, > Daniel Sebastian
Hello, Daniel, Sebastian. > > On Fri, Oct 04, 2019 at 06:30:42PM +0200, Sebastian Andrzej Siewior wrote: > > > On 2019-10-04 18:20:41 [+0200], Uladzislau Rezki wrote: > > > > If we have migrate_disable/enable, then, i think preempt_enable/disable > > > > should be replaced by it and not the way how it has been proposed > > > > in the patch. > > > > > > I don't think this patch is appropriate for upstream. > > > > Yes, I agree. The discussion made this clear, this is only for -rt > > trees. Initially I though this should be in mainline too. > > Sorry, this was _before_ Uladzislau pointed out that you *just* moved > the lock that was there from the beginning. I missed that while looking > over the patch. Based on that I don't think that this patch is not > appropriate for upstream. > Yes that is a bit messy :) Then i do not see what that patch fixes in mainline? Instead it will just add an extra blocking, i did not want that therefore used preempt_enable/disable. But, when i saw this patch i got it as a preparation of PREEMPT_RT merging work. -- Vlad Rezki
On Mon, Oct 07, 2019 at 06:23:30PM +0200, Uladzislau Rezki wrote: > Hello, Daniel, Sebastian. > > > > On Fri, Oct 04, 2019 at 06:30:42PM +0200, Sebastian Andrzej Siewior wrote: > > > > On 2019-10-04 18:20:41 [+0200], Uladzislau Rezki wrote: > > > > > If we have migrate_disable/enable, then, i think preempt_enable/disable > > > > > should be replaced by it and not the way how it has been proposed > > > > > in the patch. > > > > > > > > I don't think this patch is appropriate for upstream. > > > > > > Yes, I agree. The discussion made this clear, this is only for -rt > > > trees. Initially I though this should be in mainline too. > > > > Sorry, this was _before_ Uladzislau pointed out that you *just* moved > > the lock that was there from the beginning. I missed that while looking > > over the patch. Based on that I don't think that this patch is not > > appropriate for upstream. > > > Yes that is a bit messy :) Then i do not see what that patch fixes in > mainline? Instead it will just add an extra blocking, i did not want that > therefore used preempt_enable/disable. But, when i saw this patch i got it > as a preparation of PREEMPT_RT merging work. Maybe I should add some background info here as well. Currently, I am creating an -rt tree on v5.3 for which I need this patch (or a migrate_disable() version of it). So this is slightly independent of the work Sebiastian is doing. Though the mainline effort of PREEMPT_RT will hit this problem as well. I understood Sebiastian wrong above. I thought he suggest to use the migrate_disable() approach even for mainline. I supppose, one thing which would help in this discussion, is what do you gain by using preempt_disable() instead of moving the lock up? Do you have performance numbers which could justify the code?
On Mon, Oct 07, 2019 at 06:34:43PM +0200, Daniel Wagner wrote: > On Mon, Oct 07, 2019 at 06:23:30PM +0200, Uladzislau Rezki wrote: > > Hello, Daniel, Sebastian. > > > > > > On Fri, Oct 04, 2019 at 06:30:42PM +0200, Sebastian Andrzej Siewior wrote: > > > > > On 2019-10-04 18:20:41 [+0200], Uladzislau Rezki wrote: > > > > > > If we have migrate_disable/enable, then, i think preempt_enable/disable > > > > > > should be replaced by it and not the way how it has been proposed > > > > > > in the patch. > > > > > > > > > > I don't think this patch is appropriate for upstream. > > > > > > > > Yes, I agree. The discussion made this clear, this is only for -rt > > > > trees. Initially I though this should be in mainline too. > > > > > > Sorry, this was _before_ Uladzislau pointed out that you *just* moved > > > the lock that was there from the beginning. I missed that while looking > > > over the patch. Based on that I don't think that this patch is not > > > appropriate for upstream. > > > > > Yes that is a bit messy :) Then i do not see what that patch fixes in > > mainline? Instead it will just add an extra blocking, i did not want that > > therefore used preempt_enable/disable. But, when i saw this patch i got it > > as a preparation of PREEMPT_RT merging work. > > Maybe I should add some background info here as well. Currently, I am > creating an -rt tree on v5.3 for which I need this patch (or a > migrate_disable() version of it). So this is slightly independent of > the work Sebiastian is doing. Though the mainline effort of PREEMPT_RT > will hit this problem as well. > > I understood Sebiastian wrong above. I thought he suggest to use the > migrate_disable() approach even for mainline. > > I supppose, one thing which would help in this discussion, is what do > you gain by using preempt_disable() instead of moving the lock up? > Do you have performance numbers which could justify the code? > Actually there is a high lock contention on vmap_area_lock, because it is still global. You can have a look at last slide: https://linuxplumbersconf.org/event/4/contributions/547/attachments/287/479/Reworking_of_KVA_allocator_in_Linux_kernel.pdf so this change will make it a bit higher. From the other hand i agree that for rt it should be fixed, probably it could be done like: ifdef PREEMPT_RT migrate_disable() #else preempt_disable() ... but i am not sure it is good either. -- Vlad Rezki
On Mon, Oct 07, 2019 at 06:56:11PM +0200, Uladzislau Rezki wrote: > On Mon, Oct 07, 2019 at 06:34:43PM +0200, Daniel Wagner wrote: > > I supppose, one thing which would help in this discussion, is what do > > you gain by using preempt_disable() instead of moving the lock up? > > Do you have performance numbers which could justify the code? > > > Actually there is a high lock contention on vmap_area_lock, because it > is still global. You can have a look at last slide: > > https://linuxplumbersconf.org/event/4/contributions/547/attachments/287/479/Reworking_of_KVA_allocator_in_Linux_kernel.pdf > > so this change will make it a bit higher. Thanks! I suspected something like this :( On the todo-list page you stating that vmap_area_lock could be splitted and therefore reduce the contention. If you could avoid those preempt_disable() tricks and just use plain spin_locks() to protect it would be really helpful. > From the other hand i agree > that for rt it should be fixed, probably it could be done like: > > ifdef PREEMPT_RT > migrate_disable() > #else > preempt_disable() > ... > > but i am not sure it is good either. I don't think this way to go. I guess Sebastian and Thomas have a better idea how to address this for PREEMPT_RT.
On 2019-10-07 18:56:11 [+0200], Uladzislau Rezki wrote: > Actually there is a high lock contention on vmap_area_lock, because it > is still global. You can have a look at last slide: > > https://linuxplumbersconf.org/event/4/contributions/547/attachments/287/479/Reworking_of_KVA_allocator_in_Linux_kernel.pdf > > so this change will make it a bit higher. From the other hand i agree > that for rt it should be fixed, probably it could be done like: > > ifdef PREEMPT_RT > migrate_disable() > #else > preempt_disable() > ... > > but i am not sure it is good either. What is to be expected on average? Is the lock acquired and then released again because the slot is empty and memory needs to be allocated or can it be assumed that this hardly happens? Sebastian
On Mon, Oct 07, 2019 at 07:36:44PM +0200, Sebastian Andrzej Siewior wrote: > On 2019-10-07 18:56:11 [+0200], Uladzislau Rezki wrote: > > Actually there is a high lock contention on vmap_area_lock, because it > > is still global. You can have a look at last slide: > > > > https://linuxplumbersconf.org/event/4/contributions/547/attachments/287/479/Reworking_of_KVA_allocator_in_Linux_kernel.pdf > > > > so this change will make it a bit higher. From the other hand i agree > > that for rt it should be fixed, probably it could be done like: > > > > ifdef PREEMPT_RT > > migrate_disable() > > #else > > preempt_disable() > > ... > > > > but i am not sure it is good either. > > What is to be expected on average? Is the lock acquired and then > released again because the slot is empty and memory needs to be > allocated or can it be assumed that this hardly happens? > The lock is not released(we are not allowed), instead we just try to allocate with GFP_NOWAIT flag. It can happen if preallocation has been failed with GFP_KERNEL flag earlier: <snip> ... } else if (type == NE_FIT_TYPE) { /* * Split no edge of fit VA. * * | | * L V NVA V R * |---|-------|---| */ lva = __this_cpu_xchg(ne_fit_preload_node, NULL); if (unlikely(!lva)) { ... lva = kmem_cache_alloc(vmap_area_cachep, GFP_NOWAIT); ... } ... <snip> How often we need an extra object for split purpose, the answer is it depends on. For example fork() path falls to that pattern. I think we can assume that migration can hardly ever happen and that should be considered as rare case. Thus we can do a prealoading without worrying much if a it occurs: <snip> urezki@pc636:~/data/ssd/coding/linux-stable$ git diff diff --git a/mm/vmalloc.c b/mm/vmalloc.c index e92ff5f7dd8b..bc782edcd1fd 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1089,20 +1089,16 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, * Even if it fails we do not really care about that. Just proceed * as it is. "overflow" path will refill the cache we allocate from. */ - preempt_disable(); - if (!__this_cpu_read(ne_fit_preload_node)) { - preempt_enable(); + if (!this_cpu_read(ne_fit_preload_node)) { pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node); - preempt_disable(); - if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) { + if (this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) { if (pva) kmem_cache_free(vmap_area_cachep, pva); } } spin_lock(&vmap_area_lock); - preempt_enable(); /* * If an allocation fails, the "vend" address is urezki@pc636:~/data/ssd/coding/linux-stable$ <snip> so, we do not guarantee, instead we minimize number of allocations with GFP_NOWAIT flag. For example on my 4xCPUs i am not able to even trigger the case when CPU is not preloaded. I can test it tomorrow on my 12xCPUs to see its behavior there. -- Vlad Rezki
On Mon, Oct 07, 2019 at 11:44:20PM +0200, Uladzislau Rezki wrote: > On Mon, Oct 07, 2019 at 07:36:44PM +0200, Sebastian Andrzej Siewior wrote: > > On 2019-10-07 18:56:11 [+0200], Uladzislau Rezki wrote: > > > Actually there is a high lock contention on vmap_area_lock, because it > > > is still global. You can have a look at last slide: > > > > > > https://linuxplumbersconf.org/event/4/contributions/547/attachments/287/479/Reworking_of_KVA_allocator_in_Linux_kernel.pdf > > > > > > so this change will make it a bit higher. From the other hand i agree > > > that for rt it should be fixed, probably it could be done like: > > > > > > ifdef PREEMPT_RT > > > migrate_disable() > > > #else > > > preempt_disable() > > > ... > > > > > > but i am not sure it is good either. > > > > What is to be expected on average? Is the lock acquired and then > > released again because the slot is empty and memory needs to be > > allocated or can it be assumed that this hardly happens? > > > The lock is not released(we are not allowed), instead we just try > to allocate with GFP_NOWAIT flag. It can happen if preallocation > has been failed with GFP_KERNEL flag earlier: > > <snip> > ... > } else if (type == NE_FIT_TYPE) { > /* > * Split no edge of fit VA. > * > * | | > * L V NVA V R > * |---|-------|---| > */ > lva = __this_cpu_xchg(ne_fit_preload_node, NULL); > if (unlikely(!lva)) { > ... > lva = kmem_cache_alloc(vmap_area_cachep, GFP_NOWAIT); > ... > } > ... > <snip> > > How often we need an extra object for split purpose, the answer > is it depends on. For example fork() path falls to that pattern. > > I think we can assume that migration can hardly ever happen and > that should be considered as rare case. Thus we can do a prealoading > without worrying much if a it occurs: > > <snip> > urezki@pc636:~/data/ssd/coding/linux-stable$ git diff > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index e92ff5f7dd8b..bc782edcd1fd 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1089,20 +1089,16 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, > * Even if it fails we do not really care about that. Just proceed > * as it is. "overflow" path will refill the cache we allocate from. > */ > - preempt_disable(); > - if (!__this_cpu_read(ne_fit_preload_node)) { > - preempt_enable(); > + if (!this_cpu_read(ne_fit_preload_node)) { > pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node); > - preempt_disable(); > > - if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) { > + if (this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) { > if (pva) > kmem_cache_free(vmap_area_cachep, pva); > } > } > > spin_lock(&vmap_area_lock); > - preempt_enable(); > > /* > * If an allocation fails, the "vend" address is > urezki@pc636:~/data/ssd/coding/linux-stable$ > <snip> > > so, we do not guarantee, instead we minimize number of allocations > with GFP_NOWAIT flag. For example on my 4xCPUs i am not able to > even trigger the case when CPU is not preloaded. > > I can test it tomorrow on my 12xCPUs to see its behavior there. > Tested it on different systems. For example on my 8xCPUs system that runs PREEMPT kernel i see only few GFP_NOWAIT allocations, i.e. it happens when we land to another CPU that was not preloaded. I run the special test case that follows the preload pattern and path. So 20 "unbind" threads run it and each does 1000000 allocations. As a result only 3.5 times among 1000000, during splitting, CPU was not preloaded thus, GFP_NOWAIT was used to obtain an extra object. It is obvious that slightly modified approach still minimizes allocations in atomic context, so it can happen but the number is negligible and can be ignored, i think. -- Vlad Rezki
On Tue, Oct 08, 2019 at 06:04:59PM +0200, Uladzislau Rezki wrote: > > so, we do not guarantee, instead we minimize number of allocations > > with GFP_NOWAIT flag. For example on my 4xCPUs i am not able to > > even trigger the case when CPU is not preloaded. > > > > I can test it tomorrow on my 12xCPUs to see its behavior there. > > > Tested it on different systems. For example on my 8xCPUs system that > runs PREEMPT kernel i see only few GFP_NOWAIT allocations, i.e. it > happens when we land to another CPU that was not preloaded. > > I run the special test case that follows the preload pattern and path. > So 20 "unbind" threads run it and each does 1000000 allocations. As a > result only 3.5 times among 1000000, during splitting, CPU was not > preloaded thus, GFP_NOWAIT was used to obtain an extra object. > > It is obvious that slightly modified approach still minimizes allocations > in atomic context, so it can happen but the number is negligible and can > be ignored, i think. Thanks for doing the tests. In this case I would suggest to get rid of the preempt_disable() micro optimization, since there is almost no gain in doing so. Do you send a patch? :) Thanks, Daniel
Hello, Daniel. On Wed, Oct 09, 2019 at 08:05:39AM +0200, Daniel Wagner wrote: > On Tue, Oct 08, 2019 at 06:04:59PM +0200, Uladzislau Rezki wrote: > > > so, we do not guarantee, instead we minimize number of allocations > > > with GFP_NOWAIT flag. For example on my 4xCPUs i am not able to > > > even trigger the case when CPU is not preloaded. > > > > > > I can test it tomorrow on my 12xCPUs to see its behavior there. > > > > > Tested it on different systems. For example on my 8xCPUs system that > > runs PREEMPT kernel i see only few GFP_NOWAIT allocations, i.e. it > > happens when we land to another CPU that was not preloaded. > > > > I run the special test case that follows the preload pattern and path. > > So 20 "unbind" threads run it and each does 1000000 allocations. As a > > result only 3.5 times among 1000000, during splitting, CPU was not > > preloaded thus, GFP_NOWAIT was used to obtain an extra object. > > > > It is obvious that slightly modified approach still minimizes allocations > > in atomic context, so it can happen but the number is negligible and can > > be ignored, i think. > > Thanks for doing the tests. In this case I would suggest to get rid of > the preempt_disable() micro optimization, since there is almost no > gain in doing so. Do you send a patch? :) > I can do it, for sure, in case you do not mind, since it was your initiative. Otherwise you can upload v2. -- Vlad Rezki
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 08c134aa7ff3..0d1175673583 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1091,11 +1091,11 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, * Even if it fails we do not really care about that. Just proceed * as it is. "overflow" path will refill the cache we allocate from. */ - preempt_disable(); + spin_lock(&vmap_area_lock); if (!__this_cpu_read(ne_fit_preload_node)) { - preempt_enable(); + spin_unlock(&vmap_area_lock); pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node); - preempt_disable(); + spin_lock(&vmap_area_lock); if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) { if (pva) @@ -1103,9 +1103,6 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, } } - spin_lock(&vmap_area_lock); - preempt_enable(); - /* * If an allocation fails, the "vend" address is * returned. Therefore trigger the overflow path.
Replace preempt_enable() and preempt_disable() with the vmap_area_lock spin_lock instead. Calling spin_lock() with preempt disabled is illegal for -rt. Furthermore, enabling preemption inside the spin_lock() doesn't really make sense. Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for split purpose") Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Daniel Wagner <dwagner@suse.de> --- mm/vmalloc.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)