Message ID | 20250415112646.113091-1-urezki@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | vmalloc: Use atomic_long_add_return_relaxed() | expand |
On 04/15/25 at 01:26pm, Uladzislau Rezki (Sony) wrote: > Switch from the atomic_long_add_return() to its relaxed version. > > We do not need a full memory barrier or any memory ordering during > increasing the "vmap_lazy_nr" variable. What we only need is to do it > atomically. This is what atomic_long_add_return_relaxed() guarantees. > > AARCH64: > > <snip> > Default: > 40ec: d34cfe94 lsr x20, x20, #12 > 40f0: 14000044 b 4200 <free_vmap_area_noflush+0x19c> > 40f4: 94000000 bl 0 <__sanitizer_cov_trace_pc> > 40f8: 90000000 adrp x0, 0 <__traceiter_alloc_vmap_area> > 40fc: 91000000 add x0, x0, #0x0 > 4100: f8f40016 ldaddal x20, x22, [x0] > 4104: 8b160296 add x22, x20, x22 > > Relaxed: > 40ec: d34cfe94 lsr x20, x20, #12 > 40f0: 14000044 b 4200 <free_vmap_area_noflush+0x19c> > 40f4: 94000000 bl 0 <__sanitizer_cov_trace_pc> > 40f8: 90000000 adrp x0, 0 <__traceiter_alloc_vmap_area> > 40fc: 91000000 add x0, x0, #0x0 > 4100: f8340016 ldadd x20, x22, [x0] > 4104: 8b160296 add x22, x20, x22 > <snip> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > --- > mm/vmalloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 7bb32f498d39..9d4027041a3f 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2370,7 +2370,7 @@ static void free_vmap_area_noflush(struct vmap_area *va) > if (WARN_ON_ONCE(!list_empty(&va->list))) > return; > > - nr_lazy = atomic_long_add_return(va_size(va) >> PAGE_SHIFT, > + nr_lazy = atomic_long_add_return_relaxed(va_size(va) >> PAGE_SHIFT, > &vmap_lazy_nr); LGTM, Reviewed-by: Baoquan He <bhe@redhat.com>
On Tue, Apr 15, 2025 at 01:26:46PM +0200, Uladzislau Rezki (Sony) wrote: > Switch from the atomic_long_add_return() to its relaxed version. > > We do not need a full memory barrier or any memory ordering during > increasing the "vmap_lazy_nr" variable. What we only need is to do it > atomically. This is what atomic_long_add_return_relaxed() guarantees. > > AARCH64: > > <snip> > Default: > 40ec: d34cfe94 lsr x20, x20, #12 > 40f0: 14000044 b 4200 <free_vmap_area_noflush+0x19c> > 40f4: 94000000 bl 0 <__sanitizer_cov_trace_pc> > 40f8: 90000000 adrp x0, 0 <__traceiter_alloc_vmap_area> > 40fc: 91000000 add x0, x0, #0x0 > 4100: f8f40016 ldaddal x20, x22, [x0] > 4104: 8b160296 add x22, x20, x22 > > Relaxed: > 40ec: d34cfe94 lsr x20, x20, #12 > 40f0: 14000044 b 4200 <free_vmap_area_noflush+0x19c> > 40f4: 94000000 bl 0 <__sanitizer_cov_trace_pc> > 40f8: 90000000 adrp x0, 0 <__traceiter_alloc_vmap_area> > 40fc: 91000000 add x0, x0, #0x0 > 4100: f8340016 ldadd x20, x22, [x0] > 4104: 8b160296 add x22, x20, x22 > <snip> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > --- > mm/vmalloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 7bb32f498d39..9d4027041a3f 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2370,7 +2370,7 @@ static void free_vmap_area_noflush(struct vmap_area *va) > if (WARN_ON_ONCE(!list_empty(&va->list))) > return; > > - nr_lazy = atomic_long_add_return(va_size(va) >> PAGE_SHIFT, > + nr_lazy = atomic_long_add_return_relaxed(va_size(va) >> PAGE_SHIFT, > &vmap_lazy_nr); > > /* If touching this, maybe a step further -- I see false sharing with: nm vmlinux | sort -k 1 [snip] ffffffff845cb240 b vmap_lazy_nr ffffffff845cb248 b nr_vmalloc_pages ffffffff845cb260 b single [/snip] If this is used enough to warrant messing with eliding a fence, then it probably also wants to elide some false sharing. So at least the first two should be annotated with __cacheline_aligned_in_smp? I am not going to push though, feel free to ignore.
On Tue, Apr 15, 2025 at 07:37:31PM +0200, Mateusz Guzik wrote: > On Tue, Apr 15, 2025 at 01:26:46PM +0200, Uladzislau Rezki (Sony) wrote: > > Switch from the atomic_long_add_return() to its relaxed version. > > > > We do not need a full memory barrier or any memory ordering during > > increasing the "vmap_lazy_nr" variable. What we only need is to do it > > atomically. This is what atomic_long_add_return_relaxed() guarantees. > > > > AARCH64: > > > > <snip> > > Default: > > 40ec: d34cfe94 lsr x20, x20, #12 > > 40f0: 14000044 b 4200 <free_vmap_area_noflush+0x19c> > > 40f4: 94000000 bl 0 <__sanitizer_cov_trace_pc> > > 40f8: 90000000 adrp x0, 0 <__traceiter_alloc_vmap_area> > > 40fc: 91000000 add x0, x0, #0x0 > > 4100: f8f40016 ldaddal x20, x22, [x0] > > 4104: 8b160296 add x22, x20, x22 > > > > Relaxed: > > 40ec: d34cfe94 lsr x20, x20, #12 > > 40f0: 14000044 b 4200 <free_vmap_area_noflush+0x19c> > > 40f4: 94000000 bl 0 <__sanitizer_cov_trace_pc> > > 40f8: 90000000 adrp x0, 0 <__traceiter_alloc_vmap_area> > > 40fc: 91000000 add x0, x0, #0x0 > > 4100: f8340016 ldadd x20, x22, [x0] > > 4104: 8b160296 add x22, x20, x22 > > <snip> > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > --- > > mm/vmalloc.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index 7bb32f498d39..9d4027041a3f 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -2370,7 +2370,7 @@ static void free_vmap_area_noflush(struct vmap_area *va) > > if (WARN_ON_ONCE(!list_empty(&va->list))) > > return; > > > > - nr_lazy = atomic_long_add_return(va_size(va) >> PAGE_SHIFT, > > + nr_lazy = atomic_long_add_return_relaxed(va_size(va) >> PAGE_SHIFT, > > &vmap_lazy_nr); > > > > /* > > If touching this, maybe a step further -- I see false sharing with: > nm vmlinux | sort -k 1 > > [snip] > ffffffff845cb240 b vmap_lazy_nr > ffffffff845cb248 b nr_vmalloc_pages > ffffffff845cb260 b single > [/snip] > > If this is used enough to warrant messing with eliding a fence, then it > probably also wants to elide some false sharing. > > So at least the first two should be annotated with __cacheline_aligned_in_smp? > > I am not going to push though, feel free to ignore. > OK, i see your point. I will check closer. We can place both into two different cache-lines to prevent interference between each other. But that will be a different patch. Thank you for the nit! -- Uladzislau Rezki
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 7bb32f498d39..9d4027041a3f 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2370,7 +2370,7 @@ static void free_vmap_area_noflush(struct vmap_area *va) if (WARN_ON_ONCE(!list_empty(&va->list))) return; - nr_lazy = atomic_long_add_return(va_size(va) >> PAGE_SHIFT, + nr_lazy = atomic_long_add_return_relaxed(va_size(va) >> PAGE_SHIFT, &vmap_lazy_nr); /*
Switch from the atomic_long_add_return() to its relaxed version. We do not need a full memory barrier or any memory ordering during increasing the "vmap_lazy_nr" variable. What we only need is to do it atomically. This is what atomic_long_add_return_relaxed() guarantees. AARCH64: <snip> Default: 40ec: d34cfe94 lsr x20, x20, #12 40f0: 14000044 b 4200 <free_vmap_area_noflush+0x19c> 40f4: 94000000 bl 0 <__sanitizer_cov_trace_pc> 40f8: 90000000 adrp x0, 0 <__traceiter_alloc_vmap_area> 40fc: 91000000 add x0, x0, #0x0 4100: f8f40016 ldaddal x20, x22, [x0] 4104: 8b160296 add x22, x20, x22 Relaxed: 40ec: d34cfe94 lsr x20, x20, #12 40f0: 14000044 b 4200 <free_vmap_area_noflush+0x19c> 40f4: 94000000 bl 0 <__sanitizer_cov_trace_pc> 40f8: 90000000 adrp x0, 0 <__traceiter_alloc_vmap_area> 40fc: 91000000 add x0, x0, #0x0 4100: f8340016 ldadd x20, x22, [x0] 4104: 8b160296 add x22, x20, x22 <snip> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> --- mm/vmalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)