mbox series

[0/5] Reduce a vmalloc internal lock contention preparation work

Message ID 20220607093449.3100-1-urezki@gmail.com (mailing list archive)
Headers show
Series Reduce a vmalloc internal lock contention preparation work | expand

Message

Uladzislau Rezki June 7, 2022, 9:34 a.m. UTC
Hi.

This small serias is a preparation work to implement per-cpu vmalloc
allocation in order to reduce a high internal lock contention. This
series does not introduce any functional changes, it is only about
preparation.

Uladzislau Rezki (Sony) (5):
  mm/vmalloc: Make link_va()/unlink_va() common to different rb_root
  mm/vmalloc: Extend __alloc_vmap_area() with extra arguments
  mm/vmalloc: Initialize VA's list node after unlink
  mm/vmalloc: Extend __find_vmap_area() with one more argument
  lib/test_vmalloc: Switch to prandom_u32()

 lib/test_vmalloc.c | 15 +++----
 mm/vmalloc.c       | 98 ++++++++++++++++++++++++++++++++--------------
 2 files changed, 76 insertions(+), 37 deletions(-)

Comments

Andrew Morton June 7, 2022, 10:35 p.m. UTC | #1
On Tue,  7 Jun 2022 11:34:44 +0200 "Uladzislau Rezki (Sony)" <urezki@gmail.com> wrote:

> This small serias is a preparation work to implement per-cpu vmalloc
> allocation in order to reduce a high internal lock contention. This
> series does not introduce any functional changes, it is only about
> preparation.

I can toss it in for some runtime testing, but...

What lock are we talking about here, what is the magnitude of the
performance issues it is causing and what is the status of the patch
which uses all this preparation?
Uladzislau Rezki June 8, 2022, 10:05 a.m. UTC | #2
>
> I can toss it in for some runtime testing, but...
>
> What lock are we talking about here, what is the magnitude of the
> performance issues it is causing and what is the status of the patch
> which uses all this preparation?
>
1.
The vmalloc still uses the global lock in order to access to the global
vmap space. As for magnitude it depends on number of CPUs, higher
number higher contention. Linear dependence.

2.
I am not aware about performance issues which i run into on my setup,
from the other hand there is a "Per cpu kva allocator" built on top of
vmalloc. See vm_map_ram() vm_unmap_ram(). Having vmalloc-per
CPU we can get rid of it.

It is used by the XFS, f2fs and some drivers. The reason is that a
vmalloc is costly due to internal global lock. That is why those users
go with "Per cpu kva allocator" to accelerate their workloads.

3.
My synthetic test shows a big difference between per-CPU vmalloc
patches and default variant. I have different prototypes based on
various ways how to make it per-CPU. I still do not have a fully solution
that satisfies all the needs. But i do not think it is possible due to many
constraints.

4.
This series is not tighten to future per-cpu-vmalloc patches, it is rather
makes the vmalloc code to be more generic as a result of such common
code it would be easier to extend it to per-cpu variant.

It means if per-cpu is not in place it is not needed to be reverted back.

That is the status.