Message ID | 20220607093449.3100-1-urezki@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Reduce a vmalloc internal lock contention preparation work | expand |
On Tue, 7 Jun 2022 11:34:44 +0200 "Uladzislau Rezki (Sony)" <urezki@gmail.com> wrote: > This small serias is a preparation work to implement per-cpu vmalloc > allocation in order to reduce a high internal lock contention. This > series does not introduce any functional changes, it is only about > preparation. I can toss it in for some runtime testing, but... What lock are we talking about here, what is the magnitude of the performance issues it is causing and what is the status of the patch which uses all this preparation?
> > I can toss it in for some runtime testing, but... > > What lock are we talking about here, what is the magnitude of the > performance issues it is causing and what is the status of the patch > which uses all this preparation? > 1. The vmalloc still uses the global lock in order to access to the global vmap space. As for magnitude it depends on number of CPUs, higher number higher contention. Linear dependence. 2. I am not aware about performance issues which i run into on my setup, from the other hand there is a "Per cpu kva allocator" built on top of vmalloc. See vm_map_ram() vm_unmap_ram(). Having vmalloc-per CPU we can get rid of it. It is used by the XFS, f2fs and some drivers. The reason is that a vmalloc is costly due to internal global lock. That is why those users go with "Per cpu kva allocator" to accelerate their workloads. 3. My synthetic test shows a big difference between per-CPU vmalloc patches and default variant. I have different prototypes based on various ways how to make it per-CPU. I still do not have a fully solution that satisfies all the needs. But i do not think it is possible due to many constraints. 4. This series is not tighten to future per-cpu-vmalloc patches, it is rather makes the vmalloc code to be more generic as a result of such common code it would be easier to extend it to per-cpu variant. It means if per-cpu is not in place it is not needed to be reverted back. That is the status.