Message ID | 20230522110849.2921-1-urezki@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Mitigate a vmap lock contention | expand |
On Mon, May 22, 2023 at 01:08:40PM +0200, Uladzislau Rezki (Sony) wrote: > Hello, folk. > > 1. This is a followup of the vmap topic that was highlighted at the LSFMMBPF-2023 > conference. This small serial attempts to mitigate the contention across the > vmap/vmalloc code. The problem is described here: > Hello Uladzislau, thank you for the work! > wget ftp://vps418301.ovh.net/incoming/Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf I ran the exactly same command but couldn't download the file, did I miss something? $ wget ftp://vps418301.ovh.net/incoming/Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf [...] ==> PASV ... done. ==> RETR Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf ... No such file `Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf'. > The material is tagged as a v2 version. It contains extra slides about testing > the throughput, steps and comparison with a current approach. > > 2. Motivation. > > - The vmap code is not scalled to number of CPUs and this should be fixed; > - XFS folk has complained several times that vmalloc might be contented on > their workloads: > > <snip> > commit 8dc9384b7d75012856b02ff44c37566a55fc2abf > Author: Dave Chinner <dchinner@redhat.com> > Date: Tue Jan 4 17:22:18 2022 -0800 > > xfs: reduce kvmalloc overhead for CIL shadow buffers > > Oh, let me count the ways that the kvmalloc API sucks dog eggs. > > The problem is when we are logging lots of large objects, we hit > kvmalloc really damn hard with costly order allocations, and > behaviour utterly sucks: based on the commit I guess xfs should use vmalloc/kvmalloc is because it allocates large buffers, how large could it be? > 3. Test > > On my: AMD Ryzen Threadripper 3970X 32-Core Processor, i have below figures: > > 1-page 1-page-this-patch > 1 0.576131 vs 0.555889 > 2 2.68376 vs 1.07895 > 3 4.26502 vs 1.01739 > 4 6.04306 vs 1.28924 > 5 8.04786 vs 1.57616 > 6 9.38844 vs 1.78142 <snip> > 29 20.06 vs 3.59869 > 30 20.4353 vs 3.6991 > 31 20.9082 vs 3.73028 > 32 21.0865 vs 3.82904 > > 1..32 - is a number of jobs. The results are in usec and is a vmallco()/vfree() > pair throughput. I would be more interested in real numbers than synthetic benchmarks, Maybe XFS folks could help performing profiling similar to commit 8dc9384b7d750 with and without this patchset? By the way looking at the commit, teaching __p?d_alloc() about gfp context (that I'm _slowly_ working on...) could be nice for allowing non-GFP_KERNEL kvmalloc allocations, as Matthew mentioned. [1] Thanks! [1] https://lore.kernel.org/linux-mm/Y%2FOHC33YLedMXTlD@casper.infradead.org
On Tue, May 23, 2023 at 08:59:05PM +0900, Hyeonggon Yoo wrote: > On Mon, May 22, 2023 at 01:08:40PM +0200, Uladzislau Rezki (Sony) wrote: > > Hello, folk. > > > > 1. This is a followup of the vmap topic that was highlighted at the LSFMMBPF-2023 > > conference. This small serial attempts to mitigate the contention across the > > vmap/vmalloc code. The problem is described here: > > > > Hello Uladzislau, thank you for the work! > > > wget ftp://vps418301.ovh.net/incoming/Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf > > I ran the exactly same command but couldn't download the file, did I > miss something? > wget ftp://vps418301.ovh.net/incoming/Mitigate_a_vmalloc_lock_contention_in_SMP_env_v2.pdf Sorry, i renamed the file name :) > $ wget ftp://vps418301.ovh.net/incoming/Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf > [...] > ==> PASV ... done. ==> RETR Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf ... > No such file `Fix_a_vmalloc_lock_contention_in_SMP_env_v2.pdf'. > > > The material is tagged as a v2 version. It contains extra slides about testing > > the throughput, steps and comparison with a current approach. > > > > 2. Motivation. > > > > - The vmap code is not scalled to number of CPUs and this should be fixed; > > - XFS folk has complained several times that vmalloc might be contented on > > their workloads: > > > > <snip> > > commit 8dc9384b7d75012856b02ff44c37566a55fc2abf > > Author: Dave Chinner <dchinner@redhat.com> > > Date: Tue Jan 4 17:22:18 2022 -0800 > > > > xfs: reduce kvmalloc overhead for CIL shadow buffers > > > > Oh, let me count the ways that the kvmalloc API sucks dog eggs. > > > > The problem is when we are logging lots of large objects, we hit > > kvmalloc really damn hard with costly order allocations, and > > behaviour utterly sucks: > > based on the commit I guess xfs should use vmalloc/kvmalloc is because > it allocates large buffers, how large could it be? > They use kvmalloc(). When the page allocator is not able to serve a request they fallback to vmalloc. At least what i see, the sizes are: from 73728 up to 1048576, i.e. 18 pages up to 256 pages. > > 3. Test > > > > On my: AMD Ryzen Threadripper 3970X 32-Core Processor, i have below figures: > > > > 1-page 1-page-this-patch > > 1 0.576131 vs 0.555889 > > 2 2.68376 vs 1.07895 > > 3 4.26502 vs 1.01739 > > 4 6.04306 vs 1.28924 > > 5 8.04786 vs 1.57616 > > 6 9.38844 vs 1.78142 > > <snip> > > > 29 20.06 vs 3.59869 > > 30 20.4353 vs 3.6991 > > 31 20.9082 vs 3.73028 > > 32 21.0865 vs 3.82904 > > > > 1..32 - is a number of jobs. The results are in usec and is a vmallco()/vfree() > > pair throughput. > > I would be more interested in real numbers than synthetic benchmarks, > Maybe XFS folks could help performing profiling similar to commit 8dc9384b7d750 > with and without this patchset? > I added Dave Chinner <david@fromorbit.com> to this thread. But. The contention exists. Apart of that per-cpu-KVA allocator can go away if we make it generic instead. > By the way looking at the commit, teaching __p?d_alloc() about gfp > context (that I'm _slowly_ working on...) could be nice for allowing > non-GFP_KERNEL kvmalloc allocations, as Matthew mentioned. [1] > > Thanks! > > [1] https://lore.kernel.org/linux-mm/Y%2FOHC33YLedMXTlD@casper.infradead.org > Thanks! -- Uladzisdlau Rezki
On Tue, May 23, 2023 at 05:12:30PM +0200, Uladzislau Rezki wrote: > > > 2. Motivation. > > > > > > - The vmap code is not scalled to number of CPUs and this should be fixed; > > > - XFS folk has complained several times that vmalloc might be contented on > > > their workloads: > > > > > > <snip> > > > commit 8dc9384b7d75012856b02ff44c37566a55fc2abf > > > Author: Dave Chinner <dchinner@redhat.com> > > > Date: Tue Jan 4 17:22:18 2022 -0800 > > > > > > xfs: reduce kvmalloc overhead for CIL shadow buffers > > > > > > Oh, let me count the ways that the kvmalloc API sucks dog eggs. > > > > > > The problem is when we are logging lots of large objects, we hit > > > kvmalloc really damn hard with costly order allocations, and > > > behaviour utterly sucks: > > > > based on the commit I guess xfs should use vmalloc/kvmalloc is because > > it allocates large buffers, how large could it be? > > > They use kvmalloc(). When the page allocator is not able to serve a > request they fallback to vmalloc. At least what i see, the sizes are: > > from 73728 up to 1048576, i.e. 18 pages up to 256 pages. > > > > 3. Test > > > > > > On my: AMD Ryzen Threadripper 3970X 32-Core Processor, i have below figures: > > > > > > 1-page 1-page-this-patch > > > 1 0.576131 vs 0.555889 > > > 2 2.68376 vs 1.07895 > > > 3 4.26502 vs 1.01739 > > > 4 6.04306 vs 1.28924 > > > 5 8.04786 vs 1.57616 > > > 6 9.38844 vs 1.78142 > > > > <snip> > > > > > 29 20.06 vs 3.59869 > > > 30 20.4353 vs 3.6991 > > > 31 20.9082 vs 3.73028 > > > 32 21.0865 vs 3.82904 > > > > > > 1..32 - is a number of jobs. The results are in usec and is a vmallco()/vfree() > > > pair throughput. > > > > I would be more interested in real numbers than synthetic benchmarks, > > Maybe XFS folks could help performing profiling similar to commit 8dc9384b7d750 > > with and without this patchset? > > > I added Dave Chinner <david@fromorbit.com> to this thread. Oh, I missed that, and it would be better to [+Cc linux-xfs] > But. The contention exists. I think "theoretically can be contended" doesn't necessarily mean it's actually contended in the real world. Also I find it difficult to imagine vmalloc being highly contended because it was historically considered slow and thus discouraged when performance is important. IOW vmalloc would not be contended when allocation size is small because we have kmalloc/buddy API, and therefore I wonder which workloads are allocating very large buffers and at the same time allocating very frequently, thus performance-sensitive. I am not against this series, but wondering which workloads would benefit ;) > Apart of that per-cpu-KVA allocator can go away if we make it generic instead. Not sure I understand your point, can you elaborate please? And I would like to ask some side questions: 1. Is vm_[un]map_ram() API still worth with this patchset? 2. How does this patchset deals with 32-bit machines where vmalloc address space is limited? Thanks! > > By the way looking at the commit, teaching __p?d_alloc() about gfp > > context (that I'm _slowly_ working on...) could be nice for allowing > > non-GFP_KERNEL kvmalloc allocations, as Matthew mentioned. [1] > > > > Thanks! > > > > [1] https://lore.kernel.org/linux-mm/Y%2FOHC33YLedMXTlD@casper.infradead.org > >
On Wed, May 24, 2023 at 03:04:28AM +0900, Hyeonggon Yoo wrote: > On Tue, May 23, 2023 at 05:12:30PM +0200, Uladzislau Rezki wrote: > > > > 2. Motivation. > > > > > > > > - The vmap code is not scalled to number of CPUs and this should be fixed; > > > > - XFS folk has complained several times that vmalloc might be contented on > > > > their workloads: > > > > > > > > <snip> > > > > commit 8dc9384b7d75012856b02ff44c37566a55fc2abf > > > > Author: Dave Chinner <dchinner@redhat.com> > > > > Date: Tue Jan 4 17:22:18 2022 -0800 > > > > > > > > xfs: reduce kvmalloc overhead for CIL shadow buffers > > > > > > > > Oh, let me count the ways that the kvmalloc API sucks dog eggs. > > > > > > > > The problem is when we are logging lots of large objects, we hit > > > > kvmalloc really damn hard with costly order allocations, and > > > > behaviour utterly sucks: > > > > > > based on the commit I guess xfs should use vmalloc/kvmalloc is because > > > it allocates large buffers, how large could it be? > > > > > They use kvmalloc(). When the page allocator is not able to serve a > > request they fallback to vmalloc. At least what i see, the sizes are: > > > > from 73728 up to 1048576, i.e. 18 pages up to 256 pages. > > > > > > 3. Test > > > > > > > > On my: AMD Ryzen Threadripper 3970X 32-Core Processor, i have below figures: > > > > > > > > 1-page 1-page-this-patch > > > > 1 0.576131 vs 0.555889 > > > > 2 2.68376 vs 1.07895 > > > > 3 4.26502 vs 1.01739 > > > > 4 6.04306 vs 1.28924 > > > > 5 8.04786 vs 1.57616 > > > > 6 9.38844 vs 1.78142 > > > > > > <snip> > > > > > > > 29 20.06 vs 3.59869 > > > > 30 20.4353 vs 3.6991 > > > > 31 20.9082 vs 3.73028 > > > > 32 21.0865 vs 3.82904 > > > > > > > > 1..32 - is a number of jobs. The results are in usec and is a vmallco()/vfree() > > > > pair throughput. > > > > > > I would be more interested in real numbers than synthetic benchmarks, > > > Maybe XFS folks could help performing profiling similar to commit 8dc9384b7d750 > > > with and without this patchset? > > > > > I added Dave Chinner <david@fromorbit.com> to this thread. > > Oh, I missed that, and it would be better to [+Cc linux-xfs] > > > But. The contention exists. > > I think "theoretically can be contended" doesn't necessarily mean it's actually > contended in the real world. Did you not read the commit message for the XFS commit documented above? vmalloc lock contention most c0ertainly does exist in the real world and the profiles in commit 8dc9384b7d75 ("xfs: reduce kvmalloc overhead for CIL shadow buffers") document it clearly. > Also I find it difficult to imagine vmalloc being highly contended because it was > historically considered slow and thus discouraged when performance is important. Read the above XFS commit. We use vmalloc in critical high performance fast paths that cannot tolerate high order memory allocation failure. XFS runs this fast path millions of times a second, and will call into vmalloc() several hundred thousands times a second with machine wide concurrency under certain types of workloads. > IOW vmalloc would not be contended when allocation size is small because we have > kmalloc/buddy API, and therefore I wonder which workloads are allocating very large > buffers and at the same time allocating very frequently, thus performance-sensitive. > > I am not against this series, but wondering which workloads would benefit ;) Yup, you need to read the XFS commit message. If you understand what is in that commit message, then you wouldn't be doubting that vmalloc contention is real and that it is used in high performance fast paths that are traversed millions of times a second.... > > Apart of that per-cpu-KVA allocator can go away if we make it generic instead. > > Not sure I understand your point, can you elaborate please? > > And I would like to ask some side questions: > > 1. Is vm_[un]map_ram() API still worth with this patchset? XFS also uses this interface for mapping multi-page buffers in the XFS buffer cache. These are the items that also require the high order costly kvmalloc allocations in the transaction commit path when they are modified. So, yes, we need these mapping interfaces to scale just as well as vmalloc itself.... > 2. How does this patchset deals with 32-bit machines where > vmalloc address space is limited? From the XFS side, we just don't care about 32 bit machines at all. XFS is aimed at server and HPC environments which have been entirely 64 bit for a long, long time now... -Dave.
On Wed, May 24, 2023 at 07:43:38AM +1000, Dave Chinner wrote: > On Wed, May 24, 2023 at 03:04:28AM +0900, Hyeonggon Yoo wrote: > > On Tue, May 23, 2023 at 05:12:30PM +0200, Uladzislau Rezki wrote: > > > > > 2. Motivation. > > > > > > > > > > - The vmap code is not scalled to number of CPUs and this should be fixed; > > > > > - XFS folk has complained several times that vmalloc might be contented on > > > > > their workloads: > > > > > > > > > > <snip> > > > > > commit 8dc9384b7d75012856b02ff44c37566a55fc2abf > > > > > Author: Dave Chinner <dchinner@redhat.com> > > > > > Date: Tue Jan 4 17:22:18 2022 -0800 > > > > > > > > > > xfs: reduce kvmalloc overhead for CIL shadow buffers > > > > > > > > > > Oh, let me count the ways that the kvmalloc API sucks dog eggs. > > > > > > > > > > The problem is when we are logging lots of large objects, we hit > > > > > kvmalloc really damn hard with costly order allocations, and > > > > > behaviour utterly sucks: > > > > > > > > based on the commit I guess xfs should use vmalloc/kvmalloc is because > > > > it allocates large buffers, how large could it be? > > > > > > > They use kvmalloc(). When the page allocator is not able to serve a > > > request they fallback to vmalloc. At least what i see, the sizes are: > > > > > > from 73728 up to 1048576, i.e. 18 pages up to 256 pages. > > > > > > > > 3. Test > > > > > > > > > > On my: AMD Ryzen Threadripper 3970X 32-Core Processor, i have below figures: > > > > > > > > > > 1-page 1-page-this-patch > > > > > 1 0.576131 vs 0.555889 > > > > > 2 2.68376 vs 1.07895 > > > > > 3 4.26502 vs 1.01739 > > > > > 4 6.04306 vs 1.28924 > > > > > 5 8.04786 vs 1.57616 > > > > > 6 9.38844 vs 1.78142 > > > > > > > > <snip> > > > > > > > > > 29 20.06 vs 3.59869 > > > > > 30 20.4353 vs 3.6991 > > > > > 31 20.9082 vs 3.73028 > > > > > 32 21.0865 vs 3.82904 > > > > > > > > > > 1..32 - is a number of jobs. The results are in usec and is a vmallco()/vfree() > > > > > pair throughput. > > > > > > > > I would be more interested in real numbers than synthetic benchmarks, > > > > Maybe XFS folks could help performing profiling similar to commit 8dc9384b7d750 > > > > with and without this patchset? > > > > > > > I added Dave Chinner <david@fromorbit.com> to this thread. > > > > Oh, I missed that, and it would be better to [+Cc linux-xfs] > > > > > But. The contention exists. > > > > I think "theoretically can be contended" doesn't necessarily mean it's actually > > contended in the real world. > > Did you not read the commit message for the XFS commit documented > above? vmalloc lock contention most c0ertainly does exist in the > real world and the profiles in commit 8dc9384b7d75 ("xfs: reduce > kvmalloc overhead for CIL shadow buffers") document it clearly. > > > Also I find it difficult to imagine vmalloc being highly contended because it was > > historically considered slow and thus discouraged when performance is important. > > Read the above XFS commit. > > We use vmalloc in critical high performance fast paths that cannot > tolerate high order memory allocation failure. XFS runs this > fast path millions of times a second, and will call into > vmalloc() several hundred thousands times a second with machine wide > concurrency under certain types of workloads. > > > IOW vmalloc would not be contended when allocation size is small because we have > > kmalloc/buddy API, and therefore I wonder which workloads are allocating very large > > buffers and at the same time allocating very frequently, thus performance-sensitive. > > > > I am not against this series, but wondering which workloads would benefit ;) > > Yup, you need to read the XFS commit message. If you understand what > is in that commit message, then you wouldn't be doubting that > vmalloc contention is real and that it is used in high performance > fast paths that are traversed millions of times a second.... Oh, I read the commit but seems slipped my mind while reading it - sorry for such a dumb question, now I get it, and thank you so much. In any case didn't mean to offend, I should've read and thought more before asking. > > > > Apart of that per-cpu-KVA allocator can go away if we make it generic instead. > > > > Not sure I understand your point, can you elaborate please? > > > > And I would like to ask some side questions: > > > > 1. Is vm_[un]map_ram() API still worth with this patchset? > > XFS also uses this interface for mapping multi-page buffers in the > XFS buffer cache. These are the items that also require the high > order costly kvmalloc allocations in the transaction commit path > when they are modified. > > So, yes, we need these mapping interfaces to scale just as well as > vmalloc itself.... I mean, even before this series, vm_[un]map_ram() caches vmap_blocks per CPU but it has limitation on size that can be cached per cpu. But now that vmap() itself becomes scalable after this series, I wonder they are still worth, why not replace it with v[un]map()? > > > 2. How does this patchset deals with 32-bit machines where > > vmalloc address space is limited? > > From the XFS side, we just don't care about 32 bit machines at all. > XFS is aimed at server and HPC environments which have been entirely > 64 bit for a long, long time now... But Linux still supports 32 bit machines and is not going to drop support for them anytime soon so I think there should be at least a way to disable this feature. Thanks!
On Wed, May 24, 2023 at 03:04:28AM +0900, Hyeonggon Yoo wrote: > On Tue, May 23, 2023 at 05:12:30PM +0200, Uladzislau Rezki wrote: > > > > 2. Motivation. > > > > > > > > - The vmap code is not scalled to number of CPUs and this should be fixed; > > > > - XFS folk has complained several times that vmalloc might be contented on > > > > their workloads: > > > > > > > > <snip> > > > > commit 8dc9384b7d75012856b02ff44c37566a55fc2abf > > > > Author: Dave Chinner <dchinner@redhat.com> > > > > Date: Tue Jan 4 17:22:18 2022 -0800 > > > > > > > > xfs: reduce kvmalloc overhead for CIL shadow buffers > > > > > > > > Oh, let me count the ways that the kvmalloc API sucks dog eggs. > > > > > > > > The problem is when we are logging lots of large objects, we hit > > > > kvmalloc really damn hard with costly order allocations, and > > > > behaviour utterly sucks: > > > > > > based on the commit I guess xfs should use vmalloc/kvmalloc is because > > > it allocates large buffers, how large could it be? > > > > > They use kvmalloc(). When the page allocator is not able to serve a > > request they fallback to vmalloc. At least what i see, the sizes are: > > > > from 73728 up to 1048576, i.e. 18 pages up to 256 pages. > > > > > > 3. Test > > > > > > > > On my: AMD Ryzen Threadripper 3970X 32-Core Processor, i have below figures: > > > > > > > > 1-page 1-page-this-patch > > > > 1 0.576131 vs 0.555889 > > > > 2 2.68376 vs 1.07895 > > > > 3 4.26502 vs 1.01739 > > > > 4 6.04306 vs 1.28924 > > > > 5 8.04786 vs 1.57616 > > > > 6 9.38844 vs 1.78142 > > > > > > <snip> > > > > > > > 29 20.06 vs 3.59869 > > > > 30 20.4353 vs 3.6991 > > > > 31 20.9082 vs 3.73028 > > > > 32 21.0865 vs 3.82904 > > > > > > > > 1..32 - is a number of jobs. The results are in usec and is a vmallco()/vfree() > > > > pair throughput. > > > > > > I would be more interested in real numbers than synthetic benchmarks, > > > Maybe XFS folks could help performing profiling similar to commit 8dc9384b7d750 > > > with and without this patchset? > > > > > I added Dave Chinner <david@fromorbit.com> to this thread. > > Oh, I missed that, and it would be better to [+Cc linux-xfs] > > > But. The contention exists. > > I think "theoretically can be contended" doesn't necessarily mean it's actually > contended in the real world. > > Also I find it difficult to imagine vmalloc being highly contended because it was > historically considered slow and thus discouraged when performance is important. > > IOW vmalloc would not be contended when allocation size is small because we have > kmalloc/buddy API, and therefore I wonder which workloads are allocating very large > buffers and at the same time allocating very frequently, thus performance-sensitive. > > I am not against this series, but wondering which workloads would benefit ;) > > > Apart of that per-cpu-KVA allocator can go away if we make it generic instead. > > Not sure I understand your point, can you elaborate please? > > And I would like to ask some side questions: > > 1. Is vm_[un]map_ram() API still worth with this patchset? > It is up to community to decide. As i see XFS needs it also. Maybe in the future it can be removed(who knows). If the vmalloc code itself can deliver such performance as vm_map* APIs. > > 2. How does this patchset deals with 32-bit machines where > vmalloc address space is limited? > It can deal without any problems. Though i am not sure it is needed for 32-bit systems. The reason is, the vmalloc code was a bit slow when it comes to lookup time, it used to be O(n). After that it was improved to O(logn). vm_map_ram() and friends interface was added because of vmalloc drawbacks. I am not sure that there are 32-bit systems with 10/20/30... CPUs on board. In that case it is worth care about contention. -- Uladzislau Rezki
On Wed, May 24, 2023 at 11:50:12AM +0200, Uladzislau Rezki wrote: > On Wed, May 24, 2023 at 03:04:28AM +0900, Hyeonggon Yoo wrote: > > On Tue, May 23, 2023 at 05:12:30PM +0200, Uladzislau Rezki wrote: > > And I would like to ask some side questions: > > > > 1. Is vm_[un]map_ram() API still worth with this patchset? > > > It is up to community to decide. As i see XFS needs it also. Maybe in > the future it can be removed(who knows). If the vmalloc code itself can > deliver such performance as vm_map* APIs. vm_map* APIs cannot be replaced with vmalloc, they cover a very different use case. i.e. vmalloc allocates mapped memory, vm_map_ram() maps allocated memory.... > vm_map_ram() and friends interface was added because of vmalloc drawbacks. No. vm_map*() were scalability improvements added in 2009 to replace on vmap() and vunmap() to avoid global lock contention in the vmap allocator that XFS had been working around for years with it's own internal vmap cache.... commit 95f8e302c04c0b0c6de35ab399a5551605eeb006 Author: Nicholas Piggin <npiggin@gmail.com> Date: Tue Jan 6 14:43:09 2009 +1100 [XFS] use scalable vmap API Implement XFS's large buffer support with the new vmap APIs. See the vmap rewrite (db64fe02) for some numbers. The biggest improvement that comes from using the new APIs is avoiding the global KVA allocation lock on every call. Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> vmap/vunmap() themselves were introduce in 2.5.32 (2002) and before that XFS was using remap_page_array() and vfree() in exactly the same way it uses vm_map_ram() and vm_unmap_ram() today.... XFS has a long, long history of causing virtual memory allocator scalability and contention problems. As you can see, this isn't our first rodeo... Cheers, Dave.
On Thu, May 25, 2023 at 07:56:56AM +1000, Dave Chinner wrote: > > It is up to community to decide. As i see XFS needs it also. Maybe in > > the future it can be removed(who knows). If the vmalloc code itself can > > deliver such performance as vm_map* APIs. > > vm_map* APIs cannot be replaced with vmalloc, they cover a very > different use case. i.e. vmalloc allocates mapped memory, > vm_map_ram() maps allocated memory.... > > > vm_map_ram() and friends interface was added because of vmalloc drawbacks. > > No. vm_map*() were scalability improvements added in 2009 to replace > on vmap() and vunmap() to avoid global lock contention in the vmap > allocator that XFS had been working around for years with it's own > internal vmap cache.... All of that is true. At the same time XFS could very much switch to vmalloc for !XBF_UNMAPPED && size > PAGE_SIZE buffers IFF that provided an advantage. The need for vmap and then vm_map_* initially came from the fact that we were using the page cache to back the xfs_buf (or page_buf back then). With your work on getting rid of the pagecache usage we could just use vmalloc now if we wanted to and it improves performance. Or at some point we could even look into just using large folios for that with the infrastructure for that improving a lot (but I suspect we're not quite there yet). But ther are other uses of vm_map_* that can't do that, and they will benefit from the additional scalability as well even IFF just using vmalloc was fine for XFS now (I don't know, I haven't actually looked into it).
On Thu, May 25, 2023 at 07:56:56AM +1000, Dave Chinner wrote: > On Wed, May 24, 2023 at 11:50:12AM +0200, Uladzislau Rezki wrote: > > On Wed, May 24, 2023 at 03:04:28AM +0900, Hyeonggon Yoo wrote: > > > On Tue, May 23, 2023 at 05:12:30PM +0200, Uladzislau Rezki wrote: > > > And I would like to ask some side questions: > > > > > > 1. Is vm_[un]map_ram() API still worth with this patchset? > > > > > It is up to community to decide. As i see XFS needs it also. Maybe in > > the future it can be removed(who knows). If the vmalloc code itself can > > deliver such performance as vm_map* APIs. > > vm_map* APIs cannot be replaced with vmalloc, they cover a very > different use case. i.e. vmalloc allocates mapped memory, > vm_map_ram() maps allocated memory.... > > > vm_map_ram() and friends interface was added because of vmalloc drawbacks. > > No. vm_map*() were scalability improvements added in 2009 to replace > on vmap() and vunmap() to avoid global lock contention in the vmap > allocator that XFS had been working around for years with it's own > internal vmap cache.... > > commit 95f8e302c04c0b0c6de35ab399a5551605eeb006 > Author: Nicholas Piggin <npiggin@gmail.com> > Date: Tue Jan 6 14:43:09 2009 +1100 > > [XFS] use scalable vmap API > > Implement XFS's large buffer support with the new vmap APIs. See the vmap > rewrite (db64fe02) for some numbers. The biggest improvement that comes from > using the new APIs is avoiding the global KVA allocation lock on every call. > > Signed-off-by: Nick Piggin <npiggin@suse.de> > Reviewed-by: Christoph Hellwig <hch@infradead.org> > Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> > > vmap/vunmap() themselves were introduce in 2.5.32 (2002) and before > that XFS was using remap_page_array() and vfree() in exactly the > same way it uses vm_map_ram() and vm_unmap_ram() today.... > > XFS has a long, long history of causing virtual memory allocator > scalability and contention problems. As you can see, this isn't our > first rodeo... > Let me be more specific, sorry it looks like there is misunderstanding. I am talking about removing of vb_alloc()/vb_free() per-cpu stuff. If alloc_vmap_area() gives same performance: diff --git a/mm/vmalloc.c b/mm/vmalloc.c index d50c551592fc..a1687bbdad30 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2503,12 +2503,6 @@ void vm_unmap_ram(const void *mem, unsigned int count) kasan_poison_vmalloc(mem, size); - if (likely(count <= VMAP_MAX_ALLOC)) { - debug_check_no_locks_freed(mem, size); - vb_free(addr, size); - return; - } - va = find_unlink_vmap_area(addr); if (WARN_ON_ONCE(!va)) return; @@ -2539,12 +2533,6 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node) unsigned long addr; void *mem; - if (likely(count <= VMAP_MAX_ALLOC)) { - mem = vb_alloc(size, GFP_KERNEL); - if (IS_ERR(mem)) - return NULL; - addr = (unsigned long)mem; - } else { struct vmap_area *va; va = alloc_vmap_area(size, PAGE_SIZE, VMALLOC_START, VMALLOC_END, @@ -2554,7 +2542,6 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node) addr = va->va_start; mem = (void *)addr; - } if (vmap_pages_range(addr, addr + size, PAGE_KERNEL, pages, PAGE_SHIFT) < 0) { + other related parts. -- Uladzislau Rezki