Message ID | 20190410024714.26607-1-tobin@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | mm: Remove the SLAB allocator | expand |
On 4/10/19 4:47 AM, Tobin C. Harding wrote: > Recently a 2 year old bug was found in the SLAB allocator that crashes > the kernel. This seems to imply that not that many people are using the > SLAB allocator. AFAIK that bug required CONFIG_DEBUG_SLAB_LEAK, not just SLAB. That seems to imply not that many people are using SLAB when debugging and yeah, SLUB has better debugging support. But I wouldn't dare to make the broader implication :) > Currently we have 3 slab allocators. Two is company three is a crowd - > let's get rid of one. > > - The SLUB allocator has been the default since 2.6.23 Yeah, with a sophisticated reasoning :) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a0acd820807680d2ccc4ef3448387fcdbf152c73 > - The SLOB allocator is kinda sexy. Its only 664 LOC, the general > design is outlined in KnR, and there is an optimisation taken from > Knuth - say no more. > > If you are using the SLAB allocator please speak now or forever hold your peace ... FWIW, our enterprise kernel use it (latest is 4.12 based), and openSUSE kernels as well (with openSUSE Tumbleweed that includes latest kernel.org stables). AFAIK we don't enable SLAB_DEBUG even in general debug kernel flavours as it's just too slow. IIRC last time Mel evaluated switching to SLUB, it wasn't a clear winner, but I'll just CC him for details :) > Testing: > > Build kernel with `make defconfig` (on x86_64 machine) followed by `make > kvmconfig`. Then do the same and manually select SLOB. Boot both > kernels in Qemu. > > > thanks, > Tobin. > > > Tobin C. Harding (1): > mm: Remove SLAB allocator > > include/linux/slab.h | 26 - > kernel/cpu.c | 5 - > mm/slab.c | 4493 ------------------------------------------ > mm/slab.h | 31 +- > mm/slab_common.c | 20 +- > 5 files changed, 5 insertions(+), 4570 deletions(-) > delete mode 100644 mm/slab.c >
On Wed, Apr 10, 2019 at 10:02:36AM +0200, Vlastimil Babka wrote: > On 4/10/19 4:47 AM, Tobin C. Harding wrote: > > Recently a 2 year old bug was found in the SLAB allocator that crashes > > the kernel. This seems to imply that not that many people are using the > > SLAB allocator. > > AFAIK that bug required CONFIG_DEBUG_SLAB_LEAK, not just SLAB. That > seems to imply not that many people are using SLAB when debugging and > yeah, SLUB has better debugging support. But I wouldn't dare to make the > broader implication :) Point noted. > > Currently we have 3 slab allocators. Two is company three is a crowd - > > let's get rid of one. > > > > - The SLUB allocator has been the default since 2.6.23 > > Yeah, with a sophisticated reasoning :) > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a0acd820807680d2ccc4ef3448387fcdbf152c73 > > > - The SLOB allocator is kinda sexy. Its only 664 LOC, the general > > design is outlined in KnR, and there is an optimisation taken from > > Knuth - say no more. > > > > If you are using the SLAB allocator please speak now or forever hold your peace ... > > FWIW, our enterprise kernel use it (latest is 4.12 based), and openSUSE > kernels as well (with openSUSE Tumbleweed that includes latest > kernel.org stables). AFAIK we don't enable SLAB_DEBUG even in general > debug kernel flavours as it's just too slow. Ok, so that probably already kills this. Thanks for the response. No flaming, no swearing, man! and they said LKML was a harsh environment ... > IIRC last time Mel evaluated switching to SLUB, it wasn't a clear > winner, but I'll just CC him for details :) Probably don't need to take up too much of Mel's time, if we have one user in production we have to keep it, right. Thanks for your time Vlastimil. Tobin
On Wed, 10 Apr 2019, Vlastimil Babka wrote: > On 4/10/19 4:47 AM, Tobin C. Harding wrote: > > Recently a 2 year old bug was found in the SLAB allocator that crashes > > the kernel. This seems to imply that not that many people are using the > > SLAB allocator. > > AFAIK that bug required CONFIG_DEBUG_SLAB_LEAK, not just SLAB. That > seems to imply not that many people are using SLAB when debugging and > yeah, SLUB has better debugging support. But I wouldn't dare to make the > broader implication :) > > > Currently we have 3 slab allocators. Two is company three is a crowd - > > let's get rid of one. > > > > - The SLUB allocator has been the default since 2.6.23 > > Yeah, with a sophisticated reasoning :) > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a0acd820807680d2ccc4ef3448387fcdbf152c73 > > > - The SLOB allocator is kinda sexy. Its only 664 LOC, the general > > design is outlined in KnR, and there is an optimisation taken from > > Knuth - say no more. > > > > If you are using the SLAB allocator please speak now or forever hold your peace ... > > FWIW, our enterprise kernel use it (latest is 4.12 based), and openSUSE > kernels as well (with openSUSE Tumbleweed that includes latest > kernel.org stables). AFAIK we don't enable SLAB_DEBUG even in general > debug kernel flavours as it's just too slow. > > IIRC last time Mel evaluated switching to SLUB, it wasn't a clear > winner, but I'll just CC him for details :) > We also use CONFIG_SLAB and disable CONFIG_SLAB_DEBUG for the same reason.
On Wed 10-04-19 18:16:18, Tobin C. Harding wrote: > On Wed, Apr 10, 2019 at 10:02:36AM +0200, Vlastimil Babka wrote: > > On 4/10/19 4:47 AM, Tobin C. Harding wrote: > > > Recently a 2 year old bug was found in the SLAB allocator that crashes > > > the kernel. This seems to imply that not that many people are using the > > > SLAB allocator. > > > > AFAIK that bug required CONFIG_DEBUG_SLAB_LEAK, not just SLAB. That > > seems to imply not that many people are using SLAB when debugging and > > yeah, SLUB has better debugging support. But I wouldn't dare to make the > > broader implication :) > > Point noted. > > > > Currently we have 3 slab allocators. Two is company three is a crowd - > > > let's get rid of one. > > > > > > - The SLUB allocator has been the default since 2.6.23 > > > > Yeah, with a sophisticated reasoning :) > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a0acd820807680d2ccc4ef3448387fcdbf152c73 > > > > > - The SLOB allocator is kinda sexy. Its only 664 LOC, the general > > > design is outlined in KnR, and there is an optimisation taken from > > > Knuth - say no more. > > > > > > If you are using the SLAB allocator please speak now or forever hold your peace ... > > > > FWIW, our enterprise kernel use it (latest is 4.12 based), and openSUSE > > kernels as well (with openSUSE Tumbleweed that includes latest > > kernel.org stables). AFAIK we don't enable SLAB_DEBUG even in general > > debug kernel flavours as it's just too slow. > > Ok, so that probably already kills this. Thanks for the response. No > flaming, no swearing, man! and they said LKML was a harsh environment ... > > > IIRC last time Mel evaluated switching to SLUB, it wasn't a clear > > winner, but I'll just CC him for details :) > > Probably don't need to take up too much of Mel's time, if we have one > user in production we have to keep it, right. Well, I wouldn't be opposed to dropping SLAB. Especially when this is not a longterm stable kmalloc implementation anymore. It turned out that people want to push features from SLUB back to SLAB and then we are just having two featurefull allocators and double the maintenance cost. So as long as the performance gap is no longer there and the last data from Mel (I am sorry but I cannot find a link handy) suggests that there is no overall winner in benchmarks then why to keep them both? That being said, if somebody is willing to go and benchmark both allocators to confirm Mel's observations and current users of SLAB can confirm their workloads do not regress either then let's just drop it. Please please have it more rigorous then what happened when SLUB was forced to become a default
Hi, On 4/11/19 10:55 AM, Michal Hocko wrote: > Please please have it more rigorous then what happened when SLUB was > forced to become a default This is the hard part. Even if you are able to show that SLUB is as fast as SLAB for all the benchmarks you run, there's bound to be that one workload where SLUB regresses. You will then have people complaining about that (rightly so) and you're again stuck with two allocators. To move forward, I think we should look at possible *pathological* cases where we think SLAB might have an advantage. For example, SLUB had much more difficulties with remote CPU frees than SLAB. Now I don't know if this is the case, but it should be easy to construct a synthetic benchmark to measure this. For example, have a userspace process that does networking, which is often memory allocation intensive, so that we know that SKBs traverse between CPUs. You can do this by making sure that the NIC queues are mapped to CPU N (so that network softirqs have to run on that CPU) but the process is pinned to CPU M. It's, of course, worth thinking about other pathological cases too. Workloads that cause large allocations is one. Workloads that cause lots of slab cache shrinking is another. - Pekka
On Thu, Apr 11, 2019 at 09:55:56AM +0200, Michal Hocko wrote: > > > FWIW, our enterprise kernel use it (latest is 4.12 based), and openSUSE > > > kernels as well (with openSUSE Tumbleweed that includes latest > > > kernel.org stables). AFAIK we don't enable SLAB_DEBUG even in general > > > debug kernel flavours as it's just too slow. > > > > Ok, so that probably already kills this. Thanks for the response. No > > flaming, no swearing, man! and they said LKML was a harsh environment ... > > > > > IIRC last time Mel evaluated switching to SLUB, it wasn't a clear > > > winner, but I'll just CC him for details :) > > > > Probably don't need to take up too much of Mel's time, if we have one > > user in production we have to keep it, right. > > Well, I wouldn't be opposed to dropping SLAB. Especially when this is > not a longterm stable kmalloc implementation anymore. It turned out that > people want to push features from SLUB back to SLAB and then we are just > having two featurefull allocators and double the maintenance cost. > Indeed. > So as long as the performance gap is no longer there and the last data > from Mel (I am sorry but I cannot find a link handy) suggests that there > is no overall winner in benchmarks then why to keep them both? > The link isn't public. It was based on kernel 5.0 but I still haven't gotten around to doing a proper writeup. The very short summary is that with the defaults, SLUB is either performance-neutral or a win versus slab which is a big improvement over a few years ago. It's worth noting that there still is a partial relianace on it using high-order pages to get that performance. If the max order is 0 then there are cases when SLUB is a loss *but* even that is not universal. hackbench using processes and sockets to communicate seems to be the hardest hit when SLUB is not using high-order pages. This still allows the possibility that SLUB can degrade over time if the system gets badly enough fragmented and there are cases where kcompactd and fragmentation avoidance will be more active than it was relative to SLAB. Again, this is much better than it was a few years ago and I'm not aware of bug reports that point to compaction overhead due to SLUB. > That being said, if somebody is willing to go and benchmark both > allocators to confirm Mel's observations and current users of SLAB > can confirm their workloads do not regress either then let's just drop > it. > Independent verification would be nice. Of particular interest would be a real set of networking tests on a high-speed network. The hardware in the test grid I use doesn't have a fast enough network for me to draw a reliable conclusion.
On Wed, Apr 10, 2019 at 02:53:34PM -0700, David Rientjes wrote: > > FWIW, our enterprise kernel use it (latest is 4.12 based), and openSUSE > > kernels as well (with openSUSE Tumbleweed that includes latest > > kernel.org stables). AFAIK we don't enable SLAB_DEBUG even in general > > debug kernel flavours as it's just too slow. > > > > IIRC last time Mel evaluated switching to SLUB, it wasn't a clear > > winner, but I'll just CC him for details :) > > > > We also use CONFIG_SLAB and disable CONFIG_SLAB_DEBUG for the same reason. Would it be possible to re-evaluate using mainline kernel 5.0?
On Fri, 12 Apr 2019 12:28:16 +0100 Mel Gorman <mgorman@techsingularity.net> wrote: > On Wed, Apr 10, 2019 at 02:53:34PM -0700, David Rientjes wrote: > > > FWIW, our enterprise kernel use it (latest is 4.12 based), and openSUSE > > > kernels as well (with openSUSE Tumbleweed that includes latest > > > kernel.org stables). AFAIK we don't enable SLAB_DEBUG even in general > > > debug kernel flavours as it's just too slow. > > > > > > IIRC last time Mel evaluated switching to SLUB, it wasn't a clear > > > winner, but I'll just CC him for details :) > > > > > > > We also use CONFIG_SLAB and disable CONFIG_SLAB_DEBUG for the same reason. > > Would it be possible to re-evaluate using mainline kernel 5.0? I have vague memories that slab outperforms slub for some networking loads. Could the net folks please comment?
On Thu, 11 Apr 2019 11:27:26 +0300 Pekka Enberg <penberg@iki.fi> wrote: > Hi, > > On 4/11/19 10:55 AM, Michal Hocko wrote: > > Please please have it more rigorous then what happened when SLUB was > > forced to become a default > > This is the hard part. > > Even if you are able to show that SLUB is as fast as SLAB for all the > benchmarks you run, there's bound to be that one workload where SLUB > regresses. You will then have people complaining about that (rightly so) > and you're again stuck with two allocators. > > To move forward, I think we should look at possible *pathological* cases > where we think SLAB might have an advantage. For example, SLUB had much > more difficulties with remote CPU frees than SLAB. Now I don't know if > this is the case, but it should be easy to construct a synthetic > benchmark to measure this. I do think SLUB have a number of pathological cases where SLAB is faster. If was significantly more difficult to get good bulk-free performance for SLUB. SLUB is only fast as long as objects belong to the same page. To get good bulk-free performance if objects are "mixed", I coded this[1] way-too-complex fast-path code to counter act this (joined work with Alex Duyck). [1] https://github.com/torvalds/linux/blob/v5.1-rc5/mm/slub.c#L3033-L3113 > For example, have a userspace process that does networking, which is > often memory allocation intensive, so that we know that SKBs traverse > between CPUs. You can do this by making sure that the NIC queues are > mapped to CPU N (so that network softirqs have to run on that CPU) but > the process is pinned to CPU M. If someone want to test this with SKBs then be-aware that we netdev-guys have a number of optimizations where we try to counter act this. (As minimum disable TSO and GRO). It might also be possible for people to get inspired by and adapt the micro benchmarking[2] kernel modules that I wrote when developing the SLUB and SLAB optimizations: [2] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm > It's, of course, worth thinking about other pathological cases too. > Workloads that cause large allocations is one. Workloads that cause lots > of slab cache shrinking is another. I also worry about long uptimes when SLUB objects/pages gets too fragmented... as I said SLUB is only efficient when objects are returned to the same page, while SLAB is not. I did a comparison of bulk FREE performance here (where SLAB is slightly faster): Commit ca257195511d ("mm: new API kfree_bulk() for SLAB+SLUB allocators") [3] https://git.kernel.org/torvalds/c/ca257195511d You might also notice how simple the SLAB code is: Commit e6cdb58d1c83 ("slab: implement bulk free in SLAB allocator") [4] https://git.kernel.org/torvalds/c/e6cdb58d1c83
On Wed, 17 Apr 2019, Jesper Dangaard Brouer wrote: > I do think SLUB have a number of pathological cases where SLAB is > faster. If was significantly more difficult to get good bulk-free > performance for SLUB. SLUB is only fast as long as objects belong to > the same page. To get good bulk-free performance if objects are > "mixed", I coded this[1] way-too-complex fast-path code to counter > act this (joined work with Alex Duyck). Right. SLUB usually compensates for that with superior allocation performance. > > It's, of course, worth thinking about other pathological cases too. > > Workloads that cause large allocations is one. Workloads that cause lots > > of slab cache shrinking is another. > > I also worry about long uptimes when SLUB objects/pages gets too > fragmented... as I said SLUB is only efficient when objects are > returned to the same page, while SLAB is not. ??? Why would SLUB pages get more fragmented? SLUB has fragmentation prevention methods that SLAB does not have.
On Wed 17-04-19 10:50:18, Jesper Dangaard Brouer wrote: > On Thu, 11 Apr 2019 11:27:26 +0300 > Pekka Enberg <penberg@iki.fi> wrote: > > > Hi, > > > > On 4/11/19 10:55 AM, Michal Hocko wrote: > > > Please please have it more rigorous then what happened when SLUB was > > > forced to become a default > > > > This is the hard part. > > > > Even if you are able to show that SLUB is as fast as SLAB for all the > > benchmarks you run, there's bound to be that one workload where SLUB > > regresses. You will then have people complaining about that (rightly so) > > and you're again stuck with two allocators. > > > > To move forward, I think we should look at possible *pathological* cases > > where we think SLAB might have an advantage. For example, SLUB had much > > more difficulties with remote CPU frees than SLAB. Now I don't know if > > this is the case, but it should be easy to construct a synthetic > > benchmark to measure this. > > I do think SLUB have a number of pathological cases where SLAB is > faster. If was significantly more difficult to get good bulk-free > performance for SLUB. SLUB is only fast as long as objects belong to > the same page. To get good bulk-free performance if objects are > "mixed", I coded this[1] way-too-complex fast-path code to counter > act this (joined work with Alex Duyck). > > [1] https://github.com/torvalds/linux/blob/v5.1-rc5/mm/slub.c#L3033-L3113 How often is this a real problem for real workloads? > > For example, have a userspace process that does networking, which is > > often memory allocation intensive, so that we know that SKBs traverse > > between CPUs. You can do this by making sure that the NIC queues are > > mapped to CPU N (so that network softirqs have to run on that CPU) but > > the process is pinned to CPU M. > > If someone want to test this with SKBs then be-aware that we netdev-guys > have a number of optimizations where we try to counter act this. (As > minimum disable TSO and GRO). > > It might also be possible for people to get inspired by and adapt the > micro benchmarking[2] kernel modules that I wrote when developing the > SLUB and SLAB optimizations: > > [2] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm While microbenchmarks are good to see pathological behavior, I would be really interested to see some numbers for real world usecases. > > It's, of course, worth thinking about other pathological cases too. > > Workloads that cause large allocations is one. Workloads that cause lots > > of slab cache shrinking is another. > > I also worry about long uptimes when SLUB objects/pages gets too > fragmented... as I said SLUB is only efficient when objects are > returned to the same page, while SLAB is not. Is this something that has been actually measured in a real deployment?
On Wed, 17 Apr 2019 15:38:52 +0200 Michal Hocko <mhocko@kernel.org> wrote: > On Wed 17-04-19 10:50:18, Jesper Dangaard Brouer wrote: > > On Thu, 11 Apr 2019 11:27:26 +0300 > > Pekka Enberg <penberg@iki.fi> wrote: > > > > > Hi, > > > > > > On 4/11/19 10:55 AM, Michal Hocko wrote: > > > > Please please have it more rigorous then what happened when SLUB was > > > > forced to become a default > > > > > > This is the hard part. > > > > > > Even if you are able to show that SLUB is as fast as SLAB for all the > > > benchmarks you run, there's bound to be that one workload where SLUB > > > regresses. You will then have people complaining about that (rightly so) > > > and you're again stuck with two allocators. > > > > > > To move forward, I think we should look at possible *pathological* cases > > > where we think SLAB might have an advantage. For example, SLUB had much > > > more difficulties with remote CPU frees than SLAB. Now I don't know if > > > this is the case, but it should be easy to construct a synthetic > > > benchmark to measure this. > > > > I do think SLUB have a number of pathological cases where SLAB is > > faster. If was significantly more difficult to get good bulk-free > > performance for SLUB. SLUB is only fast as long as objects belong to > > the same page. To get good bulk-free performance if objects are > > "mixed", I coded this[1] way-too-complex fast-path code to counter > > act this (joined work with Alex Duyck). > > > > [1] https://github.com/torvalds/linux/blob/v5.1-rc5/mm/slub.c#L3033-L3113 > > How often is this a real problem for real workloads? First let me point out that I have a benchmark[2] that test this worse-case behavior, and micro-benchmark wise it was a big win. I did limit the "lookahead" based on this benchmark balance/bound worse-case behavior. [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test03.c#L4-L8 Second, I do think this happens for real workloads. As production systems will have many sockets where SKBs (SLAB objects) can be queued, and an unpredictable traffic pattern, that could cause this "mixed" SLAB-object from different pages. The skbuff_head_cache size is 256 and is using a order-1 page (8192/256=) 32 objects per page. Netstack bulk free mostly happens from (DMA) TX completion which have ring-sizes usually between 512 to 1024 packets, although we do limit bulk free to 64 objects. > > > For example, have a userspace process that does networking, which is > > > often memory allocation intensive, so that we know that SKBs traverse > > > between CPUs. You can do this by making sure that the NIC queues are > > > mapped to CPU N (so that network softirqs have to run on that CPU) but > > > the process is pinned to CPU M. > > > > If someone want to test this with SKBs then be-aware that we netdev-guys > > have a number of optimizations where we try to counter act this. (As > > minimum disable TSO and GRO). > > > > It might also be possible for people to get inspired by and adapt the > > micro benchmarking[2] kernel modules that I wrote when developing the > > SLUB and SLAB optimizations: > > > > [2] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm > > While microbenchmarks are good to see pathological behavior, I would be > really interested to see some numbers for real world usecases. Yes, I would love to see that too, but there is a gap between kernel developers with the knowledge to diagnose/make-sense of this, and people running production systems... (Cc Brendan Gregg) Maybe we should create some tracepoints that makes it possible to measure, e.g. how often SLUB fast-path vs slow-path is hit (or other behavior _you_ want to know about), and then create some easy to use trace-tools that sysadms can run. I bet Brendan could write some bpftrace[3] script that does this, if someone can describe what we want to measure... [3] https://github.com/iovisor/bpftrace > > > It's, of course, worth thinking about other pathological cases too. > > > Workloads that cause large allocations is one. Workloads that cause lots > > > of slab cache shrinking is another. > > > > I also worry about long uptimes when SLUB objects/pages gets too > > fragmented... as I said SLUB is only efficient when objects are > > returned to the same page, while SLAB is not. > > Is this something that has been actually measured in a real deployment? This is also something that would be interesting to have a tool for, that can answer: how fragmented are the SLUB objects in my production system(?)