Message ID | 20210729132132.19691-1-vbabka@suse.cz (mailing list archive) |
---|---|
Headers | show |
Series | SLUB: reduce irq disabled scope and make it RT compatible | expand |
On 2021-07-29 15:20:57 [+0200], Vlastimil Babka wrote:
> Changes since v2 [5]:
With PARTIAL enabled on top of -rc3:
| root@debpg:~# grep ^kmalloc-512 /proc/slabinfo
| kmalloc-512 3552 3552 512 32 4 : tunables 0 0 0 : slabdata 111 111 0
| root@debpg:~# hackbench -g80
| Running in process mode with 80 groups using 40 file descriptors each (== 3200 tasks)
| Each sender will pass 100 messages of 100 bytes
| Time: 0.643
| root@debpg:~# grep ^kmalloc-512 /proc/slabinfo
| kmalloc-512 954080 954080 512 32 4 : tunables 0 0 0 : slabdata 29815 29815 0
| root@debpg:~# hackbench -g80
| Running in process mode with 80 groups using 40 file descriptors each (== 3200 tasks)
| Each sender will pass 100 messages of 100 bytes
| Time: 0.604
| root@debpg:~# grep ^kmalloc-512 /proc/slabinfo
| kmalloc-512 1647904 1647904 512 32 4 : tunables 0 0 0 : slabdata 51497 51497 0
| root@debpg:~# echo 1 > /sys/kernel/slab/kmalloc-512/shrink
| root@debpg:~# grep ^kmalloc-512 /proc/slabinfo
| kmalloc-512 640 1120 512 32 4 : tunables 0 0 0 : slabdata 35 35 0
otherwise a few more hackbench invocations without manual shirnk lead to
OOM-killer:
| oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,task=systemd-logind,pid=1713,uid=0
| Out of memory: Killed process 1713 (systemd-logind) total-vm:15720kB, anon-rss:956kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:72kB oom_score_adj:0
| Mem-Info:
| active_anon:56 inactive_anon:24782 isolated_anon:0
| active_file:13 inactive_file:45 isolated_file:0
| unevictable:0 dirty:0 writeback:0
| slab_reclaimable:8749 slab_unreclaimable:894017
| mapped:68 shmem:118 pagetables:28612 bounce:0
| free:8407 free_pcp:36 free_cma:0
| Node 0 active_anon:224kB inactive_anon:99128kB active_file:260kB inactive_file:712kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:764kB dirty:0kB writebaco
| Node 0 DMA free:15360kB min:28kB low:40kB high:52kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writependiB
| lowmem_reserve[]: 0 1939 3915 3915
| Node 0 DMA32 free:11696kB min:3960kB low:5944kB high:7928kB reserved_highatomic:0KB active_anon:0kB inactive_anon:40740kB active_file:0kB inactive_file:4kB unevictable:0kB
| lowmem_reserve[]: 0 0 1975 1975
| Node 0 Normal free:5692kB min:4032kB low:6052kB high:8072kB reserved_highatomic:0KB active_anon:224kB inactive_anon:58440kB active_file:440kB inactive_file:100kB unevictaB
| lowmem_reserve[]: 0 0 0 0
| Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15360kB
| Node 0 DMA32: 11*4kB (UM) 15*8kB (M) 20*16kB (UME) 12*32kB (UME) 7*64kB (ME) 5*128kB (UME) 4*256kB (UM) 6*512kB (ME) 4*1024kB (M) 1*2048kB (M) 0*4096kB = 12196kB
| Node 0 Normal: 324*4kB (UME) 221*8kB (UME) 60*16kB (UM) 24*32kB (UME) 5*64kB (UM) 2*128kB (U) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5368kB
| Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
| 189 total pagecache pages
| 0 pages in swap cache
| Swap cache stats: add 0, delete 0, find 0/0
| Free swap = 0kB
| Total swap = 0kB
| 1048432 pages RAM
| 0 pages HighMem/MovableOnly
| 41108 pages reserved
| Unreclaimable slab info:
…
| kmalloc-512 2144352KB 2144352KB
This does not happen if I disable SLUB_CPU_PARTIAL.
Sebastian
On 7/29/21 5:24 PM, Sebastian Andrzej Siewior wrote: > On 2021-07-29 15:20:57 [+0200], Vlastimil Babka wrote: >> Changes since v2 [5]: > > With PARTIAL enabled on top of -rc3: Is that also PREEMPT_RT? Interesting... > | root@debpg:~# grep ^kmalloc-512 /proc/slabinfo > | kmalloc-512 3552 3552 512 32 4 : tunables 0 0 0 : slabdata 111 111 0 > | root@debpg:~# hackbench -g80 > | Running in process mode with 80 groups using 40 file descriptors each (== 3200 tasks) > | Each sender will pass 100 messages of 100 bytes > | Time: 0.643 > | root@debpg:~# grep ^kmalloc-512 /proc/slabinfo > | kmalloc-512 954080 954080 512 32 4 : tunables 0 0 0 : slabdata 29815 29815 0 > | root@debpg:~# hackbench -g80 > | Running in process mode with 80 groups using 40 file descriptors each (== 3200 tasks) > | Each sender will pass 100 messages of 100 bytes > | Time: 0.604 > | root@debpg:~# grep ^kmalloc-512 /proc/slabinfo > | kmalloc-512 1647904 1647904 512 32 4 : tunables 0 0 0 : slabdata 51497 51497 0 > | root@debpg:~# echo 1 > /sys/kernel/slab/kmalloc-512/shrink > | root@debpg:~# grep ^kmalloc-512 /proc/slabinfo > | kmalloc-512 640 1120 512 32 4 : tunables 0 0 0 : slabdata 35 35 0 > > otherwise a few more hackbench invocations without manual shirnk lead to > OOM-killer: > | oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,task=systemd-logind,pid=1713,uid=0 > | Out of memory: Killed process 1713 (systemd-logind) total-vm:15720kB, anon-rss:956kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:72kB oom_score_adj:0 > | Mem-Info: > | active_anon:56 inactive_anon:24782 isolated_anon:0 > | active_file:13 inactive_file:45 isolated_file:0 > | unevictable:0 dirty:0 writeback:0 > | slab_reclaimable:8749 slab_unreclaimable:894017 > | mapped:68 shmem:118 pagetables:28612 bounce:0 > | free:8407 free_pcp:36 free_cma:0 > | Node 0 active_anon:224kB inactive_anon:99128kB active_file:260kB inactive_file:712kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:764kB dirty:0kB writebaco > | Node 0 DMA free:15360kB min:28kB low:40kB high:52kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writependiB > | lowmem_reserve[]: 0 1939 3915 3915 > | Node 0 DMA32 free:11696kB min:3960kB low:5944kB high:7928kB reserved_highatomic:0KB active_anon:0kB inactive_anon:40740kB active_file:0kB inactive_file:4kB unevictable:0kB > | lowmem_reserve[]: 0 0 1975 1975 > | Node 0 Normal free:5692kB min:4032kB low:6052kB high:8072kB reserved_highatomic:0KB active_anon:224kB inactive_anon:58440kB active_file:440kB inactive_file:100kB unevictaB > | lowmem_reserve[]: 0 0 0 0 > | Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15360kB > | Node 0 DMA32: 11*4kB (UM) 15*8kB (M) 20*16kB (UME) 12*32kB (UME) 7*64kB (ME) 5*128kB (UME) 4*256kB (UM) 6*512kB (ME) 4*1024kB (M) 1*2048kB (M) 0*4096kB = 12196kB > | Node 0 Normal: 324*4kB (UME) 221*8kB (UME) 60*16kB (UM) 24*32kB (UME) 5*64kB (UM) 2*128kB (U) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5368kB > | Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > | 189 total pagecache pages > | 0 pages in swap cache > | Swap cache stats: add 0, delete 0, find 0/0 > | Free swap = 0kB > | Total swap = 0kB > | 1048432 pages RAM > | 0 pages HighMem/MovableOnly > | 41108 pages reserved > | Unreclaimable slab info: > … > | kmalloc-512 2144352KB 2144352KB > > This does not happen if I disable SLUB_CPU_PARTIAL. > > Sebastian >
On 2021-07-29 17:27:18 [+0200], Vlastimil Babka wrote: > On 7/29/21 5:24 PM, Sebastian Andrzej Siewior wrote: > > On 2021-07-29 15:20:57 [+0200], Vlastimil Babka wrote: > >> Changes since v2 [5]: > > > > With PARTIAL enabled on top of -rc3: > > Is that also PREEMPT_RT? Interesting... No, plain -rc3. Sebastian
On 2021-07-29 17:29:02 [+0200], To Vlastimil Babka wrote: > On 2021-07-29 17:27:18 [+0200], Vlastimil Babka wrote: > > On 7/29/21 5:24 PM, Sebastian Andrzej Siewior wrote: > > > On 2021-07-29 15:20:57 [+0200], Vlastimil Babka wrote: > > >> Changes since v2 [5]: > > > > > > With PARTIAL enabled on top of -rc3: > > > > Is that also PREEMPT_RT? Interesting... > > No, plain -rc3. but it also happens with PREEMPT_RT. Just wanted to make sure that it happens without RT before I report :) Sebastian
On 7/29/21 5:29 PM, Sebastian Andrzej Siewior wrote: > On 2021-07-29 17:27:18 [+0200], Vlastimil Babka wrote: >> On 7/29/21 5:24 PM, Sebastian Andrzej Siewior wrote: >> > On 2021-07-29 15:20:57 [+0200], Vlastimil Babka wrote: >> >> Changes since v2 [5]: >> > >> > With PARTIAL enabled on top of -rc3: >> >> Is that also PREEMPT_RT? Interesting... > > No, plain -rc3. Thanks, probably screwed up put_cpu_partial() with my cleanups, will check. > Sebastian >
On 7/29/21 3:20 PM, Vlastimil Babka wrote: > Changes since v2 [5]: > * Rebase to 5.14-rc3 > * A number of fixes to the RT parts, big thanks to Mike Galbraith for testing > and debugging! > * The largest fix is to protect kmem_cache_cpu->partial by local_lock instead > of cmpxchg tricks, which are insufficient on RT. To avoid divergence > between RT and !RT, just do it everywhere. Affected mainly patch 25 and a > new patch 33. This also addresses a theoretical race raised earlier by Jann > Horn. > * Smaller fixes reported by Sebastian Andrzej Siewior and Cyrill Gorcunov > > Changes since RFC v1 [1]: > * Addressed feedback from Christoph and Mel, added their acks. > * Finished RT conversion, adopting 2 patches from the RT tree. > * The local_lock conversion has to sacrifice lockless fathpaths on PREEMPT_RT > * Added some more cleanup patches to the front. > > This series was initially inspired by Mel's pcplist local_lock rewrite, and > also interest to better understand SLUB's locking and the new primitives and RT > variants and implications. It should make SLUB more preemption-friendly, > especially for RT, hopefully without noticeable regressions, as the fast paths > are not affected. > > Series is based on 5.14-rc3 and also available as a git branch: > https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-local-lock-v3r1 Branch with fixed memory leak in patch 33: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-local-lock-v3r2
On 2021-07-29 17:47:20 [+0200], Vlastimil Babka wrote: > Branch with fixed memory leak in patch 33: > > https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-local-lock-v3r2 This looks stable, nothing happen during the night. Sebastian
On Thu, Jul 29, 2021 at 03:20:57PM +0200, Vlastimil Babka wrote: > Changes since v2 [5]: > * Rebase to 5.14-rc3 > * A number of fixes to the RT parts, big thanks to Mike Galbraith for testing > and debugging! > * The largest fix is to protect kmem_cache_cpu->partial by local_lock instead > of cmpxchg tricks, which are insufficient on RT. To avoid divergence > between RT and !RT, just do it everywhere. Affected mainly patch 25 and a > new patch 33. This also addresses a theoretical race raised earlier by Jann > Horn. > * Smaller fixes reported by Sebastian Andrzej Siewior and Cyrill Gorcunov > > Changes since RFC v1 [1]: > * Addressed feedback from Christoph and Mel, added their acks. > * Finished RT conversion, adopting 2 patches from the RT tree. > * The local_lock conversion has to sacrifice lockless fathpaths on PREEMPT_RT > * Added some more cleanup patches to the front. > > This series was initially inspired by Mel's pcplist local_lock rewrite, and > also interest to better understand SLUB's locking and the new primitives and RT > variants and implications. It should make SLUB more preemption-friendly, > especially for RT, hopefully without noticeable regressions, as the fast paths > are not affected. > > Series is based on 5.14-rc3 and also available as a git branch: > https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-local-lock-v3r1 > FWIW, I ran a corrected version of this series through a few tests. Some small gains, no major regressions in terms of performance on a !PREEMPT_RT configuration across 6 different machines.
On 8/4/21 2:05 PM, Mel Gorman wrote: > On Thu, Jul 29, 2021 at 03:20:57PM +0200, Vlastimil Babka wrote: >> Series is based on 5.14-rc3 and also available as a git branch: >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-local-lock-v3r1 >> > > FWIW, I ran a corrected version of this series through a few tests. Some > small gains, no major regressions in terms of performance on a !PREEMPT_RT > configuration across 6 different machines. Thanks a lot, Mel!