Message ID | 153320759911.18959.8842396230157677671.stgit@localhost.localdomain (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: Move check for SHRINKER_NUMA_AWARE to do_shrink_slab() | expand |
On Thu, Aug 2, 2018 at 4:00 AM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote: > In case of shrink_slab_memcg() we do not zero nid, when shrinker > is not numa-aware. This is not a real problem, since currently > all memcg-aware shrinkers are numa-aware too (we have two: Actually, this is not true. huge_zero_page_shrinker is NOT numa-aware. deferred_split_shrinker is numa-aware. Thanks, Yang > super_block shrinker and workingset shrinker), but something may > change in the future. > > (Andrew, this may be merged to mm-iterate-only-over-charged-shrinkers-during-memcg-shrink_slab) > > Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> > --- > mm/vmscan.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index ea0a46166e8e..0d980e801b8a 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -455,6 +455,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, > : SHRINK_BATCH; > long scanned = 0, next_deferred; > > + if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) > + nid = 0; > + > freeable = shrinker->count_objects(shrinker, shrinkctl); > if (freeable == 0 || freeable == SHRINK_EMPTY) > return freeable; > @@ -680,9 +683,6 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, > .memcg = memcg, > }; > > - if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) > - sc.nid = 0; > - > ret = do_shrink_slab(&sc, shrinker, priority); > if (ret == SHRINK_EMPTY) > ret = 0; >
On Thu, Aug 2, 2018 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote: > > On Thu, Aug 2, 2018 at 4:00 AM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote: > > In case of shrink_slab_memcg() we do not zero nid, when shrinker > > is not numa-aware. This is not a real problem, since currently > > all memcg-aware shrinkers are numa-aware too (we have two: > > Actually, this is not true. huge_zero_page_shrinker is NOT numa-aware. > deferred_split_shrinker is numa-aware. > But both huge_zero_page_shrinker and huge_zero_page_shrinker are not memcg-aware shrinkers. I think Kirill is saying all memcg-aware shrinkers are also numa-aware shrinkers. Shakeel
On Thu, Aug 2, 2018 at 9:54 AM, Shakeel Butt <shakeelb@google.com> wrote: > On Thu, Aug 2, 2018 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote: >> >> On Thu, Aug 2, 2018 at 4:00 AM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote: >> > In case of shrink_slab_memcg() we do not zero nid, when shrinker >> > is not numa-aware. This is not a real problem, since currently >> > all memcg-aware shrinkers are numa-aware too (we have two: >> >> Actually, this is not true. huge_zero_page_shrinker is NOT numa-aware. >> deferred_split_shrinker is numa-aware. >> > > But both huge_zero_page_shrinker and huge_zero_page_shrinker are not > memcg-aware shrinkers. I think Kirill is saying all memcg-aware > shrinkers are also numa-aware shrinkers. Aha, thanks for reminding. Yes, I missed that memcg-aware part. > > Shakeel
On Thu, 02 Aug 2018 14:00:52 +0300 Kirill Tkhai <ktkhai@virtuozzo.com> wrote: > In case of shrink_slab_memcg() we do not zero nid, when shrinker > is not numa-aware. This is not a real problem, since currently > all memcg-aware shrinkers are numa-aware too (we have two: > super_block shrinker and workingset shrinker), but something may > change in the future. Fair enough. > (Andrew, this may be merged to mm-iterate-only-over-charged-shrinkers-during-memcg-shrink_slab) It got a bit messy so I got lazy and queued it as a separate patch. btw, I have a note that https://lkml.org/lkml/2018/7/7/32 was caused by this patch series. Is that the case and do you know if this was addressed?
On 02.08.2018 20:26, Yang Shi wrote: > On Thu, Aug 2, 2018 at 9:54 AM, Shakeel Butt <shakeelb@google.com> wrote: >> On Thu, Aug 2, 2018 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote: >>> >>> On Thu, Aug 2, 2018 at 4:00 AM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote: >>>> In case of shrink_slab_memcg() we do not zero nid, when shrinker >>>> is not numa-aware. This is not a real problem, since currently >>>> all memcg-aware shrinkers are numa-aware too (we have two: >>> >>> Actually, this is not true. huge_zero_page_shrinker is NOT numa-aware. >>> deferred_split_shrinker is numa-aware. >>> >> >> But both huge_zero_page_shrinker and huge_zero_page_shrinker are not >> memcg-aware shrinkers. I think Kirill is saying all memcg-aware >> shrinkers are also numa-aware shrinkers. > > Aha, thanks for reminding. Yes, I missed that memcg-aware part. Yes, I mean workingset_shadow_shrinker.
On 02.08.2018 23:47, Andrew Morton wrote: > On Thu, 02 Aug 2018 14:00:52 +0300 Kirill Tkhai <ktkhai@virtuozzo.com> wrote: > >> In case of shrink_slab_memcg() we do not zero nid, when shrinker >> is not numa-aware. This is not a real problem, since currently >> all memcg-aware shrinkers are numa-aware too (we have two: >> super_block shrinker and workingset shrinker), but something may >> change in the future. > > Fair enough. > >> (Andrew, this may be merged to mm-iterate-only-over-charged-shrinkers-during-memcg-shrink_slab) > > It got a bit messy so I got lazy and queued it as a separate patch. > > btw, I have a note that https://lkml.org/lkml/2018/7/7/32 was caused by > this patch series. Is that the case and do you know if this was > addressed? It's not related to the patchset. Bisect leads to: commit c6aeb9d4c351 (HEAD, refs/bisect/bad) Author: David Howells <dhowells@redhat.com> Date: Sun Jun 24 00:20:10 2018 +0100 kernfs, sysfs, cgroup, intel_rdt: Support fs_context CC David. David, please see reproducer at https://lkml.org/lkml/2018/7/7/32 Kirill
The reproducer can be reduced to: #define _GNU_SOURCE #include <endian.h> #include <stdint.h> #include <string.h> #include <stdio.h> #include <sys/syscall.h> #include <sys/stat.h> #include <sys/mount.h> #include <unistd.h> #include <fcntl.h> const char path[] = "./file0"; int main() { mkdir(path, 0); mount(path, path, "cgroup2", 0, 0); chroot(path); umount2(path, 0); return 0; } and I've found two bugs (see attached patch). The issue is that do_remount_sb() is called with fc == NULL from umount(), but both cgroup_reconfigure() and do_remount_sb() dereference fc unconditionally. But! I'm not sure why the reproducer works at all because the umount2() call is *after* the chroot, so should fail on ENOENT before it even gets that far. In fact, umount2() can be called multiple times, apparently successfully, and doesn't actually unmount anything. David --- diff --git a/fs/super.c b/fs/super.c index 3fe5d12b7697..321fbc244570 100644 --- a/fs/super.c +++ b/fs/super.c @@ -978,7 +978,10 @@ int do_remount_sb(struct super_block *sb, int sb_flags, void *data, sb->s_op->remount_fs) { if (sb->s_op->reconfigure) { retval = sb->s_op->reconfigure(sb, fc); - sb_flags = fc->sb_flags; + if (fc) + sb_flags = fc->sb_flags; + else + sb_flags = sb->s_flags; if (retval == 0) security_sb_reconfigure(fc); } else { diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index f3238f38d152..48275fdce053 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1796,9 +1796,11 @@ static void apply_cgroup_root_flags(unsigned int root_flags) static int cgroup_reconfigure(struct kernfs_root *kf_root, struct fs_context *fc) { - struct cgroup_fs_context *ctx = cgroup_fc2context(fc); + if (fc) { + struct cgroup_fs_context *ctx = cgroup_fc2context(fc); - apply_cgroup_root_flags(ctx->flags); + apply_cgroup_root_flags(ctx->flags); + } return 0; }
On 03.08.2018 13:31, David Howells wrote: > The reproducer can be reduced to: > > #define _GNU_SOURCE > #include <endian.h> > #include <stdint.h> > #include <string.h> > #include <stdio.h> > #include <sys/syscall.h> > #include <sys/stat.h> > #include <sys/mount.h> > #include <unistd.h> > #include <fcntl.h> > > const char path[] = "./file0"; > > int main() > { > mkdir(path, 0); > mount(path, path, "cgroup2", 0, 0); > chroot(path); > umount2(path, 0); > return 0; > } > > and I've found two bugs (see attached patch). The issue is that > do_remount_sb() is called with fc == NULL from umount(), but both > cgroup_reconfigure() and do_remount_sb() dereference fc unconditionally. > > But! I'm not sure why the reproducer works at all because the umount2() call > is *after* the chroot, so should fail on ENOENT before it even gets that far. > In fact, umount2() can be called multiple times, apparently successfully, and > doesn't actually unmount anything. Before I also try to check why it works; just reporting you that the patch works the problem in my environment. Thanks, David. > --- > diff --git a/fs/super.c b/fs/super.c > index 3fe5d12b7697..321fbc244570 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -978,7 +978,10 @@ int do_remount_sb(struct super_block *sb, int sb_flags, void *data, > sb->s_op->remount_fs) { > if (sb->s_op->reconfigure) { > retval = sb->s_op->reconfigure(sb, fc); > - sb_flags = fc->sb_flags; > + if (fc) > + sb_flags = fc->sb_flags; > + else > + sb_flags = sb->s_flags; > if (retval == 0) > security_sb_reconfigure(fc); > } else { > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > index f3238f38d152..48275fdce053 100644 > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -1796,9 +1796,11 @@ static void apply_cgroup_root_flags(unsigned int root_flags) > > static int cgroup_reconfigure(struct kernfs_root *kf_root, struct fs_context *fc) > { > - struct cgroup_fs_context *ctx = cgroup_fc2context(fc); > + if (fc) { > + struct cgroup_fs_context *ctx = cgroup_fc2context(fc); > > - apply_cgroup_root_flags(ctx->flags); > + apply_cgroup_root_flags(ctx->flags); > + } > return 0; > } > >
On 03.08.2018 13:59, Kirill Tkhai wrote: > On 03.08.2018 13:31, David Howells wrote: >> The reproducer can be reduced to: >> >> #define _GNU_SOURCE >> #include <endian.h> >> #include <stdint.h> >> #include <string.h> >> #include <stdio.h> >> #include <sys/syscall.h> >> #include <sys/stat.h> >> #include <sys/mount.h> >> #include <unistd.h> >> #include <fcntl.h> >> >> const char path[] = "./file0"; >> >> int main() >> { >> mkdir(path, 0); >> mount(path, path, "cgroup2", 0, 0); >> chroot(path); >> umount2(path, 0); >> return 0; >> } >> >> and I've found two bugs (see attached patch). The issue is that >> do_remount_sb() is called with fc == NULL from umount(), but both >> cgroup_reconfigure() and do_remount_sb() dereference fc unconditionally. >> >> But! I'm not sure why the reproducer works at all because the umount2() call >> is *after* the chroot, so should fail on ENOENT before it even gets that far. >> In fact, umount2() can be called multiple times, apparently successfully, and >> doesn't actually unmount anything. > > Before I also try to check why it works; just reporting you that the patch > works the problem in my environment. Thanks, David. patch *fixes* the problem > >> --- >> diff --git a/fs/super.c b/fs/super.c >> index 3fe5d12b7697..321fbc244570 100644 >> --- a/fs/super.c >> +++ b/fs/super.c >> @@ -978,7 +978,10 @@ int do_remount_sb(struct super_block *sb, int sb_flags, void *data, >> sb->s_op->remount_fs) { >> if (sb->s_op->reconfigure) { >> retval = sb->s_op->reconfigure(sb, fc); >> - sb_flags = fc->sb_flags; >> + if (fc) >> + sb_flags = fc->sb_flags; >> + else >> + sb_flags = sb->s_flags; >> if (retval == 0) >> security_sb_reconfigure(fc); >> } else { >> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c >> index f3238f38d152..48275fdce053 100644 >> --- a/kernel/cgroup/cgroup.c >> +++ b/kernel/cgroup/cgroup.c >> @@ -1796,9 +1796,11 @@ static void apply_cgroup_root_flags(unsigned int root_flags) >> >> static int cgroup_reconfigure(struct kernfs_root *kf_root, struct fs_context *fc) >> { >> - struct cgroup_fs_context *ctx = cgroup_fc2context(fc); >> + if (fc) { >> + struct cgroup_fs_context *ctx = cgroup_fc2context(fc); >> >> - apply_cgroup_root_flags(ctx->flags); >> + apply_cgroup_root_flags(ctx->flags); >> + } >> return 0; >> } >> >>
David Howells <dhowells@redhat.com> wrote: > But! I'm not sure why the reproducer works at all because the umount2() call > is *after* the chroot, so should fail on ENOENT before it even gets that > far. No, it shouldn't. It did chroot() not chdir(). > In fact, umount2() can be called multiple times, apparently successfully, and > doesn't actually unmount anything. Okay, because it chroot'd into the directory. Should it return EBUSY though? David
Kirill Tkhai <ktkhai@virtuozzo.com> wrote: > > Before I also try to check why it works; just reporting you that the patch > > works the problem in my environment. Thanks, David. > > patch *fixes* the problem Thanks. I've folded the patch in. David
diff --git a/mm/vmscan.c b/mm/vmscan.c index ea0a46166e8e..0d980e801b8a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -455,6 +455,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, : SHRINK_BATCH; long scanned = 0, next_deferred; + if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) + nid = 0; + freeable = shrinker->count_objects(shrinker, shrinkctl); if (freeable == 0 || freeable == SHRINK_EMPTY) return freeable; @@ -680,9 +683,6 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, .memcg = memcg, }; - if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) - sc.nid = 0; - ret = do_shrink_slab(&sc, shrinker, priority); if (ret == SHRINK_EMPTY) ret = 0;
In case of shrink_slab_memcg() we do not zero nid, when shrinker is not numa-aware. This is not a real problem, since currently all memcg-aware shrinkers are numa-aware too (we have two: super_block shrinker and workingset shrinker), but something may change in the future. (Andrew, this may be merged to mm-iterate-only-over-charged-shrinkers-during-memcg-shrink_slab) Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> --- mm/vmscan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)