Message ID | 20181229015524.222741-1-shakeelb@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | netfilter: account ebt_table_info to kmemcg | expand |
On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > memory is already accounted to kmemcg. Do the same for ebtables. The > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > whole system from a restricted memcg, a potential DoS. What is the lifetime of these objects? Are they bound to any process? > Reported-by: syzbot+7713f3aa67be76b1552c@syzkaller.appspotmail.com > Signed-off-by: Shakeel Butt <shakeelb@google.com> > --- > net/bridge/netfilter/ebtables.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c > index 491828713e0b..5e55cef0cec3 100644 > --- a/net/bridge/netfilter/ebtables.c > +++ b/net/bridge/netfilter/ebtables.c > @@ -1137,14 +1137,16 @@ static int do_replace(struct net *net, const void __user *user, > tmp.name[sizeof(tmp.name) - 1] = 0; > > countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids; > - newinfo = vmalloc(sizeof(*newinfo) + countersize); > + newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT, > + PAGE_KERNEL); > if (!newinfo) > return -ENOMEM; > > if (countersize) > memset(newinfo->counters, 0, countersize); > > - newinfo->entries = vmalloc(tmp.entries_size); > + newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT, > + PAGE_KERNEL); > if (!newinfo->entries) { > ret = -ENOMEM; > goto free_newinfo; > -- > 2.20.1.415.g653613c723-goog >
Michal Hocko <mhocko@kernel.org> wrote: > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > > memory is already accounted to kmemcg. Do the same for ebtables. The > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > > whole system from a restricted memcg, a potential DoS. > > What is the lifetime of these objects? Are they bound to any process? No, they are not. They are free'd only when userspace requests it or the netns is destroyed.
Hi, Michal! On 29.12.2018 10:33, Michal Hocko wrote: > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: >> The [ip,ip6,arp]_tables use x_tables_info internally and the underlying >> memory is already accounted to kmemcg. Do the same for ebtables. The >> syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the >> whole system from a restricted memcg, a potential DoS. > > What is the lifetime of these objects? Are they bound to any process? These are list of ebtables rules, which may be displayed with $ebtables-save command. In case of we do not account them, a low priority container may eat all the memory and OOM killer in berserk mode will kill all the processes on machine. They are not bound to any process, but they are bound to network namespace. OOM killer does not analyze such the memory cgroup-related allocations, since it is task-aware only. Maybe we should do it namespace-aware too... Kirill >> Reported-by: syzbot+7713f3aa67be76b1552c@syzkaller.appspotmail.com >> Signed-off-by: Shakeel Butt <shakeelb@google.com> >> --- >> net/bridge/netfilter/ebtables.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c >> index 491828713e0b..5e55cef0cec3 100644 >> --- a/net/bridge/netfilter/ebtables.c >> +++ b/net/bridge/netfilter/ebtables.c >> @@ -1137,14 +1137,16 @@ static int do_replace(struct net *net, const void __user *user, >> tmp.name[sizeof(tmp.name) - 1] = 0; >> >> countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids; >> - newinfo = vmalloc(sizeof(*newinfo) + countersize); >> + newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT, >> + PAGE_KERNEL); >> if (!newinfo) >> return -ENOMEM; >> >> if (countersize) >> memset(newinfo->counters, 0, countersize); >> >> - newinfo->entries = vmalloc(tmp.entries_size); >> + newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT, >> + PAGE_KERNEL); >> if (!newinfo->entries) { >> ret = -ENOMEM; >> goto free_newinfo; >> -- >> 2.20.1.415.g653613c723-goog >> >
On Sat 29-12-18 10:52:15, Florian Westphal wrote: > Michal Hocko <mhocko@kernel.org> wrote: > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > > > memory is already accounted to kmemcg. Do the same for ebtables. The > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > > > whole system from a restricted memcg, a potential DoS. > > > > What is the lifetime of these objects? Are they bound to any process? > > No, they are not. > They are free'd only when userspace requests it or the netns is > destroyed. Then this is problematic, because the oom killer is not able to guarantee the hard limit and so the excessive memory consumption cannot be really contained. As a result the memcg will be basically useless until somebody tears down the charged objects by other means. The memcg oom killer will surely kill all the existing tasks in the cgroup and this could somehow reduce the problem. Maybe this is sufficient for some usecases but that should be properly analyzed and described in the changelog.
On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote: > > On Sat 29-12-18 10:52:15, Florian Westphal wrote: > > Michal Hocko <mhocko@kernel.org> wrote: > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > > > > memory is already accounted to kmemcg. Do the same for ebtables. The > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > > > > whole system from a restricted memcg, a potential DoS. > > > > > > What is the lifetime of these objects? Are they bound to any process? > > > > No, they are not. > > They are free'd only when userspace requests it or the netns is > > destroyed. > > Then this is problematic, because the oom killer is not able to > guarantee the hard limit and so the excessive memory consumption cannot > be really contained. As a result the memcg will be basically useless > until somebody tears down the charged objects by other means. The memcg > oom killer will surely kill all the existing tasks in the cgroup and > this could somehow reduce the problem. Maybe this is sufficient for > some usecases but that should be properly analyzed and described in the > changelog. > Can you explain why you think the memcg hard limit will not be enforced? From what I understand, the memcg oom-killer will kill the allocating processes as you have mentioned. We do force charging for very limited conditions but here the memcg oom-killer will take care of Anyways, the kernel is already charging the memory for [ip,ip6,arp]_tables and this patch adds the charging for ebtables. Without this patch, as Kirill has described and shown by syzbot, a low priority memcg can force system OOM. Shakeel
Hi Kirill, On Sat, Dec 29, 2018 at 1:52 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote: > > Hi, Michal! > > On 29.12.2018 10:33, Michal Hocko wrote: > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > >> The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > >> memory is already accounted to kmemcg. Do the same for ebtables. The > >> syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > >> whole system from a restricted memcg, a potential DoS. > > > > What is the lifetime of these objects? Are they bound to any process? > > These are list of ebtables rules, which may be displayed with $ebtables-save command. > In case of we do not account them, a low priority container may eat all the memory > and OOM killer in berserk mode will kill all the processes on machine. They are not bound > to any process, but they are bound to network namespace. > > OOM killer does not analyze such the memory cgroup-related allocations, since it > is task-aware only. Maybe we should do it namespace-aware too... This is a good idea. I am already brainstorming on a somewhat similar idea to make shmem/tmpfs files oom-killable. I will share once I have something more concrete and will think on namespace angle too. thanks, Shakeel
On Sat 29-12-18 11:34:29, Shakeel Butt wrote: > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote: > > > Michal Hocko <mhocko@kernel.org> wrote: > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > > > > > whole system from a restricted memcg, a potential DoS. > > > > > > > > What is the lifetime of these objects? Are they bound to any process? > > > > > > No, they are not. > > > They are free'd only when userspace requests it or the netns is > > > destroyed. > > > > Then this is problematic, because the oom killer is not able to > > guarantee the hard limit and so the excessive memory consumption cannot > > be really contained. As a result the memcg will be basically useless > > until somebody tears down the charged objects by other means. The memcg > > oom killer will surely kill all the existing tasks in the cgroup and > > this could somehow reduce the problem. Maybe this is sufficient for > > some usecases but that should be properly analyzed and described in the > > changelog. > > > > Can you explain why you think the memcg hard limit will not be > enforced? From what I understand, the memcg oom-killer will kill the > allocating processes as you have mentioned. We do force charging for > very limited conditions but here the memcg oom-killer will take care > of I was talking about the force charge part. Depending on a specific allocation and its life time this can gradually get us over hard limit without any bound theoretically. > Anyways, the kernel is already charging the memory for > [ip,ip6,arp]_tables and this patch adds the charging for ebtables. > Without this patch, as Kirill has described and shown by syzbot, a low > priority memcg can force system OOM. I am not opposing the patch per-se. I would just like the changelog to be more descriptive about the life time and consequences.
On Sun 30-12-18 08:45:13, Michal Hocko wrote: > On Sat 29-12-18 11:34:29, Shakeel Butt wrote: > > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote: > > > > Michal Hocko <mhocko@kernel.org> wrote: > > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The > > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > > > > > > whole system from a restricted memcg, a potential DoS. > > > > > > > > > > What is the lifetime of these objects? Are they bound to any process? > > > > > > > > No, they are not. > > > > They are free'd only when userspace requests it or the netns is > > > > destroyed. > > > > > > Then this is problematic, because the oom killer is not able to > > > guarantee the hard limit and so the excessive memory consumption cannot > > > be really contained. As a result the memcg will be basically useless > > > until somebody tears down the charged objects by other means. The memcg > > > oom killer will surely kill all the existing tasks in the cgroup and > > > this could somehow reduce the problem. Maybe this is sufficient for > > > some usecases but that should be properly analyzed and described in the > > > changelog. > > > > > > > Can you explain why you think the memcg hard limit will not be > > enforced? From what I understand, the memcg oom-killer will kill the > > allocating processes as you have mentioned. We do force charging for > > very limited conditions but here the memcg oom-killer will take care > > of > > I was talking about the force charge part. Depending on a specific > allocation and its life time this can gradually get us over hard limit > without any bound theoretically. Forgot to mention. Since b8c8a338f75e ("Revert "vmalloc: back off when the current task is killed"") there is no way to bail out from the vmalloc allocation loop so if the request is really large then the memcg oom will not help. Is that a problem here? Maybe it is time to revisit fatal_signal_pending check.
On Sun, Dec 30, 2018 at 12:00 AM Michal Hocko <mhocko@kernel.org> wrote: > > On Sun 30-12-18 08:45:13, Michal Hocko wrote: > > On Sat 29-12-18 11:34:29, Shakeel Butt wrote: > > > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote: > > > > > Michal Hocko <mhocko@kernel.org> wrote: > > > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > > > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > > > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The > > > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > > > > > > > whole system from a restricted memcg, a potential DoS. > > > > > > > > > > > > What is the lifetime of these objects? Are they bound to any process? > > > > > > > > > > No, they are not. > > > > > They are free'd only when userspace requests it or the netns is > > > > > destroyed. > > > > > > > > Then this is problematic, because the oom killer is not able to > > > > guarantee the hard limit and so the excessive memory consumption cannot > > > > be really contained. As a result the memcg will be basically useless > > > > until somebody tears down the charged objects by other means. The memcg > > > > oom killer will surely kill all the existing tasks in the cgroup and > > > > this could somehow reduce the problem. Maybe this is sufficient for > > > > some usecases but that should be properly analyzed and described in the > > > > changelog. > > > > > > > > > > Can you explain why you think the memcg hard limit will not be > > > enforced? From what I understand, the memcg oom-killer will kill the > > > allocating processes as you have mentioned. We do force charging for > > > very limited conditions but here the memcg oom-killer will take care > > > of > > > > I was talking about the force charge part. Depending on a specific > > allocation and its life time this can gradually get us over hard limit > > without any bound theoretically. > > Forgot to mention. Since b8c8a338f75e ("Revert "vmalloc: back off when > the current task is killed"") there is no way to bail out from the > vmalloc allocation loop so if the request is really large then the memcg > oom will not help. Is that a problem here? > Yes, I think it will be an issue here. > Maybe it is time to revisit fatal_signal_pending check. Yes, we will need something to handle the memcg OOM. I will think more on that front or if you have any ideas, please do propose. thanks, Shakeel
On Sat, Dec 29, 2018 at 11:45 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Sat 29-12-18 11:34:29, Shakeel Butt wrote: > > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote: > > > > Michal Hocko <mhocko@kernel.org> wrote: > > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The > > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > > > > > > whole system from a restricted memcg, a potential DoS. > > > > > > > > > > What is the lifetime of these objects? Are they bound to any process? > > > > > > > > No, they are not. > > > > They are free'd only when userspace requests it or the netns is > > > > destroyed. > > > > > > Then this is problematic, because the oom killer is not able to > > > guarantee the hard limit and so the excessive memory consumption cannot > > > be really contained. As a result the memcg will be basically useless > > > until somebody tears down the charged objects by other means. The memcg > > > oom killer will surely kill all the existing tasks in the cgroup and > > > this could somehow reduce the problem. Maybe this is sufficient for > > > some usecases but that should be properly analyzed and described in the > > > changelog. > > > > > > > Can you explain why you think the memcg hard limit will not be > > enforced? From what I understand, the memcg oom-killer will kill the > > allocating processes as you have mentioned. We do force charging for > > very limited conditions but here the memcg oom-killer will take care > > of > > I was talking about the force charge part. Depending on a specific > allocation and its life time this can gradually get us over hard limit > without any bound theoretically. > > > Anyways, the kernel is already charging the memory for > > [ip,ip6,arp]_tables and this patch adds the charging for ebtables. > > Without this patch, as Kirill has described and shown by syzbot, a low > > priority memcg can force system OOM. > > I am not opposing the patch per-se. I would just like the changelog to > be more descriptive about the life time and consequences. > -- I will resend the patch with more detailed change log. thanks, Shakeel
On Sun 30-12-18 19:59:53, Shakeel Butt wrote: > On Sun, Dec 30, 2018 at 12:00 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Sun 30-12-18 08:45:13, Michal Hocko wrote: > > > On Sat 29-12-18 11:34:29, Shakeel Butt wrote: > > > > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote: > > > > > > Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > > > > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > > > > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The > > > > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > > > > > > > > whole system from a restricted memcg, a potential DoS. > > > > > > > > > > > > > > What is the lifetime of these objects? Are they bound to any process? > > > > > > > > > > > > No, they are not. > > > > > > They are free'd only when userspace requests it or the netns is > > > > > > destroyed. > > > > > > > > > > Then this is problematic, because the oom killer is not able to > > > > > guarantee the hard limit and so the excessive memory consumption cannot > > > > > be really contained. As a result the memcg will be basically useless > > > > > until somebody tears down the charged objects by other means. The memcg > > > > > oom killer will surely kill all the existing tasks in the cgroup and > > > > > this could somehow reduce the problem. Maybe this is sufficient for > > > > > some usecases but that should be properly analyzed and described in the > > > > > changelog. > > > > > > > > > > > > > Can you explain why you think the memcg hard limit will not be > > > > enforced? From what I understand, the memcg oom-killer will kill the > > > > allocating processes as you have mentioned. We do force charging for > > > > very limited conditions but here the memcg oom-killer will take care > > > > of > > > > > > I was talking about the force charge part. Depending on a specific > > > allocation and its life time this can gradually get us over hard limit > > > without any bound theoretically. > > > > Forgot to mention. Since b8c8a338f75e ("Revert "vmalloc: back off when > > the current task is killed"") there is no way to bail out from the > > vmalloc allocation loop so if the request is really large then the memcg > > oom will not help. Is that a problem here? > > > > Yes, I think it will be an issue here. > > > Maybe it is time to revisit fatal_signal_pending check. > > Yes, we will need something to handle the memcg OOM. I will think more > on that front or if you have any ideas, please do propose. I can see three options here: - do not force charge on memcg oom or introduce a limited charge overflow (reserves basically). - revert the revert and reintroduce the fatal_signal_pending check into vmalloc - be more specific and check tsk_is_oom_victim in vmalloc and fail
On Mon, Dec 31, 2018 at 2:12 AM Michal Hocko <mhocko@kernel.org> wrote: > > On Sun 30-12-18 19:59:53, Shakeel Butt wrote: > > On Sun, Dec 30, 2018 at 12:00 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > On Sun 30-12-18 08:45:13, Michal Hocko wrote: > > > > On Sat 29-12-18 11:34:29, Shakeel Butt wrote: > > > > > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > > > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote: > > > > > > > Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > > > > > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > > > > > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The > > > > > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > > > > > > > > > whole system from a restricted memcg, a potential DoS. > > > > > > > > > > > > > > > > What is the lifetime of these objects? Are they bound to any process? > > > > > > > > > > > > > > No, they are not. > > > > > > > They are free'd only when userspace requests it or the netns is > > > > > > > destroyed. > > > > > > > > > > > > Then this is problematic, because the oom killer is not able to > > > > > > guarantee the hard limit and so the excessive memory consumption cannot > > > > > > be really contained. As a result the memcg will be basically useless > > > > > > until somebody tears down the charged objects by other means. The memcg > > > > > > oom killer will surely kill all the existing tasks in the cgroup and > > > > > > this could somehow reduce the problem. Maybe this is sufficient for > > > > > > some usecases but that should be properly analyzed and described in the > > > > > > changelog. > > > > > > > > > > > > > > > > Can you explain why you think the memcg hard limit will not be > > > > > enforced? From what I understand, the memcg oom-killer will kill the > > > > > allocating processes as you have mentioned. We do force charging for > > > > > very limited conditions but here the memcg oom-killer will take care > > > > > of > > > > > > > > I was talking about the force charge part. Depending on a specific > > > > allocation and its life time this can gradually get us over hard limit > > > > without any bound theoretically. > > > > > > Forgot to mention. Since b8c8a338f75e ("Revert "vmalloc: back off when > > > the current task is killed"") there is no way to bail out from the > > > vmalloc allocation loop so if the request is really large then the memcg > > > oom will not help. Is that a problem here? > > > > > > > Yes, I think it will be an issue here. > > > > > Maybe it is time to revisit fatal_signal_pending check. > > > > Yes, we will need something to handle the memcg OOM. I will think more > > on that front or if you have any ideas, please do propose. > > I can see three options here: > - do not force charge on memcg oom or introduce a limited charge > overflow (reserves basically). > - revert the revert and reintroduce the fatal_signal_pending > check into vmalloc > - be more specific and check tsk_is_oom_victim in vmalloc and > fail > I think for the long term solution we might need something similar to memcg oom reserves (1) but for quick fix I think we can do the combination of (2) and (3). Shakeel
On Thu 03-01-19 12:52:54, Shakeel Butt wrote: > On Mon, Dec 31, 2018 at 2:12 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Sun 30-12-18 19:59:53, Shakeel Butt wrote: > > > On Sun, Dec 30, 2018 at 12:00 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > On Sun 30-12-18 08:45:13, Michal Hocko wrote: > > > > > On Sat 29-12-18 11:34:29, Shakeel Butt wrote: > > > > > > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > > > > > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote: > > > > > > > > Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > > > > > > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and the underlying > > > > > > > > > > memory is already accounted to kmemcg. Do the same for ebtables. The > > > > > > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the > > > > > > > > > > whole system from a restricted memcg, a potential DoS. > > > > > > > > > > > > > > > > > > What is the lifetime of these objects? Are they bound to any process? > > > > > > > > > > > > > > > > No, they are not. > > > > > > > > They are free'd only when userspace requests it or the netns is > > > > > > > > destroyed. > > > > > > > > > > > > > > Then this is problematic, because the oom killer is not able to > > > > > > > guarantee the hard limit and so the excessive memory consumption cannot > > > > > > > be really contained. As a result the memcg will be basically useless > > > > > > > until somebody tears down the charged objects by other means. The memcg > > > > > > > oom killer will surely kill all the existing tasks in the cgroup and > > > > > > > this could somehow reduce the problem. Maybe this is sufficient for > > > > > > > some usecases but that should be properly analyzed and described in the > > > > > > > changelog. > > > > > > > > > > > > > > > > > > > Can you explain why you think the memcg hard limit will not be > > > > > > enforced? From what I understand, the memcg oom-killer will kill the > > > > > > allocating processes as you have mentioned. We do force charging for > > > > > > very limited conditions but here the memcg oom-killer will take care > > > > > > of > > > > > > > > > > I was talking about the force charge part. Depending on a specific > > > > > allocation and its life time this can gradually get us over hard limit > > > > > without any bound theoretically. > > > > > > > > Forgot to mention. Since b8c8a338f75e ("Revert "vmalloc: back off when > > > > the current task is killed"") there is no way to bail out from the > > > > vmalloc allocation loop so if the request is really large then the memcg > > > > oom will not help. Is that a problem here? > > > > > > > > > > Yes, I think it will be an issue here. > > > > > > > Maybe it is time to revisit fatal_signal_pending check. > > > > > > Yes, we will need something to handle the memcg OOM. I will think more > > > on that front or if you have any ideas, please do propose. > > > > I can see three options here: > > - do not force charge on memcg oom or introduce a limited charge > > overflow (reserves basically). > > - revert the revert and reintroduce the fatal_signal_pending > > check into vmalloc > > - be more specific and check tsk_is_oom_victim in vmalloc and > > fail > > > > I think for the long term solution we might need something similar to > memcg oom reserves (1) but for quick fix I think we can do the > combination of (2) and (3). Johannes argued that fatal_signal_pending is too general check for vmalloc. I would argue that we already break out of some operations on fatal signals. tsk_is_oom_victim is more subtle but much more targeted on the other hand. I do not have any strong preference to be honest but I agree that some limited reserves would be the best solution long term. I just do not have any idea how to scale those reserves to be meaningful.
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index 491828713e0b..5e55cef0cec3 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -1137,14 +1137,16 @@ static int do_replace(struct net *net, const void __user *user, tmp.name[sizeof(tmp.name) - 1] = 0; countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids; - newinfo = vmalloc(sizeof(*newinfo) + countersize); + newinfo = __vmalloc(sizeof(*newinfo) + countersize, GFP_KERNEL_ACCOUNT, + PAGE_KERNEL); if (!newinfo) return -ENOMEM; if (countersize) memset(newinfo->counters, 0, countersize); - newinfo->entries = vmalloc(tmp.entries_size); + newinfo->entries = __vmalloc(tmp.entries_size, GFP_KERNEL_ACCOUNT, + PAGE_KERNEL); if (!newinfo->entries) { ret = -ENOMEM; goto free_newinfo;
The [ip,ip6,arp]_tables use x_tables_info internally and the underlying memory is already accounted to kmemcg. Do the same for ebtables. The syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the whole system from a restricted memcg, a potential DoS. Reported-by: syzbot+7713f3aa67be76b1552c@syzkaller.appspotmail.com Signed-off-by: Shakeel Butt <shakeelb@google.com> --- net/bridge/netfilter/ebtables.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)