Message ID | 1586192163-20099-1-git-send-email-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] mm, memcg: fix error return value of mem_cgroup_css_alloc() | expand |
On Tue, 7 Apr 2020 00:56:03 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > When I run my memcg testcase which creates lots of memcgs, I found > there're unexpected out of memory logs while there're still enough > available free memory. The error log is, > mkdir: cannot create directory 'foo.65533': Cannot allocate memory > > The reason is when we try to create more than MEM_CGROUP_ID_MAX memcgs, an > -ENOMEM errno will be set by mem_cgroup_css_alloc(), but the right errno > should be -EBUSY "Device or resource busy". That is same with > memcg_alloc_cache_id(). > > As the errno really misled me, we should make it right. After this patch, > the error log will be, > mkdir: cannot create directory 'foo.65533': Device or resource busy Thanks. Was a -stable backport considered?
On Tue, Apr 7, 2020 at 7:23 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 7 Apr 2020 00:56:03 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > > > When I run my memcg testcase which creates lots of memcgs, I found > > there're unexpected out of memory logs while there're still enough > > available free memory. The error log is, > > mkdir: cannot create directory 'foo.65533': Cannot allocate memory > > > > The reason is when we try to create more than MEM_CGROUP_ID_MAX memcgs, an > > -ENOMEM errno will be set by mem_cgroup_css_alloc(), but the right errno > > should be -EBUSY "Device or resource busy". That is same with > > memcg_alloc_cache_id(). > > > > As the errno really misled me, we should make it right. After this patch, > > the error log will be, > > mkdir: cannot create directory 'foo.65533': Device or resource busy > > Thanks. > > Was a -stable backport considered? I only backported to our kernel version 4.18, but I'm not sure whether it will apply to -stable or not. Seems this issue is introduced long time ago. I will try to backport it to -stable. Should I try it based on 5.5.y and 5.6.y only ? Or all the LTS kernel versions ? Thanks Yafang
On Tue, 7 Apr 2020 11:02:31 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > On Tue, Apr 7, 2020 at 7:23 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Tue, 7 Apr 2020 00:56:03 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > When I run my memcg testcase which creates lots of memcgs, I found > > > there're unexpected out of memory logs while there're still enough > > > available free memory. The error log is, > > > mkdir: cannot create directory 'foo.65533': Cannot allocate memory > > > > > > The reason is when we try to create more than MEM_CGROUP_ID_MAX memcgs, an > > > -ENOMEM errno will be set by mem_cgroup_css_alloc(), but the right errno > > > should be -EBUSY "Device or resource busy". That is same with > > > memcg_alloc_cache_id(). > > > > > > As the errno really misled me, we should make it right. After this patch, > > > the error log will be, > > > mkdir: cannot create directory 'foo.65533': Device or resource busy > > > > Thanks. > > > > Was a -stable backport considered? > > I only backported to our kernel version 4.18, but I'm not sure whether > it will apply to -stable or not. > Seems this issue is introduced long time ago. I will try to backport > it to -stable. > Should I try it based on 5.5.y and 5.6.y only ? Or all the LTS kernel > versions ? What I'm asking (of you and of reviewers) is whether this issue is sufficiently serious to require fixing in -stable kernels. The patch merging details can be sorted out later.
On Tue, Apr 7, 2020 at 11:09 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 7 Apr 2020 11:02:31 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > > > On Tue, Apr 7, 2020 at 7:23 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > On Tue, 7 Apr 2020 00:56:03 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > When I run my memcg testcase which creates lots of memcgs, I found > > > > there're unexpected out of memory logs while there're still enough > > > > available free memory. The error log is, > > > > mkdir: cannot create directory 'foo.65533': Cannot allocate memory > > > > > > > > The reason is when we try to create more than MEM_CGROUP_ID_MAX memcgs, an > > > > -ENOMEM errno will be set by mem_cgroup_css_alloc(), but the right errno > > > > should be -EBUSY "Device or resource busy". That is same with > > > > memcg_alloc_cache_id(). > > > > > > > > As the errno really misled me, we should make it right. After this patch, > > > > the error log will be, > > > > mkdir: cannot create directory 'foo.65533': Device or resource busy > > > > > > Thanks. > > > > > > Was a -stable backport considered? > > > > I only backported to our kernel version 4.18, but I'm not sure whether > > it will apply to -stable or not. > > Seems this issue is introduced long time ago. I will try to backport > > it to -stable. > > Should I try it based on 5.5.y and 5.6.y only ? Or all the LTS kernel > > versions ? > > What I'm asking (of you and of reviewers) is whether this issue is > sufficiently serious to require fixing in -stable kernels. The > patch merging details can be sorted out later. > > I think it is worth to cc stable, as this errno will mislead the user. Thanks Yafang
[I have only now noticed there was another version posted. Please try to wait a bit longer for other feedback before reposting a newer version] On Tue 07-04-20 11:11:13, Yafang Shao wrote: > On Tue, Apr 7, 2020 at 11:09 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Tue, 7 Apr 2020 11:02:31 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > On Tue, Apr 7, 2020 at 7:23 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > On Tue, 7 Apr 2020 00:56:03 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > When I run my memcg testcase which creates lots of memcgs, I found > > > > > there're unexpected out of memory logs while there're still enough > > > > > available free memory. The error log is, > > > > > mkdir: cannot create directory 'foo.65533': Cannot allocate memory > > > > > > > > > > The reason is when we try to create more than MEM_CGROUP_ID_MAX memcgs, an > > > > > -ENOMEM errno will be set by mem_cgroup_css_alloc(), but the right errno > > > > > should be -EBUSY "Device or resource busy". That is same with > > > > > memcg_alloc_cache_id(). See my comment about EBUSY in the previous version http://lkml.kernel.org/r/20200407063621.GA18914@dhcp22.suse.cz > > > > > As the errno really misled me, we should make it right. After this patch, > > > > > the error log will be, > > > > > mkdir: cannot create directory 'foo.65533': Device or resource busy > > > > > > > > Thanks. > > > > > > > > Was a -stable backport considered? > > > > > > I only backported to our kernel version 4.18, but I'm not sure whether > > > it will apply to -stable or not. > > > Seems this issue is introduced long time ago. I will try to backport > > > it to -stable. > > > Should I try it based on 5.5.y and 5.6.y only ? Or all the LTS kernel > > > versions ? > > > > What I'm asking (of you and of reviewers) is whether this issue is > > sufficiently serious to require fixing in -stable kernels. The > > patch merging details can be sorted out later. > > > > > > I think it is worth to cc stable, as this errno will mislead the user. We have idr failure path since 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs") which is more than four years ago without anybody ever noticing. While I do agree that the existing behavior might be confusing I am not really sure this qualifies as stable material TBH.
On Tue, Apr 7, 2020 at 2:43 PM Michal Hocko <mhocko@kernel.org> wrote: > > [I have only now noticed there was another version posted. Please try to > wait a bit longer for other feedback before reposting a newer version] > Sure I will. sorry about that. But after we send a patch, we always don't know in which state this patch is, e.g. Changes Requested, Not Applicable, Rejected, Needs Review / ACK and etc, as listed in netdev list[1]. That could give us an explicit instruction on what to do next. [1]. https://patchwork.ozlabs.org/project/netdev/list/?state=*&page=2¶m=3 > On Tue 07-04-20 11:11:13, Yafang Shao wrote: > > On Tue, Apr 7, 2020 at 11:09 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > On Tue, 7 Apr 2020 11:02:31 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > On Tue, Apr 7, 2020 at 7:23 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > > > On Tue, 7 Apr 2020 00:56:03 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > When I run my memcg testcase which creates lots of memcgs, I found > > > > > > there're unexpected out of memory logs while there're still enough > > > > > > available free memory. The error log is, > > > > > > mkdir: cannot create directory 'foo.65533': Cannot allocate memory > > > > > > > > > > > > The reason is when we try to create more than MEM_CGROUP_ID_MAX memcgs, an > > > > > > -ENOMEM errno will be set by mem_cgroup_css_alloc(), but the right errno > > > > > > should be -EBUSY "Device or resource busy". That is same with > > > > > > memcg_alloc_cache_id(). > > See my comment about EBUSY in the previous version > http://lkml.kernel.org/r/20200407063621.GA18914@dhcp22.suse.cz > From the perspective of mkdir(2), it is better to use ENOSPC here. But I'm not sure whether it is better to update the man page of mkdir(2) or not. I checked the explaination about ENOSPC in 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs") carefully, but I don't a clear idea which one is better now. Per Matthew, EBUSY is better, while per you, ENOSPC is better. Matthew, would you mind to give more detailed explanation ? > > > > > > As the errno really misled me, we should make it right. After this patch, > > > > > > the error log will be, > > > > > > mkdir: cannot create directory 'foo.65533': Device or resource busy > > > > > > > > > > Thanks. > > > > > > > > > > Was a -stable backport considered? > > > > > > > > I only backported to our kernel version 4.18, but I'm not sure whether > > > > it will apply to -stable or not. > > > > Seems this issue is introduced long time ago. I will try to backport > > > > it to -stable. > > > > Should I try it based on 5.5.y and 5.6.y only ? Or all the LTS kernel > > > > versions ? > > > > > > What I'm asking (of you and of reviewers) is whether this issue is > > > sufficiently serious to require fixing in -stable kernels. The > > > patch merging details can be sorted out later. > > > > > > > > > > I think it is worth to cc stable, as this errno will mislead the user. > > We have idr failure path since 73f576c04b94 ("mm: memcontrol: fix cgroup > creation failure after many small jobs") which is more than four years > ago without anybody ever noticing. While I do agree that the existing > behavior might be confusing I am not really sure this qualifies as > stable material TBH. OK Thanks Yafang
On Tue 07-04-20 17:31:33, Yafang Shao wrote: > On Tue, Apr 7, 2020 at 2:43 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > [I have only now noticed there was another version posted. Please try to > > wait a bit longer for other feedback before reposting a newer version] > > > > Sure I will. sorry about that. > But after we send a patch, we always don't know in which state this > patch is, e.g. Changes Requested, Not Applicable, Rejected, Needs > Review / ACK and etc, as listed in netdev list[1]. That could give us > an explicit instruction on what to do next. Yes I can see how that can be unclear or consuming at times. But a general rule of thumb I would suggest is to simply wait few days before reposting a new version. If you need to discuss an incremental change based on the review feedback you can always do that in the original email thread and repost the new version after the review feedback settles. > [1]. https://patchwork.ozlabs.org/project/netdev/list/?state=*&page=2¶m=3 > > > On Tue 07-04-20 11:11:13, Yafang Shao wrote: [...] > > See my comment about EBUSY in the previous version > > http://lkml.kernel.org/r/20200407063621.GA18914@dhcp22.suse.cz > > > > >From the perspective of mkdir(2), it is better to use ENOSPC here. But > I'm not sure whether it is better to update the man page of mkdir(2) > or not. Well, I do not know whether EBUSY missing in the man page is an omission or an intention. But adding it only for this single operation failure would sound like an overkill to me. > I checked the explaination about ENOSPC in 73f576c04b94 ("mm: > memcontrol: fix cgroup creation failure after many small jobs") > carefully, but I don't a clear idea which one is better now. The changelog simply mentioned that without the additional id tracking the ENOSPC would have been returned from elsewhere. I haven't checked but I suspect it would be from the cgroup core. This patch just didn't propagate the idr failure specifically and hid it under the ENOMEM failure path. I can only speculate why that was the case but I suspect that Johannes simply didn't consider the distinction important enough. All I am saying is that preserving the failure mode would be preferable. Especially unless there a strong reason to use EBUSY which then should be carefully explained in the changelog.
On Tue, Apr 07, 2020 at 01:10:17PM +0200, Michal Hocko wrote: > On Tue 07-04-20 17:31:33, Yafang Shao wrote: > > I checked the explaination about ENOSPC in 73f576c04b94 ("mm: > > memcontrol: fix cgroup creation failure after many small jobs") > > carefully, but I don't a clear idea which one is better now. > > The changelog simply mentioned that without the additional id tracking > the ENOSPC would have been returned from elsewhere. I haven't checked > but I suspect it would be from the cgroup core. This patch just didn't > propagate the idr failure specifically and hid it under the ENOMEM > failure path. I can only speculate why that was the case but I suspect > that Johannes simply didn't consider the distinction important enough. The old -ENOSPC came directly from the memcg css online callback: - if (css->id > MEM_CGROUP_ID_MAX) - return -ENOSPC; And it only became a problem because, on big memory machines (128+G) with a high rate of short-lived jobs, lazily freed cgroups piled up over the course of multiple days and clogged up the ID space. Nobody actually tried to create 64k user-visible cgroups concurrently. It's hard to imagine any machine running that many meaningfully distinct memory consumers in parallel. So I didn't think (and still don't tbh) it matters all that much in practice what we return here. I'm not against changing it back to -ENOSPC. And I agree with Michal it might be better than -EBUSY because of the mkdir() interface. Given how unlikely this is to affect real setups, I don't think this patch is stable material.
On Tue, 7 Apr 2020 14:10:12 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote: > Given how unlikely this is to affect real setups, I don't think this > patch is stable material. OK, thanks, done. I didn't notice any acks - is this patch otherwise good to go upstream?
On Wed 08-04-20 18:29:56, Andrew Morton wrote: > On Tue, 7 Apr 2020 14:10:12 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote: > > > Given how unlikely this is to affect real setups, I don't think this > > patch is stable material. > > OK, thanks, done. > > I didn't notice any acks - is this patch otherwise good to go upstream? I will ack it if s@EBUSY@ENOSPC@
On Thu, Apr 9, 2020 at 2:57 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Wed 08-04-20 18:29:56, Andrew Morton wrote: > > On Tue, 7 Apr 2020 14:10:12 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote: > > > > > Given how unlikely this is to affect real setups, I don't think this > > > patch is stable material. > > > > OK, thanks, done. > > > > I didn't notice any acks - is this patch otherwise good to go upstream? > > I will ack it if s@EBUSY@ENOSPC@ > BTW, there's no EBUSY in man page of write(2) neither. But when we echo some procs into cgroup.subtree_control of a non-root cgroup, it will return EBUSY. See also cgroup_subtree_control_write(). Pls. correct me if I missed something. We have to admit that the man page is out of date. Thanks Yafang
On Thu 09-04-20 21:59:42, Yafang Shao wrote: > On Thu, Apr 9, 2020 at 2:57 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Wed 08-04-20 18:29:56, Andrew Morton wrote: > > > On Tue, 7 Apr 2020 14:10:12 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote: > > > > > > > Given how unlikely this is to affect real setups, I don't think this > > > > patch is stable material. > > > > > > OK, thanks, done. > > > > > > I didn't notice any acks - is this patch otherwise good to go upstream? > > > > I will ack it if s@EBUSY@ENOSPC@ > > > > BTW, there's no EBUSY in man page of write(2) neither. > But when we echo some procs into cgroup.subtree_control of a non-root > cgroup, it will return EBUSY. See also cgroup_subtree_control_write(). > Pls. correct me if I missed something. > > We have to admit that the man page is out of date. Or the code is returning something unexpected but nobody has noticed so far. Please talk to cgroup maintainers on how to address that.
On Thu, 9 Apr 2020 16:07:29 +0200 Michal Hocko <mhocko@kernel.org> wrote: > On Thu 09-04-20 21:59:42, Yafang Shao wrote: > > On Thu, Apr 9, 2020 at 2:57 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > On Wed 08-04-20 18:29:56, Andrew Morton wrote: > > > > On Tue, 7 Apr 2020 14:10:12 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote: > > > > > > > > > Given how unlikely this is to affect real setups, I don't think this > > > > > patch is stable material. > > > > > > > > OK, thanks, done. > > > > > > > > I didn't notice any acks - is this patch otherwise good to go upstream? > > > > > > I will ack it if s@EBUSY@ENOSPC@ > > > > > > > BTW, there's no EBUSY in man page of write(2) neither. > > But when we echo some procs into cgroup.subtree_control of a non-root > > cgroup, it will return EBUSY. See also cgroup_subtree_control_write(). > > Pls. correct me if I missed something. > > > > We have to admit that the man page is out of date. > > Or the code is returning something unexpected but nobody has noticed so > far. Please talk to cgroup maintainers on how to address that. I agree with your ENOSPC reasoning in http://lkml.kernel.org/r/20200407063621.GA18914@dhcp22.suse.cz That means this change, methinks: --- a/mm/memcontrol.c~mm-memcg-fix-error-return-value-of-mem_cgroup_css_alloc-fix +++ a/mm/memcontrol.c @@ -2721,8 +2721,6 @@ static int memcg_alloc_cache_id(void) id = ida_simple_get(&memcg_cache_ida, 0, MEMCG_CACHES_MAX_SIZE, GFP_KERNEL); - if (id == -ENOSPC) - return -EBUSY; if (id < 0) return id; @@ -5004,10 +5002,6 @@ static struct mem_cgroup *mem_cgroup_all memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL, 1, MEM_CGROUP_ID_MAX, GFP_KERNEL); - if (memcg->id.id == -ENOSPC) { - error = -EBUSY; - goto fail; - } if (memcg->id.id < 0) { error = memcg->id.id; goto fail;
On Mon, Apr 20, 2020 at 04:44:37PM -0700, Andrew Morton wrote: > From: Yafang Shao <laoar.shao@gmail.com> > Subject: mm, memcg: fix error return value of mem_cgroup_css_alloc() > > When I run my memcg testcase which creates lots of memcgs, I found > there're unexpected out of memory logs while there're still enough > available free memory. The error log is, mkdir: cannot create directory > 'foo.65533': Cannot allocate memory > > The reason is when we try to create more than MEM_CGROUP_ID_MAX memcgs, an > -ENOMEM errno will be set by mem_cgroup_css_alloc(), but the right errno > should be -ENOSPC "No space left on device", which is an appropriate errno > for userspace's failed mkdir. > > As the errno really misled me, we should make it right. After this patch, > the error log will be "mkdir: cannot create directory 'foo.65533': No > space left on device" > > [akpm@linux-foundation.org: s/EBUSY/ENOSPC/, per Michal] > Link: http://lkml.kernel.org/r/20200407063621.GA18914@dhcp22.suse.cz > Link: http://lkml.kernel.org/r/1586192163-20099-1-git-send-email-laoar.shao@gmail.com > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > Suggested-by: Matthew Wilcox <willy@infradead.org> > Acked-by: Michal Hocko <mhocko@kernel.org> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Vladimir Davydov <vdavydov.dev@gmail.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Fixes: 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs") Acked-by: Johannes Weiner <hannes@cmpxchg.org>
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ca194864d802..042af0bc4a05 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2717,6 +2717,8 @@ static int memcg_alloc_cache_id(void) id = ida_simple_get(&memcg_cache_ida, 0, MEMCG_CACHES_MAX_SIZE, GFP_KERNEL); + if (id == -ENOSPC) + return -EBUSY; if (id < 0) return id; @@ -4986,19 +4988,26 @@ static struct mem_cgroup *mem_cgroup_alloc(void) unsigned int size; int node; int __maybe_unused i; + long error = -ENOMEM; size = sizeof(struct mem_cgroup); size += nr_node_ids * sizeof(struct mem_cgroup_per_node *); memcg = kzalloc(size, GFP_KERNEL); if (!memcg) - return NULL; + return ERR_PTR(error); memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL, 1, MEM_CGROUP_ID_MAX, GFP_KERNEL); - if (memcg->id.id < 0) + if (memcg->id.id == -ENOSPC) { + error = -EBUSY; + goto fail; + } + if (memcg->id.id < 0) { + error = memcg->id.id; goto fail; + } memcg->vmstats_local = alloc_percpu(struct memcg_vmstats_percpu); if (!memcg->vmstats_local) @@ -5042,7 +5051,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void) fail: mem_cgroup_id_remove(memcg); __mem_cgroup_free(memcg); - return NULL; + return ERR_PTR(error); } static struct cgroup_subsys_state * __ref @@ -5053,8 +5062,8 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) long error = -ENOMEM; memcg = mem_cgroup_alloc(); - if (!memcg) - return ERR_PTR(error); + if (IS_ERR(memcg)) + return ERR_CAST(memcg); WRITE_ONCE(memcg->high, PAGE_COUNTER_MAX); memcg->soft_limit = PAGE_COUNTER_MAX; @@ -5104,7 +5113,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) fail: mem_cgroup_id_remove(memcg); mem_cgroup_free(memcg); - return ERR_PTR(-ENOMEM); + return ERR_PTR(error); } static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
When I run my memcg testcase which creates lots of memcgs, I found there're unexpected out of memory logs while there're still enough available free memory. The error log is, mkdir: cannot create directory 'foo.65533': Cannot allocate memory The reason is when we try to create more than MEM_CGROUP_ID_MAX memcgs, an -ENOMEM errno will be set by mem_cgroup_css_alloc(), but the right errno should be -EBUSY "Device or resource busy". That is same with memcg_alloc_cache_id(). As the errno really misled me, we should make it right. After this patch, the error log will be, mkdir: cannot create directory 'foo.65533': Device or resource busy Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> --- mm/memcontrol.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)