diff mbox series

[3/3] mm/memcg: add next_mz back if not reclaimed yet

Message ID 20220308012047.26638-3-richard.weiyang@gmail.com (mailing list archive)
State New
Headers show
Series [1/3] mm/memcg: mz already removed from rb_tree in mem_cgroup_largest_soft_limit_node() | expand

Commit Message

Wei Yang March 8, 2022, 1:20 a.m. UTC
next_mz is removed from rb_tree, let's add it back if no reclaim has
been tried.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 mm/memcontrol.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Michal Hocko March 8, 2022, 8:17 a.m. UTC | #1
On Tue 08-03-22 01:20:47, Wei Yang wrote:
> next_mz is removed from rb_tree, let's add it back if no reclaim has
> been tried.

Could you elaborate more why we need/want this?

> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> ---
>  mm/memcontrol.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 344a7e891bc5..e803ff02aae2 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3493,8 +3493,13 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
>  			loop > MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS))
>  			break;
>  	} while (!nr_reclaimed);
> -	if (next_mz)
> +	if (next_mz) {
> +		spin_lock_irq(&mctz->lock);
> +		excess = soft_limit_excess(next_mz->memcg);
> +		__mem_cgroup_insert_exceeded(next_mz, mctz, excess);
> +		spin_unlock_irq(&mctz->lock);
>  		css_put(&next_mz->memcg->css);
> +	}
>  	return nr_reclaimed;
>  }
>  
> -- 
> 2.33.1
Wei Yang March 9, 2022, 12:46 a.m. UTC | #2
On Tue, Mar 08, 2022 at 09:17:58AM +0100, Michal Hocko wrote:
>On Tue 08-03-22 01:20:47, Wei Yang wrote:
>> next_mz is removed from rb_tree, let's add it back if no reclaim has
>> been tried.
>
>Could you elaborate more why we need/want this?
>

Per my understanding, we add back the right most node even reclaim makes no
progress, so it is reasonable to add back a node if we didn't get a chance to
do reclaim on it.

It looks like we forget to add it back to the tree. Maybe Johannes know some
background why we don't add it back?
Michal Hocko March 9, 2022, 1:48 p.m. UTC | #3
[Cc Tim - the patch is http://lkml.kernel.org/r/20220308012047.26638-3-richard.weiyang@gmail.com]

On Wed 09-03-22 00:46:20, Wei Yang wrote:
> On Tue, Mar 08, 2022 at 09:17:58AM +0100, Michal Hocko wrote:
> >On Tue 08-03-22 01:20:47, Wei Yang wrote:
> >> next_mz is removed from rb_tree, let's add it back if no reclaim has
> >> been tried.
> >
> >Could you elaborate more why we need/want this?
> >
> 
> Per my understanding, we add back the right most node even reclaim makes no
> progress, so it is reasonable to add back a node if we didn't get a chance to
> do reclaim on it.

Your patch sounded familiar and I can remember now. The same fix has
been posted by Tim last year
https://lore.kernel.org/linux-mm/8d35206601ccf0e1fe021d24405b2a0c2f4e052f.1613584277.git.tim.c.chen@linux.intel.com/
It was posted with other changes to the soft limit code which I didn't
like but I have acked this particular one. Not sure what has happened
with it afterwards.
Wei Yang March 10, 2022, 1:13 a.m. UTC | #4
On Wed, Mar 09, 2022 at 02:48:45PM +0100, Michal Hocko wrote:
>[Cc Tim - the patch is http://lkml.kernel.org/r/20220308012047.26638-3-richard.weiyang@gmail.com]
>
>On Wed 09-03-22 00:46:20, Wei Yang wrote:
>> On Tue, Mar 08, 2022 at 09:17:58AM +0100, Michal Hocko wrote:
>> >On Tue 08-03-22 01:20:47, Wei Yang wrote:
>> >> next_mz is removed from rb_tree, let's add it back if no reclaim has
>> >> been tried.
>> >
>> >Could you elaborate more why we need/want this?
>> >
>> 
>> Per my understanding, we add back the right most node even reclaim makes no
>> progress, so it is reasonable to add back a node if we didn't get a chance to
>> do reclaim on it.
>
>Your patch sounded familiar and I can remember now. The same fix has
>been posted by Tim last year
>https://lore.kernel.org/linux-mm/8d35206601ccf0e1fe021d24405b2a0c2f4e052f.1613584277.git.tim.c.chen@linux.intel.com/
>It was posted with other changes to the soft limit code which I didn't
>like but I have acked this particular one. Not sure what has happened
>with it afterwards.

Because of this ?
4f09feb8bf:  vm-scalability.throughput -4.3% regression
https://lore.kernel.org/linux-mm/20210302062521.GB23892@xsang-OptiPlex-9020/

>-- 
>Michal Hocko
>SUSE Labs
Michal Hocko March 10, 2022, 8:53 a.m. UTC | #5
On Thu 10-03-22 01:13:50, Wei Yang wrote:
> On Wed, Mar 09, 2022 at 02:48:45PM +0100, Michal Hocko wrote:
> >[Cc Tim - the patch is http://lkml.kernel.org/r/20220308012047.26638-3-richard.weiyang@gmail.com]
> >
> >On Wed 09-03-22 00:46:20, Wei Yang wrote:
> >> On Tue, Mar 08, 2022 at 09:17:58AM +0100, Michal Hocko wrote:
> >> >On Tue 08-03-22 01:20:47, Wei Yang wrote:
> >> >> next_mz is removed from rb_tree, let's add it back if no reclaim has
> >> >> been tried.
> >> >
> >> >Could you elaborate more why we need/want this?
> >> >
> >> 
> >> Per my understanding, we add back the right most node even reclaim makes no
> >> progress, so it is reasonable to add back a node if we didn't get a chance to
> >> do reclaim on it.
> >
> >Your patch sounded familiar and I can remember now. The same fix has
> >been posted by Tim last year
> >https://lore.kernel.org/linux-mm/8d35206601ccf0e1fe021d24405b2a0c2f4e052f.1613584277.git.tim.c.chen@linux.intel.com/
> >It was posted with other changes to the soft limit code which I didn't
> >like but I have acked this particular one. Not sure what has happened
> >with it afterwards.
> 
> Because of this ?
> 4f09feb8bf:  vm-scalability.throughput -4.3% regression
> https://lore.kernel.org/linux-mm/20210302062521.GB23892@xsang-OptiPlex-9020/

That was a regression for a different patch in the series AFAICS:
: FYI, we noticed a -4.3% regression of vm-scalability.throughput due to commit:
: 
: commit: 4f09feb8bf083be3834080ddf3782aee12a7c3f7 ("mm: Force update of mem cgroup soft limit tree on usage excess")

That patch has played with how often memcg_check_events is called and
that can lead to a visible performance difference.
Michal Hocko March 10, 2022, 8:59 a.m. UTC | #6
On Wed 09-03-22 14:48:46, Michal Hocko wrote:
> [Cc Tim - the patch is http://lkml.kernel.org/r/20220308012047.26638-3-richard.weiyang@gmail.com]
> 
> On Wed 09-03-22 00:46:20, Wei Yang wrote:
> > On Tue, Mar 08, 2022 at 09:17:58AM +0100, Michal Hocko wrote:
> > >On Tue 08-03-22 01:20:47, Wei Yang wrote:
> > >> next_mz is removed from rb_tree, let's add it back if no reclaim has
> > >> been tried.
> > >
> > >Could you elaborate more why we need/want this?
> > >
> > 
> > Per my understanding, we add back the right most node even reclaim makes no
> > progress, so it is reasonable to add back a node if we didn't get a chance to
> > do reclaim on it.
> 
> Your patch sounded familiar and I can remember now. The same fix has
> been posted by Tim last year
> https://lore.kernel.org/linux-mm/8d35206601ccf0e1fe021d24405b2a0c2f4e052f.1613584277.git.tim.c.chen@linux.intel.com/

Btw. I forgot to mention yesterday. Whatever was the reason this has
slipped through cracks it would great if you could reuse the changelog
of the original patch which was more verbose and explicit about the
underlying problem. The only remaining part I would add is a description
of how serious the problem is. The removed memcg would be out of the
excess tree until further memory charges would get it back. But that can
take arbitrary amount of time. Whether that is a real problem would
depend on the workload of course but considering how coarse of a tool
the soft limit is it is possible that this is not something most users
would even notice.
Wei Yang March 10, 2022, 11:55 p.m. UTC | #7
On Thu, Mar 10, 2022 at 09:59:30AM +0100, Michal Hocko wrote:
>On Wed 09-03-22 14:48:46, Michal Hocko wrote:
>> [Cc Tim - the patch is http://lkml.kernel.org/r/20220308012047.26638-3-richard.weiyang@gmail.com]
>> 
>> On Wed 09-03-22 00:46:20, Wei Yang wrote:
>> > On Tue, Mar 08, 2022 at 09:17:58AM +0100, Michal Hocko wrote:
>> > >On Tue 08-03-22 01:20:47, Wei Yang wrote:
>> > >> next_mz is removed from rb_tree, let's add it back if no reclaim has
>> > >> been tried.
>> > >
>> > >Could you elaborate more why we need/want this?
>> > >
>> > 
>> > Per my understanding, we add back the right most node even reclaim makes no
>> > progress, so it is reasonable to add back a node if we didn't get a chance to
>> > do reclaim on it.
>> 
>> Your patch sounded familiar and I can remember now. The same fix has
>> been posted by Tim last year
>> https://lore.kernel.org/linux-mm/8d35206601ccf0e1fe021d24405b2a0c2f4e052f.1613584277.git.tim.c.chen@linux.intel.com/
>
>Btw. I forgot to mention yesterday. Whatever was the reason this has
>slipped through cracks it would great if you could reuse the changelog
>of the original patch which was more verbose and explicit about the
>underlying problem. The only remaining part I would add is a description
>of how serious the problem is. The removed memcg would be out of the
>excess tree until further memory charges would get it back. But that can
>take arbitrary amount of time. Whether that is a real problem would
>depend on the workload of course but considering how coarse of a tool
>the soft limit is it is possible that this is not something most users
>would even notice.

Got it, would send a v2.

>-- 
>Michal Hocko
>SUSE Labs
Wei Yang March 10, 2022, 11:57 p.m. UTC | #8
On Thu, Mar 10, 2022 at 09:53:59AM +0100, Michal Hocko wrote:
>On Thu 10-03-22 01:13:50, Wei Yang wrote:
>> On Wed, Mar 09, 2022 at 02:48:45PM +0100, Michal Hocko wrote:
>> >[Cc Tim - the patch is http://lkml.kernel.org/r/20220308012047.26638-3-richard.weiyang@gmail.com]
>> >
>> >On Wed 09-03-22 00:46:20, Wei Yang wrote:
>> >> On Tue, Mar 08, 2022 at 09:17:58AM +0100, Michal Hocko wrote:
>> >> >On Tue 08-03-22 01:20:47, Wei Yang wrote:
>> >> >> next_mz is removed from rb_tree, let's add it back if no reclaim has
>> >> >> been tried.
>> >> >
>> >> >Could you elaborate more why we need/want this?
>> >> >
>> >> 
>> >> Per my understanding, we add back the right most node even reclaim makes no
>> >> progress, so it is reasonable to add back a node if we didn't get a chance to
>> >> do reclaim on it.
>> >
>> >Your patch sounded familiar and I can remember now. The same fix has
>> >been posted by Tim last year
>> >https://lore.kernel.org/linux-mm/8d35206601ccf0e1fe021d24405b2a0c2f4e052f.1613584277.git.tim.c.chen@linux.intel.com/
>> >It was posted with other changes to the soft limit code which I didn't
>> >like but I have acked this particular one. Not sure what has happened
>> >with it afterwards.
>> 
>> Because of this ?
>> 4f09feb8bf:  vm-scalability.throughput -4.3% regression
>> https://lore.kernel.org/linux-mm/20210302062521.GB23892@xsang-OptiPlex-9020/
>
>That was a regression for a different patch in the series AFAICS:
>: FYI, we noticed a -4.3% regression of vm-scalability.throughput due to commit:
>: 
>: commit: 4f09feb8bf083be3834080ddf3782aee12a7c3f7 ("mm: Force update of mem cgroup soft limit tree on usage excess")
>
>That patch has played with how often memcg_check_events is called and
>that can lead to a visible performance difference.

Yes, I mean maybe because of this regression, the whole patch set is removed.

>-- 
>Michal Hocko
>SUSE Labs
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 344a7e891bc5..e803ff02aae2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3493,8 +3493,13 @@  unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
 			loop > MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS))
 			break;
 	} while (!nr_reclaimed);
-	if (next_mz)
+	if (next_mz) {
+		spin_lock_irq(&mctz->lock);
+		excess = soft_limit_excess(next_mz->memcg);
+		__mem_cgroup_insert_exceeded(next_mz, mctz, excess);
+		spin_unlock_irq(&mctz->lock);
 		css_put(&next_mz->memcg->css);
+	}
 	return nr_reclaimed;
 }