diff mbox series

mm: Fix protection usage propagation

Message ID 20200803153231.15477-1-mhocko@kernel.org (mailing list archive)
State New, archived
Headers show
Series mm: Fix protection usage propagation | expand

Commit Message

Michal Hocko Aug. 3, 2020, 3:32 p.m. UTC
From: Michal Koutný <mkoutny@suse.com>

When workload runs in cgroups that aren't directly below root cgroup and
their parent specifies reclaim protection, it may end up ineffective.

The reason is that propagate_protected_usage() is not called in all
hierarchy up. All the protected usage is incorrectly accumulated in the
workload's parent. This means that siblings_low_usage is overestimated
and effective protection underestimated. Even though it is transitional
phenomenon (uncharge path does correct propagation and fixes the wrong
children_low_usage), it can undermine the indended protection
unexpectedly.

The fix is simply updating children_low_usage in respective ancestors
also in the charging path.

Fixes: 230671533d64 ("mm: memory.low hierarchical behavior")
Cc: stable # 4.18+
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
---

Hi,
I am sending this patch on behalf of Michal Koutny who is currently
on vacation and didn't get to post it before he left.

We have noticed this problem while seeing a swap out in a descendant of
a protected memcg (intermediate node) while the parent was conveniently
under its protection limit and the memory pressure was external
to that hierarchy. Michal has pinpointed this down to the wrong
siblings_low_usage which led to the unwanted reclaim.

I am adding my ack directly in this submission.

 mm/page_counter.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Roman Gushchin Aug. 3, 2020, 3:39 p.m. UTC | #1
On Mon, Aug 03, 2020 at 05:32:31PM +0200, Michal Hocko wrote:
> From: Michal Koutný <mkoutny@suse.com>
> 
> When workload runs in cgroups that aren't directly below root cgroup and
> their parent specifies reclaim protection, it may end up ineffective.
> 
> The reason is that propagate_protected_usage() is not called in all
> hierarchy up. All the protected usage is incorrectly accumulated in the
> workload's parent. This means that siblings_low_usage is overestimated
> and effective protection underestimated. Even though it is transitional
> phenomenon (uncharge path does correct propagation and fixes the wrong
> children_low_usage), it can undermine the indended protection
> unexpectedly.

Indeed, good catch!

> 
> The fix is simply updating children_low_usage in respective ancestors
> also in the charging path.
> 
> Fixes: 230671533d64 ("mm: memory.low hierarchical behavior")
> Cc: stable # 4.18+
> Signed-off-by: Michal Koutný <mkoutny@suse.com>
> Acked-by: Michal Hocko <mhocko@suse.com>

Acked-by: Roman Gushchin <guro@fb.com>

Thank you!

> ---
> 
> Hi,
> I am sending this patch on behalf of Michal Koutny who is currently
> on vacation and didn't get to post it before he left.
> 
> We have noticed this problem while seeing a swap out in a descendant of
> a protected memcg (intermediate node) while the parent was conveniently
> under its protection limit and the memory pressure was external
> to that hierarchy. Michal has pinpointed this down to the wrong
> siblings_low_usage which led to the unwanted reclaim.
> 
> I am adding my ack directly in this submission.
> 
>  mm/page_counter.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/page_counter.c b/mm/page_counter.c
> index c56db2d5e159..b4663844c9b3 100644
> --- a/mm/page_counter.c
> +++ b/mm/page_counter.c
> @@ -72,7 +72,7 @@ void page_counter_charge(struct page_counter *counter, unsigned long nr_pages)
>  		long new;
>  
>  		new = atomic_long_add_return(nr_pages, &c->usage);
> -		propagate_protected_usage(counter, new);
> +		propagate_protected_usage(c, new);
>  		/*
>  		 * This is indeed racy, but we can live with some
>  		 * inaccuracy in the watermark.
> @@ -116,7 +116,7 @@ bool page_counter_try_charge(struct page_counter *counter,
>  		new = atomic_long_add_return(nr_pages, &c->usage);
>  		if (new > c->max) {
>  			atomic_long_sub(nr_pages, &c->usage);
> -			propagate_protected_usage(counter, new);
> +			propagate_protected_usage(c, new);
>  			/*
>  			 * This is racy, but we can live with some
>  			 * inaccuracy in the failcnt.
> @@ -125,7 +125,7 @@ bool page_counter_try_charge(struct page_counter *counter,
>  			*fail = c;
>  			goto failed;
>  		}
> -		propagate_protected_usage(counter, new);
> +		propagate_protected_usage(c, new);
>  		/*
>  		 * Just like with failcnt, we can live with some
>  		 * inaccuracy in the watermark.
> -- 
> 2.27.0
>
diff mbox series

Patch

diff --git a/mm/page_counter.c b/mm/page_counter.c
index c56db2d5e159..b4663844c9b3 100644
--- a/mm/page_counter.c
+++ b/mm/page_counter.c
@@ -72,7 +72,7 @@  void page_counter_charge(struct page_counter *counter, unsigned long nr_pages)
 		long new;
 
 		new = atomic_long_add_return(nr_pages, &c->usage);
-		propagate_protected_usage(counter, new);
+		propagate_protected_usage(c, new);
 		/*
 		 * This is indeed racy, but we can live with some
 		 * inaccuracy in the watermark.
@@ -116,7 +116,7 @@  bool page_counter_try_charge(struct page_counter *counter,
 		new = atomic_long_add_return(nr_pages, &c->usage);
 		if (new > c->max) {
 			atomic_long_sub(nr_pages, &c->usage);
-			propagate_protected_usage(counter, new);
+			propagate_protected_usage(c, new);
 			/*
 			 * This is racy, but we can live with some
 			 * inaccuracy in the failcnt.
@@ -125,7 +125,7 @@  bool page_counter_try_charge(struct page_counter *counter,
 			*fail = c;
 			goto failed;
 		}
-		propagate_protected_usage(counter, new);
+		propagate_protected_usage(c, new);
 		/*
 		 * Just like with failcnt, we can live with some
 		 * inaccuracy in the watermark.