Message ID | 20240118184235.618164-1-shakeelb@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: writeback: ratelimit stat flush from mem_cgroup_wb_stats | expand |
On Thu, Jan 18, 2024 at 06:42:35PM +0000, Shakeel Butt wrote: > One of our workloads (Postgres 14) has regressed when migrated from 5.10 > to 6.1 upstream kernel. The regression can be reproduced by sysbench's > oltp_write_only benchmark. It seems like the always on rstat flush in > mem_cgroup_wb_stats() is causing the regression. So, rate limit that > specific rstat flush. One potential consequence would be the dirty > throttling might be decided on stale memcg stats. However from our > benchmarks and production traffic we have not observed any change in the > dirty throttling behavior of the application. > > Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
On Thu, Jan 18, 2024 at 06:42:35PM +0000, Shakeel Butt wrote: > One of our workloads (Postgres 14) has regressed when migrated from 5.10 > to 6.1 upstream kernel. The regression can be reproduced by sysbench's > oltp_write_only benchmark. It seems like the always on rstat flush in > mem_cgroup_wb_stats() is causing the regression. So, rate limit that > specific rstat flush. One potential consequence would be the dirty > throttling might be decided on stale memcg stats. However from our > benchmarks and production traffic we have not observed any change in the > dirty throttling behavior of the application. > > Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Thanks!
Hello. On Thu, Jan 18, 2024 at 06:42:35PM +0000, Shakeel Butt <shakeelb@google.com> wrote: > One of our workloads (Postgres 14) has regressed when migrated from 5.10 > to 6.1 upstream kernel. The regression can be reproduced by sysbench's > oltp_write_only benchmark. > It seems like the always on rstat flush in > mem_cgroup_wb_stats() is causing the regression. Is the affected benchmark running in a non-root cgroup? I'm asking whether this would warrant a Fixes: fd25a9e0e23b ("memcg: unify memcg stat flushing") that introduced the global flush (in v6.1) but it was removed later in 7d7ef0a4686a ("mm: memcg: restore subtree stats flushing") (so v6.8 could be possibly unaffected). Thanks, Michal
On Mon, Jan 22, 2024 at 7:20 AM Michal Koutný <mkoutny@suse.com> wrote: > > Hello. > > On Thu, Jan 18, 2024 at 06:42:35PM +0000, Shakeel Butt <shakeelb@google.com> wrote: > > One of our workloads (Postgres 14) has regressed when migrated from 5.10 > > to 6.1 upstream kernel. The regression can be reproduced by sysbench's > > oltp_write_only benchmark. > > It seems like the always on rstat flush in > > mem_cgroup_wb_stats() is causing the regression. > > Is the affected benchmark running in a non-root cgroup? > > I'm asking whether this would warrant a > Fixes: fd25a9e0e23b ("memcg: unify memcg stat flushing") > that introduced the global flush (in v6.1) but it was removed later in > 7d7ef0a4686a ("mm: memcg: restore subtree stats flushing") > (so v6.8 could be possibly unaffected). > Yes, the benchmark and the workload were running in non-root cgroups. Regarding the Fixes, please note that the regression was still there with 7d7ef0a4686a ("mm: memcg: restore subtree stats flushing"), so I would say that our first conversion to rstat infra would most probably have the issue as well which was 2d146aa3aa84 ("mm: memcontrol: switch to rstat"). So, the following fixes tag makes sense to me: Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 935f48c4d399..2474c8382e6f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4776,7 +4776,7 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); struct mem_cgroup *parent; - mem_cgroup_flush_stats(memcg); + mem_cgroup_flush_stats_ratelimited(memcg); *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY); *pwriteback = memcg_page_state(memcg, NR_WRITEBACK);
One of our workloads (Postgres 14) has regressed when migrated from 5.10 to 6.1 upstream kernel. The regression can be reproduced by sysbench's oltp_write_only benchmark. It seems like the always on rstat flush in mem_cgroup_wb_stats() is causing the regression. So, rate limit that specific rstat flush. One potential consequence would be the dirty throttling might be decided on stale memcg stats. However from our benchmarks and production traffic we have not observed any change in the dirty throttling behavior of the application. Signed-off-by: Shakeel Butt <shakeelb@google.com> --- mm/memcontrol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)