Message ID | 20230831165611.2610118-4-yosryahmed@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | memcg: non-unified flushing for userspace stats | expand |
On Thu 31-08-23 16:56:10, Yosry Ahmed wrote: > Unified flushing of memcg stats keeps track of the magnitude of pending > updates, and only allows a flush if that magnitude exceeds a threshold. > It also keeps track of the time at which ratelimited flushing should be > allowed as flush_next_time. > > A non-unified flush on the root memcg has the same effect as a unified > flush, so let it help unified flushing by resetting pending updates and > kicking flush_next_time forward. Move the logic into the common > do_stats_flush() helper, and do it for all root flushes, unified or > not. I have hard time to follow why we really want/need this. Does this cause any observable changes to the behavior? > > There is a subtle change here, we reset stats_flush_threshold before a > flush rather than after a flush. This probably okay because: > > (a) For flushers: only unified flushers check stats_flush_threshold, and > those flushers skip anyway if there is another unified flush ongoing. > Having them also skip if there is an ongoing non-unified root flush is > actually more consistent. > > (b) For updaters: Resetting stats_flush_threshold early may lead to more > atomic updates of stats_flush_threshold, as we start updating it > earlier. This should not be significant in practice because we stop > updating stats_flush_threshold when it reaches the threshold anyway. If > we start early and stop early, the number of atomic updates remain the > same. The only difference is the scenario where we reset > stats_flush_threshold early, start doing atomic updates early, and then > the periodic flusher kicks in before we reach the threshold. In this > case, we will have done more atomic updates. However, since the > threshold wasn't reached, then we did not do a lot of updates anyway. > > Suggested-by: Michal Koutný <mkoutny@suse.com> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> > --- > mm/memcontrol.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 8c046feeaae7..94d5a6751a9e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -647,6 +647,11 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) > */ > static void do_stats_flush(struct mem_cgroup *memcg) > { > + /* for unified flushing, root non-unified flushing can help as well */ > + if (mem_cgroup_is_root(memcg)) { > + WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME); > + atomic_set(&stats_flush_threshold, 0); > + } > cgroup_rstat_flush(memcg->css.cgroup); > } > > @@ -665,11 +670,8 @@ static void do_unified_stats_flush(void) > atomic_xchg(&stats_unified_flush_ongoing, 1)) > return; > > - WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME); > - > do_stats_flush(root_mem_cgroup); > > - atomic_set(&stats_flush_threshold, 0); > atomic_set(&stats_unified_flush_ongoing, 0); > } > > -- > 2.42.0.rc2.253.gd59a3bf2b4-goog
Hello. On Mon, Sep 04, 2023 at 04:50:15PM +0200, Michal Hocko <mhocko@suse.com> wrote: > I have hard time to follow why we really want/need this. Does this cause > any observable changes to the behavior? Behavior change depends on how much userspace triggers the root memcg flush, from nothing to effectively offloading flushing to userspace tasks. (Theory^^^) It keeps stats_flush_threshold up to date representing global error estimate so that error-tolerant readers may save their time and it keeps the reasoning about the stats_flush_threshold effect simple. Michal
On Mon 04-09-23 17:29:14, Michal Koutny wrote: > Hello. > > On Mon, Sep 04, 2023 at 04:50:15PM +0200, Michal Hocko <mhocko@suse.com> wrote: > > I have hard time to follow why we really want/need this. Does this cause > > any observable changes to the behavior? > > Behavior change depends on how much userspace triggers the root memcg > flush, from nothing to effectively offloading flushing to userspace tasks. > (Theory^^^) > > It keeps stats_flush_threshold up to date representing global error > estimate so that error-tolerant readers may save their time and it keeps > the reasoning about the stats_flush_threshold effect simple. So it also creates an undocumented but userspace visible behavior. Something that userspace might start depending on, right?
On Mon, Sep 04, 2023 at 05:41:10PM +0200, Michal Hocko <mhocko@suse.com> wrote: > So it also creates an undocumented but userspace visible behavior. > Something that userspace might start depending on, right? Yes but - - depending on undocumented behavior is a mistake, - breaking the dependency would manifest (in the case I imagine) as a performance regression (and if there are some users, the future can allow them configuring periodic kernel flush to compensate for that). Or do you suggest these effects should be documented (that would require deeper analysis of the actual effect)? Thanks, Michal
On Tue, Sep 5, 2023 at 7:10 AM Michal Koutný <mkoutny@suse.com> wrote: > > On Mon, Sep 04, 2023 at 05:41:10PM +0200, Michal Hocko <mhocko@suse.com> wrote: > > So it also creates an undocumented but userspace visible behavior. > > Something that userspace might start depending on, right? > > Yes but - > - depending on undocumented behavior is a mistake, > - breaking the dependency would manifest (in the case I imagine) as a > performance regression (and if there are some users, the future can > allow them configuring periodic kernel flush to compensate for that). I think I am missing something. This change basically makes userspace readers (for the root memcg) help out unified flushers, which are in-kernel readers (e.g. reclaim) -- not the other way around. How would that create a userspace visible behavior that a dependency can be formed on? Users expecting reclaim to be faster right after reading root stats? I would guess that would be too flaky to cause a behavior that people can depend on tbh.
On Tue, Sep 05, 2023 at 08:54:46AM -0700, Yosry Ahmed <yosryahmed@google.com> wrote: > How would that create a userspace visible behavior that a dependency > can be formed on? A userspace process reading out root memory.stat more frequently than in-kernel periodic flusher. > Users expecting reclaim to be faster right after reading root stats? Yes, that is what I had in mind. > I would guess that would be too flaky to cause a behavior that people > can depend on tbh. I agree it's a weird dependency. As I wrote, nothing that would be hard to take away. Michal
On Tue 05-09-23 08:54:46, Yosry Ahmed wrote: > On Tue, Sep 5, 2023 at 7:10 AM Michal Koutný <mkoutny@suse.com> wrote: > > > > On Mon, Sep 04, 2023 at 05:41:10PM +0200, Michal Hocko <mhocko@suse.com> wrote: > > > So it also creates an undocumented but userspace visible behavior. > > > Something that userspace might start depending on, right? > > > > Yes but - > > - depending on undocumented behavior is a mistake, > > - breaking the dependency would manifest (in the case I imagine) as a > > performance regression (and if there are some users, the future can > > allow them configuring periodic kernel flush to compensate for that). > > I think I am missing something. This change basically makes userspace > readers (for the root memcg) help out unified flushers, which are > in-kernel readers (e.g. reclaim) -- not the other way around. > > How would that create a userspace visible behavior that a dependency > can be formed on? Users expecting reclaim to be faster right after > reading root stats? I would guess that would be too flaky to cause a > behavior that people can depend on tbh. Flaky or not, it might cause behavior difference and a subtle one. I can imagine nohz and similar workloads wanting to (ab)use this to reduce kernel footprint. If we really need this then make it obvious in the changelog at least.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8c046feeaae7..94d5a6751a9e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -647,6 +647,11 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) */ static void do_stats_flush(struct mem_cgroup *memcg) { + /* for unified flushing, root non-unified flushing can help as well */ + if (mem_cgroup_is_root(memcg)) { + WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME); + atomic_set(&stats_flush_threshold, 0); + } cgroup_rstat_flush(memcg->css.cgroup); } @@ -665,11 +670,8 @@ static void do_unified_stats_flush(void) atomic_xchg(&stats_unified_flush_ongoing, 1)) return; - WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME); - do_stats_flush(root_mem_cgroup); - atomic_set(&stats_flush_threshold, 0); atomic_set(&stats_unified_flush_ongoing, 0); }
Unified flushing of memcg stats keeps track of the magnitude of pending updates, and only allows a flush if that magnitude exceeds a threshold. It also keeps track of the time at which ratelimited flushing should be allowed as flush_next_time. A non-unified flush on the root memcg has the same effect as a unified flush, so let it help unified flushing by resetting pending updates and kicking flush_next_time forward. Move the logic into the common do_stats_flush() helper, and do it for all root flushes, unified or not. There is a subtle change here, we reset stats_flush_threshold before a flush rather than after a flush. This probably okay because: (a) For flushers: only unified flushers check stats_flush_threshold, and those flushers skip anyway if there is another unified flush ongoing. Having them also skip if there is an ongoing non-unified root flush is actually more consistent. (b) For updaters: Resetting stats_flush_threshold early may lead to more atomic updates of stats_flush_threshold, as we start updating it earlier. This should not be significant in practice because we stop updating stats_flush_threshold when it reaches the threshold anyway. If we start early and stop early, the number of atomic updates remain the same. The only difference is the scenario where we reset stats_flush_threshold early, start doing atomic updates early, and then the periodic flusher kicks in before we reach the threshold. In this case, we will have done more atomic updates. However, since the threshold wasn't reached, then we did not do a lot of updates anyway. Suggested-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Yosry Ahmed <yosryahmed@google.com> --- mm/memcontrol.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)