Message ID | 20230330191801.1967435-5-yosryahmed@google.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | memcg: avoid flushing stats atomically where possible | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
Hello. On Thu, Mar 30, 2023 at 07:17:57PM +0000, Yosry Ahmed <yosryahmed@google.com> wrote: > static void __mem_cgroup_flush_stats(void) > { > - unsigned long flag; > - > - if (!spin_trylock_irqsave(&stats_flush_lock, flag)) > + /* > + * We always flush the entire tree, so concurrent flushers can just > + * skip. This avoids a thundering herd problem on the rstat global lock > + * from memcg flushers (e.g. reclaim, refault, etc). > + */ > + if (atomic_read(&stats_flush_ongoing) || > + atomic_xchg(&stats_flush_ongoing, 1)) > return; I'm curious about why this instead of if (atomic_xchg(&stats_flush_ongoing, 1)) return; Is that some microarchitectural cleverness? Thanks, Michal
On Tue, Apr 4, 2023 at 9:53 AM Michal Koutný <mkoutny@suse.com> wrote: > > Hello. > > On Thu, Mar 30, 2023 at 07:17:57PM +0000, Yosry Ahmed <yosryahmed@google.com> wrote: > > static void __mem_cgroup_flush_stats(void) > > { > > - unsigned long flag; > > - > > - if (!spin_trylock_irqsave(&stats_flush_lock, flag)) > > + /* > > + * We always flush the entire tree, so concurrent flushers can just > > + * skip. This avoids a thundering herd problem on the rstat global lock > > + * from memcg flushers (e.g. reclaim, refault, etc). > > + */ > > + if (atomic_read(&stats_flush_ongoing) || > > + atomic_xchg(&stats_flush_ongoing, 1)) > > return; > > I'm curious about why this instead of > > if (atomic_xchg(&stats_flush_ongoing, 1)) > return; > > Is that some microarchitectural cleverness? > Yes indeed it is. Basically we want to avoid unconditional cache dirtying. This pattern is also used at other places in the kernel like qspinlock.
On Tue, Apr 4, 2023 at 10:13 AM Shakeel Butt <shakeelb@google.com> wrote: > > On Tue, Apr 4, 2023 at 9:53 AM Michal Koutný <mkoutny@suse.com> wrote: > > > > Hello. > > > > On Thu, Mar 30, 2023 at 07:17:57PM +0000, Yosry Ahmed <yosryahmed@google.com> wrote: > > > static void __mem_cgroup_flush_stats(void) > > > { > > > - unsigned long flag; > > > - > > > - if (!spin_trylock_irqsave(&stats_flush_lock, flag)) > > > + /* > > > + * We always flush the entire tree, so concurrent flushers can just > > > + * skip. This avoids a thundering herd problem on the rstat global lock > > > + * from memcg flushers (e.g. reclaim, refault, etc). > > > + */ > > > + if (atomic_read(&stats_flush_ongoing) || > > > + atomic_xchg(&stats_flush_ongoing, 1)) > > > return; > > > > I'm curious about why this instead of > > > > if (atomic_xchg(&stats_flush_ongoing, 1)) > > return; > > > > Is that some microarchitectural cleverness? > > > > Yes indeed it is. Basically we want to avoid unconditional cache > dirtying. This pattern is also used at other places in the kernel like > qspinlock. Oh also take a look at https://lore.kernel.org/all/20230404052228.15788-1-feng.tang@intel.com/
On Tue, Apr 04, 2023 at 10:21:33AM -0700, Shakeel Butt <shakeelb@google.com> wrote: > > Yes indeed it is. Basically we want to avoid unconditional cache > > dirtying. This pattern is also used at other places in the kernel like > > qspinlock. Thanks for confirmation. (I remembered the commit 873f64b791a2 ("mm/memcontrol.c: remove the redundant updating of stats_flush_threshold"). But was slightly confused why would it be open-coded every time.) > Oh also take a look at > https://lore.kernel.org/all/20230404052228.15788-1-feng.tang@intel.com/ Thanks for the link. Michal
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ff39f78f962e..65750f8b8259 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -585,8 +585,8 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) */ static void flush_memcg_stats_dwork(struct work_struct *w); static DECLARE_DEFERRABLE_WORK(stats_flush_dwork, flush_memcg_stats_dwork); -static DEFINE_SPINLOCK(stats_flush_lock); static DEFINE_PER_CPU(unsigned int, stats_updates); +static atomic_t stats_flush_ongoing = ATOMIC_INIT(0); static atomic_t stats_flush_threshold = ATOMIC_INIT(0); static u64 flush_next_time; @@ -636,15 +636,19 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) static void __mem_cgroup_flush_stats(void) { - unsigned long flag; - - if (!spin_trylock_irqsave(&stats_flush_lock, flag)) + /* + * We always flush the entire tree, so concurrent flushers can just + * skip. This avoids a thundering herd problem on the rstat global lock + * from memcg flushers (e.g. reclaim, refault, etc). + */ + if (atomic_read(&stats_flush_ongoing) || + atomic_xchg(&stats_flush_ongoing, 1)) return; - flush_next_time = jiffies_64 + 2*FLUSH_TIME; + WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME); cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup); atomic_set(&stats_flush_threshold, 0); - spin_unlock_irqrestore(&stats_flush_lock, flag); + atomic_set(&stats_flush_ongoing, 0); } void mem_cgroup_flush_stats(void) @@ -655,7 +659,7 @@ void mem_cgroup_flush_stats(void) void mem_cgroup_flush_stats_ratelimited(void) { - if (time_after64(jiffies_64, flush_next_time)) + if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) mem_cgroup_flush_stats(); }