Message ID | 20230328061638.203420-8-yosryahmed@google.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | memcg: make rstat flushing irq and sleep friendly | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Mon, Mar 27, 2023 at 11:16 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > In workingset_refault(), we call mem_cgroup_flush_stats_ratelimited() > to flush stats within an RCU read section and with sleeping disallowed. > Move the call to mem_cgroup_flush_stats_ratelimited() above the RCU read > section and allow sleeping to avoid unnecessarily performing a lot of > work without sleeping. > > Since workingset_refault() is the only caller of > mem_cgroup_flush_stats_ratelimited(), just make it call the non-atomic > mem_cgroup_flush_stats(). > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> A nit below: Acked-by: Shakeel Butt <shakeelb@google.com> > --- > mm/memcontrol.c | 12 ++++++------ > mm/workingset.c | 4 ++-- > 2 files changed, 8 insertions(+), 8 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 57e8cbf701f3..0c0e74188e90 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -674,12 +674,6 @@ void mem_cgroup_flush_stats_atomic(void) > __mem_cgroup_flush_stats_atomic(); > } > > -void mem_cgroup_flush_stats_ratelimited(void) > -{ > - if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) > - mem_cgroup_flush_stats_atomic(); > -} > - > /* non-atomic functions, only safe from sleepable contexts */ > static void __mem_cgroup_flush_stats(void) > { > @@ -695,6 +689,12 @@ void mem_cgroup_flush_stats(void) > __mem_cgroup_flush_stats(); > } > > +void mem_cgroup_flush_stats_ratelimited(void) > +{ > + if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) > + mem_cgroup_flush_stats(); > +} > + > static void flush_memcg_stats_dwork(struct work_struct *w) > { > __mem_cgroup_flush_stats(); > diff --git a/mm/workingset.c b/mm/workingset.c > index af862c6738c3..7d7ecc46521c 100644 > --- a/mm/workingset.c > +++ b/mm/workingset.c > @@ -406,6 +406,8 @@ void workingset_refault(struct folio *folio, void *shadow) > unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset); > eviction <<= bucket_order; > > + /* Flush stats (and potentially sleep) before holding RCU read lock */ I think the only reason we use rcu lock is due to mem_cgroup_from_id(). Maybe we should add mem_cgroup_tryget_from_id(). The other caller of mem_cgroup_from_id() in vmscan is already doing the same and could use mem_cgroup_tryget_from_id(). Though this can be done separately to this series (if we decide to do it at all).
On Tue, Mar 28, 2023 at 06:16:36AM +0000, Yosry Ahmed wrote: > @@ -406,6 +406,8 @@ void workingset_refault(struct folio *folio, void *shadow) > unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset); > eviction <<= bucket_order; > > + /* Flush stats (and potentially sleep) before holding RCU read lock */ > + mem_cgroup_flush_stats_ratelimited(); > rcu_read_lock(); Minor nit, but please keep the lock section visually separated by an empty line between the flush and the rcu lock. Other than that, Acked-by: Johannes Weiner <hannes@cmpxchg.org>
On Tue, Mar 28, 2023 at 08:18:11AM -0700, Shakeel Butt wrote: > > @@ -406,6 +406,8 @@ void workingset_refault(struct folio *folio, void *shadow) > > unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset); > > eviction <<= bucket_order; > > > > + /* Flush stats (and potentially sleep) before holding RCU read lock */ > > I think the only reason we use rcu lock is due to > mem_cgroup_from_id(). Maybe we should add mem_cgroup_tryget_from_id(). > The other caller of mem_cgroup_from_id() in vmscan is already doing > the same and could use mem_cgroup_tryget_from_id(). Good catch. Nothing else in there is protected by RCU. We can just hold the ref instead. > Though this can be done separately to this series (if we decide to do > it at all). Agreed
On Tue, Mar 28, 2023 at 8:18 AM Shakeel Butt <shakeelb@google.com> wrote: > > On Mon, Mar 27, 2023 at 11:16 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > > > In workingset_refault(), we call mem_cgroup_flush_stats_ratelimited() > > to flush stats within an RCU read section and with sleeping disallowed. > > Move the call to mem_cgroup_flush_stats_ratelimited() above the RCU read > > section and allow sleeping to avoid unnecessarily performing a lot of > > work without sleeping. > > > > Since workingset_refault() is the only caller of > > mem_cgroup_flush_stats_ratelimited(), just make it call the non-atomic > > mem_cgroup_flush_stats(). > > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> > > A nit below: > > Acked-by: Shakeel Butt <shakeelb@google.com> > > > --- > > mm/memcontrol.c | 12 ++++++------ > > mm/workingset.c | 4 ++-- > > 2 files changed, 8 insertions(+), 8 deletions(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 57e8cbf701f3..0c0e74188e90 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -674,12 +674,6 @@ void mem_cgroup_flush_stats_atomic(void) > > __mem_cgroup_flush_stats_atomic(); > > } > > > > -void mem_cgroup_flush_stats_ratelimited(void) > > -{ > > - if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) > > - mem_cgroup_flush_stats_atomic(); > > -} > > - > > /* non-atomic functions, only safe from sleepable contexts */ > > static void __mem_cgroup_flush_stats(void) > > { > > @@ -695,6 +689,12 @@ void mem_cgroup_flush_stats(void) > > __mem_cgroup_flush_stats(); > > } > > > > +void mem_cgroup_flush_stats_ratelimited(void) > > +{ > > + if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) > > + mem_cgroup_flush_stats(); > > +} > > + > > static void flush_memcg_stats_dwork(struct work_struct *w) > > { > > __mem_cgroup_flush_stats(); > > diff --git a/mm/workingset.c b/mm/workingset.c > > index af862c6738c3..7d7ecc46521c 100644 > > --- a/mm/workingset.c > > +++ b/mm/workingset.c > > @@ -406,6 +406,8 @@ void workingset_refault(struct folio *folio, void *shadow) > > unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset); > > eviction <<= bucket_order; > > > > + /* Flush stats (and potentially sleep) before holding RCU read lock */ > > I think the only reason we use rcu lock is due to > mem_cgroup_from_id(). Maybe we should add mem_cgroup_tryget_from_id(). > The other caller of mem_cgroup_from_id() in vmscan is already doing > the same and could use mem_cgroup_tryget_from_id(). I think different callers of mem_cgroup_from_id() want different things. (a) workingset_refault() reads the memcg from the id and doesn't really care if the memcg is online or not. (b) __mem_cgroup_uncharge_swap() reads the memcg from the id and drops refs acquired on the swapout path. It doesn't need tryget as we should know for a fact that we are holding refs from the swapout path. It doesn't care if the memcg is online or not. (c) mem_cgroup_swapin_charge_folio() reads the memcg from the id and then gets a ref with css_tryget_online() -- so only if the refcount is non-zero and the memcg is online. So we would at least need mem_cgroup_tryget_from_id() and mem_cgroup_tryget_online_from_id() to eliminate all direct calls of mem_cgroup_from_id(). I am hesitant about (b) because if we use mem_cgroup_tryget_from_id() the code will be getting a ref, then dropping the ref we have been carrying from swapout, then dropping the ref we just acquired. WDYT? > > Though this can be done separately to this series (if we decide to do > it at all).
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 57e8cbf701f3..0c0e74188e90 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -674,12 +674,6 @@ void mem_cgroup_flush_stats_atomic(void) __mem_cgroup_flush_stats_atomic(); } -void mem_cgroup_flush_stats_ratelimited(void) -{ - if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) - mem_cgroup_flush_stats_atomic(); -} - /* non-atomic functions, only safe from sleepable contexts */ static void __mem_cgroup_flush_stats(void) { @@ -695,6 +689,12 @@ void mem_cgroup_flush_stats(void) __mem_cgroup_flush_stats(); } +void mem_cgroup_flush_stats_ratelimited(void) +{ + if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) + mem_cgroup_flush_stats(); +} + static void flush_memcg_stats_dwork(struct work_struct *w) { __mem_cgroup_flush_stats(); diff --git a/mm/workingset.c b/mm/workingset.c index af862c6738c3..7d7ecc46521c 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -406,6 +406,8 @@ void workingset_refault(struct folio *folio, void *shadow) unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset); eviction <<= bucket_order; + /* Flush stats (and potentially sleep) before holding RCU read lock */ + mem_cgroup_flush_stats_ratelimited(); rcu_read_lock(); /* * Look up the memcg associated with the stored ID. It might @@ -461,8 +463,6 @@ void workingset_refault(struct folio *folio, void *shadow) lruvec = mem_cgroup_lruvec(memcg, pgdat); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); - - mem_cgroup_flush_stats_ratelimited(); /* * Compare the distance to the existing workingset size. We * don't activate pages that couldn't stay resident even if
In workingset_refault(), we call mem_cgroup_flush_stats_ratelimited() to flush stats within an RCU read section and with sleeping disallowed. Move the call to mem_cgroup_flush_stats_ratelimited() above the RCU read section and allow sleeping to avoid unnecessarily performing a lot of work without sleeping. Since workingset_refault() is the only caller of mem_cgroup_flush_stats_ratelimited(), just make it call the non-atomic mem_cgroup_flush_stats(). Signed-off-by: Yosry Ahmed <yosryahmed@google.com> --- mm/memcontrol.c | 12 ++++++------ mm/workingset.c | 4 ++-- 2 files changed, 8 insertions(+), 8 deletions(-)