diff mbox series

[v2] memcg: simple cleanup of stats update functions

Message ID 20240420232505.2768428-1-shakeel.butt@linux.dev (mailing list archive)
State New
Headers show
Series [v2] memcg: simple cleanup of stats update functions | expand

Commit Message

Shakeel Butt April 20, 2024, 11:25 p.m. UTC
mod_memcg_lruvec_state() is never called from outside of memcontrol.c
and with always irq disabled. So, replace it with the irq disabled
version and add an assert that irq is disabled in the caller.

Similarly mod_objcg_state() is not called from outside of memcontrol.c,
so simply make it static and change it's name to __mod_objcg_state().

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---

Change since v1:
- Change mod_objcg_state to __mod_objcg_state as per the naming
  convention (suggested by Johannes).

 include/linux/memcontrol.h | 17 -----------------
 mm/memcontrol.c            | 31 +++++++++++++++----------------
 mm/slab.h                  |  2 --
 3 files changed, 15 insertions(+), 35 deletions(-)

Comments

Sebastian Andrzej Siewior May 27, 2024, 3:22 p.m. UTC | #1
On 2024-04-20 16:25:05 [-0700], Shakeel Butt wrote:
> mod_memcg_lruvec_state() is never called from outside of memcontrol.c
> and with always irq disabled. So, replace it with the irq disabled
> version and add an assert that irq is disabled in the caller.

unless PREEMPT_RT is enabled. In that case IRQs are not disabled as part
of local_lock_irqsave(&memcg_stock.stock_lock, …) leading to:

| ------------[ cut here ]------------
| WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:3150 __mod_objcg_mlstate+0xc2/0x110
| CPU: 0 PID: 1 Comm: systemd Not tainted 6.10.0-rc1-rt0+ #17
| Call Trace:
|  <TASK>
|  mod_objcg_state+0x2b3/0x320
|  __memcg_slab_post_alloc_hook+0x13c/0x340
|  kmem_cache_alloc_lru_noprof+0x2bd/0x2e0
|  alloc_inode+0x59/0xc0
|  iget_locked+0xf0/0x290

suggestions?

Sebastian
Vlastimil Babka (SUSE) May 27, 2024, 4:34 p.m. UTC | #2
On 5/27/24 5:22 PM, Sebastian Andrzej Siewior wrote:
> On 2024-04-20 16:25:05 [-0700], Shakeel Butt wrote:
>> mod_memcg_lruvec_state() is never called from outside of memcontrol.c
>> and with always irq disabled. So, replace it with the irq disabled
>> version and add an assert that irq is disabled in the caller.
> 
> unless PREEMPT_RT is enabled. In that case IRQs are not disabled as part
> of local_lock_irqsave(&memcg_stock.stock_lock, …) leading to:

But then the "interrupts are handled by a kernel thread that can sleep" part
of RT also means it's ok to just have the stock_lock taken with no
interrupts disabled as no actual raw interrupt handler will interrupt the
holder and deadlock, right?

> | ------------[ cut here ]------------
> | WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:3150 __mod_objcg_mlstate+0xc2/0x110
> | CPU: 0 PID: 1 Comm: systemd Not tainted 6.10.0-rc1-rt0+ #17
> | Call Trace:
> |  <TASK>
> |  mod_objcg_state+0x2b3/0x320
> |  __memcg_slab_post_alloc_hook+0x13c/0x340
> |  kmem_cache_alloc_lru_noprof+0x2bd/0x2e0
> |  alloc_inode+0x59/0xc0
> |  iget_locked+0xf0/0x290
> 
> suggestions?

So in that case the appropriate thing would be to replace the assert with
lockdep_assert_held(&memcg_stock.stock_lock);
?

It seems all the code paths leading here have that one.

> Sebastian
>
Shakeel Butt May 28, 2024, 5:16 a.m. UTC | #3
On Mon, May 27, 2024 at 06:34:24PM GMT, Vlastimil Babka (SUSE) wrote:
> On 5/27/24 5:22 PM, Sebastian Andrzej Siewior wrote:
> > On 2024-04-20 16:25:05 [-0700], Shakeel Butt wrote:
> >> mod_memcg_lruvec_state() is never called from outside of memcontrol.c
> >> and with always irq disabled. So, replace it with the irq disabled
> >> version and add an assert that irq is disabled in the caller.
> > 
> > unless PREEMPT_RT is enabled. In that case IRQs are not disabled as part
> > of local_lock_irqsave(&memcg_stock.stock_lock, …) leading to:

Sorry about that and thanks for the report.

> 
> But then the "interrupts are handled by a kernel thread that can sleep" part
> of RT also means it's ok to just have the stock_lock taken with no
> interrupts disabled as no actual raw interrupt handler will interrupt the
> holder and deadlock, right?
> 

Thanks Vlastimil for jolting my memory on RT reasoning.

> > | ------------[ cut here ]------------
> > | WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:3150 __mod_objcg_mlstate+0xc2/0x110
> > | CPU: 0 PID: 1 Comm: systemd Not tainted 6.10.0-rc1-rt0+ #17
> > | Call Trace:
> > |  <TASK>
> > |  mod_objcg_state+0x2b3/0x320
> > |  __memcg_slab_post_alloc_hook+0x13c/0x340
> > |  kmem_cache_alloc_lru_noprof+0x2bd/0x2e0
> > |  alloc_inode+0x59/0xc0
> > |  iget_locked+0xf0/0x290
> > 
> > suggestions?
> 
> So in that case the appropriate thing would be to replace the assert with
> lockdep_assert_held(&memcg_stock.stock_lock);
> ?
> 
> It seems all the code paths leading here have that one.
> 

Yeah this seems right and reasonable. Should I send a fix or you want to
send it?
Sebastian Andrzej Siewior May 28, 2024, 7:56 a.m. UTC | #4
On 2024-05-27 22:16:41 [-0700], Shakeel Butt wrote:
> On Mon, May 27, 2024 at 06:34:24PM GMT, Vlastimil Babka (SUSE) wrote:
> > On 5/27/24 5:22 PM, Sebastian Andrzej Siewior wrote:
> > > On 2024-04-20 16:25:05 [-0700], Shakeel Butt wrote:
> > >> mod_memcg_lruvec_state() is never called from outside of memcontrol.c
> > >> and with always irq disabled. So, replace it with the irq disabled
> > >> version and add an assert that irq is disabled in the caller.
> > > 
> > > unless PREEMPT_RT is enabled. In that case IRQs are not disabled as part
> > > of local_lock_irqsave(&memcg_stock.stock_lock, …) leading to:
> 
> Sorry about that and thanks for the report.

no worries.

> > 
> > But then the "interrupts are handled by a kernel thread that can sleep" part
> > of RT also means it's ok to just have the stock_lock taken with no
> > interrupts disabled as no actual raw interrupt handler will interrupt the
> > holder and deadlock, right?

I *don't* know why the interrupts-disabled check is here. The
memcg_stock.stock_lock is acquired on RT with interrupts enabled and
never disables interrupts. The lock is never acquired in an hard
interrupt (not threaded interrupt) context so there is never a deadlock.

Originally the interrupts were disabled in mod_memcg_lruvec_state()
because the counter, it operates on, is per-CPU and relies on disabled
interrupts because the operation is not atomic and the code can be run
in interrupts context (on !RT). The __mod_memcg_lruvec_state() variant
of it relied on interrupts being disabled by the caller. This "rely on"
was part of a spinlock_t lock (or invoked from an interrupt handler, the
memory is fading slowly away) which does not disable interrupts on
PREEMPT_RT.
So for that reason we ended up with __memcg_stats_lock() which disables
preemption only on PREEMPT_RT to achieve the same level of "atomic"
update.

> Thanks Vlastimil for jolting my memory on RT reasoning.
> 
> > > suggestions?
> > 
> > So in that case the appropriate thing would be to replace the assert with
> > lockdep_assert_held(&memcg_stock.stock_lock);
> > ?
> > 
> > It seems all the code paths leading here have that one.
> > 
> 
> Yeah this seems right and reasonable. Should I send a fix or you want to
> send it?

I don't mind sending a patch. I'm just not sure if the lock is the right
thing to do. However it should ensure that interrupts are disabled on
!RT for the sake of the counter update (if observed in IRQ context).

Yeah, let me prepare something.

Sebastian
Vlastimil Babka (SUSE) May 28, 2024, 8:10 a.m. UTC | #5
On 5/28/24 9:56 AM, Sebastian Andrzej Siewior wrote:
> On 2024-05-27 22:16:41 [-0700], Shakeel Butt wrote:
>> On Mon, May 27, 2024 at 06:34:24PM GMT, Vlastimil Babka (SUSE) wrote:
>> > On 5/27/24 5:22 PM, Sebastian Andrzej Siewior wrote:
>> > > On 2024-04-20 16:25:05 [-0700], Shakeel Butt wrote:
>> > >> mod_memcg_lruvec_state() is never called from outside of memcontrol.c
>> > >> and with always irq disabled. So, replace it with the irq disabled
>> > >> version and add an assert that irq is disabled in the caller.
>> > > 
>> > > unless PREEMPT_RT is enabled. In that case IRQs are not disabled as part
>> > > of local_lock_irqsave(&memcg_stock.stock_lock, …) leading to:
>> 
>> Sorry about that and thanks for the report.
> 
> no worries.
> 
>> > 
>> > But then the "interrupts are handled by a kernel thread that can sleep" part
>> > of RT also means it's ok to just have the stock_lock taken with no
>> > interrupts disabled as no actual raw interrupt handler will interrupt the
>> > holder and deadlock, right?
> 
> I *don't* know why the interrupts-disabled check is here. The
> memcg_stock.stock_lock is acquired on RT with interrupts enabled and
> never disables interrupts. The lock is never acquired in an hard
> interrupt (not threaded interrupt) context so there is never a deadlock.
> 
> Originally the interrupts were disabled in mod_memcg_lruvec_state()
> because the counter, it operates on, is per-CPU and relies on disabled
> interrupts because the operation is not atomic and the code can be run
> in interrupts context (on !RT). The __mod_memcg_lruvec_state() variant
> of it relied on interrupts being disabled by the caller. This "rely on"
> was part of a spinlock_t lock (or invoked from an interrupt handler, the
> memory is fading slowly away) which does not disable interrupts on
> PREEMPT_RT.
> So for that reason we ended up with __memcg_stats_lock() which disables
> preemption only on PREEMPT_RT to achieve the same level of "atomic"
> update.
> 
>> Thanks Vlastimil for jolting my memory on RT reasoning.
>> 
>> > > suggestions?
>> > 
>> > So in that case the appropriate thing would be to replace the assert with
>> > lockdep_assert_held(&memcg_stock.stock_lock);
>> > ?
>> > 
>> > It seems all the code paths leading here have that one.
>> > 
>> 
>> Yeah this seems right and reasonable. Should I send a fix or you want to
>> send it?
> 
> I don't mind sending a patch. I'm just not sure if the lock is the right
> thing to do. However it should ensure that interrupts are disabled on
> !RT for the sake of the counter update (if observed in IRQ context).

Looks like some places there use VM_WARN_ON_IRQS_ENABLED() that's turned off
for PREEMPT_RT, so maybe that's what should replace the current
lockdep_assert, perhaps together with
lockdep_assert_held(this_cpu_ptr(&memcg_stock.stock_lock));

But also __mod_memcg_lruvec_state() already has that VM_WARN_ON.

> Yeah, let me prepare something.
> 
> Sebastian
Sebastian Andrzej Siewior May 28, 2024, 10:19 a.m. UTC | #6
On 2024-05-28 10:10:50 [+0200], Vlastimil Babka (SUSE) wrote:
> > I don't mind sending a patch. I'm just not sure if the lock is the right
> > thing to do. However it should ensure that interrupts are disabled on
> > !RT for the sake of the counter update (if observed in IRQ context).
> 
> Looks like some places there use VM_WARN_ON_IRQS_ENABLED() that's turned off
> for PREEMPT_RT, so maybe that's what should replace the current
> lockdep_assert, perhaps together with
> lockdep_assert_held(this_cpu_ptr(&memcg_stock.stock_lock));
> 
> But also __mod_memcg_lruvec_state() already has that VM_WARN_ON.

This "VM_WARN_ON_IRQS_ENABLED" is the initial assert for "interrupts
must be disabled while change the counter".
You want to replace it with lockdep? Part of its requirement was that it
yells with lockdep disabled.

Currently I am leaning towards removing the
lockdep_assert_irqs_disabled() from __mod_objcg_mlstate(). Nothing but
the counter need it and they have their own check. So?

Sebastian
Vlastimil Babka (SUSE) May 28, 2024, 12:17 p.m. UTC | #7
On 5/28/24 12:19 PM, Sebastian Andrzej Siewior wrote:
> On 2024-05-28 10:10:50 [+0200], Vlastimil Babka (SUSE) wrote:
>> > I don't mind sending a patch. I'm just not sure if the lock is the right
>> > thing to do. However it should ensure that interrupts are disabled on
>> > !RT for the sake of the counter update (if observed in IRQ context).
>> 
>> Looks like some places there use VM_WARN_ON_IRQS_ENABLED() that's turned off
>> for PREEMPT_RT, so maybe that's what should replace the current
>> lockdep_assert, perhaps together with
>> lockdep_assert_held(this_cpu_ptr(&memcg_stock.stock_lock));
>> 
>> But also __mod_memcg_lruvec_state() already has that VM_WARN_ON.
> 
> This "VM_WARN_ON_IRQS_ENABLED" is the initial assert for "interrupts
> must be disabled while change the counter".
> You want to replace it with lockdep? Part of its requirement was that it
> yells with lockdep disabled.

No I meant the other way around, that we'de use VM_WARN_ON_IRQS_ENABLED
instead of the current lockdep_assert() that doesn't work on RT.

> Currently I am leaning towards removing the
> lockdep_assert_irqs_disabled() from __mod_objcg_mlstate(). Nothing but
> the counter need it and they have their own check. So?

Yeah that might be sufficient.

> Sebastian
diff mbox series

Patch

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 8f332b4ae84c..9aba0d0462ca 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1077,8 +1077,6 @@  static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec,
 void mem_cgroup_flush_stats(struct mem_cgroup *memcg);
 void mem_cgroup_flush_stats_ratelimited(struct mem_cgroup *memcg);
 
-void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
-			      int val);
 void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val);
 
 static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
@@ -1091,16 +1089,6 @@  static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
 	local_irq_restore(flags);
 }
 
-static inline void mod_memcg_lruvec_state(struct lruvec *lruvec,
-					  enum node_stat_item idx, int val)
-{
-	unsigned long flags;
-
-	local_irq_save(flags);
-	__mod_memcg_lruvec_state(lruvec, idx, val);
-	local_irq_restore(flags);
-}
-
 void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
 			  unsigned long count);
 
@@ -1594,11 +1582,6 @@  static inline void mem_cgroup_flush_stats_ratelimited(struct mem_cgroup *memcg)
 {
 }
 
-static inline void __mod_memcg_lruvec_state(struct lruvec *lruvec,
-					    enum node_stat_item idx, int val)
-{
-}
-
 static inline void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
 					   int val)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7703ced535a3..833d09c1d523 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -837,8 +837,9 @@  static unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx)
 	return x;
 }
 
-void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
-			      int val)
+static void __mod_memcg_lruvec_state(struct lruvec *lruvec,
+				     enum node_stat_item idx,
+				     int val)
 {
 	struct mem_cgroup_per_node *pn;
 	struct mem_cgroup *memcg;
@@ -2983,21 +2984,19 @@  void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg)
 
 #ifdef CONFIG_MEMCG_KMEM
 
-/*
- * mod_objcg_mlstate() may be called with irq enabled, so
- * mod_memcg_lruvec_state() should be used.
- */
-static inline void mod_objcg_mlstate(struct obj_cgroup *objcg,
-				     struct pglist_data *pgdat,
-				     enum node_stat_item idx, int nr)
+static inline void __mod_objcg_mlstate(struct obj_cgroup *objcg,
+				       struct pglist_data *pgdat,
+				       enum node_stat_item idx, int nr)
 {
 	struct mem_cgroup *memcg;
 	struct lruvec *lruvec;
 
+	lockdep_assert_irqs_disabled();
+
 	rcu_read_lock();
 	memcg = obj_cgroup_memcg(objcg);
 	lruvec = mem_cgroup_lruvec(memcg, pgdat);
-	mod_memcg_lruvec_state(lruvec, idx, nr);
+	__mod_memcg_lruvec_state(lruvec, idx, nr);
 	rcu_read_unlock();
 }
 
@@ -3317,7 +3316,7 @@  void __memcg_kmem_uncharge_page(struct page *page, int order)
 	obj_cgroup_put(objcg);
 }
 
-void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
+static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 		     enum node_stat_item idx, int nr)
 {
 	struct memcg_stock_pcp *stock;
@@ -3345,12 +3344,12 @@  void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 		struct pglist_data *oldpg = stock->cached_pgdat;
 
 		if (stock->nr_slab_reclaimable_b) {
-			mod_objcg_mlstate(objcg, oldpg, NR_SLAB_RECLAIMABLE_B,
+			__mod_objcg_mlstate(objcg, oldpg, NR_SLAB_RECLAIMABLE_B,
 					  stock->nr_slab_reclaimable_b);
 			stock->nr_slab_reclaimable_b = 0;
 		}
 		if (stock->nr_slab_unreclaimable_b) {
-			mod_objcg_mlstate(objcg, oldpg, NR_SLAB_UNRECLAIMABLE_B,
+			__mod_objcg_mlstate(objcg, oldpg, NR_SLAB_UNRECLAIMABLE_B,
 					  stock->nr_slab_unreclaimable_b);
 			stock->nr_slab_unreclaimable_b = 0;
 		}
@@ -3376,7 +3375,7 @@  void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 		}
 	}
 	if (nr)
-		mod_objcg_mlstate(objcg, pgdat, idx, nr);
+		__mod_objcg_mlstate(objcg, pgdat, idx, nr);
 
 	local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
 	obj_cgroup_put(old);
@@ -3442,13 +3441,13 @@  static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
 	 */
 	if (stock->nr_slab_reclaimable_b || stock->nr_slab_unreclaimable_b) {
 		if (stock->nr_slab_reclaimable_b) {
-			mod_objcg_mlstate(old, stock->cached_pgdat,
+			__mod_objcg_mlstate(old, stock->cached_pgdat,
 					  NR_SLAB_RECLAIMABLE_B,
 					  stock->nr_slab_reclaimable_b);
 			stock->nr_slab_reclaimable_b = 0;
 		}
 		if (stock->nr_slab_unreclaimable_b) {
-			mod_objcg_mlstate(old, stock->cached_pgdat,
+			__mod_objcg_mlstate(old, stock->cached_pgdat,
 					  NR_SLAB_UNRECLAIMABLE_B,
 					  stock->nr_slab_unreclaimable_b);
 			stock->nr_slab_unreclaimable_b = 0;
diff --git a/mm/slab.h b/mm/slab.h
index e32d9cf1077a..5f8f47c5bee0 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -578,8 +578,6 @@  bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 				  gfp_t flags, size_t size, void **p);
 void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
 			    void **p, int objects, struct slabobj_ext *obj_exts);
-void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
-		     enum node_stat_item idx, int nr);
 #endif
 
 size_t __ksize(const void *objp);