diff mbox series

mm/memcg: Do not check v1 event counter when not needed

Message ID 20220118182600.15007-1-mkoutny@suse.com (mailing list archive)
State New
Headers show
Series mm/memcg: Do not check v1 event counter when not needed | expand

Commit Message

Michal Koutný Jan. 18, 2022, 6:26 p.m. UTC
The function memcg_check_events() is called to trigger possible event
notifications or soft limit updates when page event "clock" moves
sufficiently. This tracking is not needed when neither soft limit nor (v1)
event notifications are configured. The tracking can catch-up
with the clock at any time upon thresholds configuration.

Guard this functionality behind an unlikely static branch (soft limit
and events are presumably rather unused than used).

This has slight insignificant performance gain in page-fault specific
benchmark but overall no performance impact is expected. The goal is to
partition the charging code per provided user functionality.

Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 mm/memcontrol.c | 8 ++++++++
 1 file changed, 8 insertions(+)


On Fri, Jan 14, 2022 at 10:09:35AM +0100, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> So avoiding these two also avoids memcg_check_events()?

I've made the matter explicit with the surrounding patch.

[
The performance "gain" is negligible (differences of pft [1] are dominated by
non-root memcg classification):

                         nocg, nopatch            cg, nopatch              nocg, patch              cg, patch
Hmean     faults/sec-2   273366.6312 (   0.00%)   243573.3767 * -10.90%*   273901.9709 *   0.20%*   247702.4104 *  -9.39%*
CoeffVar  faults/sec-2        3.8771 (   0.00%)        3.8396 (   0.97%)        3.1400 (  19.01%)        4.1188 (  -6.24%)

                                                  cg, nopatch                                       cg, patch
Hmean     faults/sec-2                            243573.3767 (   0.00%)                            247702.4104 *   1.70%*
CoeffVar  faults/sec-2                                 3.8396 (   0.00%)                                 4.1188 (  -7.27%)

On less targetted benchmarks it's well below noise.
]

I think it would make sense inserting the patch into your series and
subsequently reject enabling on PREEMPT_RT -- provided this patch makes sense
to others too -- the justification is rather functionality splitting for
this PREEMPT_RT effort.

> Are there plans to remove v1 or is this part of "we must not break
> userland"?

It's part of that mantra, so v1 can't be simply removed. OTOH, my sensing is
that this change also fits under not extending v1 (to avoid doubling effort on
everything).

Michal

[1] https://github.com/gormanm/mmtests/blob/6bcb8b301a48386e0cc63a21f7642048a3ceaed5/configs/config-pagealloc-performance#L6

Comments

Sebastian Andrzej Siewior Jan. 18, 2022, 7:57 p.m. UTC | #1
On 2022-01-18 19:26:00 [+0100], Michal Koutný wrote:
> I think it would make sense inserting the patch into your series and
> subsequently reject enabling on PREEMPT_RT -- provided this patch makes sense
> to others too -- the justification is rather functionality splitting for
> this PREEMPT_RT effort.

Interesting. So while looking at this today I came up with the patch at
the bottom. The other things I had looked way uglier and then since
nobody probably will use it…
Let me know how you want it to be integrated.

------>8------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Tue, 18 Jan 2022 17:28:07 +0100
Subject: [PATCH] mm/memcg: Disable threshold event handlers on PREEMPT_RT
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During the integration of PREEMPT_RT support, the code flow around
memcg_check_events() resulted in `twisted code'. Moving the code around
and avoiding then would then lead to an additional local-irq-save
section within memcg_check_events(). While looking better, it adds a
local-irq-save section to code flow which is usually within an
local-irq-save block.

The threshold event handler is a deprecated memcg v1 feature. Instead of
trying to get it to work under PREEMPT_RT just disable it. There should
have not been any users on PREEMPT_RT. From that perspective makes it
even less sense to get it to work under PREEMPT_RT while having zero
users.

Make memory.soft_limit_in_bytes and cgroup.event_control return
-EOPNOTSUPP on PREEMPT_RT. Make memcg_check_events() empty on PREEMPT_RT
since it won't do anything. Document that the two knobs are disabled on
PREEMPT_RT.

Suggested-by: Michal Hocko <mhocko@kernel.org>
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 Documentation/admin-guide/cgroup-v1/memory.rst |  2 ++
 mm/memcontrol.c                                | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index faac50149a222..2cc502a75ef64 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -64,6 +64,7 @@ Brief summary of control files.
 				     threads
  cgroup.procs			     show list of processes
  cgroup.event_control		     an interface for event_fd()
+				     This knob is not available on CONFIG_PREEMPT_RT systems.
  memory.usage_in_bytes		     show current usage for memory
 				     (See 5.5 for details)
  memory.memsw.usage_in_bytes	     show current usage for memory+Swap
@@ -75,6 +76,7 @@ Brief summary of control files.
  memory.max_usage_in_bytes	     show max memory usage recorded
  memory.memsw.max_usage_in_bytes     show max memory+Swap usage recorded
  memory.soft_limit_in_bytes	     set/show soft limit of memory usage
+				     This knob is not available on CONFIG_PREEMPT_RT systems.
  memory.stat			     show various statistics
  memory.use_hierarchy		     set/show hierarchical account enabled
                                      This knob is deprecated and shouldn't be
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2ed5f2a0879d3..3c4f7a0fd0039 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -821,6 +821,7 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
 	__this_cpu_add(memcg->vmstats_percpu->nr_page_events, nr_pages);
 }
 
+#ifndef CONFIG_PREEMPT_RT
 static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
 				       enum mem_cgroup_events_target target)
 {
@@ -864,6 +865,9 @@ static void memcg_check_events(struct mem_cgroup *memcg, int nid)
 			mem_cgroup_update_tree(memcg, nid);
 	}
 }
+#else
+static void memcg_check_events(struct mem_cgroup *memcg, int nid) { }
+#endif
 
 struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
 {
@@ -3751,8 +3755,12 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of,
 		}
 		break;
 	case RES_SOFT_LIMIT:
+#ifndef CONFIG_PREEMPT_RT
 		memcg->soft_limit = nr_pages;
 		ret = 0;
+#else
+		ret = -EOPNOTSUPP;
+#endif
 		break;
 	}
 	return ret ?: nbytes;
@@ -4717,6 +4725,7 @@ static void memcg_event_ptable_queue_proc(struct file *file,
 static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 					 char *buf, size_t nbytes, loff_t off)
 {
+#ifndef CONFIG_PREEMPT_RT
 	struct cgroup_subsys_state *css = of_css(of);
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_event *event;
@@ -4843,6 +4852,9 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	kfree(event);
 
 	return ret;
+#else
+	return -EOPNOTSUPP;
+#endif
 }
 
 static struct cftype mem_cgroup_legacy_files[] = {
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4a7b3ebf8e48..7f64ce33d137 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -106,6 +106,8 @@  static bool do_memsw_account(void)
 #define THRESHOLDS_EVENTS_TARGET 128
 #define SOFTLIMIT_EVENTS_TARGET 1024
 
+DEFINE_STATIC_KEY_FALSE(memcg_v1_events_enabled_key);
+
 /*
  * Cgroups above their limits are maintained in a RB-Tree, independent of
  * their hierarchy representation
@@ -852,6 +854,9 @@  static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
  */
 static void memcg_check_events(struct mem_cgroup *memcg, int nid)
 {
+	if (!static_branch_unlikely(&memcg_v1_events_enabled_key))
+		return;
+
 	/* threshold event is triggered in finer grain than soft limit */
 	if (unlikely(mem_cgroup_event_ratelimit(memcg,
 						MEM_CGROUP_TARGET_THRESH))) {
@@ -3757,6 +3762,7 @@  static ssize_t mem_cgroup_write(struct kernfs_open_file *of,
 		break;
 	case RES_SOFT_LIMIT:
 		memcg->soft_limit = nr_pages;
+		static_branch_enable(&memcg_v1_events_enabled_key);
 		ret = 0;
 		break;
 	}
@@ -4831,6 +4837,8 @@  static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	list_add(&event->list, &memcg->event_list);
 	spin_unlock_irq(&memcg->event_list_lock);
 
+	static_branch_enable(&memcg_v1_events_enabled_key);
+
 	fdput(cfile);
 	fdput(efile);