diff mbox series

[2/2] rseq/selftests: Add test for mm_cid compaction

Message ID 20250217112317.258716-3-gmonaco@redhat.com (mailing list archive)
State New
Headers show
Series None | expand

Commit Message

Gabriele Monaco Feb. 17, 2025, 11:23 a.m. UTC
A task in the kernel (task_mm_cid_work) runs somewhat periodically to
compact the mm_cid for each process. Add a test to validate that it runs
correctly and timely.

The test spawns 1 thread pinned to each CPU, then each thread, including
the main one, runs in short bursts for some time. During this period, the
mm_cids should be spanning all numbers between 0 and nproc.

At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
is selected to be the new leader, all other threads terminate.

After some time, the only running thread should see 0 as mm_cid, if that
doesn't happen, the compaction mechanism didn't work and the test fails.

Since mm_cid compaction is less likely for tasks running in short
bursts, we increase the likelihood by just running a busy loop at every
iteration. This compaction is a best effort work and this behaviour is
currently acceptable.

The test never fails if only 1 core is available, in which case, we
cannot test anything as the only available mm_cid is 0.

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 tools/testing/selftests/rseq/.gitignore       |   1 +
 tools/testing/selftests/rseq/Makefile         |   2 +-
 .../selftests/rseq/mm_cid_compaction_test.c   | 208 ++++++++++++++++++
 3 files changed, 210 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c

Comments

Mathieu Desnoyers Feb. 17, 2025, 7:59 p.m. UTC | #1
On 2025-02-17 06:23, Gabriele Monaco wrote:
> A task in the kernel (task_mm_cid_work) runs somewhat periodically to
> compact the mm_cid for each process. Add a test to validate that it runs
> correctly and timely.
> 
> The test spawns 1 thread pinned to each CPU, then each thread, including
> the main one, runs in short bursts for some time. During this period, the
> mm_cids should be spanning all numbers between 0 and nproc.
> 
> At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
> is selected to be the new leader, all other threads terminate.
> 
> After some time, the only running thread should see 0 as mm_cid, if that
> doesn't happen, the compaction mechanism didn't work and the test fails.
> 
> Since mm_cid compaction is less likely for tasks running in short
> bursts, we increase the likelihood by just running a busy loop at every
> iteration. This compaction is a best effort work and this behaviour is
> currently acceptable.

I'm wondering what we can do to make this compaction scheme more
predictable.

The situation here is caused by the fact that the CID compaction
only happens on scheduler tick. If the workload is periodic and
runs in short bursts, chances are that the scheduler tick never
issue task_tick_mm_cid() for a given process, so no compaction.

So task_tick_mm_cid() basically does:

void task_tick_mm_cid(struct rq *rq, struct task_struct *curr)
{
         struct callback_head *work = &curr->cid_work;
         unsigned long now = jiffies;

         if (!curr->mm || (curr->flags & (PF_EXITING | PF_KTHREAD)) ||
             work->next != work)
                 return;
         if (time_before(now, READ_ONCE(curr->mm->mm_cid_next_scan)))
                 return;

         /* No page allocation under rq lock */
         task_work_add(curr, work, TWA_RESUME | TWAF_NO_ALLOC);
}

So typically we have a "time_before()" check that is hit and
paces the execution of this task_work every 100ms or so.

If we have periodic tasks, that means those tasks are necessarily
preempted so they are not current when the tick happens. If the
task cares about compaction of mm_cid, it means it has returned
to userspace after that preemption.

Sooo, we happen to have code in kernel/rseq.c called exactly at
that point:

__rseq_handle_notify_resume()

I wonder if we could perhaps just call task_tick_mm_cid() (or a version
of it renamed to something more meaningful) from
__rseq_handle_notify_resume() ? By combining time_before() checks from
the scheduler tick and at return to userspace after preemption, AFAIU
we'd be handling the periodic workload correctly, and therefore this
test for mm_cid compaction could check for more robust guarantees.

Thoughts ?

Thanks,

Mathieu

> 
> The test never fails if only 1 core is available, in which case, we
> cannot test anything as the only available mm_cid is 0.
> 
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
>   tools/testing/selftests/rseq/.gitignore       |   1 +
>   tools/testing/selftests/rseq/Makefile         |   2 +-
>   .../selftests/rseq/mm_cid_compaction_test.c   | 208 ++++++++++++++++++
>   3 files changed, 210 insertions(+), 1 deletion(-)
>   create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
> 
> diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
> index 16496de5f6ce4..2c89f97e4f737 100644
> --- a/tools/testing/selftests/rseq/.gitignore
> +++ b/tools/testing/selftests/rseq/.gitignore
> @@ -3,6 +3,7 @@ basic_percpu_ops_test
>   basic_percpu_ops_mm_cid_test
>   basic_test
>   basic_rseq_op_test
> +mm_cid_compaction_test
>   param_test
>   param_test_benchmark
>   param_test_compare_twice
> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> index 5a3432fceb586..ce1b38f46a355 100644
> --- a/tools/testing/selftests/rseq/Makefile
> +++ b/tools/testing/selftests/rseq/Makefile
> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
>   
>   TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
>   		param_test_benchmark param_test_compare_twice param_test_mm_cid \
> -		param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
> +		param_test_mm_cid_benchmark param_test_mm_cid_compare_twice mm_cid_compaction_test
>   
>   TEST_GEN_PROGS_EXTENDED = librseq.so
>   
> diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
> new file mode 100644
> index 0000000000000..8808500466d02
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
> @@ -0,0 +1,208 @@
> +// SPDX-License-Identifier: LGPL-2.1
> +#define _GNU_SOURCE
> +#include <assert.h>
> +#include <pthread.h>
> +#include <sched.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <stddef.h>
> +
> +#include "../kselftest.h"
> +#include "rseq.h"
> +
> +#define VERBOSE 0
> +#define printf_verbose(fmt, ...)                    \
> +	do {                                        \
> +		if (VERBOSE)                        \
> +			printf(fmt, ##__VA_ARGS__); \
> +	} while (0)
> +
> +/* 0.5 s */
> +#define RUNNER_PERIOD 500000
> +/* Number of runs before we terminate or get the token */
> +#define THREAD_RUNS 5
> +
> +/*
> + * Number of times we check that the mm_cid were compacted.
> + * Checks are repeated every RUNNER_PERIOD.
> + */
> +#define MM_CID_COMPACT_TIMEOUT 10
> +
> +struct thread_args {
> +	int cpu;
> +	int num_cpus;
> +	pthread_mutex_t *token;
> +	pthread_barrier_t *barrier;
> +	pthread_t *tinfo;
> +	struct thread_args *args_head;
> +};
> +
> +static void __noreturn *thread_runner(void *arg)
> +{
> +	struct thread_args *args = arg;
> +	int i, ret, curr_mm_cid;
> +	cpu_set_t cpumask;
> +
> +	CPU_ZERO(&cpumask);
> +	CPU_SET(args->cpu, &cpumask);
> +	ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask);
> +	if (ret) {
> +		errno = ret;
> +		perror("Error: failed to set affinity");
> +		abort();
> +	}
> +	pthread_barrier_wait(args->barrier);
> +
> +	for (i = 0; i < THREAD_RUNS; i++)
> +		usleep(RUNNER_PERIOD);
> +	curr_mm_cid = rseq_current_mm_cid();
> +	/*
> +	 * We select one thread with high enough mm_cid to be the new leader.
> +	 * All other threads (including the main thread) will terminate.
> +	 * After some time, the mm_cid of the only remaining thread should
> +	 * converge to 0, if not, the test fails.
> +	 */
> +	if (curr_mm_cid >= args->num_cpus / 2 &&
> +	    !pthread_mutex_trylock(args->token)) {
> +		printf_verbose(
> +			"cpu%d has mm_cid=%d and will be the new leader.\n",
> +			sched_getcpu(), curr_mm_cid);
> +		for (i = 0; i < args->num_cpus; i++) {
> +			if (args->tinfo[i] == pthread_self())
> +				continue;
> +			ret = pthread_join(args->tinfo[i], NULL);
> +			if (ret) {
> +				errno = ret;
> +				perror("Error: failed to join thread");
> +				abort();
> +			}
> +		}
> +		pthread_barrier_destroy(args->barrier);
> +		free(args->tinfo);
> +		free(args->token);
> +		free(args->barrier);
> +		free(args->args_head);
> +
> +		for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) {
> +			curr_mm_cid = rseq_current_mm_cid();
> +			printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i,
> +				       curr_mm_cid, sched_getcpu());
> +			if (curr_mm_cid == 0)
> +				exit(EXIT_SUCCESS);
> +			/*
> +			 * Currently mm_cid compaction is less likely for tasks
> +			 * running in short bursts: increase likelihood by just
> +			 * running for some time doing nothing.
> +			 */
> +			for (int j = 0; j < 0xffff; j++)
> +				for (int k = 0; k < 0xffff; k++)
> +					asm("");
> +			usleep(RUNNER_PERIOD);
> +		}
> +		exit(EXIT_FAILURE);
> +	}
> +	printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n",
> +		       sched_getcpu(), curr_mm_cid);
> +	pthread_exit(NULL);
> +}
> +
> +int test_mm_cid_compaction(void)
> +{
> +	cpu_set_t affinity;
> +	int i, j, ret = 0, num_threads;
> +	pthread_t *tinfo;
> +	pthread_mutex_t *token;
> +	pthread_barrier_t *barrier;
> +	struct thread_args *args;
> +
> +	sched_getaffinity(0, sizeof(affinity), &affinity);
> +	num_threads = CPU_COUNT(&affinity);
> +	tinfo = calloc(num_threads, sizeof(*tinfo));
> +	if (!tinfo) {
> +		perror("Error: failed to allocate tinfo");
> +		return -1;
> +	}
> +	args = calloc(num_threads, sizeof(*args));
> +	if (!args) {
> +		perror("Error: failed to allocate args");
> +		ret = -1;
> +		goto out_free_tinfo;
> +	}
> +	token = malloc(sizeof(*token));
> +	if (!token) {
> +		perror("Error: failed to allocate token");
> +		ret = -1;
> +		goto out_free_args;
> +	}
> +	barrier = malloc(sizeof(*barrier));
> +	if (!barrier) {
> +		perror("Error: failed to allocate barrier");
> +		ret = -1;
> +		goto out_free_token;
> +	}
> +	if (num_threads == 1) {
> +		fprintf(stderr, "Cannot test on a single cpu. "
> +				"Skipping mm_cid_compaction test.\n");
> +		/* only skipping the test, this is not a failure */
> +		goto out_free_barrier;
> +	}
> +	pthread_mutex_init(token, NULL);
> +	ret = pthread_barrier_init(barrier, NULL, num_threads);
> +	if (ret) {
> +		errno = ret;
> +		perror("Error: failed to initialise barrier");
> +		goto out_free_barrier;
> +	}
> +	for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) {
> +		if (!CPU_ISSET(i, &affinity))
> +			continue;
> +		args[j].num_cpus = num_threads;
> +		args[j].tinfo = tinfo;
> +		args[j].token = token;
> +		args[j].barrier = barrier;
> +		args[j].cpu = i;
> +		args[j].args_head = args;
> +		if (!j) {
> +			/* The first thread is the main one */
> +			tinfo[0] = pthread_self();
> +			++j;
> +			continue;
> +		}
> +		ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]);
> +		if (ret) {
> +			errno = ret;
> +			perror("Error: failed to create thread");
> +			abort();
> +		}
> +		++j;
> +	}
> +	printf_verbose("Started %d threads.\n", num_threads);
> +
> +	/* Also main thread will terminate if it is not selected as leader */
> +	thread_runner(&args[0]);
> +
> +	/* only reached in case of errors */
> +out_free_barrier:
> +	free(barrier);
> +out_free_token:
> +	free(token);
> +out_free_args:
> +	free(args);
> +out_free_tinfo:
> +	free(tinfo);
> +
> +	return ret;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	if (!rseq_mm_cid_available()) {
> +		fprintf(stderr, "Error: rseq_mm_cid unavailable\n");
> +		return -1;
> +	}
> +	if (test_mm_cid_compaction())
> +		return -1;
> +	return 0;
> +}
diff mbox series

Patch

diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
index 16496de5f6ce4..2c89f97e4f737 100644
--- a/tools/testing/selftests/rseq/.gitignore
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -3,6 +3,7 @@  basic_percpu_ops_test
 basic_percpu_ops_mm_cid_test
 basic_test
 basic_rseq_op_test
+mm_cid_compaction_test
 param_test
 param_test_benchmark
 param_test_compare_twice
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 5a3432fceb586..ce1b38f46a355 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -16,7 +16,7 @@  OVERRIDE_TARGETS = 1
 
 TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
 		param_test_benchmark param_test_compare_twice param_test_mm_cid \
-		param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
+		param_test_mm_cid_benchmark param_test_mm_cid_compare_twice mm_cid_compaction_test
 
 TEST_GEN_PROGS_EXTENDED = librseq.so
 
diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
new file mode 100644
index 0000000000000..8808500466d02
--- /dev/null
+++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
@@ -0,0 +1,208 @@ 
+// SPDX-License-Identifier: LGPL-2.1
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "../kselftest.h"
+#include "rseq.h"
+
+#define VERBOSE 0
+#define printf_verbose(fmt, ...)                    \
+	do {                                        \
+		if (VERBOSE)                        \
+			printf(fmt, ##__VA_ARGS__); \
+	} while (0)
+
+/* 0.5 s */
+#define RUNNER_PERIOD 500000
+/* Number of runs before we terminate or get the token */
+#define THREAD_RUNS 5
+
+/*
+ * Number of times we check that the mm_cid were compacted.
+ * Checks are repeated every RUNNER_PERIOD.
+ */
+#define MM_CID_COMPACT_TIMEOUT 10
+
+struct thread_args {
+	int cpu;
+	int num_cpus;
+	pthread_mutex_t *token;
+	pthread_barrier_t *barrier;
+	pthread_t *tinfo;
+	struct thread_args *args_head;
+};
+
+static void __noreturn *thread_runner(void *arg)
+{
+	struct thread_args *args = arg;
+	int i, ret, curr_mm_cid;
+	cpu_set_t cpumask;
+
+	CPU_ZERO(&cpumask);
+	CPU_SET(args->cpu, &cpumask);
+	ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask);
+	if (ret) {
+		errno = ret;
+		perror("Error: failed to set affinity");
+		abort();
+	}
+	pthread_barrier_wait(args->barrier);
+
+	for (i = 0; i < THREAD_RUNS; i++)
+		usleep(RUNNER_PERIOD);
+	curr_mm_cid = rseq_current_mm_cid();
+	/*
+	 * We select one thread with high enough mm_cid to be the new leader.
+	 * All other threads (including the main thread) will terminate.
+	 * After some time, the mm_cid of the only remaining thread should
+	 * converge to 0, if not, the test fails.
+	 */
+	if (curr_mm_cid >= args->num_cpus / 2 &&
+	    !pthread_mutex_trylock(args->token)) {
+		printf_verbose(
+			"cpu%d has mm_cid=%d and will be the new leader.\n",
+			sched_getcpu(), curr_mm_cid);
+		for (i = 0; i < args->num_cpus; i++) {
+			if (args->tinfo[i] == pthread_self())
+				continue;
+			ret = pthread_join(args->tinfo[i], NULL);
+			if (ret) {
+				errno = ret;
+				perror("Error: failed to join thread");
+				abort();
+			}
+		}
+		pthread_barrier_destroy(args->barrier);
+		free(args->tinfo);
+		free(args->token);
+		free(args->barrier);
+		free(args->args_head);
+
+		for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) {
+			curr_mm_cid = rseq_current_mm_cid();
+			printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i,
+				       curr_mm_cid, sched_getcpu());
+			if (curr_mm_cid == 0)
+				exit(EXIT_SUCCESS);
+			/*
+			 * Currently mm_cid compaction is less likely for tasks
+			 * running in short bursts: increase likelihood by just
+			 * running for some time doing nothing.
+			 */
+			for (int j = 0; j < 0xffff; j++)
+				for (int k = 0; k < 0xffff; k++)
+					asm("");
+			usleep(RUNNER_PERIOD);
+		}
+		exit(EXIT_FAILURE);
+	}
+	printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n",
+		       sched_getcpu(), curr_mm_cid);
+	pthread_exit(NULL);
+}
+
+int test_mm_cid_compaction(void)
+{
+	cpu_set_t affinity;
+	int i, j, ret = 0, num_threads;
+	pthread_t *tinfo;
+	pthread_mutex_t *token;
+	pthread_barrier_t *barrier;
+	struct thread_args *args;
+
+	sched_getaffinity(0, sizeof(affinity), &affinity);
+	num_threads = CPU_COUNT(&affinity);
+	tinfo = calloc(num_threads, sizeof(*tinfo));
+	if (!tinfo) {
+		perror("Error: failed to allocate tinfo");
+		return -1;
+	}
+	args = calloc(num_threads, sizeof(*args));
+	if (!args) {
+		perror("Error: failed to allocate args");
+		ret = -1;
+		goto out_free_tinfo;
+	}
+	token = malloc(sizeof(*token));
+	if (!token) {
+		perror("Error: failed to allocate token");
+		ret = -1;
+		goto out_free_args;
+	}
+	barrier = malloc(sizeof(*barrier));
+	if (!barrier) {
+		perror("Error: failed to allocate barrier");
+		ret = -1;
+		goto out_free_token;
+	}
+	if (num_threads == 1) {
+		fprintf(stderr, "Cannot test on a single cpu. "
+				"Skipping mm_cid_compaction test.\n");
+		/* only skipping the test, this is not a failure */
+		goto out_free_barrier;
+	}
+	pthread_mutex_init(token, NULL);
+	ret = pthread_barrier_init(barrier, NULL, num_threads);
+	if (ret) {
+		errno = ret;
+		perror("Error: failed to initialise barrier");
+		goto out_free_barrier;
+	}
+	for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) {
+		if (!CPU_ISSET(i, &affinity))
+			continue;
+		args[j].num_cpus = num_threads;
+		args[j].tinfo = tinfo;
+		args[j].token = token;
+		args[j].barrier = barrier;
+		args[j].cpu = i;
+		args[j].args_head = args;
+		if (!j) {
+			/* The first thread is the main one */
+			tinfo[0] = pthread_self();
+			++j;
+			continue;
+		}
+		ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]);
+		if (ret) {
+			errno = ret;
+			perror("Error: failed to create thread");
+			abort();
+		}
+		++j;
+	}
+	printf_verbose("Started %d threads.\n", num_threads);
+
+	/* Also main thread will terminate if it is not selected as leader */
+	thread_runner(&args[0]);
+
+	/* only reached in case of errors */
+out_free_barrier:
+	free(barrier);
+out_free_token:
+	free(token);
+out_free_args:
+	free(args);
+out_free_tinfo:
+	free(tinfo);
+
+	return ret;
+}
+
+int main(int argc, char **argv)
+{
+	if (!rseq_mm_cid_available()) {
+		fprintf(stderr, "Error: rseq_mm_cid unavailable\n");
+		return -1;
+	}
+	if (test_mm_cid_compaction())
+		return -1;
+	return 0;
+}