Message ID | 20230310061048.1418400-1-void@manifault.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next] bpf/selftests: Fix send_signal tracepoint tests | expand |
On Fri, Mar 10, 2023 at 12:10:48AM -0600, David Vernet wrote: > The send_signal tracepoint tests are non-deterministically failing in > CI. The test works as follows: > > 1. Two pairs of file descriptors are created using the pipe() function. > One pair is used to communicate between a parent process -> child > process, and the other for the reverse direction. > > 2. A child is fork()'ed. The child process registers a signal handler, > notifies its parent that the signal handler is registered, and then > and waits for its parent to have enabled a BPF program that sends a > signal. > > 3. The parent opens and loads a BPF skeleton with programs that send > signals to the child process. The different programs are triggered by > different perf events (either NMI or normal perf), or by regular > tracepoints. The signal is delivered to the child whenever the child > triggers the program. > > 4. The child's signal handler is invoked, which sets a flag saying that > the signal handler was reached. The child then signals to the parent > that it received the signal, and the test ends. > > The perf testcases (send_signal_perf{_thread} and > send_signal_nmi{_thread}) work 100% of the time, but the tracepoint > testcases fail non-deterministically because the tracepoint is not > always being fired for the child. > > There are two tracepoint programs registered in the test: > 'tracepoint/sched/sched_switch', and > 'tracepoint/syscalls/sys_enter_nanosleep'. The child never intentionally > blocks, nor sleeps, so neither tracepoint is guaranteed to be triggered. > To fix this, we can have the child trigger the nanosleep program with a > usleep(). > > Before this patch, the test would fail locally every 2-3 runs. Now, it > doesn't fail after more than 1000 runs. > > Signed-off-by: David Vernet <void@manifault.com> > --- > tools/testing/selftests/bpf/prog_tests/send_signal.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c > index d63a20fbed33..61cc83fca53c 100644 > --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c > +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c > @@ -64,8 +64,11 @@ static void test_send_signal_common(struct perf_event_attr *attr, > ASSERT_EQ(read(pipe_p2c[0], buf, 1), 1, "pipe_read"); > > /* wait a little for signal handler */ > - for (int i = 0; i < 1000000000 && !sigusr1_received; i++) > + for (int i = 0; i < 1000000000 && !sigusr1_received; i++) { > j /= i + j + 1; > + if (!attr) > + ASSERT_EQ(usleep(1), 0, "nanosleep_tp"); As soon as I sent this out, it occurred to me that having an ASSERT_EQ like this is not a good idea. usleep() could be interrupted by a signal and return EINTR, and the whole point of this test is to send signals to the child. Let me resend this as v2 without the ASSERT_EQ. > + } > > buf[0] = sigusr1_received ? '2' : '0'; > ASSERT_EQ(sigusr1_received, 1, "sigusr1_received"); > -- > 2.39.0 >
diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c index d63a20fbed33..61cc83fca53c 100644 --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c @@ -64,8 +64,11 @@ static void test_send_signal_common(struct perf_event_attr *attr, ASSERT_EQ(read(pipe_p2c[0], buf, 1), 1, "pipe_read"); /* wait a little for signal handler */ - for (int i = 0; i < 1000000000 && !sigusr1_received; i++) + for (int i = 0; i < 1000000000 && !sigusr1_received; i++) { j /= i + j + 1; + if (!attr) + ASSERT_EQ(usleep(1), 0, "nanosleep_tp"); + } buf[0] = sigusr1_received ? '2' : '0'; ASSERT_EQ(sigusr1_received, 1, "sigusr1_received");
The send_signal tracepoint tests are non-deterministically failing in CI. The test works as follows: 1. Two pairs of file descriptors are created using the pipe() function. One pair is used to communicate between a parent process -> child process, and the other for the reverse direction. 2. A child is fork()'ed. The child process registers a signal handler, notifies its parent that the signal handler is registered, and then and waits for its parent to have enabled a BPF program that sends a signal. 3. The parent opens and loads a BPF skeleton with programs that send signals to the child process. The different programs are triggered by different perf events (either NMI or normal perf), or by regular tracepoints. The signal is delivered to the child whenever the child triggers the program. 4. The child's signal handler is invoked, which sets a flag saying that the signal handler was reached. The child then signals to the parent that it received the signal, and the test ends. The perf testcases (send_signal_perf{_thread} and send_signal_nmi{_thread}) work 100% of the time, but the tracepoint testcases fail non-deterministically because the tracepoint is not always being fired for the child. There are two tracepoint programs registered in the test: 'tracepoint/sched/sched_switch', and 'tracepoint/syscalls/sys_enter_nanosleep'. The child never intentionally blocks, nor sleeps, so neither tracepoint is guaranteed to be triggered. To fix this, we can have the child trigger the nanosleep program with a usleep(). Before this patch, the test would fail locally every 2-3 runs. Now, it doesn't fail after more than 1000 runs. Signed-off-by: David Vernet <void@manifault.com> --- tools/testing/selftests/bpf/prog_tests/send_signal.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)