Message ID | 20250109133852.360122-1-tglozar@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | tools/rtla: Add basic test suite | expand |
On Thu, 9 Jan 2025 14:38:52 +0100 Tomas Glozar <tglozar@redhat.com> wrote: > Implement a simple TAP-based test engine in bash and a few basic tests > using it, to be used to check for bugs and regressions. > > A new "check" target is added to the rtla Makefile that runs the test suite > using the "prove" command implemented by Test::Harness. > > The only test format currently supported is running rtla with defined > command arguments per test, checking its exit code. In case the exit > code is non-zero, the output of rtla is displayed. > > The test cases are adopted from rtla tests in the Continuous Kernel > Integration (CKI) project [1] with the authors' approval. > > [1] https://gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/blob/main/rt-tests/us/rtla/ > > Signed-off-by: Tomas Glozar <tglozar@redhat.com> I tested this out, and it failed the last test: make[1]: Leaving directory '/work/git/linux.git/tools/tracing/rtla' RTLA=/work/git/linux.git/tools/tracing/rtla/rtla prove -o -f tests/ tests/hwnoise.t ... ok tests/osnoise.t ... ok tests/timerlat.t .. 9/9 not ok 9 - hist test in nanoseconds # tests/timerlat.t .. Failed 1/9 subtests Test Summary Report ------------------- tests/timerlat.t (Wstat: 0 Tests: 9 Failed: 1) Failed test: 9 Files=3, Tests=20, 451 wallclock secs ( 0.07 usr 0.11 sys + 10.57 cusr 74.09 csys = 84.84 CPU) Result: FAIL make: *** [Makefile:89: check] Error 1 -- Steve
pá 17. 1. 2025 v 2:18 odesílatel Steven Rostedt <rostedt@goodmis.org> napsal: > I tested this out, and it failed the last test: > Thank you for testing this. > make[1]: Leaving directory '/work/git/linux.git/tools/tracing/rtla' > RTLA=/work/git/linux.git/tools/tracing/rtla/rtla prove -o -f tests/ > tests/hwnoise.t ... ok > tests/osnoise.t ... ok > tests/timerlat.t .. 9/9 > not ok 9 - hist test in nanoseconds > # > tests/timerlat.t .. Failed 1/9 subtests > > Test Summary Report > ------------------- > tests/timerlat.t (Wstat: 0 Tests: 9 Failed: 1) > Failed test: 9 > Files=3, Tests=20, 451 wallclock secs ( 0.07 usr 0.11 sys + 10.57 cusr 74.09 csys = 84.84 CPU) > Result: FAIL > make: *** [Makefile:89: check] Error 1 > > I have not seen this when testing the patch. It looks like rtla exits without any output, I will send a v2 that makes the test engine print the error code in that case, it will be useful not only for this. That being said, it's not like nanosecond resolution rtla-timerlat is without bugs: # RTLA timerlat histogram # Time unit is nanoseconds (ns) # Duration: 0 00:00:02 Index IRQ-000 Thr-000 Usr-000 over: 1 0 0 count: 1 0 0 min: 6102 - - avg: 6102 - - max: 6102 - - ALL: IRQ Thr Usr count: 1 0 0 min: 6102 - - avg: 6102 - - max: 6102 - - rtla timerlat hit stop tracing ## CPU 0 hit stop tracing, analyzing it ## IRQ handler delay: 1442069690324.42 us (23632738287847.00 %) IRQ latency: 6.10 us Blocking thread: :0 ------------------------------------------------------------------------ IRQ latency: 6.10 us (100%) Max timerlat IRQ latency from idle: 6.10 us in cpu 0 Maybe after enough time repeating the command I'm also going to hit the bug you saw. Tomas
On Fri, 17 Jan 2025 11:53:48 +0100 Tomas Glozar <tglozar@redhat.com> wrote: > Maybe after enough time repeating the command I'm also going to hit > the bug you saw. Note, I am running it on a VM, not necessarily the best place to run an RT app. Also, I'm in my BRANCH PROFILING phase, where my machines have the #define if enabled, and is running about 1/5th the normal speed. Thus, it will trigger large latency and delays. -- Steve
On Fri, 17 Jan 2025 11:53:48 +0100 Tomas Glozar <tglozar@redhat.com> wrote: > Maybe after enough time repeating the command I'm also going to hit > the bug you saw. Nevermind. The divide by zero fix went into my tools/urgent branch and I pushed it up to Linus, but I had already started the tools/core branch for the next merge window that I've been adding these patches to. It does not include the fix. The reason for that crash was because of the divide by zero bug. After cherry-picking it for testing, it works. Now, what I also noticed was. When it did crash and die. The osnoise tracer no longer creates any osnoise threads. Can you take a look. Remove the fix, and run that test: ~# echo osnoise > /sys/kernel/tracing/current_tracer ~# ps aux |grep osnois root 942 105 0.0 0 0 ? R 16:00 0:02 [osnoise/0] root 943 104 0.0 0 0 ? R 16:00 0:02 [osnoise/1] root 944 105 0.0 0 0 ? R 16:00 0:02 [osnoise/2] root 945 105 0.0 0 0 ? R 16:00 0:02 [osnoise/3] root 946 105 0.0 0 0 ? R 16:00 0:02 [osnoise/4] root 947 105 0.0 0 0 ? R 16:00 0:02 [osnoise/5] root 948 104 0.0 0 0 ? R 16:00 0:02 [osnoise/6] root 949 130 0.0 0 0 ? R 16:00 0:03 [osnoise/7] root 953 9.0 0.0 6468 2152 pts/0 S+ 16:00 0:00 grep osnois ~# echo nop > /sys/kernel/tracing/current_tracer ~# cd linux.git ~# git show cfff4d019b4c01 | patch -p1 -R ~# cd tools/tracing/rtla/ ~# make ~# ./rtla timerlat hist -i 2 -c 0 -n -d 30s # RTLA timerlat histogram # Time unit is nanoseconds (ns) # Duration: 0 00:00:02 Index IRQ-000 Thr-000 Usr-000 over: 1 0 0 count: 1 0 0 min: 1481365 - - avg: 1481365 - - max: 1481365 - - Floating point exception ~# rmdir /sys/kernel/tracing/instances/timerlat_aa/ ~# rmdir /sys/kernel/tracing/instances/timerlat_hist/ ~# echo osnoise > /sys/kernel/tracing/current_tracer ~# ps aux |grep osnois root 969 11.1 0.0 6468 2244 pts/0 S+ 15:54 0:00 grep osnois So that failure puts the system into a state that can no longer run osnoise. It takes a reboot to fix it. That should be fixed where a reboot is not needed. -- Steve
pá 17. 1. 2025 v 22:02 odesílatel Steven Rostedt <rostedt@goodmis.org> napsal: > > Now, what I also noticed was. When it did crash and die. The osnoise tracer > no longer creates any osnoise threads. Can you take a look. > It's likely that you have NO_OSNOISE_WORKLOAD set. The default value is OSNOISE_WORKLOAD; rtla relies on it not being set, and rtla timerlat -u resets it to the original value after it ends, so usually, everything works. But it breaks either when you set NO_OSNOISE_WORKLOAD manually or when rtla timerlat -u crashes (and fails to set it back). > So that failure puts the system into a state that can no longer run osnoise. > It takes a reboot to fix it. > > That should be fixed where a reboot is not needed. I sent patches to fix this for timerlat [1] and as I commented in the thread [2], the same has to be done for osnoise, too. I just haven't sent that yet since I was working on the rtla hang in sample collection bug [3]. I also discovered that OSNOISE_WORKLOAD is not the only option that rtla relies on being set properly: stop_tracing_us is also affected and possibly other options. rtla should set all osnoise options properly on every run, not relying on them being set by the cleanup of the previous rtla run. [1] https://lore.kernel.org/linux-trace-kernel/20250107144823.239782-1-tglozar@redhat.com/T/ [2[ https://lore.kernel.org/linux-trace-kernel/CAP4=nvRCw_2hKPUyd7obZv2Xho_mjdSC7EmozFsoeS0OZC9xaA@mail.gmail.com/ [3] https://lore.kernel.org/linux-trace-kernel/20250116144931.649593-1-tglozar@redhat.com/ Tomas
po 20. 1. 2025 v 8:38 odesílatel Tomas Glozar <tglozar@redhat.com> napsal: > It's likely that you have NO_OSNOISE_WORKLOAD set. The default value > is OSNOISE_WORKLOAD; rtla relies on it not being set, and rtla > timerlat -u resets it to the original value after it ends, so usually, > everything works. But it breaks either when you set > NO_OSNOISE_WORKLOAD manually or when rtla timerlat -u crashes (and > fails to set it back). > I just realized that the tests are not running rtla timerlat with -k and upstream, it has been defaulting to user threads for a long time, so that's probably not it.
diff --git a/tools/tracing/rtla/Makefile b/tools/tracing/rtla/Makefile index a6a7dee16622..8b5101457c70 100644 --- a/tools/tracing/rtla/Makefile +++ b/tools/tracing/rtla/Makefile @@ -85,4 +85,6 @@ clean: doc_clean fixdep-clean $(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete $(Q)rm -f rtla rtla-static fixdep FEATURE-DUMP rtla-* $(Q)rm -rf feature -.PHONY: FORCE clean +check: $(RTLA) + RTLA=$(RTLA) prove -o -f tests/ +.PHONY: FORCE clean check diff --git a/tools/tracing/rtla/tests/engine.sh b/tools/tracing/rtla/tests/engine.sh new file mode 100644 index 000000000000..547490960a77 --- /dev/null +++ b/tools/tracing/rtla/tests/engine.sh @@ -0,0 +1,29 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +test_begin() { + # Count tests to allow the test harness to double-check if all were + # included correctly. + ctr=0 + [ -z "$RTLA" ] && RTLA="./rtla" + [ -n "$TEST_COUNT" ] && echo "1..$TEST_COUNT" +} + +check() { + # Simple check: run rtla with given arguments and test exit code. + # If TEST_COUNT is set, run the test. Otherwise, just count. + ctr=$(($ctr + 1)) + if [ -n "$TEST_COUNT" ] + then + # Run rtla; in case of failure, include its output as comment + # in the test results. + result=$("$RTLA" $2 2>&1) && echo "ok $ctr - $1" || echo "not ok $ctr - $1 +$(echo "$result" | while read line; do echo "# $line"; done) +" + fi +} + +test_end() { + # If running without TEST_COUNT, tests are not actually run, just + # counted. In that case, re-run the test with the correct count. + [ -z "$TEST_COUNT" ] && TEST_COUNT=$ctr exec bash $0 || true +} diff --git a/tools/tracing/rtla/tests/hwnoise.t b/tools/tracing/rtla/tests/hwnoise.t new file mode 100644 index 000000000000..491383e30373 --- /dev/null +++ b/tools/tracing/rtla/tests/hwnoise.t @@ -0,0 +1,19 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +source tests/engine.sh +test_begin + +check "verify help page" \ + "hwnoise --help" +check "detect noise higher than one microsecond" \ + "hwnoise -c 0 -T 1 -d 5s -q" +check "set the automatic trace mode" \ + "hwnoise -a 5 -d 30s" +check "set scheduling param to the osnoise tracer threads" \ + "hwnoise -P F:1 -c 0 -r 900000 -d 1M -q" +check "stop the trace if a single sample is higher than 1 us" \ + "hwnoise -s 1 -T 1 -t -d 30s" +check "enable a trace event trigger" \ + "hwnoise -t -e osnoise:irq_noise trigger=\"hist:key=desc,duration:sort=desc,duration:vals=hitcount\" -d 1m" + +test_end diff --git a/tools/tracing/rtla/tests/osnoise.t b/tools/tracing/rtla/tests/osnoise.t new file mode 100644 index 000000000000..83a7625d2226 --- /dev/null +++ b/tools/tracing/rtla/tests/osnoise.t @@ -0,0 +1,17 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +source tests/engine.sh +test_begin + +check "verify help page" \ + "osnoise --help" +check "verify the --priority/-P param" \ + "osnoise top -P F:1 -c 0 -r 900000 -d 1M -q" +check "verify the --stop/-s param" \ + "osnoise top -s 30 -T 1 -t" +check "verify the --trace param" \ + "osnoise hist -s 30 -T 1 -t" +check "verify the --entries/-E param" \ + "osnoise hist -P F:1 -c 0 -r 900000 -d 1M -b 10 -E 25" + +test_end diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t new file mode 100644 index 000000000000..a17218a064bb --- /dev/null +++ b/tools/tracing/rtla/tests/timerlat.t @@ -0,0 +1,25 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +source tests/engine.sh +test_begin + +check "verify help page" \ + "timerlat --help" +check "verify -s/--stack" \ + "timerlat top -s 3 -T 10 -t" +check "verify -P/--priority" \ + "timerlat top -P F:1 -c 0 -d 1M -q" +check "test in nanoseconds" \ + "timerlat top -i 2 -c 0 -n -d 30s" +check "set the automatic trace mode" \ + "timerlat top -a 5 --dump-tasks" +check "print the auto-analysis if hits the stop tracing condition" \ + "timerlat top --aa-only 5" +check "disable auto-analysis" \ + "timerlat top -s 3 -T 10 -t --no-aa" +check "verify -c/--cpus" \ + "timerlat hist -c 0 -d 30s" +check "hist test in nanoseconds" \ + "timerlat hist -i 2 -c 0 -n -d 30s" + +test_end
Implement a simple TAP-based test engine in bash and a few basic tests using it, to be used to check for bugs and regressions. A new "check" target is added to the rtla Makefile that runs the test suite using the "prove" command implemented by Test::Harness. The only test format currently supported is running rtla with defined command arguments per test, checking its exit code. In case the exit code is non-zero, the output of rtla is displayed. The test cases are adopted from rtla tests in the Continuous Kernel Integration (CKI) project [1] with the authors' approval. [1] https://gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/blob/main/rt-tests/us/rtla/ Signed-off-by: Tomas Glozar <tglozar@redhat.com> --- tools/tracing/rtla/Makefile | 4 +++- tools/tracing/rtla/tests/engine.sh | 29 +++++++++++++++++++++++++++++ tools/tracing/rtla/tests/hwnoise.t | 19 +++++++++++++++++++ tools/tracing/rtla/tests/osnoise.t | 17 +++++++++++++++++ tools/tracing/rtla/tests/timerlat.t | 25 +++++++++++++++++++++++++ 5 files changed, 93 insertions(+), 1 deletion(-) create mode 100644 tools/tracing/rtla/tests/engine.sh create mode 100644 tools/tracing/rtla/tests/hwnoise.t create mode 100644 tools/tracing/rtla/tests/osnoise.t create mode 100644 tools/tracing/rtla/tests/timerlat.t