Message ID | ZYAimyPYhxVA9wKg@li-008a6a4c-3549-11b2-a85c-c5cc2836eea2.ibm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | selftests/livepatch fails on s390 | expand |
On 12/18/23 05:44, Alexander Gordeev wrote: > Hi all, > > The livepatch selftest somehow fails in -next on s390 due to what > appears to me as 'comm' usage issue. E.g the removal of timestamp- > less line "with link type OSD_10GIG." in the below output forces > 'comm' to produce the correct result in check_result() function of > tools/testing/selftests/livepatch/functions.sh script: > > [ 11.229256] qeth 0.0.bd02: qdio: OSA on SC 2624 using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W > [ 11.250189] systemd-journald[943]: Successfully sent stream file descriptor to service manager. > [ 11.258763] qeth 0.0.bd00: Device is a OSD Express card (level: 0165) > with link type OSD_10GIG. > [ 11.259261] qeth 0.0.bd00: The device represents a Bridge Capable Port > [ 11.262376] qeth 0.0.bd00: MAC address b2:96:9c:49:aa:e9 successfully registered > [ 11.269654] qeth 0.0.bd00: MAC address 06:c6:b5:7d:ee:63 successfully registered > > By contrast, using the 'diff' instead works as a charm. But it was > removed with commit 2f3f651f3756 ("selftests/livepatch: Use "comm" > instead of "diff" for dmesg"). > > I am attaching the contents of "$expect" and "$result" script > variables and the output of 'dmesg' before and after test run > dmesg-saved.txt and dmesg.txt. > > Another 'dmesg' output dmesg-saved1.txt and dmesg1.txt also > shows the same problem, which seems like something to do with > sorting. > > The minimal reproducer attached is dmesg-saved1-rep.txt and > dmesg1-rep.txt, that could be described as: > > --- dmesg-saved1-rep.txt 2023-12-17 21:08:14.171014218 +0100 > +++ dmesg1-rep.txt 2023-12-17 21:06:52.221014218 +0100 > @@ -1,3 +1,3 @@ > -[ 98.820331] livepatch: 'test_klp_state2': starting patching transition > [ 100.031067] livepatch: 'test_klp_state2': completing patching transition > [ 284.224335] livepatch: kernel.ftrace_enabled = 1 > +[ 284.232921] ===== TEST: basic shadow variable API ===== > > The culprit is the extra space in [ 98.820331] timestamp, that from > the script point of view produces the output with two extra lines: > > [ 100.031067] livepatch: 'test_klp_state2': completing patching transition > [ 284.224335] livepatch: kernel.ftrace_enabled = 1 > [ 284.232921] ===== TEST: basic shadow variable API ===== > > If the line with [ 98.820331] timestamp removed or changed to e.g > [ 100.031066] (aka 1 us less), then the result output is as expected: > > [ 284.232921] ===== TEST: basic shadow variable API ===== > > Thanks! Hi Alexander, You're right about the timestamp formatting. I can repro with a a slight variant on your example: Assume a pre-test dmesg log has three msgs (including one with timestamp of fewer digits): $ head /tmp/{A,B} ==> /tmp/A <== [ 1] message one [10] message two [11] message three during the rest, the first message rolls off the buffer post-test and a new fourth message is added: ==> /tmp/B <== [10] message two [11] message three [12] message four The test's comm invocation should be only printing "lines unique to FILE2", ie, the latest fourth message, but... $ comm --nocheck-order -13 /tmp/A /tmp/B [10] message two [11] message three [12] message four If we pre-trim the timestamps, the output is what we expect: $ comm --nocheck-order -13 \ <(sed 's/^\[[ 0-9.]*\] //' /tmp/A) \ <(sed 's/^\[[ 0-9.]*\] //' /tmp/B) message four however, I'm not sure if that fix would easily apply. It looks like I provided a disclaimer notice in check_result(): # Note: when comparing dmesg output, the kernel log timestamps # help differentiate repeated testing runs. Remove them with a # post-comparison sed filter. so I wonder if comm will get confused with repeated selftest runs? Using diff/comm was a trick that I surprised worked this long :) Maybe it can still hold, but I'll have to run a few experiements.
On Mon, Dec 18, 2023 at 11:44:23AM -0500, Joe Lawrence wrote: > > [ ... snip ... ] > > If we pre-trim the timestamps, the output is what we expect: > > $ comm --nocheck-order -13 \ > <(sed 's/^\[[ 0-9.]*\] //' /tmp/A) \ > <(sed 's/^\[[ 0-9.]*\] //' /tmp/B) > message four > > however, I'm not sure if that fix would easily apply. It looks like I > provided a disclaimer notice in check_result(): > > # Note: when comparing dmesg output, the kernel log timestamps > # help differentiate repeated testing runs. Remove them with a > # post-comparison sed filter. > > so I wonder if comm will get confused with repeated selftest runs? > Using diff/comm was a trick that I surprised worked this long :) Maybe > it can still hold, but I'll have to run a few experiements. Hi Alexander, I tested this idea and yeah, it breaks. Google brought me back to Petr's report from when way back when I tried to omit the timestamps: https://lore.kernel.org/all/20200617092352.GT31238@alley/ that was with diff and not comm, but I could trip up the latter command with ~22 iterations of the selftests. So I took a crack at refactoring: instead of saving intermediate dmesg.saved files, on starting a test, the script logs and remembers a LAST_DMESG entry. When it later checks the test result, it dumps dmesg starting from after LAST_DMESG entry. This is *very* lightly tested, but I thought maybe you could give it a spin. Hopefully it's less brittle than diff/comm strategy. -- Joe -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- diff --git a/tools/testing/selftests/livepatch/functions.sh b/tools/testing/selftests/livepatch/functions.sh index c8416c54b463..b012723e631a 100644 --- a/tools/testing/selftests/livepatch/functions.sh +++ b/tools/testing/selftests/livepatch/functions.sh @@ -42,17 +42,6 @@ function die() { exit 1 } -# save existing dmesg so we can detect new content -function save_dmesg() { - SAVED_DMESG=$(mktemp --tmpdir -t klp-dmesg-XXXXXX) - dmesg > "$SAVED_DMESG" -} - -# cleanup temporary dmesg file from save_dmesg() -function cleanup_dmesg_file() { - rm -f "$SAVED_DMESG" -} - function push_config() { DYNAMIC_DEBUG=$(grep '^kernel/livepatch' /sys/kernel/debug/dynamic_debug/control | \ awk -F'[: ]' '{print "file " $1 " line " $2 " " $4}') @@ -99,7 +88,6 @@ function set_ftrace_enabled() { function cleanup() { pop_config - cleanup_dmesg_file } # setup_config - save the current config and set a script exit trap that @@ -280,7 +268,13 @@ function set_pre_patch_ret { function start_test { local test="$1" - save_dmesg + # Dump something unique into the dmesg log, then stash the entry + # in LAST_DMESG. The check_result() function will use it to + # find new kernel messages since the test started. + local timestamp="$(date --rfc-3339=ns)" + log "livepatch kselftest timestamp: $timestamp" + LAST_DMESG=$(dmesg | grep "livepatch kselftest timestamp: $timestamp") + echo -n "TEST: $test ... " log "===== TEST: $test =====" } @@ -291,11 +285,11 @@ function check_result { local expect="$*" local result - # Note: when comparing dmesg output, the kernel log timestamps - # help differentiate repeated testing runs. Remove them with a - # post-comparison sed filter. - - result=$(dmesg | comm --nocheck-order -13 "$SAVED_DMESG" - | \ + # Test results include any new dmesg entry since LAST_DMESG, then: + # - include lines matching keywords + # - exclude lines matching keywords + # - filter out dmesg timestamp prefixes + result=$(dmesg | awk -v last_dmesg="$LAST_DMESG" 'p; $0 == last_dmesg { p=1 }' | \ grep -e 'livepatch:' -e 'test_klp' | \ grep -v '\(tainting\|taints\) kernel' | \ sed 's/^\[[ 0-9.]*\] //') @@ -306,8 +300,6 @@ function check_result { echo -e "not ok\n\n$(diff -upr --label expected --label result <(echo "$expect") <(echo "$result"))\n" die "livepatch kselftest(s) failed" fi - - cleanup_dmesg_file } # check_sysfs_rights(modname, rel_path, expected_rights) - check sysfs
On Mon, Dec 18, 2023 at 05:44:54PM -0500, Joe Lawrence wrote: Hi Joe! ... > > If we pre-trim the timestamps, the output is what we expect: > > > > $ comm --nocheck-order -13 \ > > <(sed 's/^\[[ 0-9.]*\] //' /tmp/A) \ > > <(sed 's/^\[[ 0-9.]*\] //' /tmp/B) > > message four > > > > however, I'm not sure if that fix would easily apply. It looks like I > > provided a disclaimer notice in check_result(): > > > > # Note: when comparing dmesg output, the kernel log timestamps > > # help differentiate repeated testing runs. Remove them with a > > # post-comparison sed filter. > > > > so I wonder if comm will get confused with repeated selftest runs? I would think so. AFAICT even a single run would produce duplicate lines. ... > This is *very* lightly tested, but I thought maybe you could give it a > spin. Hopefully it's less brittle than diff/comm strategy. That seems to be working. However, please see below. ... > @@ -280,7 +268,13 @@ function set_pre_patch_ret { > function start_test { > local test="$1" > > - save_dmesg > + # Dump something unique into the dmesg log, then stash the entry > + # in LAST_DMESG. The check_result() function will use it to > + # find new kernel messages since the test started. > + local timestamp="$(date --rfc-3339=ns)" > + log "livepatch kselftest timestamp: $timestamp" > + LAST_DMESG=$(dmesg | grep "livepatch kselftest timestamp: $timestamp") Not sure if it not paranoid mode, but still.. If the 'log' call is guaranteed synced? AKA would 'grep' always catch the line? And yeah.. if the log output is pushed away (e.g by a crazy-dumping concurrent logger), then nothing here would work. But this is not a new problem, so just my two cents. > echo -n "TEST: $test ... " > log "===== TEST: $test =====" > } Thanks!
On 12/19/23 04:45, Alexander Gordeev wrote: > On Mon, Dec 18, 2023 at 05:44:54PM -0500, Joe Lawrence wrote: > >> @@ -280,7 +268,13 @@ function set_pre_patch_ret { >> function start_test { >> local test="$1" >> >> - save_dmesg >> + # Dump something unique into the dmesg log, then stash the entry >> + # in LAST_DMESG. The check_result() function will use it to >> + # find new kernel messages since the test started. >> + local timestamp="$(date --rfc-3339=ns)" >> + log "livepatch kselftest timestamp: $timestamp" >> + LAST_DMESG=$(dmesg | grep "livepatch kselftest timestamp: $timestamp") > > Not sure if it not paranoid mode, but still.. > If the 'log' call is guaranteed synced? AKA would 'grep' always catch the line? > > And yeah.. if the log output is pushed away (e.g by a crazy-dumping concurrent > logger), then nothing here would work. But this is not a new problem, so just > my two cents. > Good question. I added the timestamp message to ensure there would be something in the log to grab onto... but I'm not sure what guarantees there are about syncing back out through dmesg. There is a loop_util() utility function in the script that could be used to wait, like: log "$last_dmesg_msg" loop_until "dmesg | grep -q '$last_dmesg_msg'" || die "Can't find canary dmesg entry, buffer overrun?" LAST_DMESG=$(dmesg | grep "$last_dmesg_msg") That should catch 1) short latencies and 2) buffer rollover. Maybe that is good enough?
On Tue, Dec 19, 2023 at 09:50:18AM -0500, Joe Lawrence wrote: > There is a loop_util() utility function in the script that could be used > to wait, like: > > log "$last_dmesg_msg" > loop_until "dmesg | grep -q '$last_dmesg_msg'" || > die "Can't find canary dmesg entry, buffer overrun?" > LAST_DMESG=$(dmesg | grep "$last_dmesg_msg") > > That should catch 1) short latencies and 2) buffer rollover. Maybe that > is good enough? I guess, if loop_until() would fail in case a message did not show up in meaningful time, that would be it. > -- > Joe >
On Tue 2023-12-19 09:50:18, Joe Lawrence wrote: > On 12/19/23 04:45, Alexander Gordeev wrote: > > On Mon, Dec 18, 2023 at 05:44:54PM -0500, Joe Lawrence wrote: > > > >> @@ -280,7 +268,13 @@ function set_pre_patch_ret { > >> function start_test { > >> local test="$1" > >> > >> - save_dmesg > >> + # Dump something unique into the dmesg log, then stash the entry > >> + # in LAST_DMESG. The check_result() function will use it to > >> + # find new kernel messages since the test started. > >> + local timestamp="$(date --rfc-3339=ns)" > >> + log "livepatch kselftest timestamp: $timestamp" > >> + LAST_DMESG=$(dmesg | grep "livepatch kselftest timestamp: $timestamp") I like this approach. > > Not sure if it not paranoid mode, but still.. > > If the 'log' call is guaranteed synced? AKA would 'grep' always catch the line? I believe that the write is synchronized. The "log" command does: function log() { echo "$1" > /dev/kmsg } The message is stored into the log buffer during the write to /dev/kmsg(). See devkmsg_emit() in devkmsg_write(), where devkmsg_write() is .write_iter callback in struct file_operations. const struct file_operations kmsg_fops = { .open = devkmsg_open, .read = devkmsg_read, .write_iter = devkmsg_write, .llseek = devkmsg_llseek, .poll = devkmsg_poll, .release = devkmsg_release, }; I would explect that the write() callback is called directly when shell is writing "echo" stdout to /dev/kmsg. I hope that all this is sychronous. Everything is in memory. I think that write operations are asynchronous only when the data are written to a disk or another "slow" medium. And dmesg reads the data from the ring buffer as well. It is actually using /dev/kmsg as well by default. Best Regards, Petr
--- dmesg-saved1-rep.txt 2023-12-17 21:08:14.171014218 +0100 +++ dmesg1-rep.txt 2023-12-17 21:06:52.221014218 +0100 @@ -1,3 +1,3 @@ -[ 98.820331] livepatch: 'test_klp_state2': starting patching transition [ 100.031067] livepatch: 'test_klp_state2': completing patching transition [ 284.224335] livepatch: kernel.ftrace_enabled = 1 +[ 284.232921] ===== TEST: basic shadow variable API =====