Message ID | 20240130-tcp-ao-test-key-mgmt-v2-0-d190430a6c60@arista.com (mailing list archive) |
---|---|
Headers | show |
Series | selftests/net: A couple of typos fixes in key-management/rst tests | expand |
On Tue, 30 Jan 2024 03:51:51 +0000 Dmitry Safonov wrote: > Two typo fixes, noticed by Mohammad's review. > And a fix for an issue that got uncovered. I can confirm that all tests pass now :) Thank you!
Hello: This series was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Tue, 30 Jan 2024 03:51:51 +0000 you wrote: > Changes in v2: > - Dropped "selftests/net: Clean-up double assignment", going to send it > to net-next with other changes (Simon) > - Added a patch to rectify RST selftests. > - Link to v1: https://lore.kernel.org/r/20240118-tcp-ao-test-key-mgmt-v1-0-3583ca147113@arista.com > > Two typo fixes, noticed by Mohammad's review. > And a fix for an issue that got uncovered. > > [...] Here is the summary with links: - [v2,1/3] selftests/net: Argument value mismatch when calling verify_counters() https://git.kernel.org/netdev/net/c/d8f5df1fcea5 - [v2,2/3] selftests/net: Rectify key counters checks https://git.kernel.org/netdev/net/c/384aa16d3776 - [v2,3/3] selftests/net: Repair RST passive reset selftest https://git.kernel.org/netdev/net/c/6caf3adcc877 You are awesome, thank you!
On 2/1/24 00:36, Jakub Kicinski wrote: > On Tue, 30 Jan 2024 03:51:51 +0000 Dmitry Safonov wrote: >> Two typo fixes, noticed by Mohammad's review. >> And a fix for an issue that got uncovered. > > I can confirm that all tests pass now :) > Thank you! Thanks Jakub! Please, let me know if there will be other issues with tcp-ao tests :) Going to work on tracepoints and some other TCP-AO stuff for net-next. Thanks, Dmitry
On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote: > Please, let me know if there will be other issues with tcp-ao tests :) > > Going to work on tracepoints and some other TCP-AO stuff for net-next. Since you're being nice and helpful I figured I'll try testing TCP-AO with debug options enabled :) (kernel/configs/debug.config and kernel/configs/x86_debug.config included), that slows things down and causes a bit of flakiness in unsigned-md5-* tests: https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao This has links to outputs: https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0 If it's a timing thing - FWIW we started exporting KSFT_MACHINE_SLOW=yes on the slow runners.
Hi Jakub, On 2/1/24 21:21, Jakub Kicinski wrote: > On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote: >> Please, let me know if there will be other issues with tcp-ao tests :) >> >> Going to work on tracepoints and some other TCP-AO stuff for net-next. > > Since you're being nice and helpful I figured I'll try testing TCP-AO > with debug options enabled :) (kernel/configs/debug.config and > kernel/configs/x86_debug.config included), Haha :) > that slows things down > and causes a bit of flakiness in unsigned-md5-* tests: > > https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao > > This has links to outputs: > https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0 > > If it's a timing thing - FWIW we started exporting > KSFT_MACHINE_SLOW=yes on the slow runners. I think, I know what happens here: # ok 8 AO server (AO_REQUIRED): AO client: counter TCPAOGood increased 4 => 6 # ok 9 AO server (AO_REQUIRED): unsigned client # ok 10 AO server (AO_REQUIRED): unsigned client: counter TCPAORequired increased 1 => 2 # not ok 11 AO server (AO_REQUIRED): unsigned client: Counter netns_ao_good was not expected to increase 7 => 8 for each of tests the server listens at a new port, but re-uses the same namespaces+veth. If the node/machine is quite slow, I guess a segment might have been retransmitted and the test that initiated it had already finished. And as result, the per-namespace counters are incremented, which makes the test fail (IOW, the test expects all segments in ns being dropped). So, I should do one of the options: 1. relax per-namespace checks (the per-socket and per-key counters are checked) 2. unshare(net) + veth setup for each test 3. split the selftest on smaller ones (as they create new net-ns in initialization) I'd probably prefer (2), albeit it slows down that slow machine even more, but I don't think creating 2 net-ns + veth pair per each test would add a lot more overhead even on some rpi board. But let's see, maybe I'll just go with (1) as that's really easy. I'll cook a patch this week. Thanks, Dmitry
On 2/1/24 22:25, Dmitry Safonov wrote: > Hi Jakub, > > On 2/1/24 21:21, Jakub Kicinski wrote: >> On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote: >>> Please, let me know if there will be other issues with tcp-ao tests :) >>> >>> Going to work on tracepoints and some other TCP-AO stuff for net-next. >> >> Since you're being nice and helpful I figured I'll try testing TCP-AO >> with debug options enabled :) (kernel/configs/debug.config and >> kernel/configs/x86_debug.config included), > > Haha :) > >> that slows things down >> and causes a bit of flakiness in unsigned-md5-* tests: >> >> https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao >> >> This has links to outputs: >> https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0 >> >> If it's a timing thing - FWIW we started exporting >> KSFT_MACHINE_SLOW=yes on the slow runners. > > I think, I know what happens here: > > # ok 8 AO server (AO_REQUIRED): AO client: counter TCPAOGood increased 4 > => 6 > # ok 9 AO server (AO_REQUIRED): unsigned client > # ok 10 AO server (AO_REQUIRED): unsigned client: counter TCPAORequired > increased 1 => 2 > # not ok 11 AO server (AO_REQUIRED): unsigned client: Counter > netns_ao_good was not expected to increase 7 => 8 > > for each of tests the server listens at a new port, but re-uses the same > namespaces+veth. If the node/machine is quite slow, I guess a segment > might have been retransmitted and the test that initiated it had already > finished. > And as result, the per-namespace counters are incremented, which makes > the test fail (IOW, the test expects all segments in ns being dropped). > > So, I should do one of the options: > > 1. relax per-namespace checks (the per-socket and per-key counters are > checked) > 2. unshare(net) + veth setup for each test > 3. split the selftest on smaller ones (as they create new net-ns in > initialization) Actually, I think there may be an easier fix: 4. Make sure that client close()s TCP-AO first, making it twsk. And also make sure that net-ns counters read post server's close(). Will do this, let's see if this fixes the flakiness on the netdev bot :) > I'd probably prefer (2), albeit it slows down that slow machine even > more, but I don't think creating 2 net-ns + veth pair per each test > would add a lot more overhead even on some rpi board. But let's see, > maybe I'll just go with (1) as that's really easy. > > I'll cook a patch this week. Thanks, Dmitry
On 2/1/24 23:37, Dmitry Safonov wrote: > On 2/1/24 22:25, Dmitry Safonov wrote: >> Hi Jakub, >> >> On 2/1/24 21:21, Jakub Kicinski wrote: >>> On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote: >>>> Please, let me know if there will be other issues with tcp-ao tests :) >>>> >>>> Going to work on tracepoints and some other TCP-AO stuff for net-next. >>> >>> Since you're being nice and helpful I figured I'll try testing TCP-AO >>> with debug options enabled :) (kernel/configs/debug.config and >>> kernel/configs/x86_debug.config included), >> >> Haha :) >> >>> that slows things down >>> and causes a bit of flakiness in unsigned-md5-* tests: >>> >>> https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao >>> >>> This has links to outputs: >>> https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0 >>> >>> If it's a timing thing - FWIW we started exporting >>> KSFT_MACHINE_SLOW=yes on the slow runners. >> >> I think, I know what happens here: >> >> # ok 8 AO server (AO_REQUIRED): AO client: counter TCPAOGood increased 4 >> => 6 >> # ok 9 AO server (AO_REQUIRED): unsigned client >> # ok 10 AO server (AO_REQUIRED): unsigned client: counter TCPAORequired >> increased 1 => 2 >> # not ok 11 AO server (AO_REQUIRED): unsigned client: Counter >> netns_ao_good was not expected to increase 7 => 8 >> >> for each of tests the server listens at a new port, but re-uses the same >> namespaces+veth. If the node/machine is quite slow, I guess a segment >> might have been retransmitted and the test that initiated it had already >> finished. >> And as result, the per-namespace counters are incremented, which makes >> the test fail (IOW, the test expects all segments in ns being dropped). >> >> So, I should do one of the options: >> >> 1. relax per-namespace checks (the per-socket and per-key counters are >> checked) >> 2. unshare(net) + veth setup for each test >> 3. split the selftest on smaller ones (as they create new net-ns in >> initialization) > > Actually, I think there may be an easier fix: > > 4. Make sure that client close()s TCP-AO first, making it twsk. > And also make sure that net-ns counters read post server's close(). > > Will do this, let's see if this fixes the flakiness on the netdev bot :) FWIW, I ended up with this: https://lore.kernel.org/all/20240202-unsigned-md5-netns-counters-v1-1-8b90c37c0566@arista.com/ I reproduced the issue once, running unsigned-md5* in a loop, while in another terminal building linux-next with all cores. With the patch above, it survived 77 iterations of both ipv4/ipv6 tests so far. So, there is a chance it fixes the issue :) Thanks, Dmitry
On Fri, 2 Feb 2024 02:30:52 +0000 Dmitry Safonov wrote: > > Actually, I think there may be an easier fix: > > > > 4. Make sure that client close()s TCP-AO first, making it twsk. > > And also make sure that net-ns counters read post server's close(). > > > > Will do this, let's see if this fixes the flakiness on the netdev bot :) > > FWIW, I ended up with this: > https://lore.kernel.org/all/20240202-unsigned-md5-netns-counters-v1-1-8b90c37c0566@arista.com/ > > I reproduced the issue once, running unsigned-md5* in a loop, while in > another terminal building linux-next with all cores. > With the patch above, it survived 77 iterations of both ipv4/ipv6 tests > so far. So, there is a chance it fixes the issue :) That was quick! Fingers crossed :)
Changes in v2: - Dropped "selftests/net: Clean-up double assignment", going to send it to net-next with other changes (Simon) - Added a patch to rectify RST selftests. - Link to v1: https://lore.kernel.org/r/20240118-tcp-ao-test-key-mgmt-v1-0-3583ca147113@arista.com Two typo fixes, noticed by Mohammad's review. And a fix for an issue that got uncovered. Signed-off-by: Dmitry Safonov <dima@arista.com> --- Dmitry Safonov (2): selftests/net: Rectify key counters checks selftests/net: Repair RST passive reset selftest Mohammad Nassiri (1): selftests/net: Argument value mismatch when calling verify_counters() .../testing/selftests/net/tcp_ao/key-management.c | 46 ++++--- tools/testing/selftests/net/tcp_ao/lib/sock.c | 12 +- tools/testing/selftests/net/tcp_ao/rst.c | 138 ++++++++++++++------- 3 files changed, 124 insertions(+), 72 deletions(-) --- base-commit: ecb1b8288dc7ccbdcb3b9df005fa1c0e0c0388a7 change-id: 20240118-tcp-ao-test-key-mgmt-bb51a5fe15a2 Best regards,