Message ID | 20240116154926.202164-1-bpoirier@nvidia.com (mailing list archive) |
---|---|
State | Accepted |
Commit | dd2d40acdbb2b9e6bcddd5424b0e00c1760ecf26 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] selftests: bonding: Add more missing config options | expand |
On Tue, 16 Jan 2024 10:49:26 -0500 Benjamin Poirier wrote: > As a followup to commit 03fb8565c880 ("selftests: bonding: add missing > build configs"), add more networking-specific config options which are > needed for bonding tests. > > For testing, I used the minimal config generated by virtme-ng and I added > the options in the config file. All bonding tests passed. > > Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") # for ipv6 > Fixes: 6cbe791c0f4e ("kselftest: bonding: add num_grat_arp test") # for tc options > Fixes: 222c94ec0ad4 ("selftests: bonding: add tests for ether type changes") # for nlmon > Suggested-by: Jakub Kicinski <kuba@kernel.org> > Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> With this applied the only remaining bonding test which fails in our CI is bond-options [1], good progress! :) Looks like it doesn't finish in time: not ok 7 selftests: drivers/net/bonding: bond_options.sh # TIMEOUT 120 seconds The tests run in a VM without HW virtualization support. Any opinions about bumping the timeout for bonding? If we enable KASAN etc. things will get even slower. [1] https://netdev-2.bots.linux.dev/vmksft-bonding/results/423800/7-bond-options-sh
Jakub Kicinski <kuba@kernel.org> wrote: >On Tue, 16 Jan 2024 10:49:26 -0500 Benjamin Poirier wrote: >> As a followup to commit 03fb8565c880 ("selftests: bonding: add missing >> build configs"), add more networking-specific config options which are >> needed for bonding tests. >> >> For testing, I used the minimal config generated by virtme-ng and I added >> the options in the config file. All bonding tests passed. >> >> Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") # for ipv6 >> Fixes: 6cbe791c0f4e ("kselftest: bonding: add num_grat_arp test") # for tc options >> Fixes: 222c94ec0ad4 ("selftests: bonding: add tests for ether type changes") # for nlmon >> Suggested-by: Jakub Kicinski <kuba@kernel.org> >> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> > >With this applied the only remaining bonding test which fails in our CI >is bond-options [1], good progress! :) Looks like it doesn't finish in >time: > >not ok 7 selftests: drivers/net/bonding: bond_options.sh # TIMEOUT 120 seconds > >The tests run in a VM without HW virtualization support. Any opinions >about bumping the timeout for bonding? If we enable KASAN etc. things >will get even slower. Reading the grat_arp test, it looks like it has long sleep times built into it: garp_test() { [...] exp_num=$(echo "${param}" | cut -f6 -d ' ') sleep $((exp_num + 2)) num_grat_arp() { local val for val in 10 20 30 50; do garp_test "mode active-backup miimon 100 num_grat_arp $val peer_notify_delay 1000" If I'm reading it right, this will sleep for 12, 22, 32 and 52 seconds for the passes through the loop in num_grat_arp(), so that would be 118 seconds just for that. -J >[1] >https://netdev-2.bots.linux.dev/vmksft-bonding/results/423800/7-bond-options-sh --- -Jay Vosburgh, jay.vosburgh@canonical.com
On Tue, 16 Jan 2024 11:03:30 -0800 Jay Vosburgh wrote: > If I'm reading it right, this will sleep for 12, 22, 32 and 52 > seconds for the passes through the loop in num_grat_arp(), so that would > be 118 seconds just for that. Hah, that's really tight if the default timeout is 120. Let me send a patch to bump it to 300..
On 2024-01-16 11:20 -0800, Jakub Kicinski wrote: > On Tue, 16 Jan 2024 11:03:30 -0800 Jay Vosburgh wrote: > > If I'm reading it right, this will sleep for 12, 22, 32 and 52 > > seconds for the passes through the loop in num_grat_arp(), so that would > > be 118 seconds just for that. > > Hah, that's really tight if the default timeout is 120. > Let me send a patch to bump it to 300.. I also ran into that timeout when using `make run_tests` so I executed the test script directly: # time ./bond_options.sh TEST: prio (active-backup miimon primary_reselect 0) [ OK ] TEST: prio (active-backup miimon primary_reselect 1) [ OK ] TEST: prio (active-backup miimon primary_reselect 2) [ OK ] TEST: prio (active-backup arp_ip_target primary_reselect 0) [ OK ] TEST: prio (active-backup arp_ip_target primary_reselect 1) [ OK ] TEST: prio (active-backup arp_ip_target primary_reselect 2) [ OK ] TEST: prio (active-backup ns_ip6_target primary_reselect 0) [ OK ] TEST: prio (active-backup ns_ip6_target primary_reselect 1) [ OK ] TEST: prio (active-backup ns_ip6_target primary_reselect 2) [ OK ] TEST: prio (balance-tlb miimon primary_reselect 0) [ OK ] TEST: prio (balance-tlb miimon primary_reselect 1) [ OK ] TEST: prio (balance-tlb miimon primary_reselect 2) [ OK ] TEST: prio (balance-tlb arp_ip_target primary_reselect 0) [ OK ] TEST: prio (balance-tlb arp_ip_target primary_reselect 1) [ OK ] TEST: prio (balance-tlb arp_ip_target primary_reselect 2) [ OK ] TEST: prio (balance-tlb ns_ip6_target primary_reselect 0) [ OK ] TEST: prio (balance-tlb ns_ip6_target primary_reselect 1) [ OK ] TEST: prio (balance-tlb ns_ip6_target primary_reselect 2) [ OK ] TEST: prio (balance-alb miimon primary_reselect 0) [ OK ] TEST: prio (balance-alb miimon primary_reselect 1) [ OK ] TEST: prio (balance-alb miimon primary_reselect 2) [ OK ] TEST: prio (balance-alb arp_ip_target primary_reselect 0) [ OK ] TEST: prio (balance-alb arp_ip_target primary_reselect 1) [ OK ] TEST: prio (balance-alb arp_ip_target primary_reselect 2) [ OK ] TEST: prio (balance-alb ns_ip6_target primary_reselect 0) [ OK ] TEST: prio (balance-alb ns_ip6_target primary_reselect 1) [ OK ] TEST: prio (balance-alb ns_ip6_target primary_reselect 2) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 0) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 1) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 2) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 3) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 4) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 5) [ OK ] TEST: arp_validate (active-backup arp_ip_target arp_validate 6) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 0) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 1) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 2) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 3) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 4) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 5) [ OK ] TEST: arp_validate (active-backup ns_ip6_target arp_validate 6) [ OK ] TEST: num_grat_arp (active-backup miimon num_grat_arp 10) [ OK ] TEST: num_grat_arp (active-backup miimon num_grat_arp 20) [ OK ] TEST: num_grat_arp (active-backup miimon num_grat_arp 30) [ OK ] TEST: num_grat_arp (active-backup miimon num_grat_arp 50) [ OK ] real 13m35.065s user 0m1.657s sys 0m27.918s The test is not cpu bound; as Jay pointed out, it spends most of its time sleeping.
On Tue, 16 Jan 2024 14:21:51 -0500 Benjamin Poirier wrote: > real 13m35.065s > user 0m1.657s > sys 0m27.918s > > The test is not cpu bound; as Jay pointed out, it spends most of its > time sleeping. Ugh, so it does multiple iterations of 118 sec? Could you send a patch to bump the timeout to 900 or 1200 in this case?
Jakub Kicinski <kuba@kernel.org> wrote: >On Tue, 16 Jan 2024 14:21:51 -0500 Benjamin Poirier wrote: >> real 13m35.065s >> user 0m1.657s >> sys 0m27.918s >> >> The test is not cpu bound; as Jay pointed out, it spends most of its >> time sleeping. > >Ugh, so it does multiple iterations of 118 sec? > >Could you send a patch to bump the timeout to 900 or 1200 in this case? We could also lower the interval or number of notifications; right now "peer_notif_delay 1000" puts 1 second between the ARPs in the num_grat_arp() test. I'm not sure why that value was chosen, but the peer_notify_delay is rounded to units of the miimon interval, and in this test miimon is 100 msec. I haven't tested this at all, but something like diff --git a/tools/testing/selftests/drivers/net/bonding/bond_options.sh b/tools/testing/selftests/drivers/net/bonding/bond_options.sh index c54d1697f439..95eb77aebc3c 100755 --- a/tools/testing/selftests/drivers/net/bonding/bond_options.sh +++ b/tools/testing/selftests/drivers/net/bonding/bond_options.sh @@ -277,7 +277,7 @@ garp_test() ip -n ${s_ns} link set ${active_slave} down exp_num=$(echo "${param}" | cut -f6 -d ' ') - sleep $((exp_num + 2)) + sleep $((exp_num / 5 + 2)) active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave") @@ -297,7 +297,7 @@ num_grat_arp() { local val for val in 10 20 30 50; do - garp_test "mode active-backup miimon 100 num_grat_arp $val peer_notify_delay 1000" + garp_test "mode active-backup miimon 100 num_grat_arp $val peer_notify_delay 200" log_test "num_grat_arp" "active-backup miimon num_grat_arp $val" done } could substantially reduce the time to run the test. It's kind of icky with magic numbers, but that could be cleaned up. -J --- -Jay Vosburgh, jay.vosburgh@canonical.com
On 2024-01-16 11:29 -0800, Jakub Kicinski wrote: > On Tue, 16 Jan 2024 14:21:51 -0500 Benjamin Poirier wrote: > > real 13m35.065s > > user 0m1.657s > > sys 0m27.918s > > > > The test is not cpu bound; as Jay pointed out, it spends most of its > > time sleeping. > > Ugh, so it does multiple iterations of 118 sec? There are other test functions in the script which include a lot of sleeping. > Could you send a patch to bump the timeout to 900 or 1200 in this case? Sure but I'd like to give a chance for Hangbin to reply first. Would the test be just as good if it was shortened by removing some cases or reducing the time intervals? Or is increasing the timeout the best approach?
On Tue, Jan 16, 2024 at 02:47:46PM -0500, Benjamin Poirier wrote: > On 2024-01-16 11:29 -0800, Jakub Kicinski wrote: > > On Tue, 16 Jan 2024 14:21:51 -0500 Benjamin Poirier wrote: > > > real 13m35.065s > > > user 0m1.657s > > > sys 0m27.918s > > > > > > The test is not cpu bound; as Jay pointed out, it spends most of its > > > time sleeping. > > > > Ugh, so it does multiple iterations of 118 sec? > > There are other test functions in the script which include a lot of > sleeping. The arp_validate_test need to check the mii_status, which sleep too much time. Maybe we can use busywait to save more time. > > > Could you send a patch to bump the timeout to 900 or 1200 in this case? > > Sure but I'd like to give a chance for Hangbin to reply first. Would the > test be just as good if it was shortened by removing some cases or > reducing the time intervals? Or is increasing the timeout the best > approach? The purpose of grat_arp is testing commit 9949e2efb54e ("bonding: fix send_peer_notif overflow"). As the send_peer_notif was defined to u8, to overflow it, we need to send_peer_notif = num_peer_notif * peer_notif_delay = num_grat_arp * peer_notify_delay / miimon > 255 (kernel) (kernel parameter) (user parameter) e.g. 30 (num_grat_arp) * 1000 (peer_notify_delay) / 100 (miimon) > 255. Which need 30s to complete sending garp messages. To save the testing time, the only way is reduce the miimon number. Something like 30 (num_grat_arp) * 500 (peer_notify_delay) / 50 (miimon) > 255. To save more time, we can remove the 50 num_grat_arp testing. The patch would like diff --git a/tools/testing/selftests/drivers/net/bonding/bond_options.sh b/tools/testing/selftests/drivers/net/bonding/bond_options.sh index c54d1697f439..20c4d862c436 100755 --- a/tools/testing/selftests/drivers/net/bonding/bond_options.sh +++ b/tools/testing/selftests/drivers/net/bonding/bond_options.sh @@ -277,7 +277,7 @@ garp_test() ip -n ${s_ns} link set ${active_slave} down exp_num=$(echo "${param}" | cut -f6 -d ' ') - sleep $((exp_num + 2)) + sleep $((exp_num / 2 + 2)) active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave") @@ -296,8 +296,8 @@ garp_test() num_grat_arp() { local val - for val in 10 20 30 50; do - garp_test "mode active-backup miimon 100 num_grat_arp $val peer_notify_delay 1000" + for val in 10 20 30; do + garp_test "mode active-backup miimon 50 num_grat_arp $val peer_notify_delay 500" log_test "num_grat_arp" "active-backup miimon num_grat_arp $val" done } With this we can save 100s. Thanks Hangbin
On 2024-01-17 11:15 +0800, Hangbin Liu wrote: > On Tue, Jan 16, 2024 at 02:47:46PM -0500, Benjamin Poirier wrote: > > On 2024-01-16 11:29 -0800, Jakub Kicinski wrote: > > > On Tue, 16 Jan 2024 14:21:51 -0500 Benjamin Poirier wrote: > > > > real 13m35.065s > > > > user 0m1.657s > > > > sys 0m27.918s > > > > > > > > The test is not cpu bound; as Jay pointed out, it spends most of its > > > > time sleeping. > > > > > > Ugh, so it does multiple iterations of 118 sec? > > > > There are other test functions in the script which include a lot of > > sleeping. > > The arp_validate_test need to check the mii_status, which sleep too much time. > Maybe we can use busywait to save more time. > > > > > > Could you send a patch to bump the timeout to 900 or 1200 in this case? > > > > Sure but I'd like to give a chance for Hangbin to reply first. Would the > > test be just as good if it was shortened by removing some cases or > > reducing the time intervals? Or is increasing the timeout the best > > approach? > > The purpose of grat_arp is testing commit 9949e2efb54e ("bonding: fix > send_peer_notif overflow"). As the send_peer_notif was defined to u8, > to overflow it, we need to > > send_peer_notif = num_peer_notif * peer_notif_delay = num_grat_arp * peer_notify_delay / miimon > 255 > (kernel) (kernel parameter) (user parameter) > > e.g. 30 (num_grat_arp) * 1000 (peer_notify_delay) / 100 (miimon) > 255. > > Which need 30s to complete sending garp messages. To save the testing time, > the only way is reduce the miimon number. Something like > 30 (num_grat_arp) * 500 (peer_notify_delay) / 50 (miimon) > 255. > > To save more time, we can remove the 50 num_grat_arp testing. The patch would > like > > diff --git a/tools/testing/selftests/drivers/net/bonding/bond_options.sh b/tools/testing/selftests/drivers/net/bonding/bond_options.sh > index c54d1697f439..20c4d862c436 100755 > --- a/tools/testing/selftests/drivers/net/bonding/bond_options.sh > +++ b/tools/testing/selftests/drivers/net/bonding/bond_options.sh > @@ -277,7 +277,7 @@ garp_test() > ip -n ${s_ns} link set ${active_slave} down > > exp_num=$(echo "${param}" | cut -f6 -d ' ') > - sleep $((exp_num + 2)) > + sleep $((exp_num / 2 + 2)) > > active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave") > > @@ -296,8 +296,8 @@ garp_test() > num_grat_arp() > { > local val > - for val in 10 20 30 50; do > - garp_test "mode active-backup miimon 100 num_grat_arp $val peer_notify_delay 1000" > + for val in 10 20 30; do > + garp_test "mode active-backup miimon 50 num_grat_arp $val peer_notify_delay 500" > log_test "num_grat_arp" "active-backup miimon num_grat_arp $val" > done > } > > With this we can save 100s. > Thanks for looking into it. This change got the runtime down from 13m35s to 12m7s on the same system I used to test yesterday. That's a start but since it's still well above the current timeout of 120s, I sent a patch to increase the timeout to 1200s.
Hello: This patch was applied to netdev/net.git (main) by Paolo Abeni <pabeni@redhat.com>: On Tue, 16 Jan 2024 10:49:26 -0500 you wrote: > As a followup to commit 03fb8565c880 ("selftests: bonding: add missing > build configs"), add more networking-specific config options which are > needed for bonding tests. > > For testing, I used the minimal config generated by virtme-ng and I added > the options in the config file. All bonding tests passed. > > [...] Here is the summary with links: - [net] selftests: bonding: Add more missing config options https://git.kernel.org/netdev/net/c/dd2d40acdbb2 You are awesome, thank you!
diff --git a/tools/testing/selftests/drivers/net/bonding/config b/tools/testing/selftests/drivers/net/bonding/config index f85b16fc5128..899d7fb6ea8e 100644 --- a/tools/testing/selftests/drivers/net/bonding/config +++ b/tools/testing/selftests/drivers/net/bonding/config @@ -1,5 +1,10 @@ CONFIG_BONDING=y CONFIG_BRIDGE=y CONFIG_DUMMY=y +CONFIG_IPV6=y CONFIG_MACVLAN=y +CONFIG_NET_ACT_GACT=y +CONFIG_NET_CLS_FLOWER=y +CONFIG_NET_SCH_INGRESS=y +CONFIG_NLMON=y CONFIG_VETH=y
As a followup to commit 03fb8565c880 ("selftests: bonding: add missing build configs"), add more networking-specific config options which are needed for bonding tests. For testing, I used the minimal config generated by virtme-ng and I added the options in the config file. All bonding tests passed. Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") # for ipv6 Fixes: 6cbe791c0f4e ("kselftest: bonding: add num_grat_arp test") # for tc options Fixes: 222c94ec0ad4 ("selftests: bonding: add tests for ether type changes") # for nlmon Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> --- tools/testing/selftests/drivers/net/bonding/config | 5 +++++ 1 file changed, 5 insertions(+)