Message ID | 53a7e56424756ef35434bc15a90b256bcf724651.1707407012.git.pabeni@redhat.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] selftests: net: wait for receiver startup in so_txtime.sh | expand |
Paolo Abeni wrote: > The mentioned test is failing in slow environments: > > # SO_TXTIME ipv4 clock monotonic > # ./so_txtime: recv: timeout: Resource temporarily unavailable > not ok 1 selftests: net: so_txtime.sh # exit=1 > > The receiver is started in background and the sender could end-up > transmitting the packet before the receiver is ready, so that the > later recv times out. > > Address the issue explcitly waiting for the socket being bound to > the relevant port. > > Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com>
On Thu, 2024-02-08 at 16:45 +0100, Paolo Abeni wrote: > The mentioned test is failing in slow environments: > > # SO_TXTIME ipv4 clock monotonic > # ./so_txtime: recv: timeout: Resource temporarily unavailable > not ok 1 selftests: net: so_txtime.sh # exit=1 > > The receiver is started in background and the sender could end-up > transmitting the packet before the receiver is ready, so that the > later recv times out. > > Address the issue explcitly waiting for the socket being bound to > the relevant port. > > Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > Note that to really cope with slow env the mentioned self-tests also > need net-next commit c41dfb0dfbec ("selftests/net: ignore timing > errors in so_txtime if KSFT_MACHINE_SLOW"), so this could be applied to > net-next, too Oops... CI is saying the above is not enough... > @@ -65,6 +70,7 @@ do_test() { > > local readonly START="$(date +%s%N --date="+ 0.1 seconds")" > ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${RXARGS}" -r & > + wait_local_port_listen "${NS2}" 8000 "${PROTO}" > ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${TXARGS}" The binary explicitly waits up to $START time, and that conflicts with the wait_local_port_listen, something different is needed. Apparently I was just "lucky" during my local testing. Cheers, Paolo
On Fri, 2024-02-09 at 15:51 +0100, Paolo Abeni wrote: > On Thu, 2024-02-08 at 16:45 +0100, Paolo Abeni wrote: > > The mentioned test is failing in slow environments: > > > > # SO_TXTIME ipv4 clock monotonic > > # ./so_txtime: recv: timeout: Resource temporarily unavailable > > not ok 1 selftests: net: so_txtime.sh # exit=1 > > > > The receiver is started in background and the sender could end-up > > transmitting the packet before the receiver is ready, so that the > > later recv times out. > > > > Address the issue explcitly waiting for the socket being bound to > > the relevant port. > > > > Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ") > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > Note that to really cope with slow env the mentioned self-tests also > > need net-next commit c41dfb0dfbec ("selftests/net: ignore timing > > errors in so_txtime if KSFT_MACHINE_SLOW"), so this could be applied to > > net-next, too > > Oops... CI is saying the above is not enough... > > > @@ -65,6 +70,7 @@ do_test() { > > > > local readonly START="$(date +%s%N --date="+ 0.1 seconds")" > > ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${RXARGS}" -r & > > + wait_local_port_listen "${NS2}" 8000 "${PROTO}" > > ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${TXARGS}" > > The binary explicitly waits up to $START time, and that conflicts with > the wait_local_port_listen, something different is needed. Apparently I > was just "lucky" during my local testing. I experimented a few different solutions and so far the only option that gave some positive result is increasing start delay and the etf delta by an order of magnitude, see below. But I'm pretty sure that even with that there will be sporadic failures in slow enough environments. When the host-induced jitter/delay is high enough, packets are dropped and there are functional failures. I'm wondering if we should skip this test entirely when KSFT_MACHINE_SLOW=yes. Do you see any other options? Paolo --- diff --git a/tools/testing/selftests/net/so_txtime.sh b/tools/testing/selftests/net/so_txtime.sh index 3f06f4d286a9..6445580f0a66 100755 --- a/tools/testing/selftests/net/so_txtime.sh +++ b/tools/testing/selftests/net/so_txtime.sh @@ -63,7 +63,9 @@ do_test() { exit 1 fi - local readonly START="$(date +%s%N --date="+ 0.1 seconds")" + local delta=0.1 + [ -n "${KSFT_MACHINE_SLOW}" ] && delta=1 + local readonly START="$(date +%s%N --date="+ ${delta} seconds")" ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${RXARGS}" -r & ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${TXARGS}" wait "$!" @@ -76,7 +78,9 @@ do_test 6 mono a,10 a,10 do_test 4 mono a,10,b,20 a,10,b,20 do_test 6 mono a,20,b,10 b,20,a,20 -if ip netns exec "${NS1}" tc qdisc replace dev "${DEV}" root etf clockid CLOCK_TAI delta 400000; then +delta=400000 +[ -n "${KSFT_MACHINE_SLOW}" ] && delta=$((delta*10)) +if ip netns exec "${NS1}" tc qdisc replace dev "${DEV}" root etf clockid CLOCK_TAI delta "${delta}"; then ! do_test 4 tai a,-1 a,-1 ! do_test 6 tai a,0 a,0 do_test 6 tai a,10 a,10
On Fri, 09 Feb 2024 17:45:28 +0100 Paolo Abeni wrote: > But I'm pretty sure that even with that there will be sporadic failures > in slow enough environments. > > When the host-induced jitter/delay is high enough, packets are dropped > and there are functional failures. I'm wondering if we should skip this > test entirely when KSFT_MACHINE_SLOW=yes. By skip do you mean the same approach as to the gro test? Ignore errors? Because keeping the code coverage for KASAN etc. would still be good (stating the obvious, sorry).
On Fri, 2024-02-09 at 11:17 -0800, Jakub Kicinski wrote: > On Fri, 09 Feb 2024 17:45:28 +0100 Paolo Abeni wrote: > > But I'm pretty sure that even with that there will be sporadic failures > > in slow enough environments. > > > > When the host-induced jitter/delay is high enough, packets are dropped > > and there are functional failures. I'm wondering if we should skip this > > test entirely when KSFT_MACHINE_SLOW=yes. > > By skip do you mean the same approach as to the gro test? > Ignore errors? Because keeping the code coverage for KASAN etc. > would still be good (stating the obvious, sorry). I see my wording was not clear/misleading, I'm sorry. Yes, I mean checking KSFT_MACHINE_SLOW in the caller script and ignoring errors. Cheers, Paolo
diff --git a/tools/testing/selftests/net/so_txtime.sh b/tools/testing/selftests/net/so_txtime.sh index 3f06f4d286a9..ade0e5755099 100755 --- a/tools/testing/selftests/net/so_txtime.sh +++ b/tools/testing/selftests/net/so_txtime.sh @@ -5,6 +5,8 @@ set -e +source net_helper.sh + readonly DEV="veth0" readonly BIN="./so_txtime" @@ -51,13 +53,16 @@ do_test() { local readonly CLOCK="$2" local readonly TXARGS="$3" local readonly RXARGS="$4" + local PROTO if [[ "${IP}" == "4" ]]; then local readonly SADDR="${SADDR4}" local readonly DADDR="${DADDR4}" + PROTO=udp elif [[ "${IP}" == "6" ]]; then local readonly SADDR="${SADDR6}" local readonly DADDR="${DADDR6}" + PROTO=udp6 else echo "Invalid IP version ${IP}" exit 1 @@ -65,6 +70,7 @@ do_test() { local readonly START="$(date +%s%N --date="+ 0.1 seconds")" ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${RXARGS}" -r & + wait_local_port_listen "${NS2}" 8000 "${PROTO}" ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${TXARGS}" wait "$!" }
The mentioned test is failing in slow environments: # SO_TXTIME ipv4 clock monotonic # ./so_txtime: recv: timeout: Resource temporarily unavailable not ok 1 selftests: net: so_txtime.sh # exit=1 The receiver is started in background and the sender could end-up transmitting the packet before the receiver is ready, so that the later recv times out. Address the issue explcitly waiting for the socket being bound to the relevant port. Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ") Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- Note that to really cope with slow env the mentioned self-tests also need net-next commit c41dfb0dfbec ("selftests/net: ignore timing errors in so_txtime if KSFT_MACHINE_SLOW"), so this could be applied to net-next, too --- tools/testing/selftests/net/so_txtime.sh | 6 ++++++ 1 file changed, 6 insertions(+)