diff mbox series

[net] selftests: net: wait for receiver startup in so_txtime.sh

Message ID 53a7e56424756ef35434bc15a90b256bcf724651.1707407012.git.pabeni@redhat.com (mailing list archive)
State New
Headers show
Series [net] selftests: net: wait for receiver startup in so_txtime.sh | expand

Commit Message

Paolo Abeni Feb. 8, 2024, 3:45 p.m. UTC
The mentioned test is failing in slow environments:

  # SO_TXTIME ipv4 clock monotonic
  # ./so_txtime: recv: timeout: Resource temporarily unavailable
  not ok 1 selftests: net: so_txtime.sh # exit=1

The receiver is started in background and the sender could end-up
transmitting the packet before the receiver is ready, so that the
later recv times out.

Address the issue explcitly waiting for the socket being bound to
the relevant port.

Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Note that to really cope with slow env the mentioned self-tests also
need net-next commit c41dfb0dfbec ("selftests/net: ignore timing
errors in so_txtime if KSFT_MACHINE_SLOW"), so this could be applied to
net-next, too
---
 tools/testing/selftests/net/so_txtime.sh | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Willem de Bruijn Feb. 8, 2024, 8:39 p.m. UTC | #1
Paolo Abeni wrote:
> The mentioned test is failing in slow environments:
> 
>   # SO_TXTIME ipv4 clock monotonic
>   # ./so_txtime: recv: timeout: Resource temporarily unavailable
>   not ok 1 selftests: net: so_txtime.sh # exit=1
> 
> The receiver is started in background and the sender could end-up
> transmitting the packet before the receiver is ready, so that the
> later recv times out.
> 
> Address the issue explcitly waiting for the socket being bound to
> the relevant port.
> 
> Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>
Paolo Abeni Feb. 9, 2024, 2:51 p.m. UTC | #2
On Thu, 2024-02-08 at 16:45 +0100, Paolo Abeni wrote:
> The mentioned test is failing in slow environments:
> 
>   # SO_TXTIME ipv4 clock monotonic
>   # ./so_txtime: recv: timeout: Resource temporarily unavailable
>   not ok 1 selftests: net: so_txtime.sh # exit=1
> 
> The receiver is started in background and the sender could end-up
> transmitting the packet before the receiver is ready, so that the
> later recv times out.
> 
> Address the issue explcitly waiting for the socket being bound to
> the relevant port.
> 
> Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> Note that to really cope with slow env the mentioned self-tests also
> need net-next commit c41dfb0dfbec ("selftests/net: ignore timing
> errors in so_txtime if KSFT_MACHINE_SLOW"), so this could be applied to
> net-next, too

Oops... CI is saying the above is not enough...

> @@ -65,6 +70,7 @@ do_test() {
>  
>  	local readonly START="$(date +%s%N --date="+ 0.1 seconds")"
>  	ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${RXARGS}" -r &
> +	wait_local_port_listen "${NS2}" 8000 "${PROTO}"
>  	ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${TXARGS}"

The binary explicitly waits up to $START time, and that conflicts with
the wait_local_port_listen, something different is needed. Apparently I
was just "lucky" during my local testing.

Cheers,

Paolo
Paolo Abeni Feb. 9, 2024, 4:45 p.m. UTC | #3
On Fri, 2024-02-09 at 15:51 +0100, Paolo Abeni wrote:
> On Thu, 2024-02-08 at 16:45 +0100, Paolo Abeni wrote:
> > The mentioned test is failing in slow environments:
> > 
> >   # SO_TXTIME ipv4 clock monotonic
> >   # ./so_txtime: recv: timeout: Resource temporarily unavailable
> >   not ok 1 selftests: net: so_txtime.sh # exit=1
> > 
> > The receiver is started in background and the sender could end-up
> > transmitting the packet before the receiver is ready, so that the
> > later recv times out.
> > 
> > Address the issue explcitly waiting for the socket being bound to
> > the relevant port.
> > 
> > Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ")
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > Note that to really cope with slow env the mentioned self-tests also
> > need net-next commit c41dfb0dfbec ("selftests/net: ignore timing
> > errors in so_txtime if KSFT_MACHINE_SLOW"), so this could be applied to
> > net-next, too
> 
> Oops... CI is saying the above is not enough...
> 
> > @@ -65,6 +70,7 @@ do_test() {
> >  
> >  	local readonly START="$(date +%s%N --date="+ 0.1 seconds")"
> >  	ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${RXARGS}" -r &
> > +	wait_local_port_listen "${NS2}" 8000 "${PROTO}"
> >  	ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${TXARGS}"
> 
> The binary explicitly waits up to $START time, and that conflicts with
> the wait_local_port_listen, something different is needed. Apparently I
> was just "lucky" during my local testing.

I experimented a few different solutions and so far the only option
that gave some positive result is increasing start delay and the etf
delta by an order of magnitude, see below.

But I'm pretty sure that even with that there will be sporadic failures
in slow enough environments.

When the host-induced jitter/delay is high enough, packets are dropped
and there are functional failures. I'm wondering if we should skip this
test entirely when KSFT_MACHINE_SLOW=yes.

Do you see any other options?

Paolo

---
diff --git a/tools/testing/selftests/net/so_txtime.sh b/tools/testing/selftests/net/so_txtime.sh
index 3f06f4d286a9..6445580f0a66 100755
--- a/tools/testing/selftests/net/so_txtime.sh
+++ b/tools/testing/selftests/net/so_txtime.sh
@@ -63,7 +63,9 @@ do_test() {
 		exit 1
 	fi
 
-	local readonly START="$(date +%s%N --date="+ 0.1 seconds")"
+	local delta=0.1
+	[ -n "${KSFT_MACHINE_SLOW}" ] && delta=1
+	local readonly START="$(date +%s%N --date="+ ${delta} seconds")"
 	ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${RXARGS}" -r &
 	ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${TXARGS}"
 	wait "$!"
@@ -76,7 +78,9 @@ do_test 6 mono a,10 a,10
 do_test 4 mono a,10,b,20 a,10,b,20
 do_test 6 mono a,20,b,10 b,20,a,20
 
-if ip netns exec "${NS1}" tc qdisc replace dev "${DEV}" root etf clockid CLOCK_TAI delta 400000; then
+delta=400000
+[ -n "${KSFT_MACHINE_SLOW}" ] && delta=$((delta*10))
+if ip netns exec "${NS1}" tc qdisc replace dev "${DEV}" root etf clockid CLOCK_TAI delta "${delta}"; then
 	! do_test 4 tai a,-1 a,-1
 	! do_test 6 tai a,0 a,0
 	do_test 6 tai a,10 a,10
Jakub Kicinski Feb. 9, 2024, 7:17 p.m. UTC | #4
On Fri, 09 Feb 2024 17:45:28 +0100 Paolo Abeni wrote:
> But I'm pretty sure that even with that there will be sporadic failures
> in slow enough environments.
> 
> When the host-induced jitter/delay is high enough, packets are dropped
> and there are functional failures. I'm wondering if we should skip this
> test entirely when KSFT_MACHINE_SLOW=yes.

By skip do you mean the same approach as to the gro test?
Ignore errors? Because keeping the code coverage for KASAN etc.
would still be good (stating the obvious, sorry).
Paolo Abeni Feb. 12, 2024, 7:36 a.m. UTC | #5
On Fri, 2024-02-09 at 11:17 -0800, Jakub Kicinski wrote:
> On Fri, 09 Feb 2024 17:45:28 +0100 Paolo Abeni wrote:
> > But I'm pretty sure that even with that there will be sporadic failures
> > in slow enough environments.
> > 
> > When the host-induced jitter/delay is high enough, packets are dropped
> > and there are functional failures. I'm wondering if we should skip this
> > test entirely when KSFT_MACHINE_SLOW=yes.
> 
> By skip do you mean the same approach as to the gro test?
> Ignore errors? Because keeping the code coverage for KASAN etc.
> would still be good (stating the obvious, sorry).

I see my wording was not clear/misleading, I'm sorry. Yes, I mean
checking KSFT_MACHINE_SLOW in the caller script and ignoring errors.

Cheers,

Paolo
diff mbox series

Patch

diff --git a/tools/testing/selftests/net/so_txtime.sh b/tools/testing/selftests/net/so_txtime.sh
index 3f06f4d286a9..ade0e5755099 100755
--- a/tools/testing/selftests/net/so_txtime.sh
+++ b/tools/testing/selftests/net/so_txtime.sh
@@ -5,6 +5,8 @@ 
 
 set -e
 
+source net_helper.sh
+
 readonly DEV="veth0"
 readonly BIN="./so_txtime"
 
@@ -51,13 +53,16 @@  do_test() {
 	local readonly CLOCK="$2"
 	local readonly TXARGS="$3"
 	local readonly RXARGS="$4"
+	local PROTO
 
 	if [[ "${IP}" == "4" ]]; then
 		local readonly SADDR="${SADDR4}"
 		local readonly DADDR="${DADDR4}"
+		PROTO=udp
 	elif [[ "${IP}" == "6" ]]; then
 		local readonly SADDR="${SADDR6}"
 		local readonly DADDR="${DADDR6}"
+		PROTO=udp6
 	else
 		echo "Invalid IP version ${IP}"
 		exit 1
@@ -65,6 +70,7 @@  do_test() {
 
 	local readonly START="$(date +%s%N --date="+ 0.1 seconds")"
 	ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${RXARGS}" -r &
+	wait_local_port_listen "${NS2}" 8000 "${PROTO}"
 	ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${TXARGS}"
 	wait "$!"
 }