Message ID | 117a20b1b09addb804b27167fafe1a47bfb2b18e.1707233152.git.pabeni@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] selftests: net: cope with slow env in gro.sh test | expand |
Paolo Abeni wrote: > The gro self-tests sends the packets to be aggregated with > multiple write operations. > > When running is slow environment, it's hard to guarantee that > the GRO engine will wait for the last packet in an intended > train. > > The above causes almost deterministic failures in our CI for > the 'large' test-case. > > Address the issue explicitly ignoring failures for such case > in slow environments (KSFT_MACHINE_SLOW==true). > > Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com>
Hi Paolo, On 06/02/2024 16:27, Paolo Abeni wrote: > The gro self-tests sends the packets to be aggregated with > multiple write operations. > > When running is slow environment, it's hard to guarantee that > the GRO engine will wait for the last packet in an intended > train. > > The above causes almost deterministic failures in our CI for > the 'large' test-case. > > Address the issue explicitly ignoring failures for such case > in slow environments (KSFT_MACHINE_SLOW==true). To what value is KSFT_MACHINE_SLOW set in the CI? Is it set to a different value if the machine is not slow? e.g. KSFT_MACHINE_SLOW == false (please see below) > Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > Note that the fixes tag is there mainly to justify targeting the net > tree, and this is aiming at net to hopefully make the test more stable > ASAP for both trees. > > I experimented with a largish refactory replacing the multiple writes > with a single GSO packet, but exhausted by time budget before reaching > any good result. > --- > tools/testing/selftests/net/gro.sh | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh > index 19352f106c1d..114b5281a3f5 100755 > --- a/tools/testing/selftests/net/gro.sh > +++ b/tools/testing/selftests/net/gro.sh > @@ -31,6 +31,10 @@ run_test() { > 1>>log.txt > wait "${server_pid}" > exit_code=$? > + if [ ${test} == "large" -a -n "${KSFT_MACHINE_SLOW}" ]; then Maybe best to avoid using: -n "${KSFT_MACHINE_SLOW}" Otherwise, we have the same behaviour if KSFT_MACHINE_SLOW is set to 1/yes/true or 0/no/false. But maybe it is fine like that, and what is just missing is adding somewhere how KSFT_MACHINE_SLOW is supposed to be set/used? :) Not linked to that, but a small detail about the new line, just in case you need to send a v2: it looks like it is better to avoid using '-a': https://www.shellcheck.net/wiki/SC2166 (but here, it looks like the usage is fine) > + echo "Ignoring errors due to slow environment" 1>&2 > + exit_code=0 > + fi > if [[ "${exit_code}" -eq 0 ]]; then > break; > fi Cheers, Matt
On Wed, 2024-02-07 at 12:16 +0100, Matthieu Baerts wrote: > Hi Paolo, > > On 06/02/2024 16:27, Paolo Abeni wrote: > > The gro self-tests sends the packets to be aggregated with > > multiple write operations. > > > > When running is slow environment, it's hard to guarantee that > > the GRO engine will wait for the last packet in an intended > > train. > > > > The above causes almost deterministic failures in our CI for > > the 'large' test-case. > > > > Address the issue explicitly ignoring failures for such case > > in slow environments (KSFT_MACHINE_SLOW==true). > > To what value is KSFT_MACHINE_SLOW set in the CI? AFAIK, the CI initialize KSFT_MACHINE_SLOW (to true) only on slow env. > Is it set to a different value if the machine is not slow? e.g. > > KSFT_MACHINE_SLOW == false > > (please see below) > > > Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > Note that the fixes tag is there mainly to justify targeting the net > > tree, and this is aiming at net to hopefully make the test more stable > > ASAP for both trees. > > > > I experimented with a largish refactory replacing the multiple writes > > with a single GSO packet, but exhausted by time budget before reaching > > any good result. > > --- > > tools/testing/selftests/net/gro.sh | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh > > index 19352f106c1d..114b5281a3f5 100755 > > --- a/tools/testing/selftests/net/gro.sh > > +++ b/tools/testing/selftests/net/gro.sh > > @@ -31,6 +31,10 @@ run_test() { > > 1>>log.txt > > wait "${server_pid}" > > exit_code=$? > > + if [ ${test} == "large" -a -n "${KSFT_MACHINE_SLOW}" ]; then > > Maybe best to avoid using: > > -n "${KSFT_MACHINE_SLOW}" > > Otherwise, we have the same behaviour if KSFT_MACHINE_SLOW is set to > 1/yes/true or 0/no/false. For consistency, I followed the logic already in place in commit c41dfb0dfbec ("selftests/net: ignore timing errors in so_txtime if KSFT_MACHINE_SLOW"). > But maybe it is fine like that, and what is just missing is adding > somewhere how KSFT_MACHINE_SLOW is supposed to be set/used? :) > > > Not linked to that, but a small detail about the new line, just in case > you need to send a v2: it looks like it is better to avoid using '-a': > > https://www.shellcheck.net/wiki/SC2166 Thank for the pointer, I was not aware of that. I guess a v2 dropping '-a' would be better. Thanks, Paolo
Hi Paolo, On 07/02/2024 15:35, Paolo Abeni wrote: > On Wed, 2024-02-07 at 12:16 +0100, Matthieu Baerts wrote: >> Hi Paolo, >> >> On 06/02/2024 16:27, Paolo Abeni wrote: >>> The gro self-tests sends the packets to be aggregated with >>> multiple write operations. >>> >>> When running is slow environment, it's hard to guarantee that >>> the GRO engine will wait for the last packet in an intended >>> train. >>> >>> The above causes almost deterministic failures in our CI for >>> the 'large' test-case. >>> >>> Address the issue explicitly ignoring failures for such case >>> in slow environments (KSFT_MACHINE_SLOW==true). >> >> To what value is KSFT_MACHINE_SLOW set in the CI? > > AFAIK, the CI initialize KSFT_MACHINE_SLOW (to true) only on slow env. Should be good, then! >> Is it set to a different value if the machine is not slow? e.g. >> >> KSFT_MACHINE_SLOW == false >> >> (please see below) >> >>> Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") >>> Signed-off-by: Paolo Abeni <pabeni@redhat.com> >>> --- >>> Note that the fixes tag is there mainly to justify targeting the net >>> tree, and this is aiming at net to hopefully make the test more stable >>> ASAP for both trees. >>> >>> I experimented with a largish refactory replacing the multiple writes >>> with a single GSO packet, but exhausted by time budget before reaching >>> any good result. >>> --- >>> tools/testing/selftests/net/gro.sh | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh >>> index 19352f106c1d..114b5281a3f5 100755 >>> --- a/tools/testing/selftests/net/gro.sh >>> +++ b/tools/testing/selftests/net/gro.sh >>> @@ -31,6 +31,10 @@ run_test() { >>> 1>>log.txt >>> wait "${server_pid}" >>> exit_code=$? >>> + if [ ${test} == "large" -a -n "${KSFT_MACHINE_SLOW}" ]; then >> >> Maybe best to avoid using: >> >> -n "${KSFT_MACHINE_SLOW}" >> >> Otherwise, we have the same behaviour if KSFT_MACHINE_SLOW is set to >> 1/yes/true or 0/no/false. > > For consistency, I followed the logic already in place in commit > c41dfb0dfbec ("selftests/net: ignore timing errors in so_txtime if > KSFT_MACHINE_SLOW"). I only checked code in -net, I forgot to look at net-next. Thanks for the pointer! I thought it was "fragile", but if that's how we are supposed to use this env var, that's OK then :) >> But maybe it is fine like that, and what is just missing is adding >> somewhere how KSFT_MACHINE_SLOW is supposed to be set/used? :) >> >> >> Not linked to that, but a small detail about the new line, just in case >> you need to send a v2: it looks like it is better to avoid using '-a': >> >> https://www.shellcheck.net/wiki/SC2166 > > Thank for the pointer, I was not aware of that. > > I guess a v2 dropping '-a' would be better. I'm not even sure a v2 is really needed. "-a" seems OK if you don't use (or don't plan to use) "!" or "-" in the expression from what I read. Another way to fix this is to use [[ ]]: [[ ${test} == "large" && -n "${KSFT_MACHINE_SLOW}" ]] Cheers, Matt
On Tue, 6 Feb 2024 16:27:40 +0100 Paolo Abeni wrote: > The gro self-tests sends the packets to be aggregated with > multiple write operations. > > When running is slow environment, it's hard to guarantee that > the GRO engine will wait for the last packet in an intended > train. > > The above causes almost deterministic failures in our CI for > the 'large' test-case. > > Address the issue explicitly ignoring failures for such case > in slow environments (KSFT_MACHINE_SLOW==true). > > Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > Note that the fixes tag is there mainly to justify targeting the net > tree, and this is aiming at net to hopefully make the test more stable > ASAP for both trees. > > I experimented with a largish refactory replacing the multiple writes > with a single GSO packet, but exhausted by time budget before reaching > any good result. It does make things a lot more stable, but there was still a failure recently: https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/455661/36-gro-sh/stdout :(
On Wed, 7 Feb 2024 18:42:52 -0800 Jakub Kicinski wrote: > On Tue, 6 Feb 2024 16:27:40 +0100 Paolo Abeni wrote: > > The gro self-tests sends the packets to be aggregated with > > multiple write operations. > > > > When running is slow environment, it's hard to guarantee that > > the GRO engine will wait for the last packet in an intended > > train. > > > > The above causes almost deterministic failures in our CI for > > the 'large' test-case. > > > > Address the issue explicitly ignoring failures for such case > > in slow environments (KSFT_MACHINE_SLOW==true). > > > > Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > Note that the fixes tag is there mainly to justify targeting the net > > tree, and this is aiming at net to hopefully make the test more stable > > ASAP for both trees. > > > > I experimented with a largish refactory replacing the multiple writes > > with a single GSO packet, but exhausted by time budget before reaching > > any good result. > > It does make things a lot more stable, but there was still a failure > recently: > > https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/455661/36-gro-sh/stdout > > :( Ah, sorry, I missed the v2. That must have been between v1 and v2.
diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh index 19352f106c1d..114b5281a3f5 100755 --- a/tools/testing/selftests/net/gro.sh +++ b/tools/testing/selftests/net/gro.sh @@ -31,6 +31,10 @@ run_test() { 1>>log.txt wait "${server_pid}" exit_code=$? + if [ ${test} == "large" -a -n "${KSFT_MACHINE_SLOW}" ]; then + echo "Ignoring errors due to slow environment" 1>&2 + exit_code=0 + fi if [[ "${exit_code}" -eq 0 ]]; then break; fi
The gro self-tests sends the packets to be aggregated with multiple write operations. When running is slow environment, it's hard to guarantee that the GRO engine will wait for the last packet in an intended train. The above causes almost deterministic failures in our CI for the 'large' test-case. Address the issue explicitly ignoring failures for such case in slow environments (KSFT_MACHINE_SLOW==true). Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- Note that the fixes tag is there mainly to justify targeting the net tree, and this is aiming at net to hopefully make the test more stable ASAP for both trees. I experimented with a largish refactory replacing the multiple writes with a single GSO packet, but exhausted by time budget before reaching any good result. --- tools/testing/selftests/net/gro.sh | 4 ++++ 1 file changed, 4 insertions(+)