diff mbox series

[net] selftests: net: cope with slow env in gro.sh test

Message ID 117a20b1b09addb804b27167fafe1a47bfb2b18e.1707233152.git.pabeni@redhat.com (mailing list archive)
State New
Headers show
Series [net] selftests: net: cope with slow env in gro.sh test | expand

Commit Message

Paolo Abeni Feb. 6, 2024, 3:27 p.m. UTC
The gro self-tests sends the packets to be aggregated with
multiple write operations.

When running is slow environment, it's hard to guarantee that
the GRO engine will wait for the last packet in an intended
train.

The above causes almost deterministic failures in our CI for
the 'large' test-case.

Address the issue explicitly ignoring failures for such case
in slow environments (KSFT_MACHINE_SLOW==true).

Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Note that the fixes tag is there mainly to justify targeting the net
tree, and this is aiming at net to hopefully make the test more stable
ASAP for both trees.

I experimented with a largish refactory replacing the multiple writes
with a single GSO packet, but exhausted by time budget before reaching
any good result.
---
 tools/testing/selftests/net/gro.sh | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Willem de Bruijn Feb. 6, 2024, 3:46 p.m. UTC | #1
Paolo Abeni wrote:
> The gro self-tests sends the packets to be aggregated with
> multiple write operations.
> 
> When running is slow environment, it's hard to guarantee that
> the GRO engine will wait for the last packet in an intended
> train.
> 
> The above causes almost deterministic failures in our CI for
> the 'large' test-case.
> 
> Address the issue explicitly ignoring failures for such case
> in slow environments (KSFT_MACHINE_SLOW==true).
> 
> Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>
Matthieu Baerts Feb. 7, 2024, 11:16 a.m. UTC | #2
Hi Paolo,

On 06/02/2024 16:27, Paolo Abeni wrote:
> The gro self-tests sends the packets to be aggregated with
> multiple write operations.
> 
> When running is slow environment, it's hard to guarantee that
> the GRO engine will wait for the last packet in an intended
> train.
> 
> The above causes almost deterministic failures in our CI for
> the 'large' test-case.
> 
> Address the issue explicitly ignoring failures for such case
> in slow environments (KSFT_MACHINE_SLOW==true).

To what value is KSFT_MACHINE_SLOW set in the CI?

Is it set to a different value if the machine is not slow? e.g.

  KSFT_MACHINE_SLOW == false

(please see below)

> Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> Note that the fixes tag is there mainly to justify targeting the net
> tree, and this is aiming at net to hopefully make the test more stable
> ASAP for both trees.
> 
> I experimented with a largish refactory replacing the multiple writes
> with a single GSO packet, but exhausted by time budget before reaching
> any good result.
> ---
>  tools/testing/selftests/net/gro.sh | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh
> index 19352f106c1d..114b5281a3f5 100755
> --- a/tools/testing/selftests/net/gro.sh
> +++ b/tools/testing/selftests/net/gro.sh
> @@ -31,6 +31,10 @@ run_test() {
>        1>>log.txt
>      wait "${server_pid}"
>      exit_code=$?
> +    if [ ${test} == "large" -a -n "${KSFT_MACHINE_SLOW}" ]; then

Maybe best to avoid using:

  -n "${KSFT_MACHINE_SLOW}"

Otherwise, we have the same behaviour if KSFT_MACHINE_SLOW is set to
1/yes/true or 0/no/false.

But maybe it is fine like that, and what is just missing is adding
somewhere how KSFT_MACHINE_SLOW is supposed to be set/used? :)


Not linked to that, but a small detail about the new line, just in case
you need to send a v2: it looks like it is better to avoid using '-a':

  https://www.shellcheck.net/wiki/SC2166

(but here, it looks like the usage is fine)

> +        echo "Ignoring errors due to slow environment" 1>&2
> +        exit_code=0
> +    fi
>      if [[ "${exit_code}" -eq 0 ]]; then
>          break;
>      fi

Cheers,
Matt
Paolo Abeni Feb. 7, 2024, 2:35 p.m. UTC | #3
On Wed, 2024-02-07 at 12:16 +0100, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 06/02/2024 16:27, Paolo Abeni wrote:
> > The gro self-tests sends the packets to be aggregated with
> > multiple write operations.
> > 
> > When running is slow environment, it's hard to guarantee that
> > the GRO engine will wait for the last packet in an intended
> > train.
> > 
> > The above causes almost deterministic failures in our CI for
> > the 'large' test-case.
> > 
> > Address the issue explicitly ignoring failures for such case
> > in slow environments (KSFT_MACHINE_SLOW==true).
> 
> To what value is KSFT_MACHINE_SLOW set in the CI?

AFAIK, the CI initialize KSFT_MACHINE_SLOW (to true) only on slow env.

> Is it set to a different value if the machine is not slow? e.g.
> 
>   KSFT_MACHINE_SLOW == false
> 
> (please see below)
> 
> > Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test")
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > Note that the fixes tag is there mainly to justify targeting the net
> > tree, and this is aiming at net to hopefully make the test more stable
> > ASAP for both trees.
> > 
> > I experimented with a largish refactory replacing the multiple writes
> > with a single GSO packet, but exhausted by time budget before reaching
> > any good result.
> > ---
> >  tools/testing/selftests/net/gro.sh | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh
> > index 19352f106c1d..114b5281a3f5 100755
> > --- a/tools/testing/selftests/net/gro.sh
> > +++ b/tools/testing/selftests/net/gro.sh
> > @@ -31,6 +31,10 @@ run_test() {
> >        1>>log.txt
> >      wait "${server_pid}"
> >      exit_code=$?
> > +    if [ ${test} == "large" -a -n "${KSFT_MACHINE_SLOW}" ]; then
> 
> Maybe best to avoid using:
> 
>   -n "${KSFT_MACHINE_SLOW}"
> 
> Otherwise, we have the same behaviour if KSFT_MACHINE_SLOW is set to
> 1/yes/true or 0/no/false.

For consistency, I followed the logic already in place in commit
c41dfb0dfbec ("selftests/net: ignore timing errors in so_txtime if
KSFT_MACHINE_SLOW").

> But maybe it is fine like that, and what is just missing is adding
> somewhere how KSFT_MACHINE_SLOW is supposed to be set/used? :)
> 
> 
> Not linked to that, but a small detail about the new line, just in case
> you need to send a v2: it looks like it is better to avoid using '-a':
> 
>   https://www.shellcheck.net/wiki/SC2166

Thank for the pointer, I was not aware of that. 

I guess a v2 dropping '-a' would be better.

Thanks,

Paolo
Matthieu Baerts Feb. 7, 2024, 2:45 p.m. UTC | #4
Hi Paolo,

On 07/02/2024 15:35, Paolo Abeni wrote:
> On Wed, 2024-02-07 at 12:16 +0100, Matthieu Baerts wrote:
>> Hi Paolo,
>>
>> On 06/02/2024 16:27, Paolo Abeni wrote:
>>> The gro self-tests sends the packets to be aggregated with
>>> multiple write operations.
>>>
>>> When running is slow environment, it's hard to guarantee that
>>> the GRO engine will wait for the last packet in an intended
>>> train.
>>>
>>> The above causes almost deterministic failures in our CI for
>>> the 'large' test-case.
>>>
>>> Address the issue explicitly ignoring failures for such case
>>> in slow environments (KSFT_MACHINE_SLOW==true).
>>
>> To what value is KSFT_MACHINE_SLOW set in the CI?
> 
> AFAIK, the CI initialize KSFT_MACHINE_SLOW (to true) only on slow env.

Should be good, then!

>> Is it set to a different value if the machine is not slow? e.g.
>>
>>   KSFT_MACHINE_SLOW == false
>>
>> (please see below)
>>
>>> Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test")
>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>> ---
>>> Note that the fixes tag is there mainly to justify targeting the net
>>> tree, and this is aiming at net to hopefully make the test more stable
>>> ASAP for both trees.
>>>
>>> I experimented with a largish refactory replacing the multiple writes
>>> with a single GSO packet, but exhausted by time budget before reaching
>>> any good result.
>>> ---
>>>  tools/testing/selftests/net/gro.sh | 4 ++++
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh
>>> index 19352f106c1d..114b5281a3f5 100755
>>> --- a/tools/testing/selftests/net/gro.sh
>>> +++ b/tools/testing/selftests/net/gro.sh
>>> @@ -31,6 +31,10 @@ run_test() {
>>>        1>>log.txt
>>>      wait "${server_pid}"
>>>      exit_code=$?
>>> +    if [ ${test} == "large" -a -n "${KSFT_MACHINE_SLOW}" ]; then
>>
>> Maybe best to avoid using:
>>
>>   -n "${KSFT_MACHINE_SLOW}"
>>
>> Otherwise, we have the same behaviour if KSFT_MACHINE_SLOW is set to
>> 1/yes/true or 0/no/false.
> 
> For consistency, I followed the logic already in place in commit
> c41dfb0dfbec ("selftests/net: ignore timing errors in so_txtime if
> KSFT_MACHINE_SLOW").

I only checked code in -net, I forgot to look at net-next. Thanks for
the pointer! I thought it was "fragile", but if that's how we are
supposed to use this env var, that's OK then :)

>> But maybe it is fine like that, and what is just missing is adding
>> somewhere how KSFT_MACHINE_SLOW is supposed to be set/used? :)
>>
>>
>> Not linked to that, but a small detail about the new line, just in case
>> you need to send a v2: it looks like it is better to avoid using '-a':
>>
>>   https://www.shellcheck.net/wiki/SC2166
> 
> Thank for the pointer, I was not aware of that. 
> 
> I guess a v2 dropping '-a' would be better.

I'm not even sure a v2 is really needed. "-a" seems OK if you don't use
(or don't plan to use) "!" or "-" in the expression from what I read.

Another way to fix this is to use [[ ]]:

  [[ ${test} == "large" && -n "${KSFT_MACHINE_SLOW}" ]]

Cheers,
Matt
Jakub Kicinski Feb. 8, 2024, 2:42 a.m. UTC | #5
On Tue,  6 Feb 2024 16:27:40 +0100 Paolo Abeni wrote:
> The gro self-tests sends the packets to be aggregated with
> multiple write operations.
> 
> When running is slow environment, it's hard to guarantee that
> the GRO engine will wait for the last packet in an intended
> train.
> 
> The above causes almost deterministic failures in our CI for
> the 'large' test-case.
> 
> Address the issue explicitly ignoring failures for such case
> in slow environments (KSFT_MACHINE_SLOW==true).
> 
> Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> Note that the fixes tag is there mainly to justify targeting the net
> tree, and this is aiming at net to hopefully make the test more stable
> ASAP for both trees.
> 
> I experimented with a largish refactory replacing the multiple writes
> with a single GSO packet, but exhausted by time budget before reaching
> any good result.

It does make things a lot more stable, but there was still a failure
recently:

https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/455661/36-gro-sh/stdout

:(
Jakub Kicinski Feb. 8, 2024, 2:51 a.m. UTC | #6
On Wed, 7 Feb 2024 18:42:52 -0800 Jakub Kicinski wrote:
> On Tue,  6 Feb 2024 16:27:40 +0100 Paolo Abeni wrote:
> > The gro self-tests sends the packets to be aggregated with
> > multiple write operations.
> > 
> > When running is slow environment, it's hard to guarantee that
> > the GRO engine will wait for the last packet in an intended
> > train.
> > 
> > The above causes almost deterministic failures in our CI for
> > the 'large' test-case.
> > 
> > Address the issue explicitly ignoring failures for such case
> > in slow environments (KSFT_MACHINE_SLOW==true).
> > 
> > Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test")
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > Note that the fixes tag is there mainly to justify targeting the net
> > tree, and this is aiming at net to hopefully make the test more stable
> > ASAP for both trees.
> > 
> > I experimented with a largish refactory replacing the multiple writes
> > with a single GSO packet, but exhausted by time budget before reaching
> > any good result.  
> 
> It does make things a lot more stable, but there was still a failure
> recently:
> 
> https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/455661/36-gro-sh/stdout
> 
> :(

Ah, sorry, I missed the v2. That must have been between v1 and v2.
diff mbox series

Patch

diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh
index 19352f106c1d..114b5281a3f5 100755
--- a/tools/testing/selftests/net/gro.sh
+++ b/tools/testing/selftests/net/gro.sh
@@ -31,6 +31,10 @@  run_test() {
       1>>log.txt
     wait "${server_pid}"
     exit_code=$?
+    if [ ${test} == "large" -a -n "${KSFT_MACHINE_SLOW}" ]; then
+        echo "Ignoring errors due to slow environment" 1>&2
+        exit_code=0
+    fi
     if [[ "${exit_code}" -eq 0 ]]; then
         break;
     fi