[net-next] selftests/net: calibrate txtimestamp

Message ID	20240127023212.3746239-1-willemdebruijn.kernel@gmail.com (mailing list archive)
State	Accepted
Commit	5264ab612e28058536de8069bcf83eb20fd65c29
Delegated to:	Netdev Maintainers
Headers	show Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A1A2B65D; Sat, 27 Jan 2024 02:32:23 +0000 (UTC) From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> To: netdev@vger.kernel.org Cc: davem@davemloft.net, kuba@kernel.org, edumazet@google.com, pabeni@redhat.com, linux-kselftest@vger.kernel.org, Willem de Bruijn <willemb@google.com> Subject: [PATCH net-next] selftests/net: calibrate txtimestamp Date: Fri, 26 Jan 2024 21:31:51 -0500 Message-ID: <20240127023212.3746239-1-willemdebruijn.kernel@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[net-next] selftests/net: calibrate txtimestamp \| expand [net-next] selftests/net: calibrate txtimestamp

Context	Check	Description
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for net-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 8 this patch: 8
netdev/build_tools	success	Errors and warnings before: 1 this patch: 0
netdev/cc_maintainers	success	CCed 0 of 0 maintainers
netdev/build_clang	success	Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	net selftest script(s) already in Makefile
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 8 this patch: 8
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 29 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
netdev/contest	success	net-next-2024-01-31--06-00 (tests: 715)

Willem de Bruijn Jan. 27, 2024, 2:31 a.m. UTC

From: Willem de Bruijn <willemb@google.com>

The test sends packets and compares enqueue, transmit and Ack
timestamps with expected values. It installs netem delays to increase
latency between these points.

The test proves flaky in virtual environment (vng). Increase the
delays to reduce variance. Scale measurement tolerance accordingly.

Time sensitive tests are difficult to calibrate. Increasing delays 10x
also increases runtime 10x, for one. And it may still prove flaky at
some rate.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 tools/testing/selftests/net/txtimestamp.sh | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Simon Horman Jan. 30, 2024, 2:54 p.m. UTC | #1

On Fri, Jan 26, 2024 at 09:31:51PM -0500, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> The test sends packets and compares enqueue, transmit and Ack
> timestamps with expected values. It installs netem delays to increase
> latency between these points.
> 
> The test proves flaky in virtual environment (vng). Increase the
> delays to reduce variance. Scale measurement tolerance accordingly.
> 
> Time sensitive tests are difficult to calibrate. Increasing delays 10x
> also increases runtime 10x, for one. And it may still prove flaky at
> some rate.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Reviewed-by: Simon Horman <horms@kernel.org>

Jakub Kicinski Jan. 31, 2024, 1:47 a.m. UTC | #2

On Fri, 26 Jan 2024 21:31:51 -0500 Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> The test sends packets and compares enqueue, transmit and Ack
> timestamps with expected values. It installs netem delays to increase
> latency between these points.
> 
> The test proves flaky in virtual environment (vng). Increase the
> delays to reduce variance. Scale measurement tolerance accordingly.
> 
> Time sensitive tests are difficult to calibrate. Increasing delays 10x
> also increases runtime 10x, for one. And it may still prove flaky at
> some rate.

Willem, do you still want us to apply this as is or should we do 
the 10x only if [ x$KSFT_MACHINE_SLOW != x ] ?

Willem de Bruijn Jan. 31, 2024, 3:06 p.m. UTC | #3

Jakub Kicinski wrote:
> On Fri, 26 Jan 2024 21:31:51 -0500 Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> > 
> > The test sends packets and compares enqueue, transmit and Ack
> > timestamps with expected values. It installs netem delays to increase
> > latency between these points.
> > 
> > The test proves flaky in virtual environment (vng). Increase the
> > delays to reduce variance. Scale measurement tolerance accordingly.
> > 
> > Time sensitive tests are difficult to calibrate. Increasing delays 10x
> > also increases runtime 10x, for one. And it may still prove flaky at
> > some rate.
> 
> Willem, do you still want us to apply this as is or should we do 
> the 10x only if [ x$KSFT_MACHINE_SLOW != x ] ?

If the test passes on all platforms with this change, I think that's
still preferable.

The only downside is that it will take 10x runtime. But that will
continue on debug and virtualized builds anyway.

On the upside, the awesome dash does indicate that it passes as is on
non-debug metal instances:

https://netdev.bots.linux.dev/contest.html?test=txtimestamp-sh

Let me know if you want me to use this as a testcase for
$KSFT_MACHINE_SLOW.

Otherwise I'll start with the gro and so-txtime tests. They may not
be so easily calibrated. As we cannot control the gro timeout, nor
the FQ max horizon.

In such cases we can use the environment variable to either skip the
test entirely or --my preference-- run it to get code coverage, but
suppress a failure if due to timing (only). Sounds good?

Jakub Kicinski Jan. 31, 2024, 6:29 p.m. UTC | #4

On Wed, 31 Jan 2024 10:06:18 -0500 Willem de Bruijn wrote:
> > Willem, do you still want us to apply this as is or should we do 
> > the 10x only if [ x$KSFT_MACHINE_SLOW != x ] ?  
> 
> If the test passes on all platforms with this change, I think that's
> still preferable.
> 
> The only downside is that it will take 10x runtime. But that will
> continue on debug and virtualized builds anyway.
> 
> On the upside, the awesome dash does indicate that it passes as is on
> non-debug metal instances:
> 
> https://netdev.bots.linux.dev/contest.html?test=txtimestamp-sh
> 
> Let me know if you want me to use this as a testcase for
> $KSFT_MACHINE_SLOW.

Ah, all good, I thought your increasing the acceptance criteria.

> Otherwise I'll start with the gro and so-txtime tests. They may not
> be so easily calibrated. As we cannot control the gro timeout, nor
> the FQ max horizon.

Paolo also mentioned working on GRO, maybe we need a spreadsheet
for people to "reserve" broken tests to avoid duplicating work? :S

> In such cases we can use the environment variable to either skip the
> test entirely or --my preference-- run it to get code coverage, but
> suppress a failure if due to timing (only). Sounds good?

+1 I also think we should run and ignore failure. I was wondering if we
can swap FAIL for XFAIL in those cases:

tools/testing/selftests/kselftest.h
#define KSFT_XFAIL 2

Documentation/dev-tools/ktap.rst
- "XFAIL", which indicates that a test is expected to fail. This
  is similar to "TODO", above, and is used by some kselftest tests.

IDK if that's a stretch or not. Or we can just return PASS with 
a comment?

Paolo Abeni Jan. 31, 2024, 6:39 p.m. UTC | #5

On Wed, 2024-01-31 at 10:06 -0500, Willem de Bruijn wrote:
> Otherwise I'll start with the gro and so-txtime tests. They may not
> be so easily calibrated. As we cannot control the gro timeout, nor
> the FQ max horizon.

Note that we can control the GRO timeout to some extent, via 
gro_flush_timeout, see commit 89abe628375301fedb68770644df845d49018d8b.

Unfortunately that is not enough for 'large' gro tests. I think the
root cause is that the process sending the packets can be de-scheduled
- even the qemu VM from the hypervisor CPU - causing an extremely large
gap between consecutive pkts.

I guess/hope that replacing multiple sendmsg() with a sendmmsg() could
improve a bit the scenario, but I fear it will not solve the issue
completely.

> In such cases we can use the environment variable to either skip the
> test entirely or --my preference-- run it to get code coverage, but
> suppress a failure if due to timing (only). Sounds good?

Sounds good to me! I was wondering about skipping the 'large' test
only, but suppressing the failure when KSFT_MACHINE_SLOW=yes only for
such test looks a better option.

Thanks!

Paolo

patchwork-bot+netdevbpf@kernel.org Jan. 31, 2024, 6:40 p.m. UTC | #6

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 26 Jan 2024 21:31:51 -0500 you wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> The test sends packets and compares enqueue, transmit and Ack
> timestamps with expected values. It installs netem delays to increase
> latency between these points.
> 
> The test proves flaky in virtual environment (vng). Increase the
> delays to reduce variance. Scale measurement tolerance accordingly.
> 
> [...]

Here is the summary with links:
  - [net-next] selftests/net: calibrate txtimestamp
    https://git.kernel.org/netdev/net-next/c/5264ab612e28

You are awesome, thank you!

Willem de Bruijn Jan. 31, 2024, 8:27 p.m. UTC | #7

Jakub Kicinski wrote:
> On Wed, 31 Jan 2024 10:06:18 -0500 Willem de Bruijn wrote:
> > > Willem, do you still want us to apply this as is or should we do 
> > > the 10x only if [ x$KSFT_MACHINE_SLOW != x ] ?  
> > 
> > If the test passes on all platforms with this change, I think that's
> > still preferable.
> > 
> > The only downside is that it will take 10x runtime. But that will
> > continue on debug and virtualized builds anyway.
> > 
> > On the upside, the awesome dash does indicate that it passes as is on
> > non-debug metal instances:
> > 
> > https://netdev.bots.linux.dev/contest.html?test=txtimestamp-sh
> > 
> > Let me know if you want me to use this as a testcase for
> > $KSFT_MACHINE_SLOW.
> 
> Ah, all good, I thought your increasing the acceptance criteria.
> 
> > Otherwise I'll start with the gro and so-txtime tests. They may not
> > be so easily calibrated. As we cannot control the gro timeout, nor
> > the FQ max horizon.
> 
> Paolo also mentioned working on GRO, maybe we need a spreadsheet
> for people to "reserve" broken tests to avoid duplicating work? :S
> 
> > In such cases we can use the environment variable to either skip the
> > test entirely or --my preference-- run it to get code coverage, but
> > suppress a failure if due to timing (only). Sounds good?
> 
> +1 I also think we should run and ignore failure. I was wondering if we
> can swap FAIL for XFAIL in those cases:
> 
> tools/testing/selftests/kselftest.h
> #define KSFT_XFAIL 2
> 
> Documentation/dev-tools/ktap.rst
> - "XFAIL", which indicates that a test is expected to fail. This
>   is similar to "TODO", above, and is used by some kselftest tests.
> 
> IDK if that's a stretch or not. Or we can just return PASS with 
> a comment?

Flaky tests will then report both pass and expected fail. That might
add noise to https://netdev.bots.linux.dev/flakes.html?

I initially considered returning skipped on timing failure. But that
has the same issue.

So perhaps just return pass?


Especially for flaky tests sometimes returning pass and sometimes
returning expected to fa red/green
dash such as

Jakub Kicinski Jan. 31, 2024, 8:58 p.m. UTC | #8

On Wed, 31 Jan 2024 15:27:34 -0500 Willem de Bruijn wrote:
> Jakub Kicinski wrote:
> > +1 I also think we should run and ignore failure. I was wondering if we
> > can swap FAIL for XFAIL in those cases:
> > 
> > tools/testing/selftests/kselftest.h
> > #define KSFT_XFAIL 2
> > 
> > Documentation/dev-tools/ktap.rst
> > - "XFAIL", which indicates that a test is expected to fail. This
> >   is similar to "TODO", above, and is used by some kselftest tests.
> > 
> > IDK if that's a stretch or not. Or we can just return PASS with 
> > a comment?  
> 
> Flaky tests will then report both pass and expected fail. That might
> add noise to https://netdev.bots.linux.dev/flakes.html?
> 
> I initially considered returning skipped on timing failure. But that
> has the same issue.
> 
> So perhaps just return pass?
> 
> 
> Especially for flaky tests sometimes returning pass and sometimes
> returning expected to fa red/green
> dash such as 

Right, we only have pass / fail / skip. (I put the "warn" result in for
tests migrated from patchwork so ignore its existence for tests.)

We already treat XFAIL in KTAP as "pass". TCP-AO's key-managemeent_ipv6
test for example already reports XFAIL:

# ok 15 # XFAIL listen() after current/rnext keys set: the socket has current/rn
ext keys: 100:200

Skips look somewhat similar in KTAP, "ok $number # SKIP" but we fish
those out specifically to catch skips. Any other "ok .... # comment"
KTAP result is treated as a "pass" right now.

Willem de Bruijn Jan. 31, 2024, 9:20 p.m. UTC | #9

Jakub Kicinski wrote:
> On Wed, 31 Jan 2024 15:27:34 -0500 Willem de Bruijn wrote:
> > Jakub Kicinski wrote:
> > > +1 I also think we should run and ignore failure. I was wondering if we
> > > can swap FAIL for XFAIL in those cases:
> > > 
> > > tools/testing/selftests/kselftest.h
> > > #define KSFT_XFAIL 2
> > > 
> > > Documentation/dev-tools/ktap.rst
> > > - "XFAIL", which indicates that a test is expected to fail. This
> > >   is similar to "TODO", above, and is used by some kselftest tests.
> > > 
> > > IDK if that's a stretch or not. Or we can just return PASS with 
> > > a comment?  
> > 
> > Flaky tests will then report both pass and expected fail. That might
> > add noise to https://netdev.bots.linux.dev/flakes.html?
> > 
> > I initially considered returning skipped on timing failure. But that
> > has the same issue.
> > 
> > So perhaps just return pass?
> > 
> > 
> > Especially for flaky tests sometimes returning pass and sometimes
> > returning expected to fa red/green
> > dash such as 
> 
> Right, we only have pass / fail / skip. (I put the "warn" result in for
> tests migrated from patchwork so ignore its existence for tests.)
> 
> We already treat XFAIL in KTAP as "pass". TCP-AO's key-managemeent_ipv6
> test for example already reports XFAIL:

Ok perfect. Then I'll do the same.
 
> # ok 15 # XFAIL listen() after current/rnext keys set: the socket has current/rn
> ext keys: 100:200
> 
> Skips look somewhat similar in KTAP, "ok $number # SKIP" but we fish
> those out specifically to catch skips. Any other "ok .... # comment"
> KTAP result is treated as a "pass" right now.

[net-next] selftests/net: calibrate txtimestamp

Checks

Commit Message

Comments

Patch