[v2,0/3] selftests/net: A couple of typos fixes in key-management/rst tests

Message ID	20240130-tcp-ao-test-key-mgmt-v2-0-d190430a6c60@arista.com (mailing list archive)
Headers	show Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 69C35374FB for <netdev@vger.kernel.org>; Tue, 30 Jan 2024 03:52:01 +0000 (UTC) From: Dmitry Safonov <dima@arista.com> To: "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Shuah Khan <shuah@kernel.org>, Dmitry Safonov <0x7f454c46@gmail.com> Cc: Dmitry Safonov <dima@arista.com>, Mohammad Nassiri <mnassiri@ciena.com>, Simon Horman <horms@kernel.org>, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 0/3] selftests/net: A couple of typos fixes in key-management/rst tests Date: Tue, 30 Jan 2024 03:51:51 +0000 Message-ID: <20240130-tcp-ao-test-key-mgmt-v2-0-d190430a6c60@arista.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit
Series	selftests/net: A couple of typos fixes in key-management/rst tests \| expand [v2,0/3] selftests/net: A couple of typos fixes in key-management/rst tests [v2,1/3] selftests/net: Argument value mismatch when calling verify_counters() [v2,2/3] selftests/net: Rectify key counters checks [v2,3/3] selftests/net: Repair RST passive reset selftest

Dmitry Safonov Jan. 30, 2024, 3:51 a.m. UTC

Changes in v2:
- Dropped "selftests/net: Clean-up double assignment", going to send it
  to net-next with other changes (Simon)
- Added a patch to rectify RST selftests.
- Link to v1: https://lore.kernel.org/r/20240118-tcp-ao-test-key-mgmt-v1-0-3583ca147113@arista.com

Two typo fixes, noticed by Mohammad's review.
And a fix for an issue that got uncovered.

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
Dmitry Safonov (2):
      selftests/net: Rectify key counters checks
      selftests/net: Repair RST passive reset selftest

Mohammad Nassiri (1):
      selftests/net: Argument value mismatch when calling verify_counters()

 .../testing/selftests/net/tcp_ao/key-management.c  |  46 ++++---
 tools/testing/selftests/net/tcp_ao/lib/sock.c      |  12 +-
 tools/testing/selftests/net/tcp_ao/rst.c           | 138 ++++++++++++++-------
 3 files changed, 124 insertions(+), 72 deletions(-)
---
base-commit: ecb1b8288dc7ccbdcb3b9df005fa1c0e0c0388a7
change-id: 20240118-tcp-ao-test-key-mgmt-bb51a5fe15a2

Best regards,

Jakub Kicinski Feb. 1, 2024, 12:36 a.m. UTC | #1

On Tue, 30 Jan 2024 03:51:51 +0000 Dmitry Safonov wrote:
> Two typo fixes, noticed by Mohammad's review.
> And a fix for an issue that got uncovered.

I can confirm that all tests pass now :)
Thank you!

patchwork-bot+netdevbpf@kernel.org Feb. 1, 2024, 12:40 a.m. UTC | #2

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 30 Jan 2024 03:51:51 +0000 you wrote:
> Changes in v2:
> - Dropped "selftests/net: Clean-up double assignment", going to send it
>   to net-next with other changes (Simon)
> - Added a patch to rectify RST selftests.
> - Link to v1: https://lore.kernel.org/r/20240118-tcp-ao-test-key-mgmt-v1-0-3583ca147113@arista.com
> 
> Two typo fixes, noticed by Mohammad's review.
> And a fix for an issue that got uncovered.
> 
> [...]

Here is the summary with links:
  - [v2,1/3] selftests/net: Argument value mismatch when calling verify_counters()
    https://git.kernel.org/netdev/net/c/d8f5df1fcea5
  - [v2,2/3] selftests/net: Rectify key counters checks
    https://git.kernel.org/netdev/net/c/384aa16d3776
  - [v2,3/3] selftests/net: Repair RST passive reset selftest
    https://git.kernel.org/netdev/net/c/6caf3adcc877

You are awesome, thank you!

Dmitry Safonov Feb. 1, 2024, 12:50 a.m. UTC | #3

On 2/1/24 00:36, Jakub Kicinski wrote:
> On Tue, 30 Jan 2024 03:51:51 +0000 Dmitry Safonov wrote:
>> Two typo fixes, noticed by Mohammad's review.
>> And a fix for an issue that got uncovered.
> 
> I can confirm that all tests pass now :)
> Thank you!

Thanks Jakub!

Please, let me know if there will be other issues with tcp-ao tests :)

Going to work on tracepoints and some other TCP-AO stuff for net-next.

Thanks,
             Dmitry

Jakub Kicinski Feb. 1, 2024, 9:21 p.m. UTC | #4

On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote:
> Please, let me know if there will be other issues with tcp-ao tests :)
> 
> Going to work on tracepoints and some other TCP-AO stuff for net-next.

Since you're being nice and helpful I figured I'll try testing TCP-AO
with debug options enabled :) (kernel/configs/debug.config and
kernel/configs/x86_debug.config included), that slows things down 
and causes a bit of flakiness in unsigned-md5-* tests:

https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao

This has links to outputs:
https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0

If it's a timing thing - FWIW we started exporting
KSFT_MACHINE_SLOW=yes on the slow runners.

Dmitry Safonov Feb. 1, 2024, 10:25 p.m. UTC | #5

Hi Jakub,

On 2/1/24 21:21, Jakub Kicinski wrote:
> On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote:
>> Please, let me know if there will be other issues with tcp-ao tests :)
>>
>> Going to work on tracepoints and some other TCP-AO stuff for net-next.
> 
> Since you're being nice and helpful I figured I'll try testing TCP-AO
> with debug options enabled :) (kernel/configs/debug.config and
> kernel/configs/x86_debug.config included),

Haha :)

> that slows things down 
> and causes a bit of flakiness in unsigned-md5-* tests:
> 
> https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao
> 
> This has links to outputs:
> https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0
> 
> If it's a timing thing - FWIW we started exporting
> KSFT_MACHINE_SLOW=yes on the slow runners.

I think, I know what happens here:

# ok 8 AO server (AO_REQUIRED): AO client: counter TCPAOGood increased 4
=> 6
# ok 9 AO server (AO_REQUIRED): unsigned client
# ok 10 AO server (AO_REQUIRED): unsigned client: counter TCPAORequired
increased 1 => 2
# not ok 11 AO server (AO_REQUIRED): unsigned client: Counter
netns_ao_good was not expected to increase 7 => 8

for each of tests the server listens at a new port, but re-uses the same
namespaces+veth. If the node/machine is quite slow, I guess a segment
might have been retransmitted and the test that initiated it had already
finished.
And as result, the per-namespace counters are incremented, which makes
the test fail (IOW, the test expects all segments in ns being dropped).

So, I should do one of the options:

1. relax per-namespace checks (the per-socket and per-key counters are
   checked)
2. unshare(net) + veth setup for each test
3. split the selftest on smaller ones (as they create new net-ns in
   initialization)

I'd probably prefer (2), albeit it slows down that slow machine even
more, but I don't think creating 2 net-ns + veth pair per each test
would add a lot more overhead even on some rpi board. But let's see,
maybe I'll just go with (1) as that's really easy.

I'll cook a patch this week.

Thanks,
             Dmitry

Dmitry Safonov Feb. 1, 2024, 11:37 p.m. UTC | #6

On 2/1/24 22:25, Dmitry Safonov wrote:
> Hi Jakub,
> 
> On 2/1/24 21:21, Jakub Kicinski wrote:
>> On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote:
>>> Please, let me know if there will be other issues with tcp-ao tests :)
>>>
>>> Going to work on tracepoints and some other TCP-AO stuff for net-next.
>>
>> Since you're being nice and helpful I figured I'll try testing TCP-AO
>> with debug options enabled :) (kernel/configs/debug.config and
>> kernel/configs/x86_debug.config included),
> 
> Haha :)
> 
>> that slows things down 
>> and causes a bit of flakiness in unsigned-md5-* tests:
>>
>> https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao
>>
>> This has links to outputs:
>> https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0
>>
>> If it's a timing thing - FWIW we started exporting
>> KSFT_MACHINE_SLOW=yes on the slow runners.
> 
> I think, I know what happens here:
> 
> # ok 8 AO server (AO_REQUIRED): AO client: counter TCPAOGood increased 4
> => 6
> # ok 9 AO server (AO_REQUIRED): unsigned client
> # ok 10 AO server (AO_REQUIRED): unsigned client: counter TCPAORequired
> increased 1 => 2
> # not ok 11 AO server (AO_REQUIRED): unsigned client: Counter
> netns_ao_good was not expected to increase 7 => 8
> 
> for each of tests the server listens at a new port, but re-uses the same
> namespaces+veth. If the node/machine is quite slow, I guess a segment
> might have been retransmitted and the test that initiated it had already
> finished.
> And as result, the per-namespace counters are incremented, which makes
> the test fail (IOW, the test expects all segments in ns being dropped).
> 
> So, I should do one of the options:
> 
> 1. relax per-namespace checks (the per-socket and per-key counters are
>    checked)
> 2. unshare(net) + veth setup for each test
> 3. split the selftest on smaller ones (as they create new net-ns in
>    initialization)

Actually, I think there may be an easier fix:

4. Make sure that client close()s TCP-AO first, making it twsk.
   And also make sure that net-ns counters read post server's close().

Will do this, let's see if this fixes the flakiness on the netdev bot :)

> I'd probably prefer (2), albeit it slows down that slow machine even
> more, but I don't think creating 2 net-ns + veth pair per each test
> would add a lot more overhead even on some rpi board. But let's see,
> maybe I'll just go with (1) as that's really easy.
> 
> I'll cook a patch this week.

Thanks,
            Dmitry

Dmitry Safonov Feb. 2, 2024, 2:30 a.m. UTC | #7

On 2/1/24 23:37, Dmitry Safonov wrote:
> On 2/1/24 22:25, Dmitry Safonov wrote:
>> Hi Jakub,
>>
>> On 2/1/24 21:21, Jakub Kicinski wrote:
>>> On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote:
>>>> Please, let me know if there will be other issues with tcp-ao tests :)
>>>>
>>>> Going to work on tracepoints and some other TCP-AO stuff for net-next.
>>>
>>> Since you're being nice and helpful I figured I'll try testing TCP-AO
>>> with debug options enabled :) (kernel/configs/debug.config and
>>> kernel/configs/x86_debug.config included),
>>
>> Haha :)
>>
>>> that slows things down 
>>> and causes a bit of flakiness in unsigned-md5-* tests:
>>>
>>> https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao
>>>
>>> This has links to outputs:
>>> https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0
>>>
>>> If it's a timing thing - FWIW we started exporting
>>> KSFT_MACHINE_SLOW=yes on the slow runners.
>>
>> I think, I know what happens here:
>>
>> # ok 8 AO server (AO_REQUIRED): AO client: counter TCPAOGood increased 4
>> => 6
>> # ok 9 AO server (AO_REQUIRED): unsigned client
>> # ok 10 AO server (AO_REQUIRED): unsigned client: counter TCPAORequired
>> increased 1 => 2
>> # not ok 11 AO server (AO_REQUIRED): unsigned client: Counter
>> netns_ao_good was not expected to increase 7 => 8
>>
>> for each of tests the server listens at a new port, but re-uses the same
>> namespaces+veth. If the node/machine is quite slow, I guess a segment
>> might have been retransmitted and the test that initiated it had already
>> finished.
>> And as result, the per-namespace counters are incremented, which makes
>> the test fail (IOW, the test expects all segments in ns being dropped).
>>
>> So, I should do one of the options:
>>
>> 1. relax per-namespace checks (the per-socket and per-key counters are
>>    checked)
>> 2. unshare(net) + veth setup for each test
>> 3. split the selftest on smaller ones (as they create new net-ns in
>>    initialization)
> 
> Actually, I think there may be an easier fix:
> 
> 4. Make sure that client close()s TCP-AO first, making it twsk.
>    And also make sure that net-ns counters read post server's close().
> 
> Will do this, let's see if this fixes the flakiness on the netdev bot :)

FWIW, I ended up with this:
https://lore.kernel.org/all/20240202-unsigned-md5-netns-counters-v1-1-8b90c37c0566@arista.com/

I reproduced the issue once, running unsigned-md5* in a loop, while in
another terminal building linux-next with all cores.
With the patch above, it survived 77 iterations of both ipv4/ipv6 tests
so far. So, there is a chance it fixes the issue :)

Thanks,
           Dmitry

Jakub Kicinski Feb. 2, 2024, 3:13 a.m. UTC | #8

On Fri, 2 Feb 2024 02:30:52 +0000 Dmitry Safonov wrote:
> > Actually, I think there may be an easier fix:
> > 
> > 4. Make sure that client close()s TCP-AO first, making it twsk.
> >    And also make sure that net-ns counters read post server's close().
> > 
> > Will do this, let's see if this fixes the flakiness on the netdev bot :)  
> 
> FWIW, I ended up with this:
> https://lore.kernel.org/all/20240202-unsigned-md5-netns-counters-v1-1-8b90c37c0566@arista.com/
> 
> I reproduced the issue once, running unsigned-md5* in a loop, while in
> another terminal building linux-next with all cores.
> With the patch above, it survived 77 iterations of both ipv4/ipv6 tests
> so far. So, there is a chance it fixes the issue :)

That was quick! Fingers crossed :)

[v2,0/3] selftests/net: A couple of typos fixes in key-management/rst tests

Message

Comments