diff mbox series

[net,10/10] selftests: mptcp: explicitly trigger the listener diag code-path

Message ID 20240223-upstream-net-20240223-misc-fixes-v1-10-162e87e48497@kernel.org (mailing list archive)
State Accepted
Commit b4b51d36bbaa3ddb93b3e1ca3a1ef0aa629d6521
Delegated to: Netdev Maintainers
Headers show
Series mptcp: more misc. fixes for v6.8 | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 10 of 10 maintainers
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success net selftest script(s) already in Makefile
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch warning WARNING: line length of 96 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Matthieu Baerts (NGI0) Feb. 23, 2024, 4:14 p.m. UTC
From: Paolo Abeni <pabeni@redhat.com>

The mptcp diag interface already experienced a few locking bugs
that lockdep and appropriate coverage have detected in advance.

Let's add a test-case triggering the relevant code path, to prevent
similar issues in the future.

Be careful to cope with very slow environments.

Note that we don't need an explicit timeout on the mptcp_connect
subprocess to cope with eventual bug/hang-up as the final cleanup
terminating the child processes will take care of that.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 tools/testing/selftests/net/mptcp/diag.sh | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

Comments

Simon Horman Feb. 26, 2024, 5:58 p.m. UTC | #1
On Fri, Feb 23, 2024 at 05:14:20PM +0100, Matthieu Baerts (NGI0) wrote:
> From: Paolo Abeni <pabeni@redhat.com>
> 
> The mptcp diag interface already experienced a few locking bugs
> that lockdep and appropriate coverage have detected in advance.
> 
> Let's add a test-case triggering the relevant code path, to prevent
> similar issues in the future.
> 
> Be careful to cope with very slow environments.
> 
> Note that we don't need an explicit timeout on the mptcp_connect
> subprocess to cope with eventual bug/hang-up as the final cleanup
> terminating the child processes will take care of that.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>

Reviewed-by: Simon Horman <horms@kernel.org>
Jakub Kicinski March 1, 2024, 2:37 p.m. UTC | #2
On Fri, 23 Feb 2024 17:14:20 +0100 Matthieu Baerts (NGI0) wrote:
> From: Paolo Abeni <pabeni@redhat.com>
> 
> The mptcp diag interface already experienced a few locking bugs
> that lockdep and appropriate coverage have detected in advance.
> 
> Let's add a test-case triggering the relevant code path, to prevent
> similar issues in the future.
> 
> Be careful to cope with very slow environments.
> 
> Note that we don't need an explicit timeout on the mptcp_connect
> subprocess to cope with eventual bug/hang-up as the final cleanup
> terminating the child processes will take care of that.

Hi!

There's a failure in CI under debug after merging net and net-next 
in diag.sh. Maybe because of the patch which lowered timeout?
https://lore.kernel.org/all/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org/
Matthieu Baerts (NGI0) March 1, 2024, 3:10 p.m. UTC | #3
Hi Jakub,

On 01/03/2024 15:37, Jakub Kicinski wrote:
> On Fri, 23 Feb 2024 17:14:20 +0100 Matthieu Baerts (NGI0) wrote:
>> From: Paolo Abeni <pabeni@redhat.com>
>>
>> The mptcp diag interface already experienced a few locking bugs
>> that lockdep and appropriate coverage have detected in advance.
>>
>> Let's add a test-case triggering the relevant code path, to prevent
>> similar issues in the future.
>>
>> Be careful to cope with very slow environments.
>>
>> Note that we don't need an explicit timeout on the mptcp_connect
>> subprocess to cope with eventual bug/hang-up as the final cleanup
>> terminating the child processes will take care of that.
> 
> Hi!
> 
> There's a failure in CI under debug after merging net and net-next 
> in diag.sh. Maybe because of the patch which lowered timeout?
> https://lore.kernel.org/all/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org/

Thank you for this message!

I didn't have this error on my side, even without '-d SLUB_DEBUG_ON' we
do on top of the debug kconfig, but I see I can reproduce it on slower
environments. Indeed, it looks like it can be caused by that
modification. I will send a fix ASAP!

Cheers,
Matt
Matthieu Baerts (NGI0) March 1, 2024, 6:24 p.m. UTC | #4
Hi Jakub,

On 01/03/2024 16:10, Matthieu Baerts wrote:
> On 01/03/2024 15:37, Jakub Kicinski wrote:
>> On Fri, 23 Feb 2024 17:14:20 +0100 Matthieu Baerts (NGI0) wrote:
>>> From: Paolo Abeni <pabeni@redhat.com>
>>>
>>> The mptcp diag interface already experienced a few locking bugs
>>> that lockdep and appropriate coverage have detected in advance.
>>>
>>> Let's add a test-case triggering the relevant code path, to prevent
>>> similar issues in the future.
>>>
>>> Be careful to cope with very slow environments.
>>>
>>> Note that we don't need an explicit timeout on the mptcp_connect
>>> subprocess to cope with eventual bug/hang-up as the final cleanup
>>> terminating the child processes will take care of that.
>>
>> Hi!
>>
>> There's a failure in CI under debug after merging net and net-next 
>> in diag.sh. Maybe because of the patch which lowered timeout?
>> https://lore.kernel.org/all/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org/
> 
> Thank you for this message!
> 
> I didn't have this error on my side, even without '-d SLUB_DEBUG_ON' we
> do on top of the debug kconfig, but I see I can reproduce it on slower
> environments. Indeed, it looks like it can be caused by that
> modification. I will send a fix ASAP!

The following patch fixes the issue on my side (when using 'renice 20'
and stress-ng in parallel :) ):

https://lore.kernel.org/netdev/20240301-upstream-net-20240301-selftests-mptcp-diag-exit-timeout-v1-2-07cb2fa15c06@kernel.org/T/

I queued it for -net, but the issue should only be visible in net-next
due to the reduction of the timeout, as you suggested. This patch can
also be applied in net-next if preferred, it will not cause any
conflicts between net and net-next.

Cheers,
Matt
Jakub Kicinski March 1, 2024, 6:55 p.m. UTC | #5
On Fri, 1 Mar 2024 19:24:53 +0100 Matthieu Baerts wrote:
> > Thank you for this message!
> > 
> > I didn't have this error on my side, even without '-d SLUB_DEBUG_ON' we
> > do on top of the debug kconfig, but I see I can reproduce it on slower
> > environments. Indeed, it looks like it can be caused by that
> > modification. I will send a fix ASAP!  
> 
> The following patch fixes the issue on my side (when using 'renice 20'
> and stress-ng in parallel :) ):
> 
> https://lore.kernel.org/netdev/20240301-upstream-net-20240301-selftests-mptcp-diag-exit-timeout-v1-2-07cb2fa15c06@kernel.org/T/

Nice! Note only a fix but also lowers runtime, if I'm reading right!

> I queued it for -net, but the issue should only be visible in net-next
> due to the reduction of the timeout, as you suggested. This patch can
> also be applied in net-next if preferred, it will not cause any
> conflicts between net and net-next.

No strong preference, net sounds good.

Thanks for a quick turnaround!
diff mbox series

Patch

diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh
index 0a58ebb8b04c..f300f4e1eb59 100755
--- a/tools/testing/selftests/net/mptcp/diag.sh
+++ b/tools/testing/selftests/net/mptcp/diag.sh
@@ -20,7 +20,7 @@  flush_pids()
 
 	ip netns pids "${ns}" | xargs --no-run-if-empty kill -SIGUSR1 &>/dev/null
 
-	for _ in $(seq 10); do
+	for _ in $(seq $((timeout_poll * 10))); do
 		[ -z "$(ip netns pids "${ns}")" ] && break
 		sleep 0.1
 	done
@@ -91,6 +91,15 @@  chk_msk_nr()
 	__chk_msk_nr "grep -c token:" "$@"
 }
 
+chk_listener_nr()
+{
+	local expected=$1
+	local msg="$2"
+
+	__chk_nr "ss -inmlHMON $ns | wc -l" "$expected" "$msg - mptcp" 0
+	__chk_nr "ss -inmlHtON $ns | wc -l" "$expected" "$msg - subflows"
+}
+
 wait_msk_nr()
 {
 	local condition="grep -c token:"
@@ -289,5 +298,24 @@  flush_pids
 chk_msk_inuse 0 "many->0"
 chk_msk_cestab 0 "many->0"
 
+chk_listener_nr 0 "no listener sockets"
+NR_SERVERS=100
+for I in $(seq 1 $NR_SERVERS); do
+	ip netns exec $ns ./mptcp_connect -p $((I + 20001)) \
+		-t ${timeout_poll} -l 0.0.0.0 >/dev/null 2>&1 &
+done
+
+for I in $(seq 1 $NR_SERVERS); do
+	mptcp_lib_wait_local_port_listen $ns $((I + 20001))
+done
+
+chk_listener_nr $NR_SERVERS "many listener sockets"
+
+# graceful termination
+for I in $(seq 1 $NR_SERVERS); do
+	echo a | ip netns exec $ns ./mptcp_connect -p $((I + 20001)) 127.0.0.1 >/dev/null 2>&1 &
+done
+flush_pids
+
 mptcp_lib_result_print_all_tap
 exit $ret