diff mbox series

[net-next,v3,7/7] selftests: net: fdb_notify: Add a test for FDB notifications

Message ID baf2abd6af2e88f8874d14c97da1554b7e7a710e.1731342342.git.petrm@nvidia.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net: ndo_fdb_add/del: Have drivers report whether they notified | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success Errors and warnings before: 0 (+0) this patch: 0 (+0)
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 4 this patch: 4
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success net selftest script(s) already in Makefile
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch warning WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-11-11--21-00 (tests: 787)

Commit Message

Petr Machata Nov. 11, 2024, 5:09 p.m. UTC
Check that only one notification is produced for various FDB edit
operations.

Regarding the ip_link_add() and ip_link_master() helpers. This pattern of
action plus corresponding defer is bound to come up often, and a dedicated
vocabulary to capture it will be handy. tunnel_create() and vlan_create()
from forwarding/lib.sh are somewhat opaque and perhaps too kitchen-sinky,
so I tried to go in the opposite direction with these ones, and wrapped
only the bare minimum to schedule a corresponding cleanup.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
---

Notes:
CC: Shuah Khan <shuah@kernel.org>
CC: Benjamin Poirier <bpoirier@nvidia.com>
CC: Hangbin Liu <liuhangbin@gmail.com>
CC: linux-kselftest@vger.kernel.org
CC: Jiri Pirko <jiri@resnulli.us>

 tools/testing/selftests/net/Makefile      |  2 +-
 tools/testing/selftests/net/fdb_notify.sh | 95 +++++++++++++++++++++++
 tools/testing/selftests/net/lib.sh        | 17 ++++
 3 files changed, 113 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/net/fdb_notify.sh

Comments

Jakub Kicinski Nov. 12, 2024, 10:22 p.m. UTC | #1
On Mon, 11 Nov 2024 18:09:01 +0100 Petr Machata wrote:
> Check that only one notification is produced for various FDB edit
> operations.
> 
> Regarding the ip_link_add() and ip_link_master() helpers. This pattern of
> action plus corresponding defer is bound to come up often, and a dedicated
> vocabulary to capture it will be handy. tunnel_create() and vlan_create()
> from forwarding/lib.sh are somewhat opaque and perhaps too kitchen-sinky,
> so I tried to go in the opposite direction with these ones, and wrapped
> only the bare minimum to schedule a corresponding cleanup.

Looks like it fails about half of the time :(

https://netdev.bots.linux.dev/flakes.html?min-flip=0&tn-needle=fdb-notify&br-cnt=200
Petr Machata Nov. 13, 2024, 11:46 a.m. UTC | #2
Jakub Kicinski <kuba@kernel.org> writes:

> On Mon, 11 Nov 2024 18:09:01 +0100 Petr Machata wrote:
>> Check that only one notification is produced for various FDB edit
>> operations.
>> 
>> Regarding the ip_link_add() and ip_link_master() helpers. This pattern of
>> action plus corresponding defer is bound to come up often, and a dedicated
>> vocabulary to capture it will be handy. tunnel_create() and vlan_create()
>> from forwarding/lib.sh are somewhat opaque and perhaps too kitchen-sinky,
>> so I tried to go in the opposite direction with these ones, and wrapped
>> only the bare minimum to schedule a corresponding cleanup.
>
> Looks like it fails about half of the time :(
>
> https://netdev.bots.linux.dev/flakes.html?min-flip=0&tn-needle=fdb-notify&br-cnt=200

OK, I can't reproduce this. Trying in VM, on an actual HW, debug, no
debug, no luck. But I see basically two failures:

- A "0 seen, 1 expected", which... I don't know, maybe it could just be
  a misplaced sleep. I don't see how, but it's a deterministing
  scenario, there shouldn't be anything racy here, either it emits or it
  doesn't, so some buffering issue is the only thing I can think of.

- Deadlocks. E.g. this, which looks like it deadlocked and timed out
  ("bad unlock balance detected" followed by "blocked for more than 122
  seconds" et.al.):

    https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/846621/18-fdb-notify-sh/

  Like... how could this patchset even theoretically cause a deadlock?
Petr Machata Nov. 13, 2024, 3:11 p.m. UTC | #3
Petr Machata <petrm@nvidia.com> writes:

> Jakub Kicinski <kuba@kernel.org> writes:
>
>> On Mon, 11 Nov 2024 18:09:01 +0100 Petr Machata wrote:
>>> Check that only one notification is produced for various FDB edit
>>> operations.
>>> 
>>> Regarding the ip_link_add() and ip_link_master() helpers. This pattern of
>>> action plus corresponding defer is bound to come up often, and a dedicated
>>> vocabulary to capture it will be handy. tunnel_create() and vlan_create()
>>> from forwarding/lib.sh are somewhat opaque and perhaps too kitchen-sinky,
>>> so I tried to go in the opposite direction with these ones, and wrapped
>>> only the bare minimum to schedule a corresponding cleanup.
>>
>> Looks like it fails about half of the time :(
>>
>> https://netdev.bots.linux.dev/flakes.html?min-flip=0&tn-needle=fdb-notify&br-cnt=200
>
> OK, I can't reproduce this. Trying in VM, on an actual HW, debug, no
> debug, no luck. But I see basically two failures:
>
> - A "0 seen, 1 expected", which... I don't know, maybe it could just be
>   a misplaced sleep. I don't see how, but it's a deterministing
>   scenario, there shouldn't be anything racy here, either it emits or it
>   doesn't, so some buffering issue is the only thing I can think of.

I think this really could be just a "bridge monitor" taking a bit more
time to start every now and then. Can I have you test with this extra
chunk, or should I just resend with that change and hope for the best?

diff --git a/tools/testing/selftests/net/fdb_notify.sh b/tools/testing/selftests/net/fdb_notify.sh
index a98047361988..a8e04f08831c 100755
--- a/tools/testing/selftests/net/fdb_notify.sh
+++ b/tools/testing/selftests/net/fdb_notify.sh
@@ -26,6 +26,7 @@ do_test_dup()
 		bridge monitor fdb &> "$tmpf" &
 		defer kill_process $!
 
+		sleep 0.5
 		bridge fdb "$op" 00:11:22:33:44:55 vlan 1 "$@"
 		sleep 0.2
 	defer_scope_pop

> - Deadlocks. E.g. this, which looks like it deadlocked and timed out

Eh, these are ancient. Never mind.
Jakub Kicinski Nov. 14, 2024, 1:08 a.m. UTC | #4
On Wed, 13 Nov 2024 16:11:03 +0100 Petr Machata wrote:
> > - A "0 seen, 1 expected", which... I don't know, maybe it could just be
> >   a misplaced sleep. I don't see how, but it's a deterministing
> >   scenario, there shouldn't be anything racy here, either it emits or it
> >   doesn't, so some buffering issue is the only thing I can think of.  
> 
> I think this really could be just a "bridge monitor" taking a bit more
> time to start every now and then. Can I have you test with this extra
> chunk, or should I just resend with that change and hope for the best?

Let's give it a go, if it doesn't fix it we can try to do sneaky local
changes in the CI, without more resends.
diff mbox series

Patch

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 26a4883a65c9..ab0e8f30bfe7 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -92,7 +92,7 @@  TEST_PROGS += test_vxlan_mdb.sh
 TEST_PROGS += test_bridge_neigh_suppress.sh
 TEST_PROGS += test_vxlan_nolocalbypass.sh
 TEST_PROGS += test_bridge_backup_port.sh
-TEST_PROGS += fdb_flush.sh
+TEST_PROGS += fdb_flush.sh fdb_notify.sh
 TEST_PROGS += fq_band_pktlimit.sh
 TEST_PROGS += vlan_hw_filter.sh
 TEST_PROGS += bpf_offload.py
diff --git a/tools/testing/selftests/net/fdb_notify.sh b/tools/testing/selftests/net/fdb_notify.sh
new file mode 100755
index 000000000000..a98047361988
--- /dev/null
+++ b/tools/testing/selftests/net/fdb_notify.sh
@@ -0,0 +1,95 @@ 
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+source lib.sh
+
+ALL_TESTS="
+	test_dup_bridge
+	test_dup_vxlan_self
+	test_dup_vxlan_master
+	test_dup_macvlan_self
+	test_dup_macvlan_master
+"
+
+do_test_dup()
+{
+	local op=$1; shift
+	local what=$1; shift
+	local tmpf
+
+	RET=0
+
+	tmpf=$(mktemp)
+	defer rm "$tmpf"
+
+	defer_scope_push
+		bridge monitor fdb &> "$tmpf" &
+		defer kill_process $!
+
+		bridge fdb "$op" 00:11:22:33:44:55 vlan 1 "$@"
+		sleep 0.2
+	defer_scope_pop
+
+	local count=$(grep -c -e 00:11:22:33:44:55 $tmpf)
+	((count == 1))
+	check_err $? "Got $count notifications, expected 1"
+
+	log_test "$what $op: Duplicate notifications"
+}
+
+test_dup_bridge()
+{
+	ip_link_add br up type bridge vlan_filtering 1
+	do_test_dup add "bridge" dev br self
+	do_test_dup del "bridge" dev br self
+}
+
+test_dup_vxlan_self()
+{
+	ip_link_add br up type bridge vlan_filtering 1
+	ip_link_add vx up type vxlan id 2000 dstport 4789
+	ip_link_master vx br
+
+	do_test_dup add "vxlan" dev vx self dst 192.0.2.1
+	do_test_dup del "vxlan" dev vx self dst 192.0.2.1
+}
+
+test_dup_vxlan_master()
+{
+	ip_link_add br up type bridge vlan_filtering 1
+	ip_link_add vx up type vxlan id 2000 dstport 4789
+	ip_link_master vx br
+
+	do_test_dup add "vxlan master" dev vx master
+	do_test_dup del "vxlan master" dev vx master
+}
+
+test_dup_macvlan_self()
+{
+	ip_link_add dd up type dummy
+	ip_link_add mv up link dd type macvlan mode passthru
+
+	do_test_dup add "macvlan self" dev mv self
+	do_test_dup del "macvlan self" dev mv self
+}
+
+test_dup_macvlan_master()
+{
+	ip_link_add br up type bridge vlan_filtering 1
+	ip_link_add dd up type dummy
+	ip_link_add mv up link dd type macvlan mode passthru
+	ip_link_master mv br
+
+	do_test_dup add "macvlan master" dev mv self
+	do_test_dup del "macvlan master" dev mv self
+}
+
+cleanup()
+{
+	defer_scopes_cleanup
+}
+
+trap cleanup EXIT
+tests_run
+
+exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index 24f63e45735d..8994fec1c38f 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -442,3 +442,20 @@  kill_process()
 	# Suppress noise from killing the process.
 	{ kill $pid && wait $pid; } 2>/dev/null
 }
+
+ip_link_add()
+{
+	local name=$1; shift
+
+	ip link add name "$name" "$@"
+	defer ip link del dev "$name"
+}
+
+ip_link_master()
+{
+	local member=$1; shift
+	local master=$1; shift
+
+	ip link set dev "$member" master "$master"
+	defer ip link set dev "$member" nomaster
+}