diff mbox series

[bpf-next] selftests/bpf: disable IPv6 for lwt_redirect test

Message ID 20240131053212.2247527-1-chantr4@gmail.com (mailing list archive)
State Accepted
Commit 2ef61296d2844c6a4211e07ab70ef2fb412b2c30
Delegated to: BPF
Headers show
Series [bpf-next] selftests/bpf: disable IPv6 for lwt_redirect test | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-18 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-31 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-33 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-35 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-36 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-41 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-42 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-16 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-15 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf-next
netdev/ynl success SINGLE THREAD; Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/build_tools success Errors and warnings before: 1 this patch: 0
netdev/cc_maintainers success CCed 0 of 0 maintainers
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 7 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Manu Bretelle Jan. 31, 2024, 5:32 a.m. UTC
After a recent change in the vmtest runner, this test started failing
sporadically.

Investigation showed that this test was subject to race condition which
got exacerbated after the vm runner change. The symptoms being that the
logic that waited for an ICMPv4 packet is naive and will break if 5 or
more non-ICMPv4 packets make it to tap0.
When ICMPv6 is enabled, the kernel will generate traffic such as ICMPv6
router solicitation...
On a system with good performance, the expected ICMPv4 packet would very
likely make it to the network interface promptly, but on a system with
poor performance, those "guarantees" do not hold true anymore.

Given that the test is IPv4 only, this change disable IPv6 in the test
netns by setting `net.ipv6.conf.all.disable_ipv6` to 1.
This essentially leaves "ping" as the sole generator of traffic in the
network namespace.
If this test was to be made IPv6 compatible, the logic in
`wait_for_packet` would need to be modified.

In more details...

At a high level, the test does:
- create a new namespace
- in `setup_redirect_target` set up lo, tap0, and link_err interfaces as
  well as add 2 routes that attaches ingress/egress sections of
  `test_lwt_redirect.bpf.o` to the xmit path.
- in `send_and_capture_test_packets` send an ICMP packet and read off
  the tap interface (using `wait_for_packet`) to check that a ICMP packet
  with the right size is read.

`wait_for_packet` will try to read `max_retry` (5) times from the tap0
fd looking for an ICMPv4 packet matching some criteria.

The problem is that when we set up the `tap0` interface, because IPv6 is
enabled by default, traffic such as Router solicitation is sent through
tap0, as in:

  # tcpdump -r /tmp/lwt_redirect.pc
  reading from file /tmp/lwt_redirect.pcap, link-type EN10MB (Ethernet)
  04:46:23.578352 IP6 :: > ff02::1:ffc0:4427: ICMP6, neighbor solicitation, who has fe80::fcba:dff:fec0:4427, length 32
  04:46:23.659522 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
  04:46:24.389169 IP 10.0.0.1 > 20.0.0.9: ICMP echo request, id 122, seq 1, length 108
  04:46:24.618599 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
  04:46:24.619985 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16
  04:46:24.767326 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
  04:46:28.936402 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16

If `wait_for_packet` sees 5 non-ICMPv4 packets, it will return 0, which is what we see in:

  2024-01-31T03:51:25.0336992Z test_lwt_redirect_run:PASS:netns_create 0 nsec
  2024-01-31T03:51:25.0341309Z open_netns:PASS:malloc token 0 nsec
  2024-01-31T03:51:25.0344844Z open_netns:PASS:open /proc/self/ns/net 0 nsec
  2024-01-31T03:51:25.0350071Z open_netns:PASS:open netns fd 0 nsec
  2024-01-31T03:51:25.0353516Z open_netns:PASS:setns 0 nsec
  2024-01-31T03:51:25.0356560Z test_lwt_redirect_run:PASS:setns 0 nsec
  2024-01-31T03:51:25.0360140Z open_tuntap:PASS:open(/dev/net/tun) 0 nsec
  2024-01-31T03:51:25.0363822Z open_tuntap:PASS:ioctl(TUNSETIFF) 0 nsec
  2024-01-31T03:51:25.0367402Z open_tuntap:PASS:fcntl(O_NONBLOCK) 0 nsec
  2024-01-31T03:51:25.0371167Z setup_redirect_target:PASS:open_tuntap 0 nsec
  2024-01-31T03:51:25.0375180Z setup_redirect_target:PASS:if_nametoindex 0 nsec
  2024-01-31T03:51:25.0379929Z setup_redirect_target:PASS:ip link add link_err type dummy 0 nsec
  2024-01-31T03:51:25.0384874Z setup_redirect_target:PASS:ip link set lo up 0 nsec
  2024-01-31T03:51:25.0389678Z setup_redirect_target:PASS:ip addr add dev lo 10.0.0.1/32 0 nsec
  2024-01-31T03:51:25.0394814Z setup_redirect_target:PASS:ip link set link_err up 0 nsec
  2024-01-31T03:51:25.0399874Z setup_redirect_target:PASS:ip link set tap0 up 0 nsec
  2024-01-31T03:51:25.0407731Z setup_redirect_target:PASS:ip route add 10.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_ingress 0 nsec
  2024-01-31T03:51:25.0419105Z setup_redirect_target:PASS:ip route add 20.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_egress 0 nsec
  2024-01-31T03:51:25.0427209Z test_lwt_redirect_normal:PASS:setup_redirect_target 0 nsec
  2024-01-31T03:51:25.0431424Z ping_dev:PASS:if_nametoindex 0 nsec
  2024-01-31T03:51:25.0437222Z send_and_capture_test_packets:FAIL:wait_for_epacket unexpected wait_for_epacket: actual 0 != expected 1
  2024-01-31T03:51:25.0448298Z (/tmp/work/bpf/bpf/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c:175: errno: Success) test_lwt_redirect_normal egress test fails
  2024-01-31T03:51:25.0457124Z close_netns:PASS:setns 0 nsec

When running in a VM which potential resource contrains, the odds that calling
`ping` is not scheduled very soon after bringing `tap0` up increases,
and with this the chances to get our ICMP packet pushed to position 6+
in the network trace.

To confirm this indeed solves the issue, I ran the test 100 times in a
row with:

  errors=0
  successes=0
  for i in `seq 1 100`
  do
    ./test_progs -t lwt_redirect/lwt_redirect_normal
    if [ $? -eq 0 ]; then
      successes=$((successes+1))
    else
      errors=$((errors+1))
    fi
  done
  echo "successes: $successes/errors: $errors"

While this test would at least fail a couple of time every 10 runs, here
it ran 100 times with no error.

Fixes: 43a7c3ef8a15 ("selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT")
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
---
 tools/testing/selftests/bpf/prog_tests/lwt_redirect.c | 1 +
 1 file changed, 1 insertion(+)

Comments

patchwork-bot+netdevbpf@kernel.org Jan. 31, 2024, 5:20 p.m. UTC | #1
Hello:

This patch was applied to bpf/bpf-next.git (master)
by Andrii Nakryiko <andrii@kernel.org>:

On Tue, 30 Jan 2024 21:32:12 -0800 you wrote:
> After a recent change in the vmtest runner, this test started failing
> sporadically.
> 
> Investigation showed that this test was subject to race condition which
> got exacerbated after the vm runner change. The symptoms being that the
> logic that waited for an ICMPv4 packet is naive and will break if 5 or
> more non-ICMPv4 packets make it to tap0.
> When ICMPv6 is enabled, the kernel will generate traffic such as ICMPv6
> router solicitation...
> On a system with good performance, the expected ICMPv4 packet would very
> likely make it to the network interface promptly, but on a system with
> poor performance, those "guarantees" do not hold true anymore.
> 
> [...]

Here is the summary with links:
  - [bpf-next] selftests/bpf: disable IPv6 for lwt_redirect test
    https://git.kernel.org/bpf/bpf-next/c/2ef61296d284

You are awesome, thank you!
Alan Maguire Jan. 31, 2024, 5:49 p.m. UTC | #2
On 31/01/2024 05:32, Manu Bretelle wrote:
> After a recent change in the vmtest runner, this test started failing
> sporadically.
> 
> Investigation showed that this test was subject to race condition which
> got exacerbated after the vm runner change. The symptoms being that the
> logic that waited for an ICMPv4 packet is naive and will break if 5 or
> more non-ICMPv4 packets make it to tap0.
> When ICMPv6 is enabled, the kernel will generate traffic such as ICMPv6
> router solicitation...
> On a system with good performance, the expected ICMPv4 packet would very
> likely make it to the network interface promptly, but on a system with
> poor performance, those "guarantees" do not hold true anymore.
> 
> Given that the test is IPv4 only, this change disable IPv6 in the test
> netns by setting `net.ipv6.conf.all.disable_ipv6` to 1.
> This essentially leaves "ping" as the sole generator of traffic in the
> network namespace.
> If this test was to be made IPv6 compatible, the logic in
> `wait_for_packet` would need to be modified.
> 

Great to fix test flakiness like this; I was curious if you tried
modifying things from the bpf side; would something like this in
progs/test_lwt_redirect.c help?
(haven't been able to test because I can't reproduce the failure):

static int get_redirect_target(struct __sk_buff *skb)
{
        struct iphdr *iph = NULL;
        void *start = (void *)(long)skb->data;
        void *end = (void *)(long)skb->data_end;

+	if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6))
+		return -1;

I _think_ that would skip redirection and might solve the problem
from the bpf side. Might be worth testing, but not a big deal..

> In more details...
> 
> At a high level, the test does:
> - create a new namespace
> - in `setup_redirect_target` set up lo, tap0, and link_err interfaces as
>   well as add 2 routes that attaches ingress/egress sections of
>   `test_lwt_redirect.bpf.o` to the xmit path.
> - in `send_and_capture_test_packets` send an ICMP packet and read off
>   the tap interface (using `wait_for_packet`) to check that a ICMP packet
>   with the right size is read.
> 
> `wait_for_packet` will try to read `max_retry` (5) times from the tap0
> fd looking for an ICMPv4 packet matching some criteria.
> 
> The problem is that when we set up the `tap0` interface, because IPv6 is
> enabled by default, traffic such as Router solicitation is sent through
> tap0, as in:
> 
>   # tcpdump -r /tmp/lwt_redirect.pc
>   reading from file /tmp/lwt_redirect.pcap, link-type EN10MB (Ethernet)
>   04:46:23.578352 IP6 :: > ff02::1:ffc0:4427: ICMP6, neighbor solicitation, who has fe80::fcba:dff:fec0:4427, length 32
>   04:46:23.659522 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
>   04:46:24.389169 IP 10.0.0.1 > 20.0.0.9: ICMP echo request, id 122, seq 1, length 108
>   04:46:24.618599 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
>   04:46:24.619985 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16
>   04:46:24.767326 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
>   04:46:28.936402 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16
> 
> If `wait_for_packet` sees 5 non-ICMPv4 packets, it will return 0, which is what we see in:
> 
>   2024-01-31T03:51:25.0336992Z test_lwt_redirect_run:PASS:netns_create 0 nsec
>   2024-01-31T03:51:25.0341309Z open_netns:PASS:malloc token 0 nsec
>   2024-01-31T03:51:25.0344844Z open_netns:PASS:open /proc/self/ns/net 0 nsec
>   2024-01-31T03:51:25.0350071Z open_netns:PASS:open netns fd 0 nsec
>   2024-01-31T03:51:25.0353516Z open_netns:PASS:setns 0 nsec
>   2024-01-31T03:51:25.0356560Z test_lwt_redirect_run:PASS:setns 0 nsec
>   2024-01-31T03:51:25.0360140Z open_tuntap:PASS:open(/dev/net/tun) 0 nsec
>   2024-01-31T03:51:25.0363822Z open_tuntap:PASS:ioctl(TUNSETIFF) 0 nsec
>   2024-01-31T03:51:25.0367402Z open_tuntap:PASS:fcntl(O_NONBLOCK) 0 nsec
>   2024-01-31T03:51:25.0371167Z setup_redirect_target:PASS:open_tuntap 0 nsec
>   2024-01-31T03:51:25.0375180Z setup_redirect_target:PASS:if_nametoindex 0 nsec
>   2024-01-31T03:51:25.0379929Z setup_redirect_target:PASS:ip link add link_err type dummy 0 nsec
>   2024-01-31T03:51:25.0384874Z setup_redirect_target:PASS:ip link set lo up 0 nsec
>   2024-01-31T03:51:25.0389678Z setup_redirect_target:PASS:ip addr add dev lo 10.0.0.1/32 0 nsec
>   2024-01-31T03:51:25.0394814Z setup_redirect_target:PASS:ip link set link_err up 0 nsec
>   2024-01-31T03:51:25.0399874Z setup_redirect_target:PASS:ip link set tap0 up 0 nsec
>   2024-01-31T03:51:25.0407731Z setup_redirect_target:PASS:ip route add 10.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_ingress 0 nsec
>   2024-01-31T03:51:25.0419105Z setup_redirect_target:PASS:ip route add 20.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_egress 0 nsec
>   2024-01-31T03:51:25.0427209Z test_lwt_redirect_normal:PASS:setup_redirect_target 0 nsec
>   2024-01-31T03:51:25.0431424Z ping_dev:PASS:if_nametoindex 0 nsec
>   2024-01-31T03:51:25.0437222Z send_and_capture_test_packets:FAIL:wait_for_epacket unexpected wait_for_epacket: actual 0 != expected 1
>   2024-01-31T03:51:25.0448298Z (/tmp/work/bpf/bpf/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c:175: errno: Success) test_lwt_redirect_normal egress test fails
>   2024-01-31T03:51:25.0457124Z close_netns:PASS:setns 0 nsec
> 
> When running in a VM which potential resource contrains, the odds that calling
> `ping` is not scheduled very soon after bringing `tap0` up increases,
> and with this the chances to get our ICMP packet pushed to position 6+
> in the network trace.
> 
> To confirm this indeed solves the issue, I ran the test 100 times in a
> row with:
> 
>   errors=0
>   successes=0
>   for i in `seq 1 100`
>   do
>     ./test_progs -t lwt_redirect/lwt_redirect_normal
>     if [ $? -eq 0 ]; then
>       successes=$((successes+1))
>     else
>       errors=$((errors+1))
>     fi
>   done
>   echo "successes: $successes/errors: $errors"
> 
> While this test would at least fail a couple of time every 10 runs, here
> it ran 100 times with no error.
> 
> Fixes: 43a7c3ef8a15 ("selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT")
> Signed-off-by: Manu Bretelle <chantr4@gmail.com>

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>

> ---
>  tools/testing/selftests/bpf/prog_tests/lwt_redirect.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c b/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c
> index beeb3ac1c361..b5b9e74b1044 100644
> --- a/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c
> +++ b/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c
> @@ -203,6 +203,7 @@ static int setup_redirect_target(const char *target_dev, bool need_mac)
>  	if (!ASSERT_GE(target_index, 0, "if_nametoindex"))
>  		goto fail;
>  
> +	SYS(fail, "sysctl -w net.ipv6.conf.all.disable_ipv6=1");
>  	SYS(fail, "ip link add link_err type dummy");
>  	SYS(fail, "ip link set lo up");
>  	SYS(fail, "ip addr add dev lo " LOCAL_SRC "/32");
Manu Bretelle Jan. 31, 2024, 6:50 p.m. UTC | #3
On Wed, Jan 31, 2024 at 05:49:42PM +0000, Alan Maguire wrote:
> On 31/01/2024 05:32, Manu Bretelle wrote:
> > After a recent change in the vmtest runner, this test started failing
> > sporadically.
> > 
> > Investigation showed that this test was subject to race condition which
> > got exacerbated after the vm runner change. The symptoms being that the
> > logic that waited for an ICMPv4 packet is naive and will break if 5 or
> > more non-ICMPv4 packets make it to tap0.
> > When ICMPv6 is enabled, the kernel will generate traffic such as ICMPv6
> > router solicitation...
> > On a system with good performance, the expected ICMPv4 packet would very
> > likely make it to the network interface promptly, but on a system with
> > poor performance, those "guarantees" do not hold true anymore.
> > 
> > Given that the test is IPv4 only, this change disable IPv6 in the test
> > netns by setting `net.ipv6.conf.all.disable_ipv6` to 1.
> > This essentially leaves "ping" as the sole generator of traffic in the
> > network namespace.
> > If this test was to be made IPv6 compatible, the logic in
> > `wait_for_packet` would need to be modified.
> > 
> 
> Great to fix test flakiness like this; I was curious if you tried
> modifying things from the bpf side; would something like this in
> progs/test_lwt_redirect.c help?
> (haven't been able to test because I can't reproduce the failure):
> 
> static int get_redirect_target(struct __sk_buff *skb)
> {
>         struct iphdr *iph = NULL;
>         void *start = (void *)(long)skb->data;
>         void *end = (void *)(long)skb->data_end;
> 
> +	if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6))
> +		return -1;
> 
> I _think_ that would skip redirection and might solve the problem
> from the bpf side. Might be worth testing, but not a big deal..
> 

No, I did not try this, and it should indeed work.

The repro env is quite a heavy lift and I do not have easy access to access to 
a full toolchain to recompile programs and test this cross-platform within it.

sysctl call was a straightforward code change with high chance of result and
minimal back and forth with BPF CI to get s390x artifacts (which is where the
flakiness manifested), and given that the test runs within its own ephemeral
network namespace, I felt no guilt in using that sledgehammer :)

> > In more details...
> > 
> > At a high level, the test does:
> > - create a new namespace
> > - in `setup_redirect_target` set up lo, tap0, and link_err interfaces as
> >   well as add 2 routes that attaches ingress/egress sections of
> >   `test_lwt_redirect.bpf.o` to the xmit path.
> > - in `send_and_capture_test_packets` send an ICMP packet and read off
> >   the tap interface (using `wait_for_packet`) to check that a ICMP packet
> >   with the right size is read.
> > 
> > `wait_for_packet` will try to read `max_retry` (5) times from the tap0
> > fd looking for an ICMPv4 packet matching some criteria.
> > 
> > The problem is that when we set up the `tap0` interface, because IPv6 is
> > enabled by default, traffic such as Router solicitation is sent through
> > tap0, as in:
> > 
> >   # tcpdump -r /tmp/lwt_redirect.pc
> >   reading from file /tmp/lwt_redirect.pcap, link-type EN10MB (Ethernet)
> >   04:46:23.578352 IP6 :: > ff02::1:ffc0:4427: ICMP6, neighbor solicitation, who has fe80::fcba:dff:fec0:4427, length 32
> >   04:46:23.659522 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
> >   04:46:24.389169 IP 10.0.0.1 > 20.0.0.9: ICMP echo request, id 122, seq 1, length 108
> >   04:46:24.618599 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
> >   04:46:24.619985 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16
> >   04:46:24.767326 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
> >   04:46:28.936402 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16
> > 
> > If `wait_for_packet` sees 5 non-ICMPv4 packets, it will return 0, which is what we see in:
> > 
> >   2024-01-31T03:51:25.0336992Z test_lwt_redirect_run:PASS:netns_create 0 nsec
> >   2024-01-31T03:51:25.0341309Z open_netns:PASS:malloc token 0 nsec
> >   2024-01-31T03:51:25.0344844Z open_netns:PASS:open /proc/self/ns/net 0 nsec
> >   2024-01-31T03:51:25.0350071Z open_netns:PASS:open netns fd 0 nsec
> >   2024-01-31T03:51:25.0353516Z open_netns:PASS:setns 0 nsec
> >   2024-01-31T03:51:25.0356560Z test_lwt_redirect_run:PASS:setns 0 nsec
> >   2024-01-31T03:51:25.0360140Z open_tuntap:PASS:open(/dev/net/tun) 0 nsec
> >   2024-01-31T03:51:25.0363822Z open_tuntap:PASS:ioctl(TUNSETIFF) 0 nsec
> >   2024-01-31T03:51:25.0367402Z open_tuntap:PASS:fcntl(O_NONBLOCK) 0 nsec
> >   2024-01-31T03:51:25.0371167Z setup_redirect_target:PASS:open_tuntap 0 nsec
> >   2024-01-31T03:51:25.0375180Z setup_redirect_target:PASS:if_nametoindex 0 nsec
> >   2024-01-31T03:51:25.0379929Z setup_redirect_target:PASS:ip link add link_err type dummy 0 nsec
> >   2024-01-31T03:51:25.0384874Z setup_redirect_target:PASS:ip link set lo up 0 nsec
> >   2024-01-31T03:51:25.0389678Z setup_redirect_target:PASS:ip addr add dev lo 10.0.0.1/32 0 nsec
> >   2024-01-31T03:51:25.0394814Z setup_redirect_target:PASS:ip link set link_err up 0 nsec
> >   2024-01-31T03:51:25.0399874Z setup_redirect_target:PASS:ip link set tap0 up 0 nsec
> >   2024-01-31T03:51:25.0407731Z setup_redirect_target:PASS:ip route add 10.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_ingress 0 nsec
> >   2024-01-31T03:51:25.0419105Z setup_redirect_target:PASS:ip route add 20.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_egress 0 nsec
> >   2024-01-31T03:51:25.0427209Z test_lwt_redirect_normal:PASS:setup_redirect_target 0 nsec
> >   2024-01-31T03:51:25.0431424Z ping_dev:PASS:if_nametoindex 0 nsec
> >   2024-01-31T03:51:25.0437222Z send_and_capture_test_packets:FAIL:wait_for_epacket unexpected wait_for_epacket: actual 0 != expected 1
> >   2024-01-31T03:51:25.0448298Z (/tmp/work/bpf/bpf/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c:175: errno: Success) test_lwt_redirect_normal egress test fails
> >   2024-01-31T03:51:25.0457124Z close_netns:PASS:setns 0 nsec
> > 
> > When running in a VM which potential resource contrains, the odds that calling
> > `ping` is not scheduled very soon after bringing `tap0` up increases,
> > and with this the chances to get our ICMP packet pushed to position 6+
> > in the network trace.
> > 
> > To confirm this indeed solves the issue, I ran the test 100 times in a
> > row with:
> > 
> >   errors=0
> >   successes=0
> >   for i in `seq 1 100`
> >   do
> >     ./test_progs -t lwt_redirect/lwt_redirect_normal
> >     if [ $? -eq 0 ]; then
> >       successes=$((successes+1))
> >     else
> >       errors=$((errors+1))
> >     fi
> >   done
> >   echo "successes: $successes/errors: $errors"
> > 
> > While this test would at least fail a couple of time every 10 runs, here
> > it ran 100 times with no error.
> > 
> > Fixes: 43a7c3ef8a15 ("selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT")
> > Signed-off-by: Manu Bretelle <chantr4@gmail.com>
> 
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> 
> > ---
> >  tools/testing/selftests/bpf/prog_tests/lwt_redirect.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c b/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c
> > index beeb3ac1c361..b5b9e74b1044 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c
> > @@ -203,6 +203,7 @@ static int setup_redirect_target(const char *target_dev, bool need_mac)
> >  	if (!ASSERT_GE(target_index, 0, "if_nametoindex"))
> >  		goto fail;
> >  
> > +	SYS(fail, "sysctl -w net.ipv6.conf.all.disable_ipv6=1");
> >  	SYS(fail, "ip link add link_err type dummy");
> >  	SYS(fail, "ip link set lo up");
> >  	SYS(fail, "ip addr add dev lo " LOCAL_SRC "/32");
Yan Zhai Feb. 1, 2024, 4:44 a.m. UTC | #4
Hi Manu,

On Tue, Jan 30, 2024 at 11:32 PM Manu Bretelle <chantr4@gmail.com> wrote:
>
> After a recent change in the vmtest runner, this test started failing
> sporadically.
>
> Investigation showed that this test was subject to race condition which
> got exacerbated after the vm runner change. The symptoms being that the
> logic that waited for an ICMPv4 packet is naive and will break if 5 or
> more non-ICMPv4 packets make it to tap0.
> When ICMPv6 is enabled, the kernel will generate traffic such as ICMPv6
> router solicitation...
> On a system with good performance, the expected ICMPv4 packet would very
> likely make it to the network interface promptly, but on a system with
> poor performance, those "guarantees" do not hold true anymore.
>
> Given that the test is IPv4 only, this change disable IPv6 in the test
> netns by setting `net.ipv6.conf.all.disable_ipv6` to 1.
> This essentially leaves "ping" as the sole generator of traffic in the
> network namespace.
> If this test was to be made IPv6 compatible, the logic in
> `wait_for_packet` would need to be modified.
>
> In more details...
>
> At a high level, the test does:
> - create a new namespace
> - in `setup_redirect_target` set up lo, tap0, and link_err interfaces as
>   well as add 2 routes that attaches ingress/egress sections of
>   `test_lwt_redirect.bpf.o` to the xmit path.
> - in `send_and_capture_test_packets` send an ICMP packet and read off
>   the tap interface (using `wait_for_packet`) to check that a ICMP packet
>   with the right size is read.
>
> `wait_for_packet` will try to read `max_retry` (5) times from the tap0
> fd looking for an ICMPv4 packet matching some criteria.
>
> The problem is that when we set up the `tap0` interface, because IPv6 is
> enabled by default, traffic such as Router solicitation is sent through
> tap0, as in:
>
>   # tcpdump -r /tmp/lwt_redirect.pc
>   reading from file /tmp/lwt_redirect.pcap, link-type EN10MB (Ethernet)
>   04:46:23.578352 IP6 :: > ff02::1:ffc0:4427: ICMP6, neighbor solicitation, who has fe80::fcba:dff:fec0:4427, length 32
>   04:46:23.659522 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
>   04:46:24.389169 IP 10.0.0.1 > 20.0.0.9: ICMP echo request, id 122, seq 1, length 108
>   04:46:24.618599 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
>   04:46:24.619985 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16
>   04:46:24.767326 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
>   04:46:28.936402 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16
>
> If `wait_for_packet` sees 5 non-ICMPv4 packets, it will return 0, which is what we see in:
>
>   2024-01-31T03:51:25.0336992Z test_lwt_redirect_run:PASS:netns_create 0 nsec
>   2024-01-31T03:51:25.0341309Z open_netns:PASS:malloc token 0 nsec
>   2024-01-31T03:51:25.0344844Z open_netns:PASS:open /proc/self/ns/net 0 nsec
>   2024-01-31T03:51:25.0350071Z open_netns:PASS:open netns fd 0 nsec
>   2024-01-31T03:51:25.0353516Z open_netns:PASS:setns 0 nsec
>   2024-01-31T03:51:25.0356560Z test_lwt_redirect_run:PASS:setns 0 nsec
>   2024-01-31T03:51:25.0360140Z open_tuntap:PASS:open(/dev/net/tun) 0 nsec
>   2024-01-31T03:51:25.0363822Z open_tuntap:PASS:ioctl(TUNSETIFF) 0 nsec
>   2024-01-31T03:51:25.0367402Z open_tuntap:PASS:fcntl(O_NONBLOCK) 0 nsec
>   2024-01-31T03:51:25.0371167Z setup_redirect_target:PASS:open_tuntap 0 nsec
>   2024-01-31T03:51:25.0375180Z setup_redirect_target:PASS:if_nametoindex 0 nsec
>   2024-01-31T03:51:25.0379929Z setup_redirect_target:PASS:ip link add link_err type dummy 0 nsec
>   2024-01-31T03:51:25.0384874Z setup_redirect_target:PASS:ip link set lo up 0 nsec
>   2024-01-31T03:51:25.0389678Z setup_redirect_target:PASS:ip addr add dev lo 10.0.0.1/32 0 nsec
>   2024-01-31T03:51:25.0394814Z setup_redirect_target:PASS:ip link set link_err up 0 nsec
>   2024-01-31T03:51:25.0399874Z setup_redirect_target:PASS:ip link set tap0 up 0 nsec
>   2024-01-31T03:51:25.0407731Z setup_redirect_target:PASS:ip route add 10.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_ingress 0 nsec
>   2024-01-31T03:51:25.0419105Z setup_redirect_target:PASS:ip route add 20.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_egress 0 nsec
>   2024-01-31T03:51:25.0427209Z test_lwt_redirect_normal:PASS:setup_redirect_target 0 nsec
>   2024-01-31T03:51:25.0431424Z ping_dev:PASS:if_nametoindex 0 nsec
>   2024-01-31T03:51:25.0437222Z send_and_capture_test_packets:FAIL:wait_for_epacket unexpected wait_for_epacket: actual 0 != expected 1
>   2024-01-31T03:51:25.0448298Z (/tmp/work/bpf/bpf/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c:175: errno: Success) test_lwt_redirect_normal egress test fails
>   2024-01-31T03:51:25.0457124Z close_netns:PASS:setns 0 nsec
>
> When running in a VM which potential resource contrains, the odds that calling
> `ping` is not scheduled very soon after bringing `tap0` up increases,
> and with this the chances to get our ICMP packet pushed to position 6+
> in the network trace.
>
> To confirm this indeed solves the issue, I ran the test 100 times in a
> row with:
>
>   errors=0
>   successes=0
>   for i in `seq 1 100`
>   do
>     ./test_progs -t lwt_redirect/lwt_redirect_normal
>     if [ $? -eq 0 ]; then
>       successes=$((successes+1))
>     else
>       errors=$((errors+1))
>     fi
>   done
>   echo "successes: $successes/errors: $errors"
>
> While this test would at least fail a couple of time every 10 runs, here
> it ran 100 times with no error.
>
> Fixes: 43a7c3ef8a15 ("selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT")
> Signed-off-by: Manu Bretelle <chantr4@gmail.com>
> ---
>  tools/testing/selftests/bpf/prog_tests/lwt_redirect.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c b/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c
> index beeb3ac1c361..b5b9e74b1044 100644
> --- a/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c
> +++ b/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c
> @@ -203,6 +203,7 @@ static int setup_redirect_target(const char *target_dev, bool need_mac)
>         if (!ASSERT_GE(target_index, 0, "if_nametoindex"))
>                 goto fail;
>
> +       SYS(fail, "sysctl -w net.ipv6.conf.all.disable_ipv6=1");

Thanks for digging into this! I was totally unprepared for that many
router solicitations when I wrote the wait logic. For now disable v6
is totally good to unblock similar scenarios. But think it twice it is
probably still worthwhile to incorporate v6 later since lwt hooks mess
with both v4/v6 routing. So I'll try to fix up the wait logic later
this week. An exact packet filter is probably best suited to make
icmpv6/arp/nd happy.

best
Yan

>         SYS(fail, "ip link add link_err type dummy");
>         SYS(fail, "ip link set lo up");
>         SYS(fail, "ip addr add dev lo " LOCAL_SRC "/32");
> --
> 2.39.3
>
Manu Bretelle Feb. 1, 2024, 6:42 p.m. UTC | #5
On Wed, Jan 31, 2024 at 10:44:07PM -0600, Yan Zhai wrote:
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c b/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c
> > index beeb3ac1c361..b5b9e74b1044 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c
> > @@ -203,6 +203,7 @@ static int setup_redirect_target(const char *target_dev, bool need_mac)
> >         if (!ASSERT_GE(target_index, 0, "if_nametoindex"))
> >                 goto fail;
> >
> > +       SYS(fail, "sysctl -w net.ipv6.conf.all.disable_ipv6=1");
> 
> Thanks for digging into this! I was totally unprepared for that many
> router solicitations when I wrote the wait logic. For now disable v6
> is totally good to unblock similar scenarios. But think it twice it is
> probably still worthwhile to incorporate v6 later since lwt hooks mess
> with both v4/v6 routing. So I'll try to fix up the wait logic later
> this week. An exact packet filter is probably best suited to make
> icmpv6/arp/nd happy.
> 

Greatly appreciated Yan. I definitely agree that it is desirable to have V6
support to.

In case that helps... [0] should help you repro locally on your favorite
architecture.


[0] https://chantra.github.io/bpfcitools/bpf-local-development.html

Manu

> best
> Yan
> 
> >         SYS(fail, "ip link add link_err type dummy");
> >         SYS(fail, "ip link set lo up");
> >         SYS(fail, "ip addr add dev lo " LOCAL_SRC "/32");
> > --
> > 2.39.3
> >
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c b/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c
index beeb3ac1c361..b5b9e74b1044 100644
--- a/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c
+++ b/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c
@@ -203,6 +203,7 @@  static int setup_redirect_target(const char *target_dev, bool need_mac)
 	if (!ASSERT_GE(target_index, 0, "if_nametoindex"))
 		goto fail;
 
+	SYS(fail, "sysctl -w net.ipv6.conf.all.disable_ipv6=1");
 	SYS(fail, "ip link add link_err type dummy");
 	SYS(fail, "ip link set lo up");
 	SYS(fail, "ip addr add dev lo " LOCAL_SRC "/32");