Message ID | 20230403215834.26675-1-zhuyifei@google.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | [bpf] selftests/bpf: Poll for receive in cg_storage_multi test | expand |
On Mon, Apr 3, 2023 at 2:59 PM YiFei Zhu <zhuyifei@google.com> wrote: > > In some cases the loopback latency might be large enough, causing > the assertion on invocations to be run before ingress prog getting > executed. The assertion would fail and the test would flake. > > This can be reliably reproduced by arbitrarily increaing the loopback > latency (thanks to [1]): > tc qdisc add dev lo root handle 1: htb default 12 > tc class add dev lo parent 1:1 classid 1:12 htb rate 20kbps ceil 20kbps > tc qdisc add dev lo parent 1:12 netem delay 100ms > > Fix this by polling on the receive end and waiting for up to a > second, instead of instantly returning to the assert. > > [1] https://gist.github.com/kstevens715/4598301 > > Reported-by: Martin KaFai Lau <martin.lau@linux.dev> > Link: https://lore.kernel.org/bpf/9c5c8b7e-1d89-a3af-5400-14fde81f4429@linux.dev/ > Fixes: 3573f384014f ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress") > Signed-off-by: YiFei Zhu <zhuyifei@google.com> Thank you! Acked-by: Stanislav Fomichev <sdf@google.com> > --- > .../testing/selftests/bpf/prog_tests/cg_storage_multi.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c > index 621c57222191..3b0094a2a353 100644 > --- a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c > +++ b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c > @@ -7,6 +7,7 @@ > #include <test_progs.h> > #include <cgroup_helpers.h> > #include <network_helpers.h> > +#include <poll.h> > > #include "progs/cg_storage_multi.h" > > @@ -56,8 +57,9 @@ static bool assert_storage_noexist(struct bpf_map *map, const void *key) > > static bool connect_send(const char *cgroup_path) > { > - bool res = true; > int server_fd = -1, client_fd = -1; > + struct pollfd pollfd; > + bool res = true; > > if (join_cgroup(cgroup_path)) > goto out_clean; > @@ -73,6 +75,11 @@ static bool connect_send(const char *cgroup_path) > if (send(client_fd, "message", strlen("message"), 0) < 0) > goto out_clean; > > + pollfd.fd = server_fd; > + pollfd.events = POLLIN; > + if (poll(&pollfd, 1, 1000) != 1) > + goto out_clean; > + > res = false; > > out_clean: > -- > 2.40.0.348.gf938b09366-goog >
On 4/3/23 2:58 PM, YiFei Zhu wrote: > In some cases the loopback latency might be large enough, causing > the assertion on invocations to be run before ingress prog getting > executed. The assertion would fail and the test would flake. > > This can be reliably reproduced by arbitrarily increaing the loopback > latency (thanks to [1]): > tc qdisc add dev lo root handle 1: htb default 12 > tc class add dev lo parent 1:1 classid 1:12 htb rate 20kbps ceil 20kbps > tc qdisc add dev lo parent 1:12 netem delay 100ms > > Fix this by polling on the receive end and waiting for up to a > second, instead of instantly returning to the assert. > > [1] https://gist.github.com/kstevens715/4598301 > > Reported-by: Martin KaFai Lau <martin.lau@linux.dev> > Link: https://lore.kernel.org/bpf/9c5c8b7e-1d89-a3af-5400-14fde81f4429@linux.dev/ > Fixes: 3573f384014f ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress") > Signed-off-by: YiFei Zhu <zhuyifei@google.com> > --- > .../testing/selftests/bpf/prog_tests/cg_storage_multi.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c > index 621c57222191..3b0094a2a353 100644 > --- a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c > +++ b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c > @@ -7,6 +7,7 @@ > #include <test_progs.h> > #include <cgroup_helpers.h> > #include <network_helpers.h> > +#include <poll.h> > > #include "progs/cg_storage_multi.h" > > @@ -56,8 +57,9 @@ static bool assert_storage_noexist(struct bpf_map *map, const void *key) > > static bool connect_send(const char *cgroup_path) > { > - bool res = true; > int server_fd = -1, client_fd = -1; > + struct pollfd pollfd; > + bool res = true; > > if (join_cgroup(cgroup_path)) > goto out_clean; > @@ -73,6 +75,11 @@ static bool connect_send(const char *cgroup_path) > if (send(client_fd, "message", strlen("message"), 0) < 0) > goto out_clean; > > + pollfd.fd = server_fd; > + pollfd.events = POLLIN; > + if (poll(&pollfd, 1, 1000) != 1) > + goto out_clean; Thanks for the fix. The slowness explanation makes sense. A nit. All start_server() has a 3s SO_RCVTIMEO by default. How about a read() here instead of a poll(). Easier to change the default read timeout for all tests if needed.
diff --git a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c index 621c57222191..3b0094a2a353 100644 --- a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c +++ b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c @@ -7,6 +7,7 @@ #include <test_progs.h> #include <cgroup_helpers.h> #include <network_helpers.h> +#include <poll.h> #include "progs/cg_storage_multi.h" @@ -56,8 +57,9 @@ static bool assert_storage_noexist(struct bpf_map *map, const void *key) static bool connect_send(const char *cgroup_path) { - bool res = true; int server_fd = -1, client_fd = -1; + struct pollfd pollfd; + bool res = true; if (join_cgroup(cgroup_path)) goto out_clean; @@ -73,6 +75,11 @@ static bool connect_send(const char *cgroup_path) if (send(client_fd, "message", strlen("message"), 0) < 0) goto out_clean; + pollfd.fd = server_fd; + pollfd.events = POLLIN; + if (poll(&pollfd, 1, 1000) != 1) + goto out_clean; + res = false; out_clean:
In some cases the loopback latency might be large enough, causing the assertion on invocations to be run before ingress prog getting executed. The assertion would fail and the test would flake. This can be reliably reproduced by arbitrarily increaing the loopback latency (thanks to [1]): tc qdisc add dev lo root handle 1: htb default 12 tc class add dev lo parent 1:1 classid 1:12 htb rate 20kbps ceil 20kbps tc qdisc add dev lo parent 1:12 netem delay 100ms Fix this by polling on the receive end and waiting for up to a second, instead of instantly returning to the assert. [1] https://gist.github.com/kstevens715/4598301 Reported-by: Martin KaFai Lau <martin.lau@linux.dev> Link: https://lore.kernel.org/bpf/9c5c8b7e-1d89-a3af-5400-14fde81f4429@linux.dev/ Fixes: 3573f384014f ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress") Signed-off-by: YiFei Zhu <zhuyifei@google.com> --- .../testing/selftests/bpf/prog_tests/cg_storage_multi.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)