Message ID | 20240701225349.3395580-2-zijianzhang@bytedance.com (mailing list archive) |
---|---|
State | Accepted |
Commit | af2b7e5b741aaae9ffbba2c660def434e07aa241 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | fix OOM and order check in msg_zerocopy selftest | expand |
zijianzhang@ wrote: > From: Zijian Zhang <zijianzhang@bytedance.com> > > In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg > on a socket with MSG_ZEROCOPY flag, and it will recv the notifications > until the socket is not writable. Typically, it will start the receiving > process after around 30+ sendmsgs. However, as the introduction of commit > dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is > always writable and does not get any chance to run recv notifications. > The selftest always exits with OUT_OF_MEMORY because the memory used by > opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a > different value to trigger OOM on older kernels too. > > Thus, we introduce "cfg_notification_limit" to force sender to receive > notifications after some number of sendmsgs. > > Fixes: 07b65c5b31ce ("test: add msg_zerocopy test") > Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> > Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com> Reviewed-by: Willem de Bruijn <willemb@google.com>
On Mon, 1 Jul 2024 22:53:48 +0000 zijianzhang@bytedance.com wrote: > In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg > on a socket with MSG_ZEROCOPY flag, and it will recv the notifications > until the socket is not writable. Typically, it will start the receiving > process after around 30+ sendmsgs. However, as the introduction of commit > dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is > always writable and does not get any chance to run recv notifications. > The selftest always exits with OUT_OF_MEMORY because the memory used by > opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a > different value to trigger OOM on older kernels too. This test doesn't fail in netdev CI. Is the problem fix in net-next somehow? Or the "always exits with OUT_OF_MEMORY" is an exaggerations? (TBH I'm not even sure what it means to "exit with OUT_OF_MEMORY" in this context.) TAP version 13 1..1 # timeout set to 3600 # selftests: net: msg_zerocopy.sh # ipv4 tcp -t 1 # tx=164425 (10260 MB) txc=0 zc=n # rx=59526 (10260 MB) # ipv4 tcp -z -t 1 # tx=111332 (6947 MB) txc=111332 zc=n # rx=55245 (6947 MB) # ok # ipv6 tcp -t 1 # tx=168140 (10492 MB) txc=0 zc=n # rx=64633 (10492 MB) # ipv6 tcp -z -t 1 # tx=108388 (6763 MB) txc=108388 zc=n # rx=54146 (6763 MB) # ok # ipv4 udp -t 1 # tx=173970 (10856 MB) txc=0 zc=n # rx=173936 (10854 MB) # ipv4 udp -z -t 1 # tx=117728 (7346 MB) txc=117728 zc=n # rx=117703 (7345 MB) # ok # ipv6 udp -t 1 # tx=225502 (14072 MB) txc=0 zc=n # rx=225502 (14072 MB) # ipv6 udp -z -t 1 # tx=111215 (6940 MB) txc=111215 zc=n # rx=111212 (6940 MB) # ok # OK. All tests passed ok 1 selftests: net: msg_zerocopy.sh
On 7/3/24 6:50 PM, Jakub Kicinski wrote: > On Mon, 1 Jul 2024 22:53:48 +0000 zijianzhang@bytedance.com wrote: >> In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg >> on a socket with MSG_ZEROCOPY flag, and it will recv the notifications >> until the socket is not writable. Typically, it will start the receiving >> process after around 30+ sendmsgs. However, as the introduction of commit >> dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is >> always writable and does not get any chance to run recv notifications. >> The selftest always exits with OUT_OF_MEMORY because the memory used by >> opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a >> different value to trigger OOM on older kernels too. > > This test doesn't fail in netdev CI. Is the problem fix in net-next > somehow? Or the "always exits with OUT_OF_MEMORY" is an exaggerations? > (TBH I'm not even sure what it means to "exit with OUT_OF_MEMORY" in > this context.) > The reason why this test doesn't fail in CI: According to the test output, # ipv4 tcp -z -t 1 # tx=111332 (6947 MB) txc=111332 zc=n zerocopy is false here. This is because of some limitation of zerocopy in localhost. Specifically, the subsection "Notification Latency" in the sendmsg zerocopy the paper. In order to make "zc=y", we may need to update skb_orphan_frags_rx to the same as skb_orphan_frags, recompile the kernel, and run the test. By OUT_OF_MEMORY I mean: Each calling of sendmsg with zerocopy will allocate an skb with sock_omalloc. If users never recv the notifications but keep calling sendmsg with zerocopy. The send system call will finally return with -ENOMEM. I hope this clarifies your confusion :) > TAP version 13 > 1..1 > # timeout set to 3600 > # selftests: net: msg_zerocopy.sh > # ipv4 tcp -t 1 > # tx=164425 (10260 MB) txc=0 zc=n > # rx=59526 (10260 MB) > # ipv4 tcp -z -t 1 > # tx=111332 (6947 MB) txc=111332 zc=n > # rx=55245 (6947 MB) > # ok > # ipv6 tcp -t 1 > # tx=168140 (10492 MB) txc=0 zc=n > # rx=64633 (10492 MB) > # ipv6 tcp -z -t 1 > # tx=108388 (6763 MB) txc=108388 zc=n > # rx=54146 (6763 MB) > # ok > # ipv4 udp -t 1 > # tx=173970 (10856 MB) txc=0 zc=n > # rx=173936 (10854 MB) > # ipv4 udp -z -t 1 > # tx=117728 (7346 MB) txc=117728 zc=n > # rx=117703 (7345 MB) > # ok > # ipv6 udp -t 1 > # tx=225502 (14072 MB) txc=0 zc=n > # rx=225502 (14072 MB) > # ipv6 udp -z -t 1 > # tx=111215 (6940 MB) txc=111215 zc=n > # rx=111212 (6940 MB) > # ok > # OK. All tests passed > ok 1 selftests: net: msg_zerocopy.sh
On Wed, 3 Jul 2024 19:32:33 -0700 Zijian Zhang wrote: > > This test doesn't fail in netdev CI. Is the problem fix in net-next > > somehow? Or the "always exits with OUT_OF_MEMORY" is an exaggerations? > > (TBH I'm not even sure what it means to "exit with OUT_OF_MEMORY" in > > this context.) > > > The reason why this test doesn't fail in CI: > > According to the test output, > # ipv4 tcp -z -t 1 > # tx=111332 (6947 MB) txc=111332 zc=n > zerocopy is false here. > > This is because of some limitation of zerocopy in localhost. > Specifically, the subsection "Notification Latency" in the sendmsg > zerocopy the paper. > > In order to make "zc=y", we may need to update skb_orphan_frags_rx to > the same as skb_orphan_frags, recompile the kernel, and run the test. > > By OUT_OF_MEMORY I mean: > > Each calling of sendmsg with zerocopy will allocate an skb with > sock_omalloc. If users never recv the notifications but keep calling > sendmsg with zerocopy. The send system call will finally return with > -ENOMEM. > > I hope this clarifies your confusion :) It does, thanks!
diff --git a/tools/testing/selftests/net/msg_zerocopy.c b/tools/testing/selftests/net/msg_zerocopy.c index bdc03a2097e8..926556febc83 100644 --- a/tools/testing/selftests/net/msg_zerocopy.c +++ b/tools/testing/selftests/net/msg_zerocopy.c @@ -85,6 +85,7 @@ static bool cfg_rx; static int cfg_runtime_ms = 4200; static int cfg_verbose; static int cfg_waittime_ms = 500; +static int cfg_notification_limit = 32; static bool cfg_zerocopy; static socklen_t cfg_alen; @@ -95,6 +96,7 @@ static char payload[IP_MAXPACKET]; static long packets, bytes, completions, expected_completions; static int zerocopied = -1; static uint32_t next_completion; +static uint32_t sends_since_notify; static unsigned long gettimeofday_ms(void) { @@ -208,6 +210,7 @@ static bool do_sendmsg(int fd, struct msghdr *msg, bool do_zerocopy, int domain) error(1, errno, "send"); if (cfg_verbose && ret != len) fprintf(stderr, "send: ret=%u != %u\n", ret, len); + sends_since_notify++; if (len) { packets++; @@ -460,6 +463,7 @@ static bool do_recv_completion(int fd, int domain) static void do_recv_completions(int fd, int domain) { while (do_recv_completion(fd, domain)) {} + sends_since_notify = 0; } /* Wait for all remaining completions on the errqueue */ @@ -549,6 +553,9 @@ static void do_tx(int domain, int type, int protocol) else do_sendmsg(fd, &msg, cfg_zerocopy, domain); + if (cfg_zerocopy && sends_since_notify >= cfg_notification_limit) + do_recv_completions(fd, domain); + while (!do_poll(fd, POLLOUT)) { if (cfg_zerocopy) do_recv_completions(fd, domain); @@ -708,7 +715,7 @@ static void parse_opts(int argc, char **argv) cfg_payload_len = max_payload_len; - while ((c = getopt(argc, argv, "46c:C:D:i:mp:rs:S:t:vz")) != -1) { + while ((c = getopt(argc, argv, "46c:C:D:i:l:mp:rs:S:t:vz")) != -1) { switch (c) { case '4': if (cfg_family != PF_UNSPEC) @@ -736,6 +743,9 @@ static void parse_opts(int argc, char **argv) if (cfg_ifindex == 0) error(1, errno, "invalid iface: %s", optarg); break; + case 'l': + cfg_notification_limit = strtoul(optarg, NULL, 0); + break; case 'm': cfg_cork_mixed = true; break;