Message ID | 20250329180541.34968-1-kuniyu@amazon.com (mailing list archive) |
---|---|
Headers | show |
Series | udp: Fix two integer overflows when sk->sk_rcvbuf is close to INT_MAX. | expand |
On Sat, 29 Mar 2025 11:05:10 -0700 Kuniyuki Iwashima wrote: > v4: > * Patch 4 > * Wait RCU for at most 30 sec The new test still doesn't pass TAP version 13 1..1 # timeout set to 3600 # selftests: net: so_rcvbuf # 0.00 [+0.00] TAP version 13 # 0.00 [+0.00] 1..2 # 0.00 [+0.00] # Starting 2 tests from 2 test cases. # 0.00 [+0.00] # RUN so_rcvbuf.udp_ipv4.rmem_max ... # 0.00 [+0.00] # so_rcvbuf.c:151:rmem_max:Expected pages (35) == 0 (0) # 0.00 [+0.00] # rmem_max: Test terminated by assertion # 0.00 [+0.00] # FAIL so_rcvbuf.udp_ipv4.rmem_max # 0.00 [+0.00] not ok 1 so_rcvbuf.udp_ipv4.rmem_max # 0.01 [+0.00] # RUN so_rcvbuf.udp_ipv6.rmem_max ... # 0.01 [+0.00] # so_rcvbuf.c:151:rmem_max:Expected pages (35) == 0 (0) # 0.01 [+0.00] # rmem_max: Test terminated by assertion # 0.01 [+0.00] # FAIL so_rcvbuf.udp_ipv6.rmem_max # 0.01 [+0.00] not ok 2 so_rcvbuf.udp_ipv6.rmem_max # 0.01 [+0.00] # FAILED: 0 / 2 tests passed. # 0.01 [+0.00] # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0 not ok 1 selftests: net: so_rcvbuf # exit=1 Plus we also see failures in udpgso.sh
From: Jakub Kicinski <kuba@kernel.org> Date: Mon, 31 Mar 2025 06:51:09 -0700 > On Sat, 29 Mar 2025 11:05:10 -0700 Kuniyuki Iwashima wrote: > > v4: > > * Patch 4 > > * Wait RCU for at most 30 sec > > The new test still doesn't pass > > TAP version 13 > 1..1 > # timeout set to 3600 > # selftests: net: so_rcvbuf > # 0.00 [+0.00] TAP version 13 > # 0.00 [+0.00] 1..2 > # 0.00 [+0.00] # Starting 2 tests from 2 test cases. > # 0.00 [+0.00] # RUN so_rcvbuf.udp_ipv4.rmem_max ... > # 0.00 [+0.00] # so_rcvbuf.c:151:rmem_max:Expected pages (35) == 0 (0) > # 0.00 [+0.00] # rmem_max: Test terminated by assertion > # 0.00 [+0.00] # FAIL so_rcvbuf.udp_ipv4.rmem_max > # 0.00 [+0.00] not ok 1 so_rcvbuf.udp_ipv4.rmem_max > # 0.01 [+0.00] # RUN so_rcvbuf.udp_ipv6.rmem_max ... > # 0.01 [+0.00] # so_rcvbuf.c:151:rmem_max:Expected pages (35) == 0 (0) > # 0.01 [+0.00] # rmem_max: Test terminated by assertion > # 0.01 [+0.00] # FAIL so_rcvbuf.udp_ipv6.rmem_max > # 0.01 [+0.00] not ok 2 so_rcvbuf.udp_ipv6.rmem_max > # 0.01 [+0.00] # FAILED: 0 / 2 tests passed. > # 0.01 [+0.00] # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0 > not ok 1 selftests: net: so_rcvbuf # exit=1 > > > Plus we also see failures in udpgso.sh I forgot to update `size` when skb_condense() is called. Without the change I saw the new test stuck at 1 page after udpgso.sh, but with the change both passed. Will post v5. Thanks!
On Mon, 31 Mar 2025 11:21:34 -0700 Kuniyuki Iwashima wrote: > > Plus we also see failures in udpgso.sh > > I forgot to update `size` when skb_condense() is called. > > Without the change I saw the new test stuck at 1 page after > udpgso.sh, but with the change both passed. > > Will post v5. Please do test locally if you can.
From: Jakub Kicinski <kuba@kernel.org> Date: Mon, 31 Mar 2025 11:36:03 -0700 > On Mon, 31 Mar 2025 11:21:34 -0700 Kuniyuki Iwashima wrote: > > > Plus we also see failures in udpgso.sh > > > > I forgot to update `size` when skb_condense() is called. > > > > Without the change I saw the new test stuck at 1 page after > > udpgso.sh, but with the change both passed. > > > > Will post v5. > > Please do test locally if you can. Sure, will try the same tests with CI.
From: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Mon, 31 Mar 2025 11:54:53 -0700 > From: Jakub Kicinski <kuba@kernel.org> > Date: Mon, 31 Mar 2025 11:36:03 -0700 > > On Mon, 31 Mar 2025 11:21:34 -0700 Kuniyuki Iwashima wrote: > > > > Plus we also see failures in udpgso.sh > > > > > > I forgot to update `size` when skb_condense() is called. > > > > > > Without the change I saw the new test stuck at 1 page after > > > udpgso.sh, but with the change both passed. > > > > > > Will post v5. > > > > Please do test locally if you can. > > Sure, will try the same tests with CI. Is there a way to tell NIPA to run a test in a dedicated VM ? I see some tests succeed when executed solely but fail when executed with make -C tools/testing/selftests/ TARGETS=net run_tests When combined with other tests, assuming that the global UDP usage will soon drop to 0 is not always easy... so it's defeating the purpose but I'd drop the test in v5 not to make CI unhappy.
On Mon, 31 Mar 2025 13:31:47 -0700 Kuniyuki Iwashima wrote: > > > Please do test locally if you can. > > > > Sure, will try the same tests with CI. > > Is there a way to tell NIPA to run a test in a dedicated VM ? > > I see some tests succeed when executed solely but fail when > executed with > > make -C tools/testing/selftests/ TARGETS=net run_tests > > When combined with other tests, assuming that the global UDP usage > will soon drop to 0 is not always easy... so it's defeating the > purpose but I'd drop the test in v5 not to make CI unhappy. Can we account for some level of system noise? Or try to dump all the sockets and count the "accounted for" in-use memory? We can do various things in NIPA, but I'm not sure if it's okay for tests inside net/ should require a completely idle system. If we want a completely idle system maybe user-mode Linux + kunit is a better direction? Willem, WDYT?
Jakub Kicinski wrote: > On Mon, 31 Mar 2025 13:31:47 -0700 Kuniyuki Iwashima wrote: > > > > Please do test locally if you can. > > > > > > Sure, will try the same tests with CI. > > > > Is there a way to tell NIPA to run a test in a dedicated VM ? > > > > I see some tests succeed when executed solely but fail when > > executed with > > > > make -C tools/testing/selftests/ TARGETS=net run_tests > > > > When combined with other tests, assuming that the global UDP usage > > will soon drop to 0 is not always easy... so it's defeating the > > purpose but I'd drop the test in v5 not to make CI unhappy. > > Can we account for some level of system noise? Or try to dump all > the sockets and count the "accounted for" in-use memory? > > We can do various things in NIPA, but I'm not sure if it's okay > for tests inside net/ should require a completely idle system. > If we want a completely idle system maybe user-mode Linux + kunit > is a better direction? > > Willem, WDYT? The number of tests depending on global variables like proto_memory_allocated is thankfully low. kselftest/runner.sh runs RUN_IN_NETNS tests in parallel. That sounds the case here. Perhaps we can add a test option to force running without concurrent other tests? Otherwise, the specific test drops usage from MAX to 0. And verifies to reach MAX before exiting its loop. Other concurrent tests are unlikely to spike very high. It might just be sufficient to relax that ASSERT to something a bit higher than 0, but a far cry from INT_MAX, to mean "clearly no longer stressed".
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Date: Mon, 31 Mar 2025 21:15:24 -0400 > Jakub Kicinski wrote: > > On Mon, 31 Mar 2025 13:31:47 -0700 Kuniyuki Iwashima wrote: > > > > > Please do test locally if you can. > > > > > > > > Sure, will try the same tests with CI. > > > > > > Is there a way to tell NIPA to run a test in a dedicated VM ? > > > > > > I see some tests succeed when executed solely but fail when > > > executed with > > > > > > make -C tools/testing/selftests/ TARGETS=net run_tests > > > > > > When combined with other tests, assuming that the global UDP usage > > > will soon drop to 0 is not always easy... so it's defeating the > > > purpose but I'd drop the test in v5 not to make CI unhappy. > > > > Can we account for some level of system noise? Or try to dump all > > the sockets and count the "accounted for" in-use memory? > > > > We can do various things in NIPA, but I'm not sure if it's okay > > for tests inside net/ should require a completely idle system. > > If we want a completely idle system maybe user-mode Linux + kunit > > is a better direction? > > > > Willem, WDYT? > > The number of tests depending on global variables like > proto_memory_allocated is thankfully low. > > kselftest/runner.sh runs RUN_IN_NETNS tests in parallel. That sounds > the case here. Perhaps we can add a test option to force running > without concurrent other tests? > > Otherwise, the specific test drops usage from MAX to 0. And verifies > to reach MAX before exiting its loop. Yes, and we need to query rmem_alloc via netlink. Also, I haven't investigated, but I saw a weird timeout, when the usage stuck at 523914, which is smaller than INT_MAX >> PAGE_SHIFT. Probably the test needs to check sockets' rmem_alloc to be accurate. I'll drop the test for now and follow up in net-next. Thanks!