Message ID | cover.1651071843.git.asml.silence@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | UDP/IPv6 refactoring | expand |
On Thu, 2022-04-28 at 11:56 +0100, Pavel Begunkov wrote: > Refactor UDP/IPv6 and especially udpv6_sendmsg() paths. The end result looks > cleaner than it was before and the series also removes a bunch of instructions > and other overhead from the hot path positively affecting performance. > > It was a part of a larger series, there were some perf numbers for it, see > https://lore.kernel.org/netdev/cover.1648981570.git.asml.silence@gmail.com/ > > Pavel Begunkov (11): > ipv6: optimise ipcm6 cookie init > udp/ipv6: refactor udpv6_sendmsg udplite checks > udp/ipv6: move pending section of udpv6_sendmsg > udp/ipv6: prioritise the ip6 path over ip4 checks > udp/ipv6: optimise udpv6_sendmsg() daddr checks > udp/ipv6: optimise out daddr reassignment > udp/ipv6: clean up udpv6_sendmsg's saddr init > ipv6: partially inline fl6_update_dst() > ipv6: refactor opts push in __ip6_make_skb() > ipv6: improve opt-less __ip6_make_skb() > ipv6: clean up ip6_setup_cork > > include/net/ipv6.h | 24 +++---- > net/ipv6/datagram.c | 4 +- > net/ipv6/exthdrs.c | 15 ++-- > net/ipv6/ip6_output.c | 53 +++++++------- > net/ipv6/raw.c | 8 +-- > net/ipv6/udp.c | 158 ++++++++++++++++++++---------------------- > net/l2tp/l2tp_ip6.c | 8 +-- > 7 files changed, 122 insertions(+), 148 deletions(-) Just a general comment here: IMHO the above diffstat is quite significant and some patches looks completely non trivial to me. I think we need a quite significant performance gain to justify the above, could you please share your performace data, comprising the testing scenario? Thanks! Paolo
On 4/28/22 15:04, Paolo Abeni wrote: > On Thu, 2022-04-28 at 11:56 +0100, Pavel Begunkov wrote: >> Refactor UDP/IPv6 and especially udpv6_sendmsg() paths. The end result looks >> cleaner than it was before and the series also removes a bunch of instructions >> and other overhead from the hot path positively affecting performance. >> >> It was a part of a larger series, there were some perf numbers for it, see >> https://lore.kernel.org/netdev/cover.1648981570.git.asml.silence@gmail.com/ >> >> Pavel Begunkov (11): >> ipv6: optimise ipcm6 cookie init >> udp/ipv6: refactor udpv6_sendmsg udplite checks >> udp/ipv6: move pending section of udpv6_sendmsg >> udp/ipv6: prioritise the ip6 path over ip4 checks >> udp/ipv6: optimise udpv6_sendmsg() daddr checks >> udp/ipv6: optimise out daddr reassignment >> udp/ipv6: clean up udpv6_sendmsg's saddr init >> ipv6: partially inline fl6_update_dst() >> ipv6: refactor opts push in __ip6_make_skb() >> ipv6: improve opt-less __ip6_make_skb() >> ipv6: clean up ip6_setup_cork >> >> include/net/ipv6.h | 24 +++---- >> net/ipv6/datagram.c | 4 +- >> net/ipv6/exthdrs.c | 15 ++-- >> net/ipv6/ip6_output.c | 53 +++++++------- >> net/ipv6/raw.c | 8 +-- >> net/ipv6/udp.c | 158 ++++++++++++++++++++---------------------- >> net/l2tp/l2tp_ip6.c | 8 +-- >> 7 files changed, 122 insertions(+), 148 deletions(-) > > Just a general comment here: IMHO the above diffstat is quite > significant and some patches looks completely non trivial to me. > > I think we need a quite significant performance gain to justify the > above, could you please share your performace data, comprising the > testing scenario? As mentioned I benchmarked it with a UDP/IPv6 max throughput kind of test and only as a part of a larger series [1]. It was "2090K vs 2229K tx/s, +6.6%". Taking into account +3% from split out sock_wfree optimisations, half if not most of the rest should be accounted to this series, so a bit hand-wavingly +1-3%. Can spend some extra time retesting this particular series if strongly required... I was using [2], which is basically an io_uring copy of send paths of selftests/net/msg_zerocopy. Should be visible with other tools, this one just alleviates context switch / etc. overhead with io_uring. ./send-zc -6 udp -D <address> -t <time> -s16 -z0 It sends a number of 16 bytes UDP/ipv6 (non-zerocopy) send requests over io_uring, then waits for them and repeats. It was 8 (default) requests per iteration (i.e. syscall). I was using dummy netdev, so there is no actual receiver, but it quite correlates with my server setup with mlx cards, just takes more effort for me to test. And all with mitigations=off There might be some fatter targets to optimise, but udpv6_sendmsg() and functions around take a good chunk of cycles as well, though without particular hotspots. If we'd want some better justification than 1-3%, then need to add more work on top, adding even more to diffstat... vicious cycle. [1] https://lore.kernel.org/netdev/cover.1648981570.git.asml.silence@gmail.com/ [2] https://github.com/isilence/liburing/blob/zc_v3/test/send-zc.c