Message ID | 20230125074901.2737-1-magnus.karlsson@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | net: xdp: execute xdp_do_flush() before napi_complete_done() | expand |
On Wed, Jan 25, 2023 at 08:48:56AM +0100, Magnus Karlsson wrote: > Make sure that xdp_do_flush() is always executed before > napi_complete_done(). This is important for two reasons. First, a > redirect to an XSKMAP assumes that a call to xdp_do_redirect() from > napi context X on CPU Y will be followed by a xdp_do_flush() from the > same napi context and CPU. This is not guaranteed if the > napi_complete_done() is executed before xdp_do_flush(), as it tells > the napi logic that it is fine to schedule napi context X on another > CPU. Details from a production system triggering this bug using the > veth driver can be found in [1]. > > The second reason is that the XDP_REDIRECT logic in itself relies on > being inside a single NAPI instance through to the xdp_do_flush() call > for RCU protection of all in-kernel data structures. Details can be > found in [2]. > > The drivers have only been compile-tested since I do not own any of > the HW below. So if you are a maintainer, it would be great if you > could take a quick look to make sure I did not mess something up. > > Note that these were the drivers I found that violated the ordering by > running a simple script and manually checking the ones that came up as > potential offenders. But the script was not perfect in any way. There > might still be offenders out there, since the script can generate > false negatives. BTW all this series is stable material, right? > v1 -> v2: > * Added acks [Toke, Steen] > * Corrected two spelling errors [Toke] > > [1] https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com > [2] https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/ > > Thanks: Magnus > > Magnus Karlsson (5): > qede: execute xdp_do_flush() before napi_complete_done() > lan966x: execute xdp_do_flush() before napi_complete_done() > virtio-net: execute xdp_do_flush() before napi_complete_done() > dpaa_eth: execute xdp_do_flush() before napi_complete_done() > dpaa2-eth: execute xdp_do_flush() before napi_complete_done() > > drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 6 +++--- > drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 9 ++++++--- > drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c | 6 +++--- > drivers/net/ethernet/qlogic/qede/qede_fp.c | 7 ++++--- > drivers/net/virtio_net.c | 6 +++--- > 5 files changed, 19 insertions(+), 15 deletions(-) > > > base-commit: 2a48216cff7a2e3964fbed16f84d33f68b3e5e42 > -- > 2.34.1
Hello: This series was applied to netdev/net.git (master) by Jakub Kicinski <kuba@kernel.org>: On Wed, 25 Jan 2023 08:48:56 +0100 you wrote: > Make sure that xdp_do_flush() is always executed before > napi_complete_done(). This is important for two reasons. First, a > redirect to an XSKMAP assumes that a call to xdp_do_redirect() from > napi context X on CPU Y will be followed by a xdp_do_flush() from the > same napi context and CPU. This is not guaranteed if the > napi_complete_done() is executed before xdp_do_flush(), as it tells > the napi logic that it is fine to schedule napi context X on another > CPU. Details from a production system triggering this bug using the > veth driver can be found in [1]. > > [...] Here is the summary with links: - [net,v2,1/5] qede: execute xdp_do_flush() before napi_complete_done() https://git.kernel.org/netdev/net/c/2ccce20d51fa - [net,v2,2/5] lan966x: execute xdp_do_flush() before napi_complete_done() https://git.kernel.org/netdev/net/c/12b5717990c8 - [net,v2,3/5] virtio-net: execute xdp_do_flush() before napi_complete_done() https://git.kernel.org/netdev/net/c/ad7e615f646c - [net,v2,4/5] dpaa_eth: execute xdp_do_flush() before napi_complete_done() https://git.kernel.org/netdev/net/c/b534013798b7 - [net,v2,5/5] dpaa2-eth: execute xdp_do_flush() before napi_complete_done() https://git.kernel.org/netdev/net/c/a3191c4d86c5 You are awesome, thank you!