Message ID | 161917591996.102337.9559803697014955421.stgit@toke.dk (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | Clean up and document RCU-based object protection for XDP_REDIRECT | expand |
On Fri, Apr 23, 2021 at 01:05:20PM +0200, Toke Høiland-Jørgensen wrote: > From: Toke Høiland-Jørgensen <toke@redhat.com> > > The i40e driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP > program invocations. However, the actual lifetime of the objects referred > by the XDP program invocation is longer, all the way through to the call to > xdp_do_flush(), making the scope of the rcu_read_lock() too small. This > turns out to be harmless because it all happens in a single NAPI poll > cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() > misleading. Okay, but what about the lifetime of the xdp_prog itself? Can xdp_prog change within a single NAPI poll? After reading previous discussions I would say it can't, right? There are drivers that have a big RCU critical section in NAPI poll, but it seems that some read a xdp_prog a single time whereas others read it per processed frame. If we are sure that xdp_prog can't change on-the-fly then first low hanging fruit, at least for the Intel drivers, is to skip a test against NULL and read it only once at the beginning of NAPI poll. There might be also other micro-optimizations specific to each drivers that could be done based on that (that of course read the xdp_prog per each frame). Or am I nuts? > > Rather than extend the scope of the rcu_read_lock(), just get rid of it > entirely. With the addition of RCU annotations to the XDP_REDIRECT map > types that take bh execution into account, lockdep even understands this to > be safe, so there's really no reason to keep it around. > > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> > --- > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 -- > drivers/net/ethernet/intel/i40e/i40e_xsk.c | 6 +----- > 2 files changed, 1 insertion(+), 7 deletions(-) > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c > index fc20afc23bfa..3f4c947a5185 100644 > --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c > +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c > @@ -2303,7 +2303,6 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring, > struct bpf_prog *xdp_prog; > u32 act; > > - rcu_read_lock(); > xdp_prog = READ_ONCE(rx_ring->xdp_prog); > > if (!xdp_prog) > @@ -2334,7 +2333,6 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring, > break; > } > xdp_out: > - rcu_read_unlock(); > return ERR_PTR(-result); > } > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c > index d89c22347d9d..93b349f11d3b 100644 > --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c > +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c > @@ -153,7 +153,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) > struct bpf_prog *xdp_prog; > u32 act; > > - rcu_read_lock(); > /* NB! xdp_prog will always be !NULL, due to the fact that > * this path is enabled by setting an XDP program. > */ > @@ -162,9 +161,7 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) > > if (likely(act == XDP_REDIRECT)) { > err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); > - result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED; > - rcu_read_unlock(); > - return result; > + return !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED; > } > > switch (act) { > @@ -184,7 +181,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) > result = I40E_XDP_CONSUMED; > break; > } > - rcu_read_unlock(); > return result; > } > >
Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > On Fri, Apr 23, 2021 at 01:05:20PM +0200, Toke Høiland-Jørgensen wrote: >> From: Toke Høiland-Jørgensen <toke@redhat.com> >> >> The i40e driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP >> program invocations. However, the actual lifetime of the objects referred >> by the XDP program invocation is longer, all the way through to the call to >> xdp_do_flush(), making the scope of the rcu_read_lock() too small. This >> turns out to be harmless because it all happens in a single NAPI poll >> cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() >> misleading. > > Okay, but what about the lifetime of the xdp_prog itself? Can xdp_prog > change within a single NAPI poll? After reading previous discussions I > would say it can't, right? Well, bpf_prog objects are also RCU-protected so it's at least guaranteed to stay alive until the end of the NAPI poll. But I don't think there's anything preventing the program from being changed in the middle of a NAPI poll. > There are drivers that have a big RCU critical section in NAPI poll, but it > seems that some read a xdp_prog a single time whereas others read it per > processed frame. > > If we are sure that xdp_prog can't change on-the-fly then first low > hanging fruit, at least for the Intel drivers, is to skip a test against > NULL and read it only once at the beginning of NAPI poll. There might be > also other micro-optimizations specific to each drivers that could be done > based on that (that of course read the xdp_prog per each frame). I think the main problem this could cause is that the dispatcher code could have replaced the program in the dispatcher trampoline while the driver was still using it, which would hurt performance. However, ultimately this is under the control of the driver, since the program install is a driver op. For instance, i40e_xdp_setup() does a conditional synchronize_rcu() after removing a program; making this unconditional (and maybe moving it after the writes to the rx_ring prog pointers?) would ensure that the NAPI cycle had ended before the bpf_op() call in dev_xdp_install(), which would delay the trampoline replace. I guess there could then be a window where the new program is being used but has not been installed into the trampoline yet, then, so maybe delaying that replace is not actually terribly important? Adding Björn, maybe he has a better idea. > Or am I nuts? No I don't think so :) I guess it remains to be seen whether there's a real performance benefit, but at least I don't think there would be any safety or correctness issues with attempting this. -Toke
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index fc20afc23bfa..3f4c947a5185 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2303,7 +2303,6 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring, struct bpf_prog *xdp_prog; u32 act; - rcu_read_lock(); xdp_prog = READ_ONCE(rx_ring->xdp_prog); if (!xdp_prog) @@ -2334,7 +2333,6 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring, break; } xdp_out: - rcu_read_unlock(); return ERR_PTR(-result); } diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index d89c22347d9d..93b349f11d3b 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -153,7 +153,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) struct bpf_prog *xdp_prog; u32 act; - rcu_read_lock(); /* NB! xdp_prog will always be !NULL, due to the fact that * this path is enabled by setting an XDP program. */ @@ -162,9 +161,7 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) if (likely(act == XDP_REDIRECT)) { err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); - result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED; - rcu_read_unlock(); - return result; + return !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED; } switch (act) { @@ -184,7 +181,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) result = I40E_XDP_CONSUMED; break; } - rcu_read_unlock(); return result; }