diff mbox series

[RFC,bpf-next,4/4] i40e: remove rcu_read_lock() around XDP program invocation

Message ID 161917591996.102337.9559803697014955421.stgit@toke.dk (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series Clean up and document RCU-based object protection for XDP_REDIRECT | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 12 maintainers not CCed: anthony.l.nguyen@intel.com yhs@fb.com kpsingh@kernel.org daniel@iogearbox.net andrii@kernel.org ast@kernel.org john.fastabend@gmail.com intel-wired-lan@lists.osuosl.org songliubraving@fb.com davem@davemloft.net jesse.brandeburg@intel.com kuba@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 38 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link

Commit Message

Toke Høiland-Jørgensen April 23, 2021, 11:05 a.m. UTC
From: Toke Høiland-Jørgensen <toke@redhat.com>

The i40e driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
program invocations. However, the actual lifetime of the objects referred
by the XDP program invocation is longer, all the way through to the call to
xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
turns out to be harmless because it all happens in a single NAPI poll
cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |    2 --
 drivers/net/ethernet/intel/i40e/i40e_xsk.c  |    6 +-----
 2 files changed, 1 insertion(+), 7 deletions(-)

Comments

Fijalkowski, Maciej April 23, 2021, 1:57 p.m. UTC | #1
On Fri, Apr 23, 2021 at 01:05:20PM +0200, Toke Høiland-Jørgensen wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
> 
> The i40e driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
> program invocations. However, the actual lifetime of the objects referred
> by the XDP program invocation is longer, all the way through to the call to
> xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
> turns out to be harmless because it all happens in a single NAPI poll
> cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
> misleading.

Okay, but what about the lifetime of the xdp_prog itself? Can xdp_prog
change within a single NAPI poll? After reading previous discussions I
would say it can't, right?

There are drivers that have a big RCU critical section in NAPI poll, but it
seems that some read a xdp_prog a single time whereas others read it per
processed frame.

If we are sure that xdp_prog can't change on-the-fly then first low
hanging fruit, at least for the Intel drivers, is to skip a test against
NULL and read it only once at the beginning of NAPI poll. There might be
also other micro-optimizations specific to each drivers that could be done
based on that (that of course read the xdp_prog per each frame).

Or am I nuts?

> 
> Rather than extend the scope of the rcu_read_lock(), just get rid of it
> entirely. With the addition of RCU annotations to the XDP_REDIRECT map
> types that take bh execution into account, lockdep even understands this to
> be safe, so there's really no reason to keep it around.
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c |    2 --
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c  |    6 +-----
>  2 files changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index fc20afc23bfa..3f4c947a5185 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -2303,7 +2303,6 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
>  	struct bpf_prog *xdp_prog;
>  	u32 act;
>  
> -	rcu_read_lock();
>  	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
>  
>  	if (!xdp_prog)
> @@ -2334,7 +2333,6 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
>  		break;
>  	}
>  xdp_out:
> -	rcu_read_unlock();
>  	return ERR_PTR(-result);
>  }
>  
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index d89c22347d9d..93b349f11d3b 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -153,7 +153,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
>  	struct bpf_prog *xdp_prog;
>  	u32 act;
>  
> -	rcu_read_lock();
>  	/* NB! xdp_prog will always be !NULL, due to the fact that
>  	 * this path is enabled by setting an XDP program.
>  	 */
> @@ -162,9 +161,7 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
>  
>  	if (likely(act == XDP_REDIRECT)) {
>  		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
> -		result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
> -		rcu_read_unlock();
> -		return result;
> +		return !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
>  	}
>  
>  	switch (act) {
> @@ -184,7 +181,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
>  		result = I40E_XDP_CONSUMED;
>  		break;
>  	}
> -	rcu_read_unlock();
>  	return result;
>  }
>  
>
Toke Høiland-Jørgensen April 23, 2021, 8:33 p.m. UTC | #2
Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:

> On Fri, Apr 23, 2021 at 01:05:20PM +0200, Toke Høiland-Jørgensen wrote:
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> 
>> The i40e driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
>> program invocations. However, the actual lifetime of the objects referred
>> by the XDP program invocation is longer, all the way through to the call to
>> xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
>> turns out to be harmless because it all happens in a single NAPI poll
>> cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
>> misleading.
>
> Okay, but what about the lifetime of the xdp_prog itself? Can xdp_prog
> change within a single NAPI poll? After reading previous discussions I
> would say it can't, right?

Well, bpf_prog objects are also RCU-protected so it's at least
guaranteed to stay alive until the end of the NAPI poll. But I don't
think there's anything preventing the program from being changed in the
middle of a NAPI poll.

> There are drivers that have a big RCU critical section in NAPI poll, but it
> seems that some read a xdp_prog a single time whereas others read it per
> processed frame.
>
> If we are sure that xdp_prog can't change on-the-fly then first low
> hanging fruit, at least for the Intel drivers, is to skip a test against
> NULL and read it only once at the beginning of NAPI poll. There might be
> also other micro-optimizations specific to each drivers that could be done
> based on that (that of course read the xdp_prog per each frame).

I think the main problem this could cause is that the dispatcher code
could have replaced the program in the dispatcher trampoline while the
driver was still using it, which would hurt performance. However,
ultimately this is under the control of the driver, since the program
install is a driver op. For instance, i40e_xdp_setup() does a
conditional synchronize_rcu() after removing a program; making this
unconditional (and maybe moving it after the writes to the rx_ring prog
pointers?) would ensure that the NAPI cycle had ended before the
bpf_op() call in dev_xdp_install(), which would delay the trampoline
replace.

I guess there could then be a window where the new program is being used
but has not been installed into the trampoline yet, then, so maybe
delaying that replace is not actually terribly important? Adding Björn,
maybe he has a better idea.

> Or am I nuts?

No I don't think so :)

I guess it remains to be seen whether there's a real performance
benefit, but at least I don't think there would be any safety or
correctness issues with attempting this.

-Toke
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index fc20afc23bfa..3f4c947a5185 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2303,7 +2303,6 @@  static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
 	struct bpf_prog *xdp_prog;
 	u32 act;
 
-	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 
 	if (!xdp_prog)
@@ -2334,7 +2333,6 @@  static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
 		break;
 	}
 xdp_out:
-	rcu_read_unlock();
 	return ERR_PTR(-result);
 }
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index d89c22347d9d..93b349f11d3b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -153,7 +153,6 @@  static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	struct bpf_prog *xdp_prog;
 	u32 act;
 
-	rcu_read_lock();
 	/* NB! xdp_prog will always be !NULL, due to the fact that
 	 * this path is enabled by setting an XDP program.
 	 */
@@ -162,9 +161,7 @@  static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 
 	if (likely(act == XDP_REDIRECT)) {
 		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
-		result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
-		rcu_read_unlock();
-		return result;
+		return !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
 	}
 
 	switch (act) {
@@ -184,7 +181,6 @@  static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 		result = I40E_XDP_CONSUMED;
 		break;
 	}
-	rcu_read_unlock();
 	return result;
 }