diff mbox series

[net,4/8] i40e: Fix XDP program unloading while removing the driver

Message ID 20240528-net-2024-05-28-intel-net-fixes-v1-4-dc8593d2bbc6@intel.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Intel Wired LAN Driver Updates 2024-05-28 (e1000e, i40e, ice) | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 45 this patch: 45
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 3 blamed authors not CCed: sylwesterx.dziedziuch@intel.com slawomirx.laba@intel.com anthony.l.nguyen@intel.com; 12 maintainers not CCed: hawk@kernel.org pabeni@redhat.com edumazet@google.com john.fastabend@gmail.com slawomirx.laba@intel.com intel-wired-lan@lists.osuosl.org ast@kernel.org daniel@iogearbox.net anthony.l.nguyen@intel.com bpf@vger.kernel.org sylwesterx.dziedziuch@intel.com jesse.brandeburg@intel.com
netdev/build_clang fail Errors and warnings before: 36 this patch: 36
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn fail Errors and warnings before: 45 this patch: 45
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 34 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 211 this patch: 211
netdev/source_inline success Was 0 now: 0

Commit Message

Jacob Keller May 28, 2024, 10:06 p.m. UTC
From: Michal Kubiak <michal.kubiak@intel.com>

The commit 6533e558c650 ("i40e: Fix reset path while removing
the driver") introduced a new PF state "__I40E_IN_REMOVE" to block
modifying the XDP program while the driver is being removed.
Unfortunately, such a change is useful only if the ".ndo_bpf()"
callback was called out of the rmmod context because unloading the
existing XDP program is also a part of driver removing procedure.
In other words, from the rmmod context the driver is expected to
unload the XDP program without reporting any errors. Otherwise,
the kernel warning with callstack is printed out to dmesg.

Example failing scenario:
 1. Load the i40e driver.
 2. Load the XDP program.
 3. Unload the i40e driver (using "rmmod" command).

Fix this by improving checks in ".ndo_bpf()" to determine if that
callback was called from the removing context and if the kernel
wants to unload the XDP program. Allow for unloading the XDP program
in such a case.

Fixes: 6533e558c650 ("i40e: Fix reset path while removing the driver")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

Comments

Jakub Kicinski May 30, 2024, 1:54 a.m. UTC | #1
On Tue, 28 May 2024 15:06:07 -0700 Jacob Keller wrote:
> +	/* Called from netdev unregister context. Unload the XDP program. */
> +	if (vsi->netdev->reg_state == NETREG_UNREGISTERING) {
> +		xdp_features_clear_redirect_target(vsi->netdev);
> +		old_prog = xchg(&vsi->xdp_prog, NULL);
> +		if (old_prog)
> +			bpf_prog_put(old_prog);
> +
> +		return 0;
> +	}

This is not great. The netdevice is closed at this stage, why is the xdp
setup try to do work if the device is closed even when not
unregistering?
Jacob Keller May 30, 2024, 4:38 p.m. UTC | #2
On 5/29/2024 6:54 PM, Jakub Kicinski wrote:
> On Tue, 28 May 2024 15:06:07 -0700 Jacob Keller wrote:
>> +	/* Called from netdev unregister context. Unload the XDP program. */
>> +	if (vsi->netdev->reg_state == NETREG_UNREGISTERING) {
>> +		xdp_features_clear_redirect_target(vsi->netdev);
>> +		old_prog = xchg(&vsi->xdp_prog, NULL);
>> +		if (old_prog)
>> +			bpf_prog_put(old_prog);
>> +
>> +		return 0;
>> +	}
> 
> This is not great. The netdevice is closed at this stage, why is the xdp
> setup try to do work if the device is closed even when not
> unregistering?

The comment makes this seem like its happening during unregistration. It
looks like i40e_xdp_setup() is only called from i40e_xdp(), which is if
xdp->command is XDP_SETUP_PROG

From the looks of things, ndo_bpf is called both for setup and teardown?

>    7 >-------/* Set or clear a bpf program used in the earliest stages of packet
>    6 >------- * rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The callee
>    5 >------- * is responsible for calling bpf_prog_put on any old progs that are
>    4 >------- * stored. In case of error, the callee need not release the new prog
>    3 >------- * reference, but on success it takes ownership and must bpf_prog_put
>    2 >------- * when it is no longer used.
>    1 >------- */

Indeed, dev_xdp_uninstall calls dev_xdp_install in a loop to remove
programs.

As far as I can tell, it looks like the .ndo_bpf call is made with a
program set to NULL during uninstall:

>    30 static void dev_xdp_uninstall(struct net_device *dev)
>    29 {
>    28 >-------struct bpf_xdp_link *link;
>    27 >-------struct bpf_prog *prog;
>    26 >-------enum bpf_xdp_mode mode;
>    25 >-------bpf_op_t bpf_op;
>    24
>    23 >-------ASSERT_RTNL();
>    22
>    21 >-------for (mode = XDP_MODE_SKB; mode < __MAX_XDP_MODE; mode++) {
>    20 >------->-------prog = dev_xdp_prog(dev, mode);
>    19 >------->-------if (!prog)
>    18 >------->------->-------continue;
>    17
>    16 >------->-------bpf_op = dev_xdp_bpf_op(dev, mode);
>    15 >------->-------if (!bpf_op)
>    14 >------->------->-------continue;
>    13
>    12 >------->-------WARN_ON(dev_xdp_install(dev, mode, bpf_op, NULL, 0, NULL));
>    11

Here, dev_xdp_install is called with a prog of NULL

>    10 >------->-------/* auto-detach link from net device */
>     9 >------->-------link = dev_xdp_link(dev, mode);
>     8 >------->-------if (link)
>     7 >------->------->-------link->dev = NULL;
>     6 >------->-------else
>     5 >------->------->-------bpf_prog_put(prog);
>     4
>     3 >------->-------dev_xdp_set_link(dev, mode, NULL);
>     2 >-------}
>     1 }

I think the semantics are confusing here.

Basically, the issue is this function needs to remove the XDP properly
when called by the netdev unregister flow but has a check against adding
a new program if its called during remove...

I think this is confusing and could be improved by refactoring how the
i40e flow works. If the passed-in prog is NULL, its a request to remove.
If its otherwise, its a request to add a new program (possibly replacing
an existing one?).

I think we ought to just be checking NULL and not needing to bother with
the netdev_unregister state at all here?

Hopefully Michal can chime in with a better understanding.
Michal Kubiak June 5, 2024, 3 p.m. UTC | #3
On Thu, May 30, 2024 at 09:38:04AM -0700, Jacob Keller wrote:
> 
> 
> On 5/29/2024 6:54 PM, Jakub Kicinski wrote:
> > On Tue, 28 May 2024 15:06:07 -0700 Jacob Keller wrote:
> >> +	/* Called from netdev unregister context. Unload the XDP program. */
> >> +	if (vsi->netdev->reg_state == NETREG_UNREGISTERING) {
> >> +		xdp_features_clear_redirect_target(vsi->netdev);
> >> +		old_prog = xchg(&vsi->xdp_prog, NULL);
> >> +		if (old_prog)
> >> +			bpf_prog_put(old_prog);
> >> +
> >> +		return 0;
> >> +	}
> > 
> > This is not great. The netdevice is closed at this stage, why is the xdp
> > setup try to do work if the device is closed even when not
> > unregistering?
> 
> The comment makes this seem like its happening during unregistration. It
> looks like i40e_xdp_setup() is only called from i40e_xdp(), which is if
> xdp->command is XDP_SETUP_PROG
> 
> From the looks of things, ndo_bpf is called both for setup and teardown?

Exactly, ndo_bpf with the command XDP_SETUP_PROG can be called for both
loading and unloading the XDP program. When the XDP program is unloaded,
the callback is simply called with NULL as the pointer to the program.
Also, unloading the XDP program can be initiated not only from the user
space directly, but also from the kernel core.

In this specific case we are handling that last case. Calling ndo_bpf
when reg_state == NETREG_UNREGISTERING is the case when unloading the
XDP program is an immanent part of the netdevice unregistering process.

In such a case we have to unconditionally decrease the reference counter for
the XDP program using bpf_prog_put() call and exit with no error to
assure the consistency between BPF core code and our driver.

> 
> >    7 >-------/* Set or clear a bpf program used in the earliest stages of packet
> >    6 >------- * rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The callee
> >    5 >------- * is responsible for calling bpf_prog_put on any old progs that are
> >    4 >------- * stored. In case of error, the callee need not release the new prog
> >    3 >------- * reference, but on success it takes ownership and must bpf_prog_put
> >    2 >------- * when it is no longer used.
> >    1 >------- */
> 
> Indeed, dev_xdp_uninstall calls dev_xdp_install in a loop to remove
> programs.
> 
> As far as I can tell, it looks like the .ndo_bpf call is made with a
> program set to NULL during uninstall:
> 
> >    30 static void dev_xdp_uninstall(struct net_device *dev)
> >    29 {
> >    28 >-------struct bpf_xdp_link *link;
> >    27 >-------struct bpf_prog *prog;
> >    26 >-------enum bpf_xdp_mode mode;
> >    25 >-------bpf_op_t bpf_op;
> >    24
> >    23 >-------ASSERT_RTNL();
> >    22
> >    21 >-------for (mode = XDP_MODE_SKB; mode < __MAX_XDP_MODE; mode++) {
> >    20 >------->-------prog = dev_xdp_prog(dev, mode);
> >    19 >------->-------if (!prog)
> >    18 >------->------->-------continue;
> >    17
> >    16 >------->-------bpf_op = dev_xdp_bpf_op(dev, mode);
> >    15 >------->-------if (!bpf_op)
> >    14 >------->------->-------continue;
> >    13
> >    12 >------->-------WARN_ON(dev_xdp_install(dev, mode, bpf_op, NULL, 0, NULL));
> >    11
> 
> Here, dev_xdp_install is called with a prog of NULL
> 
> >    10 >------->-------/* auto-detach link from net device */
> >     9 >------->-------link = dev_xdp_link(dev, mode);
> >     8 >------->-------if (link)
> >     7 >------->------->-------link->dev = NULL;
> >     6 >------->-------else
> >     5 >------->------->-------bpf_prog_put(prog);
> >     4
> >     3 >------->-------dev_xdp_set_link(dev, mode, NULL);
> >     2 >-------}
> >     1 }
> 

I confirm. The current design of netdevice unregistering algorithm
includes checking (in a loop) if the netdevice has any XDP program
attached and forces unloading that program because it won't be used
anymore on that device.

> I think the semantics are confusing here.
> 
> Basically, the issue is this function needs to remove the XDP properly
> when called by the netdev unregister flow but has a check against adding
> a new program if its called during remove...

The check for __I40E_IN_REMOVE has been introduced by the commit
6533e558c650 ("i40e: Fix reset path while removing the driver").
Similar checks have been added in other callbacks. I believe the
intention was to fix some synchronization issues by exiting from callbacks
or reset immediately if the driver is being removed.
Unfortunately, although it could work for other callbacks, we cannot do that
in ndo_bpf because we need to leave kernel structures and ref counters
consistent.
I decided to keep the check for IN_REMOVE because I believe it covers
the case when NETREG_UNREGISTERING flag is not set yet but we started to
destroy our internal data structures.

> 
> I think this is confusing and could be improved by refactoring how the
> i40e flow works. If the passed-in prog is NULL, its a request to remove.
> If its otherwise, its a request to add a new program (possibly replacing
> an existing one?).
> 
> I think we ought to just be checking NULL and not needing to bother with
> the netdev_unregister state at all here?

I am afraid checking for NULL won't be enough here.
Normally, when ndo_bpf is called from the user space application, that
callback can be called with NULL too (when the user just wants to unload
the XDP program). In such a case, apart from calling bpf_prog_put(), we
have to rebuild our internal data structures (queues, pointers, counters
etc.) to restore the i40e driver working back in normal mode (with no
XDP program).
My intention of adding the check for NETREG_REGISTERING was to implement
a quick handler for unloading the XDP program from the netdev
unregistering context only, when our internal data structures are
already destroyed but we need to leave kernel's ref counters in a
consistent state.

> 
> Hopefully Michal can chime in with a better understanding.

Thanks,
Michal
Jakub Kicinski June 5, 2024, 7:29 p.m. UTC | #4
On Wed, 5 Jun 2024 17:00:02 +0200 Michal Kubiak wrote:
> I am afraid checking for NULL won't be enough here.
> Normally, when ndo_bpf is called from the user space application, that
> callback can be called with NULL too (when the user just wants to unload
> the XDP program). In such a case, apart from calling bpf_prog_put(), we
> have to rebuild our internal data structures (queues, pointers, counters
> etc.) to restore the i40e driver working back in normal mode (with no
> XDP program).

Apologizes for asking a question which can be answered by studying 
the code longer, but why do you need to rebuild internal data
structures for a device which is *down*. Unregistering or not.
Michal Kubiak June 6, 2024, 10:02 a.m. UTC | #5
On Wed, Jun 05, 2024 at 12:29:57PM -0700, Jakub Kicinski wrote:
> On Wed, 5 Jun 2024 17:00:02 +0200 Michal Kubiak wrote:
> > I am afraid checking for NULL won't be enough here.
> > Normally, when ndo_bpf is called from the user space application, that
> > callback can be called with NULL too (when the user just wants to unload
> > the XDP program). In such a case, apart from calling bpf_prog_put(), we
> > have to rebuild our internal data structures (queues, pointers, counters
> > etc.) to restore the i40e driver working back in normal mode (with no
> > XDP program).
> 
> Apologizes for asking a question which can be answered by studying 
> the code longer, but why do you need to rebuild internal data
> structures for a device which is *down*. Unregistering or not.

Excuse me, but I don't understand why we should assume that a device is
*down* when that callback is being called?
Maybe I didn't make it clear, but the ndo_bpf can be called every time
when the userspace application wants to load or unload the XDP program.
It can happen when a device is *up* and also when the link is *up*.

Michal
Jakub Kicinski June 6, 2024, 1:43 p.m. UTC | #6
On Thu, 6 Jun 2024 12:02:27 +0200 Michal Kubiak wrote:
> > Apologizes for asking a question which can be answered by studying 
> > the code longer, but why do you need to rebuild internal data
> > structures for a device which is *down*. Unregistering or not.  
> 
> Excuse me, but I don't understand why we should assume that a device is
> *down* when that callback is being called?
> Maybe I didn't make it clear, but the ndo_bpf can be called every time
> when the userspace application wants to load or unload the XDP program.
> It can happen when a device is *up* and also when the link is *up*.

The patch was adding a special case for NETREG_UNREGISTERING,
at that point the device will be closed. Calling ndo_close is one
of the first things core does during unregistering.
Simplifying the handling for when the device is closed would be
better.
Michal Kubiak June 11, 2024, 6:52 p.m. UTC | #7
On Thu, Jun 06, 2024 at 06:43:28AM -0700, Jakub Kicinski wrote:
> On Thu, 6 Jun 2024 12:02:27 +0200 Michal Kubiak wrote:
> > > Apologizes for asking a question which can be answered by studying 
> > > the code longer, but why do you need to rebuild internal data
> > > structures for a device which is *down*. Unregistering or not.  
> > 
> > Excuse me, but I don't understand why we should assume that a device is
> > *down* when that callback is being called?
> > Maybe I didn't make it clear, but the ndo_bpf can be called every time
> > when the userspace application wants to load or unload the XDP program.
> > It can happen when a device is *up* and also when the link is *up*.
> 
> The patch was adding a special case for NETREG_UNREGISTERING,
> at that point the device will be closed. Calling ndo_close is one
> of the first things core does during unregistering.
> Simplifying the handling for when the device is closed would be
> better.

I think I'm getting your point but moving the code for
NETREG_UNREGISTERING to ndo_stop wouldn't be enough and it seems to be
against the current design of 'unregister_netdevice_many_notify()'.

In 'unregister_netdevice_many_notify()' we have the call to
'dev_close_many()' (which calls ndo_stop on i40e driver) and then
'dev_xdp_uninstall()' is called (where there is the call to ndo_bpf).

'dev_xdp_uninstall()' seems to be the right function where all
activities related to XDP program unloading during unregistering are
expected from the driver.

Anyway, I analyzed that code one more time and I agree that the special
case for NETREG_UNREGISTERING makes the code more complex and I can
implement it in a simpler way.
The root cause of the problem I'm trying to fix is that __I40E_IN_REMOVE
flag is handled in a wrong way.

I will post the v2 then.


Thanks,
Michal
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 284c3fad5a6e..2f478edb9f9f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -13293,6 +13293,20 @@  static int i40e_xdp_setup(struct i40e_vsi *vsi, struct bpf_prog *prog,
 	bool need_reset;
 	int i;
 
+	/* Called from netdev unregister context. Unload the XDP program. */
+	if (vsi->netdev->reg_state == NETREG_UNREGISTERING) {
+		xdp_features_clear_redirect_target(vsi->netdev);
+		old_prog = xchg(&vsi->xdp_prog, NULL);
+		if (old_prog)
+			bpf_prog_put(old_prog);
+
+		return 0;
+	}
+
+	/* VSI shall be deleted in a moment, just return EINVAL */
+	if (test_bit(__I40E_IN_REMOVE, pf->state))
+		return -EINVAL;
+
 	/* Don't allow frames that span over multiple buffers */
 	if (vsi->netdev->mtu > frame_size - I40E_PACKET_HDR_PAD) {
 		NL_SET_ERR_MSG_MOD(extack, "MTU too large for linear frames and XDP prog does not support frags");
@@ -13301,14 +13315,9 @@  static int i40e_xdp_setup(struct i40e_vsi *vsi, struct bpf_prog *prog,
 
 	/* When turning XDP on->off/off->on we reset and rebuild the rings. */
 	need_reset = (i40e_enabled_xdp_vsi(vsi) != !!prog);
-
 	if (need_reset)
 		i40e_prep_for_reset(pf);
 
-	/* VSI shall be deleted in a moment, just return EINVAL */
-	if (test_bit(__I40E_IN_REMOVE, pf->state))
-		return -EINVAL;
-
 	old_prog = xchg(&vsi->xdp_prog, prog);
 
 	if (need_reset) {