diff mbox series

[RFC,v2,net-next,16/16] net: bridge: switchdev: let drivers inform which bridge ports are offloaded

Message ID 20210318231829.3892920-17-olteanv@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Better support for sandwiched LAGs with bridge and DSA | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count fail Series longer than 15 patches
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 11 maintainers not CCed: brouer@redhat.com claudiu.manoil@nxp.com lorenzo@kernel.org jiri@nvidia.com m-karicheri2@ti.com ast@kernel.org idosch@nvidia.com yanaijie@huawei.com bridge@lists.linux-foundation.org peter.ujfalusi@ti.com jesse.brandeburg@intel.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit fail Errors and warnings before: 61 this patch: 69
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 353 lines checked
netdev/build_allmodconfig_warn fail Errors and warnings before: 59 this patch: 69
netdev/header_inline success Link

Commit Message

Vladimir Oltean March 18, 2021, 11:18 p.m. UTC
From: Vladimir Oltean <vladimir.oltean@nxp.com>

On reception of an skb, the bridge checks if it was marked as 'already
forwarded in hardware' (checks if skb->offload_fwd_mark == 1), and if it
is, it puts a mark of its own on that skb, with the switchdev mark of
the ingress port. Then during forwarding, it enforces that the egress
port must have a different switchdev mark than the ingress one (this is
done in nbp_switchdev_allowed_egress).

Non-switchdev drivers don't report any physical switch id (neither
through devlink nor .ndo_get_port_parent_id), therefore the bridge
assigns them a switchdev mark of 0, and packets coming from them will
always have skb->offload_fwd_mark = 0. So there aren't any restrictions.

Problems appear due to the fact that DSA would like to perform software
fallback for bonding and team interfaces that the physical switch cannot
offload.

         +-- br0 -+
        /   / |    \
       /   /  |     \
      /   /   |      \
     /   /    |       \
    /   /     |        \
   /    |     |       bond0
  /     |     |      /    \
 swp0  swp1  swp2  swp3  swp4

There, it is desirable that the presence of swp3 and swp4 under a
non-offloaded LAG does not preclude us from doing hardware bridging
beteen swp0, swp1 and swp2. The bandwidth of the CPU is often times high
enough that software bridging between {swp0,swp1,swp2} and bond0 is not
impractical.

But this creates an impossible paradox given the current way in which
port switchdev marks are assigned. When the driver receives a packet
from swp0 (say, due to flooding), it must set skb->offload_fwd_mark to
something.

- If we set it to 0, then the bridge will forward it towards swp1, swp2
  and bond0. But the switch has already forwarded it towards swp1 and
  swp2 (not to bond0, remember, that isn't offloaded, so as far as the
  switch is concerned, ports swp3 and swp4 are not looking up the FDB,
  and the entire bond0 is a destination that is strictly behind the
  CPU). But we don't want duplicated traffic towards swp1 and swp2, so
  it's not ok to set skb->offload_fwd_mark = 0.

- If we set it to 1, then the bridge will not forward the skb towards
  the ports with the same switchdev mark, i.e. not to swp1, swp2 and
  bond0. Towards swp1 and swp2 that's ok, but towards bond0? It should
  have forwarded the skb there.

So the real issue is that bond0 will be assigned the same switchdev mark
as {swp0,swp1,swp2}, because the function that assigns switchdev marks
to bridge ports, nbp_switchdev_mark_set, recurses through bond0's lower
interfaces until it finds something that implements devlink.

A solution is to give the bridge explicit hints as to what switchdev
mark it should use for each port.

Currently, the bridging offload is very 'silent': a driver registers a
netdevice notifier, which is put on the netns's notifier chain, and
which sniffs around for NETDEV_CHANGEUPPER events where the upper is a
bridge, and the lower is an interface it knows about (one registered by
this driver, normally). Then, from within that notifier, it does a bunch
of stuff behind the bridge's back, without the bridge necessarily
knowing that there's somebody offloading that port. It looks like this:

     ip link set swp0 master br0
                  |
                  v
   bridge calls netdev_master_upper_dev_link
                  |
                  v
        call_netdevice_notifiers
                  |
                  v
       dsa_slave_netdevice_event
                  |
                  v
        oh, hey! it's for me!
                  |
                  v
           .port_bridge_join

What we do to solve the conundrum is to be less silent, and emit a
notification back. Something like this:

     ip link set swp0 master br0
                  |
                  v
   bridge calls netdev_master_upper_dev_link
                  |
                  v                    bridge: Aye! I'll use this
        call_netdevice_notifiers           ^  ppid as the
                  |                        |  switchdev mark for
                  v                        |  this port, and zero
       dsa_slave_netdevice_event           |  if I got nothing.
                  |                        |
                  v                        |
        oh, hey! it's for me!              |
                  |                        |
                  v                        |
           .port_bridge_join               |
                  |                        |
                  +------------------------+
             switchdev_bridge_port_offload(swp0)

Then stacked interfaces (like bond0 on top of swp3/swp4) would be
treated differently in DSA, depending on whether we can or cannot
offload them.

The offload case:

    ip link set bond0 master br0
                  |
                  v
   bridge calls netdev_master_upper_dev_link
                  |
                  v                    bridge: Aye! I'll use this
        call_netdevice_notifiers           ^  ppid as the
                  |                        |  switchdev mark for
                  v                        |        bond0.
       dsa_slave_netdevice_event           | Coincidentally (or not),
                  |                        | bond0 and swp0, swp1, swp2
                  v                        | all have the same switchdev
        hmm, it's not quite for me,        | mark now, since the ASIC
         but my driver has already         | is able to forward towards
           called .port_lag_join           | all these ports in hw.
          for it, because I have           |
      a port with dp->lag_dev == bond0.    |
                  |                        |
                  v                        |
           .port_bridge_join               |
           for swp3 and swp4               |
                  |                        |
                  +------------------------+
            switchdev_bridge_port_offload(bond0)

And the non-offload case:

    ip link set bond0 master br0
                  |
                  v
   bridge calls netdev_master_upper_dev_link
                  |
                  v                    bridge waiting:
        call_netdevice_notifiers           ^  huh, switchdev_bridge_port_offload
                  |                        |  wasn't called, okay, I'll use a
                  v                        |  switchdev mark of zero for this one.
       dsa_slave_netdevice_event           :  Then packets received on swp0 will
                  |                        :  not be forwarded towards swp1, but
                  v                        :  they will towards bond0.
         it's not for me, but
       bond0 is an upper of swp3
      and swp4, but their dp->lag_dev
       is NULL because they couldn't
            offload it.

Basically we can draw the conclusion that the lowers of a bridge port
can come and go, so depending on the configuration of lowers for a
bridge port, it can dynamically toggle between offloaded and unoffloaded.
Therefore, we need an equivalent switchdev_bridge_port_unoffload too.

This patch changes the way any switchdev driver interacts with the
bridge. From now on, everybody needs to call switchdev_bridge_port_offload,
otherwise the bridge will treat the port as non-offloaded and allow
software flooding to other ports from the same ASIC.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 .../ethernet/freescale/dpaa2/dpaa2-switch.c   |  4 +-
 .../marvell/prestera/prestera_switchdev.c     |  7 ++
 .../mellanox/mlxsw/spectrum_switchdev.c       |  4 +-
 drivers/net/ethernet/mscc/ocelot_net.c        |  4 +-
 drivers/net/ethernet/rocker/rocker_ofdpa.c    |  8 +-
 drivers/net/ethernet/ti/am65-cpsw-nuss.c      |  7 +-
 drivers/net/ethernet/ti/cpsw_new.c            |  6 +-
 include/linux/if_bridge.h                     | 16 ++++
 net/bridge/br_if.c                            | 11 +--
 net/bridge/br_private.h                       |  8 +-
 net/bridge/br_switchdev.c                     | 94 ++++++++++++++++---
 11 files changed, 138 insertions(+), 31 deletions(-)

Comments

Tobias Waldekranz March 22, 2021, 4:30 p.m. UTC | #1
On Fri, Mar 19, 2021 at 01:18, Vladimir Oltean <olteanv@gmail.com> wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> On reception of an skb, the bridge checks if it was marked as 'already
> forwarded in hardware' (checks if skb->offload_fwd_mark == 1), and if it
> is, it puts a mark of its own on that skb, with the switchdev mark of
> the ingress port. Then during forwarding, it enforces that the egress
> port must have a different switchdev mark than the ingress one (this is
> done in nbp_switchdev_allowed_egress).
>
> Non-switchdev drivers don't report any physical switch id (neither
> through devlink nor .ndo_get_port_parent_id), therefore the bridge
> assigns them a switchdev mark of 0, and packets coming from them will
> always have skb->offload_fwd_mark = 0. So there aren't any restrictions.
>
> Problems appear due to the fact that DSA would like to perform software
> fallback for bonding and team interfaces that the physical switch cannot
> offload.
>
>          +-- br0 -+
>         /   / |    \
>        /   /  |     \
>       /   /   |      \
>      /   /    |       \
>     /   /     |        \
>    /    |     |       bond0
>   /     |     |      /    \
>  swp0  swp1  swp2  swp3  swp4
>
> There, it is desirable that the presence of swp3 and swp4 under a
> non-offloaded LAG does not preclude us from doing hardware bridging
> beteen swp0, swp1 and swp2. The bandwidth of the CPU is often times high
> enough that software bridging between {swp0,swp1,swp2} and bond0 is not
> impractical.
>
> But this creates an impossible paradox given the current way in which
> port switchdev marks are assigned. When the driver receives a packet
> from swp0 (say, due to flooding), it must set skb->offload_fwd_mark to
> something.
>
> - If we set it to 0, then the bridge will forward it towards swp1, swp2
>   and bond0. But the switch has already forwarded it towards swp1 and
>   swp2 (not to bond0, remember, that isn't offloaded, so as far as the
>   switch is concerned, ports swp3 and swp4 are not looking up the FDB,
>   and the entire bond0 is a destination that is strictly behind the
>   CPU). But we don't want duplicated traffic towards swp1 and swp2, so
>   it's not ok to set skb->offload_fwd_mark = 0.
>
> - If we set it to 1, then the bridge will not forward the skb towards
>   the ports with the same switchdev mark, i.e. not to swp1, swp2 and
>   bond0. Towards swp1 and swp2 that's ok, but towards bond0? It should
>   have forwarded the skb there.
>
> So the real issue is that bond0 will be assigned the same switchdev mark
> as {swp0,swp1,swp2}, because the function that assigns switchdev marks
> to bridge ports, nbp_switchdev_mark_set, recurses through bond0's lower
> interfaces until it finds something that implements devlink.
>
> A solution is to give the bridge explicit hints as to what switchdev
> mark it should use for each port.
>
> Currently, the bridging offload is very 'silent': a driver registers a
> netdevice notifier, which is put on the netns's notifier chain, and
> which sniffs around for NETDEV_CHANGEUPPER events where the upper is a
> bridge, and the lower is an interface it knows about (one registered by
> this driver, normally). Then, from within that notifier, it does a bunch
> of stuff behind the bridge's back, without the bridge necessarily
> knowing that there's somebody offloading that port. It looks like this:
>
>      ip link set swp0 master br0
>                   |
>                   v
>    bridge calls netdev_master_upper_dev_link
>                   |
>                   v
>         call_netdevice_notifiers
>                   |
>                   v
>        dsa_slave_netdevice_event
>                   |
>                   v
>         oh, hey! it's for me!
>                   |
>                   v
>            .port_bridge_join
>
> What we do to solve the conundrum is to be less silent, and emit a
> notification back. Something like this:
>
>      ip link set swp0 master br0
>                   |
>                   v
>    bridge calls netdev_master_upper_dev_link
>                   |
>                   v                    bridge: Aye! I'll use this
>         call_netdevice_notifiers           ^  ppid as the
>                   |                        |  switchdev mark for
>                   v                        |  this port, and zero
>        dsa_slave_netdevice_event           |  if I got nothing.
>                   |                        |
>                   v                        |
>         oh, hey! it's for me!              |
>                   |                        |
>                   v                        |
>            .port_bridge_join               |
>                   |                        |
>                   +------------------------+
>              switchdev_bridge_port_offload(swp0)
>
> Then stacked interfaces (like bond0 on top of swp3/swp4) would be
> treated differently in DSA, depending on whether we can or cannot
> offload them.
>
> The offload case:
>
>     ip link set bond0 master br0
>                   |
>                   v
>    bridge calls netdev_master_upper_dev_link
>                   |
>                   v                    bridge: Aye! I'll use this
>         call_netdevice_notifiers           ^  ppid as the
>                   |                        |  switchdev mark for
>                   v                        |        bond0.
>        dsa_slave_netdevice_event           | Coincidentally (or not),
>                   |                        | bond0 and swp0, swp1, swp2
>                   v                        | all have the same switchdev
>         hmm, it's not quite for me,        | mark now, since the ASIC
>          but my driver has already         | is able to forward towards
>            called .port_lag_join           | all these ports in hw.
>           for it, because I have           |
>       a port with dp->lag_dev == bond0.    |
>                   |                        |
>                   v                        |
>            .port_bridge_join               |
>            for swp3 and swp4               |
>                   |                        |
>                   +------------------------+
>             switchdev_bridge_port_offload(bond0)
>
> And the non-offload case:
>
>     ip link set bond0 master br0
>                   |
>                   v
>    bridge calls netdev_master_upper_dev_link
>                   |
>                   v                    bridge waiting:
>         call_netdevice_notifiers           ^  huh, switchdev_bridge_port_offload
>                   |                        |  wasn't called, okay, I'll use a
>                   v                        |  switchdev mark of zero for this one.
>        dsa_slave_netdevice_event           :  Then packets received on swp0 will
>                   |                        :  not be forwarded towards swp1, but
>                   v                        :  they will towards bond0.
>          it's not for me, but
>        bond0 is an upper of swp3
>       and swp4, but their dp->lag_dev
>        is NULL because they couldn't
>             offload it.
>
> Basically we can draw the conclusion that the lowers of a bridge port
> can come and go, so depending on the configuration of lowers for a
> bridge port, it can dynamically toggle between offloaded and unoffloaded.
> Therefore, we need an equivalent switchdev_bridge_port_unoffload too.
>
> This patch changes the way any switchdev driver interacts with the
> bridge. From now on, everybody needs to call switchdev_bridge_port_offload,
> otherwise the bridge will treat the port as non-offloaded and allow
> software flooding to other ports from the same ASIC.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  .../ethernet/freescale/dpaa2/dpaa2-switch.c   |  4 +-
>  .../marvell/prestera/prestera_switchdev.c     |  7 ++
>  .../mellanox/mlxsw/spectrum_switchdev.c       |  4 +-
>  drivers/net/ethernet/mscc/ocelot_net.c        |  4 +-
>  drivers/net/ethernet/rocker/rocker_ofdpa.c    |  8 +-
>  drivers/net/ethernet/ti/am65-cpsw-nuss.c      |  7 +-
>  drivers/net/ethernet/ti/cpsw_new.c            |  6 +-

Why is not net/dsa included in this change?

>  include/linux/if_bridge.h                     | 16 ++++
>  net/bridge/br_if.c                            | 11 +--
>  net/bridge/br_private.h                       |  8 +-
>  net/bridge/br_switchdev.c                     | 94 ++++++++++++++++---
>  11 files changed, 138 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> index 2fd05dd18d46..f20556178e33 100644
> --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> @@ -1518,7 +1518,7 @@ static int dpaa2_switch_port_bridge_join(struct net_device *netdev,
>  	if (err)
>  		goto err_egress_flood;
>  
> -	return 0;
> +	return switchdev_bridge_port_offload(netdev, NULL);
>  
>  err_egress_flood:
>  	dpaa2_switch_port_set_fdb(port_priv, NULL);
> @@ -1552,6 +1552,8 @@ static int dpaa2_switch_port_bridge_leave(struct net_device *netdev)
>  	struct ethsw_core *ethsw = port_priv->ethsw_data;
>  	int err;
>  
> +	switchdev_bridge_port_unoffload(netdev);
> +
>  	/* First of all, fast age any learn FDB addresses on this switch port */
>  	dpaa2_switch_port_fast_age(port_priv);
>  
> diff --git a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
> index 49e052273f30..0b0d5db7b85b 100644
> --- a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
> +++ b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
> @@ -443,6 +443,10 @@ static int prestera_port_bridge_join(struct prestera_port *port,
>  		goto err_brport_create;
>  	}
>  
> +	err = switchdev_bridge_port_offload(port->dev, NULL);
> +	if (err)
> +		goto err_brport_offload;
> +
>  	if (bridge->vlan_enabled)
>  		return 0;
>  
> @@ -453,6 +457,7 @@ static int prestera_port_bridge_join(struct prestera_port *port,
>  	return 0;
>  
>  err_port_join:
> +err_brport_offload:
>  	prestera_bridge_port_put(br_port);
>  err_brport_create:
>  	prestera_bridge_put(bridge);
> @@ -520,6 +525,8 @@ static void prestera_port_bridge_leave(struct prestera_port *port,
>  	if (!br_port)
>  		return;
>  
> +	switchdev_bridge_port_unoffload(port->dev);
> +
>  	bridge = br_port->bridge;
>  
>  	if (bridge->vlan_enabled)
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
> index 23b7e8d6386b..7fa0b3653819 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
> @@ -2326,7 +2326,7 @@ int mlxsw_sp_port_bridge_join(struct mlxsw_sp_port *mlxsw_sp_port,
>  	if (err)
>  		goto err_port_join;
>  
> -	return 0;
> +	return switchdev_bridge_port_offload(brport_dev, extack);
>  
>  err_port_join:
>  	mlxsw_sp_bridge_port_put(mlxsw_sp->bridge, bridge_port);
> @@ -2348,6 +2348,8 @@ void mlxsw_sp_port_bridge_leave(struct mlxsw_sp_port *mlxsw_sp_port,
>  	if (!bridge_port)
>  		return;
>  
> +	switchdev_bridge_port_unoffload(brport_dev);
> +
>  	bridge_device->ops->port_leave(bridge_device, bridge_port,
>  				       mlxsw_sp_port);
>  	mlxsw_sp_bridge_port_put(mlxsw_sp->bridge, bridge_port);
> diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
> index d38ffc7cf5f0..b917d9dd8a6a 100644
> --- a/drivers/net/ethernet/mscc/ocelot_net.c
> +++ b/drivers/net/ethernet/mscc/ocelot_net.c
> @@ -1213,7 +1213,7 @@ static int ocelot_netdevice_bridge_join(struct net_device *dev,
>  	if (err)
>  		goto err_switchdev_sync;
>  
> -	return 0;
> +	return switchdev_bridge_port_offload(brport_dev, extack);
>  
>  err_switchdev_sync:
>  	ocelot_port_bridge_leave(ocelot, port, bridge);
> @@ -1234,6 +1234,8 @@ static int ocelot_netdevice_bridge_leave(struct net_device *dev,
>  	if (err)
>  		return err;
>  
> +	switchdev_bridge_port_unoffload(brport_dev);
> +
>  	ocelot_port_bridge_leave(ocelot, port, bridge);
>  
>  	return 0;
> diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c
> index 967a634ee9ac..9b6d7cac112b 100644
> --- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
> +++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
> @@ -2592,13 +2592,19 @@ static int ofdpa_port_bridge_join(struct ofdpa_port *ofdpa_port,
>  
>  	ofdpa_port->bridge_dev = bridge;
>  
> -	return ofdpa_port_vlan_add(ofdpa_port, OFDPA_UNTAGGED_VID, 0);
> +	err = ofdpa_port_vlan_add(ofdpa_port, OFDPA_UNTAGGED_VID, 0);
> +	if (err)
> +		return err;
> +
> +	return switchdev_bridge_port_offload(ofdpa_port->dev, NULL);
>  }
>  
>  static int ofdpa_port_bridge_leave(struct ofdpa_port *ofdpa_port)
>  {
>  	int err;
>  
> +	switchdev_bridge_port_unoffload(ofdpa_port->dev);
> +
>  	err = ofdpa_port_vlan_del(ofdpa_port, OFDPA_UNTAGGED_VID, 0);
>  	if (err)
>  		return err;
> diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
> index 638d7b03be4b..fe2e38971acc 100644
> --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
> +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
> @@ -7,6 +7,7 @@
>  
>  #include <linux/clk.h>
>  #include <linux/etherdevice.h>
> +#include <linux/if_bridge.h>
>  #include <linux/if_vlan.h>
>  #include <linux/interrupt.h>
>  #include <linux/kernel.h>
> @@ -2082,6 +2083,7 @@ static int am65_cpsw_netdevice_port_link(struct net_device *ndev, struct net_dev
>  {
>  	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
>  	struct am65_cpsw_ndev_priv *priv = am65_ndev_to_priv(ndev);
> +	int err;
>  
>  	if (!common->br_members) {
>  		common->hw_bridge_dev = br_ndev;
> @@ -2097,7 +2099,8 @@ static int am65_cpsw_netdevice_port_link(struct net_device *ndev, struct net_dev
>  
>  	am65_cpsw_port_offload_fwd_mark_update(common);
>  
> -	return NOTIFY_DONE;
> +	err = switchdev_bridge_port_offload(ndev, NULL);
> +	return notifier_to_errno(err);
>  }
>  
>  static void am65_cpsw_netdevice_port_unlink(struct net_device *ndev)
> @@ -2105,6 +2108,8 @@ static void am65_cpsw_netdevice_port_unlink(struct net_device *ndev)
>  	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
>  	struct am65_cpsw_ndev_priv *priv = am65_ndev_to_priv(ndev);
>  
> +	switchdev_bridge_port_unoffload(ndev);
> +
>  	common->br_members &= ~BIT(priv->port->port_id);
>  
>  	am65_cpsw_port_offload_fwd_mark_update(common);
> diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
> index 58a64313ac00..6347532fb39d 100644
> --- a/drivers/net/ethernet/ti/cpsw_new.c
> +++ b/drivers/net/ethernet/ti/cpsw_new.c
> @@ -1508,6 +1508,7 @@ static int cpsw_netdevice_port_link(struct net_device *ndev,
>  {
>  	struct cpsw_priv *priv = netdev_priv(ndev);
>  	struct cpsw_common *cpsw = priv->cpsw;
> +	int err;
>  
>  	if (!cpsw->br_members) {
>  		cpsw->hw_bridge_dev = br_ndev;
> @@ -1523,7 +1524,8 @@ static int cpsw_netdevice_port_link(struct net_device *ndev,
>  
>  	cpsw_port_offload_fwd_mark_update(cpsw);
>  
> -	return NOTIFY_DONE;
> +	err = switchdev_bridge_port_offload(ndev, NULL);
> +	return notifier_to_errno(err);
>  }
>  
>  static void cpsw_netdevice_port_unlink(struct net_device *ndev)
> @@ -1531,6 +1533,8 @@ static void cpsw_netdevice_port_unlink(struct net_device *ndev)
>  	struct cpsw_priv *priv = netdev_priv(ndev);
>  	struct cpsw_common *cpsw = priv->cpsw;
>  
> +	switchdev_bridge_port_unoffload(ndev);
> +
>  	cpsw->br_members &= ~BIT(priv->emac_port);
>  
>  	cpsw_port_offload_fwd_mark_update(cpsw);
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index ea176c508c0d..4fbee6d5fc16 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -196,4 +196,20 @@ static inline int br_fdb_replay(struct net_device *br_dev,
>  }
>  #endif
>  
> +#if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_NET_SWITCHDEV)
> +int switchdev_bridge_port_offload(struct net_device *dev,
> +				  struct netlink_ext_ack *extack);
> +int switchdev_bridge_port_unoffload(struct net_device *dev);
> +#else
> +int switchdev_bridge_port_offload(struct net_device *dev,
> +				  struct netlink_ext_ack *extack)
> +{
> +	return 0;
> +}
> +
> +int switchdev_bridge_port_unoffload(struct net_device *dev)
> +{
> +}
> +#endif
> +
>  #endif
> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> index f7d2f472ae24..930a09f27e0d 100644
> --- a/net/bridge/br_if.c
> +++ b/net/bridge/br_if.c
> @@ -643,10 +643,6 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
>  	if (err)
>  		goto err5;
>  
> -	err = nbp_switchdev_mark_set(p);
> -	if (err)
> -		goto err6;
> -
>  	dev_disable_lro(dev);
>  
>  	list_add_rcu(&p->list, &br->port_list);
> @@ -671,13 +667,13 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
>  		 */
>  		err = dev_pre_changeaddr_notify(br->dev, dev->dev_addr, extack);
>  		if (err)
> -			goto err7;
> +			goto err6;
>  	}
>  
>  	err = nbp_vlan_init(p, extack);
>  	if (err) {
>  		netdev_err(dev, "failed to initialize vlan filtering on this port\n");
> -		goto err7;
> +		goto err6;
>  	}
>  
>  	spin_lock_bh(&br->lock);
> @@ -700,11 +696,10 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
>  
>  	return 0;
>  
> -err7:
> +err6:
>  	list_del_rcu(&p->list);
>  	br_fdb_delete_by_port(br, p, 0, 1);
>  	nbp_update_port_count(br);
> -err6:
>  	netdev_upper_dev_unlink(dev, br->dev);
>  err5:
>  	dev->priv_flags &= ~IFF_BRIDGE_PORT;
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index d7d167e10b70..1982b5887d0f 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -326,8 +326,10 @@ struct net_bridge_port {
>  #ifdef CONFIG_NET_POLL_CONTROLLER
>  	struct netpoll			*np;
>  #endif
> +	int				offload_count;

Should this be conditional on CONFIG_NET_SWITCHDEV?

>  #ifdef CONFIG_NET_SWITCHDEV
>  	int				offload_fwd_mark;
> +	struct netdev_phys_item_id	ppid;
>  #endif
>  	u16				group_fwd_mask;
>  	u16				backup_redirected_cnt;
> @@ -1572,7 +1574,6 @@ static inline void br_sysfs_delbr(struct net_device *dev) { return; }
>  
>  /* br_switchdev.c */
>  #ifdef CONFIG_NET_SWITCHDEV
> -int nbp_switchdev_mark_set(struct net_bridge_port *p);
>  void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
>  			      struct sk_buff *skb);
>  bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
> @@ -1592,11 +1593,6 @@ static inline void br_switchdev_frame_unmark(struct sk_buff *skb)
>  	skb->offload_fwd_mark = 0;
>  }
>  #else
> -static inline int nbp_switchdev_mark_set(struct net_bridge_port *p)
> -{
> -	return 0;
> -}
> -
>  static inline void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
>  					    struct sk_buff *skb)
>  {
> diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> index b89503832fcc..4cf7902f056c 100644
> --- a/net/bridge/br_switchdev.c
> +++ b/net/bridge/br_switchdev.c
> @@ -8,37 +8,109 @@
>  
>  #include "br_private.h"
>  
> -static int br_switchdev_mark_get(struct net_bridge *br, struct net_device *dev)
> +static int br_switchdev_mark_get(struct net_bridge *br,
> +				 struct net_bridge_port *new_nbp)
>  {
>  	struct net_bridge_port *p;
>  
>  	/* dev is yet to be added to the port list. */
>  	list_for_each_entry(p, &br->port_list, list) {
> -		if (netdev_port_same_parent_id(dev, p->dev))
> +		if (!p->offload_count)
> +			continue;
> +
> +		if (netdev_phys_item_id_same(&p->ppid, &new_nbp->ppid))
>  			return p->offload_fwd_mark;
>  	}
>  
>  	return ++br->offload_fwd_mark;
>  }
>  
> -int nbp_switchdev_mark_set(struct net_bridge_port *p)
> +static int nbp_switchdev_mark_set(struct net_bridge_port *p,
> +				  struct netdev_phys_item_id ppid,
> +				  struct netlink_ext_ack *extack)
> +{
> +	if (p->offload_count) {
> +		/* Prevent unsupported configurations such as a bridge port
> +		 * which is a bonding interface, and the member ports are from
> +		 * different hardware switches.
> +		 */
> +		if (!netdev_phys_item_id_same(&p->ppid, &ppid)) {
> +			NL_SET_ERR_MSG_MOD(extack,
> +					   "Same bridge port cannot be offloaded by two physical switches");
> +			return -EBUSY;
> +		}
> +		/* Be tolerant with drivers that call SWITCHDEV_BRPORT_OFFLOADED
> +		 * more than once for the same bridge port, such as when the
> +		 * bridge port is an offloaded bonding/team interface.
> +		 */
> +		p->offload_count++;
> +		return 0;
> +	}
> +
> +	p->ppid = ppid;
> +	p->offload_count = 1;
> +	p->offload_fwd_mark = br_switchdev_mark_get(p->br, p);
> +
> +	return 0;
> +}
> +
> +static void nbp_switchdev_mark_clear(struct net_bridge_port *p,
> +				     struct netdev_phys_item_id ppid)
> +{
> +	if (WARN_ON(!netdev_phys_item_id_same(&p->ppid, &ppid)))
> +		return;
> +	if (WARN_ON(!p->offload_count))
> +		return;
> +
> +	p->offload_count--;
> +	if (p->offload_count)
> +		return;
> +
> +	p->offload_fwd_mark = 0;
> +}
> +
> +/* Let the bridge know that this port is offloaded, so that it can use
> + * the port parent id obtained by recursion to determine the bridge
> + * port's switchdev mark.
> + */
> +int switchdev_bridge_port_offload(struct net_device *dev,
> +				  struct netlink_ext_ack *extack)
>  {
> -	struct netdev_phys_item_id ppid = { };
> +	struct netdev_phys_item_id ppid;
> +	struct net_bridge_port *p;
>  	int err;
>  
> -	ASSERT_RTNL();
> +	p = br_port_get_rtnl(dev);
> +	if (!p)
> +		return -ENODEV;
>  
> -	err = dev_get_port_parent_id(p->dev, &ppid, true);
> -	if (err) {
> -		if (err == -EOPNOTSUPP)
> -			return 0;
> +	err = dev_get_port_parent_id(dev, &ppid, true);
> +	if (err)
> +		return err;
> +
> +	return nbp_switchdev_mark_set(p, ppid, extack);
> +}
> +EXPORT_SYMBOL_GPL(switchdev_bridge_port_offload);
> +
> +int switchdev_bridge_port_unoffload(struct net_device *dev)
> +{
> +	struct netdev_phys_item_id ppid;
> +	struct net_bridge_port *p;
> +	int err;
> +

Should we ASSERT_RTNL here as well?

> +	p = br_port_get_rtnl(dev);
> +	if (!p)
> +		return -ENODEV;
> +
> +	err = dev_get_port_parent_id(dev, &ppid, true);
> +	if (err)
>  		return err;
> -	}
>  
> -	p->offload_fwd_mark = br_switchdev_mark_get(p->br, p->dev);
> +	nbp_switchdev_mark_clear(p, ppid);
>  
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(switchdev_bridge_port_unoffload);
>  
>  void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
>  			      struct sk_buff *skb)
> -- 
> 2.25.1
Vladimir Oltean March 22, 2021, 5:19 p.m. UTC | #2
On Mon, Mar 22, 2021 at 05:30:52PM +0100, Tobias Waldekranz wrote:
> > ---
> >  .../ethernet/freescale/dpaa2/dpaa2-switch.c   |  4 +-
> >  .../marvell/prestera/prestera_switchdev.c     |  7 ++
> >  .../mellanox/mlxsw/spectrum_switchdev.c       |  4 +-
> >  drivers/net/ethernet/mscc/ocelot_net.c        |  4 +-
> >  drivers/net/ethernet/rocker/rocker_ofdpa.c    |  8 +-
> >  drivers/net/ethernet/ti/am65-cpsw-nuss.c      |  7 +-
> >  drivers/net/ethernet/ti/cpsw_new.c            |  6 +-
> 
> Why is not net/dsa included in this change?

I don't know, must have went shopping somewhere?
I'll make sure DSA is included in this change when I resend.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
index 2fd05dd18d46..f20556178e33 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
@@ -1518,7 +1518,7 @@  static int dpaa2_switch_port_bridge_join(struct net_device *netdev,
 	if (err)
 		goto err_egress_flood;
 
-	return 0;
+	return switchdev_bridge_port_offload(netdev, NULL);
 
 err_egress_flood:
 	dpaa2_switch_port_set_fdb(port_priv, NULL);
@@ -1552,6 +1552,8 @@  static int dpaa2_switch_port_bridge_leave(struct net_device *netdev)
 	struct ethsw_core *ethsw = port_priv->ethsw_data;
 	int err;
 
+	switchdev_bridge_port_unoffload(netdev);
+
 	/* First of all, fast age any learn FDB addresses on this switch port */
 	dpaa2_switch_port_fast_age(port_priv);
 
diff --git a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
index 49e052273f30..0b0d5db7b85b 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
@@ -443,6 +443,10 @@  static int prestera_port_bridge_join(struct prestera_port *port,
 		goto err_brport_create;
 	}
 
+	err = switchdev_bridge_port_offload(port->dev, NULL);
+	if (err)
+		goto err_brport_offload;
+
 	if (bridge->vlan_enabled)
 		return 0;
 
@@ -453,6 +457,7 @@  static int prestera_port_bridge_join(struct prestera_port *port,
 	return 0;
 
 err_port_join:
+err_brport_offload:
 	prestera_bridge_port_put(br_port);
 err_brport_create:
 	prestera_bridge_put(bridge);
@@ -520,6 +525,8 @@  static void prestera_port_bridge_leave(struct prestera_port *port,
 	if (!br_port)
 		return;
 
+	switchdev_bridge_port_unoffload(port->dev);
+
 	bridge = br_port->bridge;
 
 	if (bridge->vlan_enabled)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 23b7e8d6386b..7fa0b3653819 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -2326,7 +2326,7 @@  int mlxsw_sp_port_bridge_join(struct mlxsw_sp_port *mlxsw_sp_port,
 	if (err)
 		goto err_port_join;
 
-	return 0;
+	return switchdev_bridge_port_offload(brport_dev, extack);
 
 err_port_join:
 	mlxsw_sp_bridge_port_put(mlxsw_sp->bridge, bridge_port);
@@ -2348,6 +2348,8 @@  void mlxsw_sp_port_bridge_leave(struct mlxsw_sp_port *mlxsw_sp_port,
 	if (!bridge_port)
 		return;
 
+	switchdev_bridge_port_unoffload(brport_dev);
+
 	bridge_device->ops->port_leave(bridge_device, bridge_port,
 				       mlxsw_sp_port);
 	mlxsw_sp_bridge_port_put(mlxsw_sp->bridge, bridge_port);
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
index d38ffc7cf5f0..b917d9dd8a6a 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1213,7 +1213,7 @@  static int ocelot_netdevice_bridge_join(struct net_device *dev,
 	if (err)
 		goto err_switchdev_sync;
 
-	return 0;
+	return switchdev_bridge_port_offload(brport_dev, extack);
 
 err_switchdev_sync:
 	ocelot_port_bridge_leave(ocelot, port, bridge);
@@ -1234,6 +1234,8 @@  static int ocelot_netdevice_bridge_leave(struct net_device *dev,
 	if (err)
 		return err;
 
+	switchdev_bridge_port_unoffload(brport_dev);
+
 	ocelot_port_bridge_leave(ocelot, port, bridge);
 
 	return 0;
diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c
index 967a634ee9ac..9b6d7cac112b 100644
--- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
+++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
@@ -2592,13 +2592,19 @@  static int ofdpa_port_bridge_join(struct ofdpa_port *ofdpa_port,
 
 	ofdpa_port->bridge_dev = bridge;
 
-	return ofdpa_port_vlan_add(ofdpa_port, OFDPA_UNTAGGED_VID, 0);
+	err = ofdpa_port_vlan_add(ofdpa_port, OFDPA_UNTAGGED_VID, 0);
+	if (err)
+		return err;
+
+	return switchdev_bridge_port_offload(ofdpa_port->dev, NULL);
 }
 
 static int ofdpa_port_bridge_leave(struct ofdpa_port *ofdpa_port)
 {
 	int err;
 
+	switchdev_bridge_port_unoffload(ofdpa_port->dev);
+
 	err = ofdpa_port_vlan_del(ofdpa_port, OFDPA_UNTAGGED_VID, 0);
 	if (err)
 		return err;
diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
index 638d7b03be4b..fe2e38971acc 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
@@ -7,6 +7,7 @@ 
 
 #include <linux/clk.h>
 #include <linux/etherdevice.h>
+#include <linux/if_bridge.h>
 #include <linux/if_vlan.h>
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
@@ -2082,6 +2083,7 @@  static int am65_cpsw_netdevice_port_link(struct net_device *ndev, struct net_dev
 {
 	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
 	struct am65_cpsw_ndev_priv *priv = am65_ndev_to_priv(ndev);
+	int err;
 
 	if (!common->br_members) {
 		common->hw_bridge_dev = br_ndev;
@@ -2097,7 +2099,8 @@  static int am65_cpsw_netdevice_port_link(struct net_device *ndev, struct net_dev
 
 	am65_cpsw_port_offload_fwd_mark_update(common);
 
-	return NOTIFY_DONE;
+	err = switchdev_bridge_port_offload(ndev, NULL);
+	return notifier_to_errno(err);
 }
 
 static void am65_cpsw_netdevice_port_unlink(struct net_device *ndev)
@@ -2105,6 +2108,8 @@  static void am65_cpsw_netdevice_port_unlink(struct net_device *ndev)
 	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
 	struct am65_cpsw_ndev_priv *priv = am65_ndev_to_priv(ndev);
 
+	switchdev_bridge_port_unoffload(ndev);
+
 	common->br_members &= ~BIT(priv->port->port_id);
 
 	am65_cpsw_port_offload_fwd_mark_update(common);
diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index 58a64313ac00..6347532fb39d 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -1508,6 +1508,7 @@  static int cpsw_netdevice_port_link(struct net_device *ndev,
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
+	int err;
 
 	if (!cpsw->br_members) {
 		cpsw->hw_bridge_dev = br_ndev;
@@ -1523,7 +1524,8 @@  static int cpsw_netdevice_port_link(struct net_device *ndev,
 
 	cpsw_port_offload_fwd_mark_update(cpsw);
 
-	return NOTIFY_DONE;
+	err = switchdev_bridge_port_offload(ndev, NULL);
+	return notifier_to_errno(err);
 }
 
 static void cpsw_netdevice_port_unlink(struct net_device *ndev)
@@ -1531,6 +1533,8 @@  static void cpsw_netdevice_port_unlink(struct net_device *ndev)
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
 
+	switchdev_bridge_port_unoffload(ndev);
+
 	cpsw->br_members &= ~BIT(priv->emac_port);
 
 	cpsw_port_offload_fwd_mark_update(cpsw);
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index ea176c508c0d..4fbee6d5fc16 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -196,4 +196,20 @@  static inline int br_fdb_replay(struct net_device *br_dev,
 }
 #endif
 
+#if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_NET_SWITCHDEV)
+int switchdev_bridge_port_offload(struct net_device *dev,
+				  struct netlink_ext_ack *extack);
+int switchdev_bridge_port_unoffload(struct net_device *dev);
+#else
+int switchdev_bridge_port_offload(struct net_device *dev,
+				  struct netlink_ext_ack *extack)
+{
+	return 0;
+}
+
+int switchdev_bridge_port_unoffload(struct net_device *dev)
+{
+}
+#endif
+
 #endif
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index f7d2f472ae24..930a09f27e0d 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -643,10 +643,6 @@  int br_add_if(struct net_bridge *br, struct net_device *dev,
 	if (err)
 		goto err5;
 
-	err = nbp_switchdev_mark_set(p);
-	if (err)
-		goto err6;
-
 	dev_disable_lro(dev);
 
 	list_add_rcu(&p->list, &br->port_list);
@@ -671,13 +667,13 @@  int br_add_if(struct net_bridge *br, struct net_device *dev,
 		 */
 		err = dev_pre_changeaddr_notify(br->dev, dev->dev_addr, extack);
 		if (err)
-			goto err7;
+			goto err6;
 	}
 
 	err = nbp_vlan_init(p, extack);
 	if (err) {
 		netdev_err(dev, "failed to initialize vlan filtering on this port\n");
-		goto err7;
+		goto err6;
 	}
 
 	spin_lock_bh(&br->lock);
@@ -700,11 +696,10 @@  int br_add_if(struct net_bridge *br, struct net_device *dev,
 
 	return 0;
 
-err7:
+err6:
 	list_del_rcu(&p->list);
 	br_fdb_delete_by_port(br, p, 0, 1);
 	nbp_update_port_count(br);
-err6:
 	netdev_upper_dev_unlink(dev, br->dev);
 err5:
 	dev->priv_flags &= ~IFF_BRIDGE_PORT;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index d7d167e10b70..1982b5887d0f 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -326,8 +326,10 @@  struct net_bridge_port {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	struct netpoll			*np;
 #endif
+	int				offload_count;
 #ifdef CONFIG_NET_SWITCHDEV
 	int				offload_fwd_mark;
+	struct netdev_phys_item_id	ppid;
 #endif
 	u16				group_fwd_mask;
 	u16				backup_redirected_cnt;
@@ -1572,7 +1574,6 @@  static inline void br_sysfs_delbr(struct net_device *dev) { return; }
 
 /* br_switchdev.c */
 #ifdef CONFIG_NET_SWITCHDEV
-int nbp_switchdev_mark_set(struct net_bridge_port *p);
 void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
 			      struct sk_buff *skb);
 bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
@@ -1592,11 +1593,6 @@  static inline void br_switchdev_frame_unmark(struct sk_buff *skb)
 	skb->offload_fwd_mark = 0;
 }
 #else
-static inline int nbp_switchdev_mark_set(struct net_bridge_port *p)
-{
-	return 0;
-}
-
 static inline void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
 					    struct sk_buff *skb)
 {
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index b89503832fcc..4cf7902f056c 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -8,37 +8,109 @@ 
 
 #include "br_private.h"
 
-static int br_switchdev_mark_get(struct net_bridge *br, struct net_device *dev)
+static int br_switchdev_mark_get(struct net_bridge *br,
+				 struct net_bridge_port *new_nbp)
 {
 	struct net_bridge_port *p;
 
 	/* dev is yet to be added to the port list. */
 	list_for_each_entry(p, &br->port_list, list) {
-		if (netdev_port_same_parent_id(dev, p->dev))
+		if (!p->offload_count)
+			continue;
+
+		if (netdev_phys_item_id_same(&p->ppid, &new_nbp->ppid))
 			return p->offload_fwd_mark;
 	}
 
 	return ++br->offload_fwd_mark;
 }
 
-int nbp_switchdev_mark_set(struct net_bridge_port *p)
+static int nbp_switchdev_mark_set(struct net_bridge_port *p,
+				  struct netdev_phys_item_id ppid,
+				  struct netlink_ext_ack *extack)
+{
+	if (p->offload_count) {
+		/* Prevent unsupported configurations such as a bridge port
+		 * which is a bonding interface, and the member ports are from
+		 * different hardware switches.
+		 */
+		if (!netdev_phys_item_id_same(&p->ppid, &ppid)) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "Same bridge port cannot be offloaded by two physical switches");
+			return -EBUSY;
+		}
+		/* Be tolerant with drivers that call SWITCHDEV_BRPORT_OFFLOADED
+		 * more than once for the same bridge port, such as when the
+		 * bridge port is an offloaded bonding/team interface.
+		 */
+		p->offload_count++;
+		return 0;
+	}
+
+	p->ppid = ppid;
+	p->offload_count = 1;
+	p->offload_fwd_mark = br_switchdev_mark_get(p->br, p);
+
+	return 0;
+}
+
+static void nbp_switchdev_mark_clear(struct net_bridge_port *p,
+				     struct netdev_phys_item_id ppid)
+{
+	if (WARN_ON(!netdev_phys_item_id_same(&p->ppid, &ppid)))
+		return;
+	if (WARN_ON(!p->offload_count))
+		return;
+
+	p->offload_count--;
+	if (p->offload_count)
+		return;
+
+	p->offload_fwd_mark = 0;
+}
+
+/* Let the bridge know that this port is offloaded, so that it can use
+ * the port parent id obtained by recursion to determine the bridge
+ * port's switchdev mark.
+ */
+int switchdev_bridge_port_offload(struct net_device *dev,
+				  struct netlink_ext_ack *extack)
 {
-	struct netdev_phys_item_id ppid = { };
+	struct netdev_phys_item_id ppid;
+	struct net_bridge_port *p;
 	int err;
 
-	ASSERT_RTNL();
+	p = br_port_get_rtnl(dev);
+	if (!p)
+		return -ENODEV;
 
-	err = dev_get_port_parent_id(p->dev, &ppid, true);
-	if (err) {
-		if (err == -EOPNOTSUPP)
-			return 0;
+	err = dev_get_port_parent_id(dev, &ppid, true);
+	if (err)
+		return err;
+
+	return nbp_switchdev_mark_set(p, ppid, extack);
+}
+EXPORT_SYMBOL_GPL(switchdev_bridge_port_offload);
+
+int switchdev_bridge_port_unoffload(struct net_device *dev)
+{
+	struct netdev_phys_item_id ppid;
+	struct net_bridge_port *p;
+	int err;
+
+	p = br_port_get_rtnl(dev);
+	if (!p)
+		return -ENODEV;
+
+	err = dev_get_port_parent_id(dev, &ppid, true);
+	if (err)
 		return err;
-	}
 
-	p->offload_fwd_mark = br_switchdev_mark_get(p->br, p->dev);
+	nbp_switchdev_mark_clear(p, ppid);
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(switchdev_bridge_port_unoffload);
 
 void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
 			      struct sk_buff *skb)