diff mbox series

[RFC,v1,net-next] net: mlxsw_sp: Use switchdev_handle_port_obj_add_foreign() for vxlan

Message ID 20250208141518.191782-1-ericwouds@gmail.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC,v1,net-next] net: mlxsw_sp: Use switchdev_handle_port_obj_add_foreign() for vxlan | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 40 this patch: 40
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Eric Woudstra Feb. 8, 2025, 2:15 p.m. UTC
Sending as RFC as I do not own this hardware. This code is not tested.

Vladimir found this part of the spectrum switchdev, while looking at
another issue here:

https://lore.kernel.org/all/20250207220408.zipucrmm2yafj4wu@skbuf/

As vxlan seems a foreign port, wouldn't it be better to use
switchdev_handle_port_obj_add_foreign() ?

Note that port_obj_info->handled = true is set in
__switchdev_handle_port_obj_add() and can be removed.

Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum.c    |  6 ++
 .../net/ethernet/mellanox/mlxsw/spectrum.h    |  2 +
 .../mellanox/mlxsw/spectrum_switchdev.c       | 92 ++++++++++---------
 3 files changed, 56 insertions(+), 44 deletions(-)

Comments

Ido Schimmel Feb. 10, 2025, 6:48 a.m. UTC | #1
On Sat, Feb 08, 2025 at 03:15:18PM +0100, Eric Woudstra wrote:
> Sending as RFC as I do not own this hardware. This code is not tested.
> 
> Vladimir found this part of the spectrum switchdev, while looking at
> another issue here:
> 
> https://lore.kernel.org/all/20250207220408.zipucrmm2yafj4wu@skbuf/
> 
> As vxlan seems a foreign port, wouldn't it be better to use
> switchdev_handle_port_obj_add_foreign() ?

Thanks for the patch, but the VXLAN port is not foreign to the other
switch ports. That is, forwarding between these ports and VXLAN happens
in hardware. And yes, switchdev_bridge_port_offload() does need to be
called for the VXLAN port so that it's assigned the same hardware domain
as the other ports.
Vladimir Oltean Feb. 10, 2025, 3:22 p.m. UTC | #2
On Mon, Feb 10, 2025 at 08:48:32AM +0200, Ido Schimmel wrote:
> On Sat, Feb 08, 2025 at 03:15:18PM +0100, Eric Woudstra wrote:
> > Sending as RFC as I do not own this hardware. This code is not tested.
> > 
> > Vladimir found this part of the spectrum switchdev, while looking at
> > another issue here:
> > 
> > https://lore.kernel.org/all/20250207220408.zipucrmm2yafj4wu@skbuf/
> > 
> > As vxlan seems a foreign port, wouldn't it be better to use
> > switchdev_handle_port_obj_add_foreign() ?
> 
> Thanks for the patch, but the VXLAN port is not foreign to the other
> switch ports. That is, forwarding between these ports and VXLAN happens
> in hardware. And yes, switchdev_bridge_port_offload() does need to be
> called for the VXLAN port so that it's assigned the same hardware domain
> as the other ports.

Thanks, this is useful. I'm not providing a patch yet because there are
still things I don't understand.

Have you seen any of the typical problems associated with the software
bridge thinking vxlan isn't part of the same hwdom as the ingress
physical port, and, say, flooding packets twice to vxlan, when the
switch had already forwarded a copy of the packet? In almost 4 years
since commit 2f5dc00f7a3e ("net: bridge: switchdev: let drivers inform
which bridge ports are offloaded"), I would have expected such issues
would have been noticed?

Do we require a Fixes: tag for this?

And then, switchdev_bridge_port_offload() has a brport_dev argument,
which would pretty clearly be passed as vxlan_dev by
mlxsw_sp_bridge_8021d_vxlan_join() and
mlxsw_sp_bridge_vlan_aware_vxlan_join(), but it also has one other
"struct net_device *dev" argument, on which br_switchdev_port_offload()
wants to call dev_get_port_parent_id(), to see what hwdom (what other
bridge ports) to associate it to.

Usually we use the mlxsw_sp_port->dev as the second argument, but which
port to use here? Any random port that's under the bridge, or is there a
specific one for the vxlan that should be used?
Ido Schimmel Feb. 11, 2025, 2:31 p.m. UTC | #3
On Mon, Feb 10, 2025 at 05:22:46PM +0200, Vladimir Oltean wrote:
> On Mon, Feb 10, 2025 at 08:48:32AM +0200, Ido Schimmel wrote:
> > On Sat, Feb 08, 2025 at 03:15:18PM +0100, Eric Woudstra wrote:
> > > Sending as RFC as I do not own this hardware. This code is not tested.
> > > 
> > > Vladimir found this part of the spectrum switchdev, while looking at
> > > another issue here:
> > > 
> > > https://lore.kernel.org/all/20250207220408.zipucrmm2yafj4wu@skbuf/
> > > 
> > > As vxlan seems a foreign port, wouldn't it be better to use
> > > switchdev_handle_port_obj_add_foreign() ?
> > 
> > Thanks for the patch, but the VXLAN port is not foreign to the other
> > switch ports. That is, forwarding between these ports and VXLAN happens
> > in hardware. And yes, switchdev_bridge_port_offload() does need to be
> > called for the VXLAN port so that it's assigned the same hardware domain
> > as the other ports.
> 
> Thanks, this is useful. I'm not providing a patch yet because there are
> still things I don't understand.
> 
> Have you seen any of the typical problems associated with the software
> bridge thinking vxlan isn't part of the same hwdom as the ingress
> physical port, and, say, flooding packets twice to vxlan, when the
> switch had already forwarded a copy of the packet? In almost 4 years
> since commit 2f5dc00f7a3e ("net: bridge: switchdev: let drivers inform
> which bridge ports are offloaded"), I would have expected such issues
> would have been noticed?

I'm aware of one report from QA that is on my list. They configured a
VXLAN tunnel that floods packets to two remote VTEPs:

00:00:00:00:00:00 dev vx100 dst 1.1.1.2
00:00:00:00:00:00 dev vx100 dst 1.1.1.3

The underlay routes used to forward the VXLAN encapsulated traffic are:

1.1.1.2 via 10.0.0.2 dev swp13
1.1.1.3 via 10.0.0.6 dev swp15

But they made sure not to configure 10.0.0.6 at the other end. What
happens is that traffic for 1.1.1.2 is correctly forwarded in hardware,
but traffic for 1.1.1.3 encounters a neighbour miss and trapped to
the CPU which then forwards it again to 1.1.1.2.

Putting the VXLAN device in the same hardware domain as the other switch
ports should solve this "double forwarding" scenario.

> Do we require a Fixes: tag for this?

It's not strictly a regression (never worked) and it's not that critical
IMO, so I prefer targeting net-next.

> And then, switchdev_bridge_port_offload() has a brport_dev argument,
> which would pretty clearly be passed as vxlan_dev by
> mlxsw_sp_bridge_8021d_vxlan_join() and
> mlxsw_sp_bridge_vlan_aware_vxlan_join(), but it also has one other
> "struct net_device *dev" argument, on which br_switchdev_port_offload()
> wants to call dev_get_port_parent_id(), to see what hwdom (what other
> bridge ports) to associate it to.

Right.

> Usually we use the mlxsw_sp_port->dev as the second argument, but which
> port to use here? Any random port that's under the bridge, or is there a
> specific one for the vxlan that should be used?

Any random port is fine as they all share the same parent ID.

BTW, I asked Amit (Cced) to look into this as there might be some issues
with ARP suppression that will make it a bit more complicated to patch.
Are you OK with that or do you want the authorship? We can put any tag
you choose. I am asking so that it won't appear like I am trying to
discredit you.

Thanks!
Vladimir Oltean Feb. 11, 2025, 3:23 p.m. UTC | #4
On Tue, Feb 11, 2025 at 04:31:43PM +0200, Ido Schimmel wrote:
> On Mon, Feb 10, 2025 at 05:22:46PM +0200, Vladimir Oltean wrote:
> > On Mon, Feb 10, 2025 at 08:48:32AM +0200, Ido Schimmel wrote:
> > > On Sat, Feb 08, 2025 at 03:15:18PM +0100, Eric Woudstra wrote:
> > > > Sending as RFC as I do not own this hardware. This code is not tested.
> > > > 
> > > > Vladimir found this part of the spectrum switchdev, while looking at
> > > > another issue here:
> > > > 
> > > > https://lore.kernel.org/all/20250207220408.zipucrmm2yafj4wu@skbuf/
> > > > 
> > > > As vxlan seems a foreign port, wouldn't it be better to use
> > > > switchdev_handle_port_obj_add_foreign() ?
> > > 
> > > Thanks for the patch, but the VXLAN port is not foreign to the other
> > > switch ports. That is, forwarding between these ports and VXLAN happens
> > > in hardware. And yes, switchdev_bridge_port_offload() does need to be
> > > called for the VXLAN port so that it's assigned the same hardware domain
> > > as the other ports.
> > 
> > Thanks, this is useful. I'm not providing a patch yet because there are
> > still things I don't understand.
> > 
> > Have you seen any of the typical problems associated with the software
> > bridge thinking vxlan isn't part of the same hwdom as the ingress
> > physical port, and, say, flooding packets twice to vxlan, when the
> > switch had already forwarded a copy of the packet? In almost 4 years
> > since commit 2f5dc00f7a3e ("net: bridge: switchdev: let drivers inform
> > which bridge ports are offloaded"), I would have expected such issues
> > would have been noticed?
> 
> I'm aware of one report from QA that is on my list. They configured a
> VXLAN tunnel that floods packets to two remote VTEPs:
> 
> 00:00:00:00:00:00 dev vx100 dst 1.1.1.2
> 00:00:00:00:00:00 dev vx100 dst 1.1.1.3
> 
> The underlay routes used to forward the VXLAN encapsulated traffic are:
> 
> 1.1.1.2 via 10.0.0.2 dev swp13
> 1.1.1.3 via 10.0.0.6 dev swp15
> 
> But they made sure not to configure 10.0.0.6 at the other end. What
> happens is that traffic for 1.1.1.2 is correctly forwarded in hardware,
> but traffic for 1.1.1.3 encounters a neighbour miss and trapped to
> the CPU which then forwards it again to 1.1.1.2.
> 
> Putting the VXLAN device in the same hardware domain as the other switch
> ports should solve this "double forwarding" scenario.

Let me see if I understand what you are saying based on the code (please
bear with me, VXLAN tunnels are way outside of my area).

So in this case, the packet hits a neighbor miss in the VXLAN underlay
and reaches the CPU through the
MLXSW_SP_TRAP_EXCEPTION(UNRESOLVED_NEIGH, L3_EXCEPTIONS) trap. That is
defined to call mlxsw_sp_rx_mark_listener(), which sets
skb->offload_fwd_mark = 1 (aka packet was forwarded in L2) but not
skb->offload_l3_fwd_mark (aka packet was not forwarded in L3).
This corresponds to expected reality: neighbor miss packets should not
re-enter the bridge forwarding path in the overlay towards the VXLAN.

Yet they still do, because although skb->offload_fwd_mark is correctly
set, it only skips software forwarding towards other physical (and/or
LAG) mlxsw bridge ports.

If my understanding is correct, then yes, I agree that making the hwdomain
of the vxlan tunnel coincide with that of the other bridge ports should
suppress this packet.

> > Do we require a Fixes: tag for this?
> 
> It's not strictly a regression (never worked) and it's not that critical
> IMO, so I prefer targeting net-next.

Yes, I agree. Before that commit, the bridge would look by itself at the
bridge port, recursively searching for lower interfaces until something
returned something positively in dev_get_port_parent_id(). Which was a
limiting model. If anything, solving the "double forwarding of vxlan exception
packets" issue would have required an alternative switchdev bridge port
offloading model anyway, like the explicit one that exists now, which is
more flexible.

> > And then, switchdev_bridge_port_offload() has a brport_dev argument,
> > which would pretty clearly be passed as vxlan_dev by
> > mlxsw_sp_bridge_8021d_vxlan_join() and
> > mlxsw_sp_bridge_vlan_aware_vxlan_join(), but it also has one other
> > "struct net_device *dev" argument, on which br_switchdev_port_offload()
> > wants to call dev_get_port_parent_id(), to see what hwdom (what other
> > bridge ports) to associate it to.
> 
> Right.
> 
> > Usually we use the mlxsw_sp_port->dev as the second argument, but which
> > port to use here? Any random port that's under the bridge, or is there a
> > specific one for the vxlan that should be used?
> 
> Any random port is fine as they all share the same parent ID.
> 
> BTW, I asked Amit (Cced) to look into this as there might be some issues
> with ARP suppression that will make it a bit more complicated to patch.
> Are you OK with that or do you want the authorship? We can put any tag
> you choose. I am asking so that it won't appear like I am trying to
> discredit you.

It is completely fine for Amit to take a look at this, I could really do
without yet another thing that is completely over my head :)

Just one indication from my side on how I would approach things:

Because spectrum bridge ports come and go, and the vxlan tunnel bridge
port may stay, its hwdom may change over time (depending on how many
cards there are plugged in the system). Also, it probably won't work to
combine spectrum bridge ports spanning multiple cards in the same
bridge. For those reasons, br_switchdev_port_offload() supports multiple
calls made to the same vxlan_dev, and behind the scenes, it will just
bump the net_bridge_port->offload_count.

Thinking a bit more, I think you want exactly that: a vxlan bridge port
needs to have an offload_count equal to the number of other spectrum
ports in the same bridge. This way, it only stops being offloaded when
there are no spectrum ports left (and the offload_count drops to 0).
But this needs handling from both directions: when a vxlan joins a
bridge with spectrum ports, and when a spectrum port joins a bridge with
vxlan tunnels. Plus every leave operation having to call
br_switchdev_port_unoffload(), for things to stay balanced.

Probably there are tons of other restrictions to be aware of, but I just
wanted to say this, because in the previous email we talked about "just
any random port" and it seems like it actually needs to be "all ports".
Ido Schimmel Feb. 12, 2025, 2:07 p.m. UTC | #5
On Tue, Feb 11, 2025 at 05:23:53PM +0200, Vladimir Oltean wrote:
> On Tue, Feb 11, 2025 at 04:31:43PM +0200, Ido Schimmel wrote:
> > On Mon, Feb 10, 2025 at 05:22:46PM +0200, Vladimir Oltean wrote:
> > > On Mon, Feb 10, 2025 at 08:48:32AM +0200, Ido Schimmel wrote:
> > > > On Sat, Feb 08, 2025 at 03:15:18PM +0100, Eric Woudstra wrote:
> > > > > Sending as RFC as I do not own this hardware. This code is not tested.
> > > > > 
> > > > > Vladimir found this part of the spectrum switchdev, while looking at
> > > > > another issue here:
> > > > > 
> > > > > https://lore.kernel.org/all/20250207220408.zipucrmm2yafj4wu@skbuf/
> > > > > 
> > > > > As vxlan seems a foreign port, wouldn't it be better to use
> > > > > switchdev_handle_port_obj_add_foreign() ?
> > > > 
> > > > Thanks for the patch, but the VXLAN port is not foreign to the other
> > > > switch ports. That is, forwarding between these ports and VXLAN happens
> > > > in hardware. And yes, switchdev_bridge_port_offload() does need to be
> > > > called for the VXLAN port so that it's assigned the same hardware domain
> > > > as the other ports.
> > > 
> > > Thanks, this is useful. I'm not providing a patch yet because there are
> > > still things I don't understand.
> > > 
> > > Have you seen any of the typical problems associated with the software
> > > bridge thinking vxlan isn't part of the same hwdom as the ingress
> > > physical port, and, say, flooding packets twice to vxlan, when the
> > > switch had already forwarded a copy of the packet? In almost 4 years
> > > since commit 2f5dc00f7a3e ("net: bridge: switchdev: let drivers inform
> > > which bridge ports are offloaded"), I would have expected such issues
> > > would have been noticed?
> > 
> > I'm aware of one report from QA that is on my list. They configured a
> > VXLAN tunnel that floods packets to two remote VTEPs:
> > 
> > 00:00:00:00:00:00 dev vx100 dst 1.1.1.2
> > 00:00:00:00:00:00 dev vx100 dst 1.1.1.3
> > 
> > The underlay routes used to forward the VXLAN encapsulated traffic are:
> > 
> > 1.1.1.2 via 10.0.0.2 dev swp13
> > 1.1.1.3 via 10.0.0.6 dev swp15
> > 
> > But they made sure not to configure 10.0.0.6 at the other end. What
> > happens is that traffic for 1.1.1.2 is correctly forwarded in hardware,
> > but traffic for 1.1.1.3 encounters a neighbour miss and trapped to
> > the CPU which then forwards it again to 1.1.1.2.
> > 
> > Putting the VXLAN device in the same hardware domain as the other switch
> > ports should solve this "double forwarding" scenario.
> 
> Let me see if I understand what you are saying based on the code (please
> bear with me, VXLAN tunnels are way outside of my area).
> 
> So in this case, the packet hits a neighbor miss in the VXLAN underlay
> and reaches the CPU through the
> MLXSW_SP_TRAP_EXCEPTION(UNRESOLVED_NEIGH, L3_EXCEPTIONS) trap. That is
> defined to call mlxsw_sp_rx_mark_listener(), which sets
> skb->offload_fwd_mark = 1 (aka packet was forwarded in L2) but not
> skb->offload_l3_fwd_mark (aka packet was not forwarded in L3).
> This corresponds to expected reality: neighbor miss packets should not
> re-enter the bridge forwarding path in the overlay towards the VXLAN.
> 
> Yet they still do, because although skb->offload_fwd_mark is correctly
> set, it only skips software forwarding towards other physical (and/or
> LAG) mlxsw bridge ports.
> 
> If my understanding is correct, then yes, I agree that making the hwdomain
> of the vxlan tunnel coincide with that of the other bridge ports should
> suppress this packet.

Yes, the above is correct.

> 
> > > Do we require a Fixes: tag for this?
> > 
> > It's not strictly a regression (never worked) and it's not that critical
> > IMO, so I prefer targeting net-next.
> 
> Yes, I agree. Before that commit, the bridge would look by itself at the
> bridge port, recursively searching for lower interfaces until something
> returned something positively in dev_get_port_parent_id(). Which was a
> limiting model. If anything, solving the "double forwarding of vxlan exception
> packets" issue would have required an alternative switchdev bridge port
> offloading model anyway, like the explicit one that exists now, which is
> more flexible.

Yes.

> 
> > > And then, switchdev_bridge_port_offload() has a brport_dev argument,
> > > which would pretty clearly be passed as vxlan_dev by
> > > mlxsw_sp_bridge_8021d_vxlan_join() and
> > > mlxsw_sp_bridge_vlan_aware_vxlan_join(), but it also has one other
> > > "struct net_device *dev" argument, on which br_switchdev_port_offload()
> > > wants to call dev_get_port_parent_id(), to see what hwdom (what other
> > > bridge ports) to associate it to.
> > 
> > Right.
> > 
> > > Usually we use the mlxsw_sp_port->dev as the second argument, but which
> > > port to use here? Any random port that's under the bridge, or is there a
> > > specific one for the vxlan that should be used?
> > 
> > Any random port is fine as they all share the same parent ID.
> > 
> > BTW, I asked Amit (Cced) to look into this as there might be some issues
> > with ARP suppression that will make it a bit more complicated to patch.
> > Are you OK with that or do you want the authorship? We can put any tag
> > you choose. I am asking so that it won't appear like I am trying to
> > discredit you.
> 
> It is completely fine for Amit to take a look at this, I could really do
> without yet another thing that is completely over my head :)

Great, thanks!

> 
> Just one indication from my side on how I would approach things:
> 
> Because spectrum bridge ports come and go, and the vxlan tunnel bridge
> port may stay, its hwdom may change over time (depending on how many
> cards there are plugged in the system). Also, it probably won't work to
> combine spectrum bridge ports spanning multiple cards in the same
> bridge. For those reasons, br_switchdev_port_offload() supports multiple
> calls made to the same vxlan_dev, and behind the scenes, it will just
> bump the net_bridge_port->offload_count.
> 
> Thinking a bit more, I think you want exactly that: a vxlan bridge port
> needs to have an offload_count equal to the number of other spectrum
> ports in the same bridge. This way, it only stops being offloaded when
> there are no spectrum ports left (and the offload_count drops to 0).
> But this needs handling from both directions: when a vxlan joins a
> bridge with spectrum ports, and when a spectrum port joins a bridge with
> vxlan tunnels. Plus every leave operation having to call
> br_switchdev_port_unoffload(), for things to stay balanced.
> 
> Probably there are tons of other restrictions to be aware of, but I just
> wanted to say this, because in the previous email we talked about "just
> any random port" and it seems like it actually needs to be "all ports".

I think that "any random port" is fine as the second argument of
switchdev_bridge_port_offload() is only used to derive the parent ID
which is the same for all mlxsw ports belonging to the same PCI card.
And yes, it won't work if you combine ports from different cards in the
same bridge, but it's not really a realistic configuration. Packets
entering the bridge via the first card would have to go to the CPU if
they need to egress ports belonging to the second card. I don't know why
anyone would want that. A more realistic way to support multi-ASIC
systems is to reload each devlink instance (i.e., ASIC / card) to a
different namespace via something like "devlink dev reload
pci/0000:01:00.0 netns ns1" or use PCI passthrough to assign each card
to a different VM.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index d714311fd884..f03b489f7b99 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -4002,6 +4002,12 @@  bool mlxsw_sp_port_dev_check(const struct net_device *dev)
 	return dev->netdev_ops == &mlxsw_sp_port_netdev_ops;
 }
 
+bool mlxsw_sp_foreign_dev_check(const struct net_device *dev,
+				const struct net_device *foreign_dev)
+{
+	return netif_is_vxlan(foreign_dev);
+}
+
 static int mlxsw_sp_lower_dev_walk(struct net_device *lower_dev,
 				   struct netdev_nested_priv *priv)
 {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index b10f80fc651b..4264468c5a6f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -709,6 +709,8 @@  int mlxsw_sp_flow_counter_alloc(struct mlxsw_sp *mlxsw_sp,
 void mlxsw_sp_flow_counter_free(struct mlxsw_sp *mlxsw_sp,
 				unsigned int counter_index);
 bool mlxsw_sp_port_dev_check(const struct net_device *dev);
+bool mlxsw_sp_foreign_dev_check(const struct net_device *dev,
+				const struct net_device *foreign_dev);
 struct mlxsw_sp *mlxsw_sp_lower_get(struct net_device *dev);
 struct mlxsw_sp_port *mlxsw_sp_port_dev_lower_find(struct net_device *dev);
 struct mlxsw_sp_port *mlxsw_sp_port_dev_lower_find_rcu(struct net_device *dev);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 6397ff0dc951..0ab80cb22bc4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -2252,6 +2252,14 @@  mlxsw_sp_port_mrouter_update_mdb(struct mlxsw_sp_port *mlxsw_sp_port,
 	}
 }
 
+static int
+mlxsw_sp_switchdev_handle_vxlan_obj_add(struct net_device *vxlan_dev,
+					const struct switchdev_obj *obj,
+					struct netlink_ext_ack *extack);
+static void
+mlxsw_sp_switchdev_handle_vxlan_obj_del(struct net_device *vxlan_dev,
+					const struct switchdev_obj *obj);
+
 static int mlxsw_sp_port_obj_add(struct net_device *dev, const void *ctx,
 				 const struct switchdev_obj *obj,
 				 struct netlink_ext_ack *extack)
@@ -2262,16 +2270,20 @@  static int mlxsw_sp_port_obj_add(struct net_device *dev, const void *ctx,
 
 	switch (obj->id) {
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
-		vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
-
-		err = mlxsw_sp_port_vlans_add(mlxsw_sp_port, vlan, extack);
-
-		/* The event is emitted before the changes are actually
-		 * applied to the bridge. Therefore schedule the respin
-		 * call for later, so that the respin logic sees the
-		 * updated bridge state.
-		 */
-		mlxsw_sp_span_respin(mlxsw_sp_port->mlxsw_sp);
+		if (netif_is_vxlan(dev)) {
+			err = mlxsw_sp_switchdev_handle_vxlan_obj_add(dev, obj, extack);
+		} else {
+			vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
+
+			err = mlxsw_sp_port_vlans_add(mlxsw_sp_port, vlan, extack);
+
+			/* The event is emitted before the changes are actually
+			 * applied to the bridge. Therefore schedule the respin
+			 * call for later, so that the respin logic sees the
+			 * updated bridge state.
+			 */
+			mlxsw_sp_span_respin(mlxsw_sp_port->mlxsw_sp);
+		}
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_MDB:
 		err = mlxsw_sp_port_mdb_add(mlxsw_sp_port,
@@ -2401,8 +2413,12 @@  static int mlxsw_sp_port_obj_del(struct net_device *dev, const void *ctx,
 
 	switch (obj->id) {
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
-		err = mlxsw_sp_port_vlans_del(mlxsw_sp_port,
-					      SWITCHDEV_OBJ_PORT_VLAN(obj));
+		if (netif_is_vxlan(dev)) {
+			mlxsw_sp_switchdev_handle_vxlan_obj_del(dev, obj);
+		} else {
+			err = mlxsw_sp_port_vlans_del(mlxsw_sp_port,
+						      SWITCHDEV_OBJ_PORT_VLAN(obj));
+		}
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_MDB:
 		err = mlxsw_sp_port_mdb_del(mlxsw_sp_port,
@@ -3931,19 +3947,17 @@  mlxsw_sp_switchdev_vxlan_vlan_del(struct mlxsw_sp *mlxsw_sp,
 
 static int
 mlxsw_sp_switchdev_vxlan_vlans_add(struct net_device *vxlan_dev,
-				   struct switchdev_notifier_port_obj_info *
-				   port_obj_info)
+				   const struct switchdev_obj *obj,
+				   struct netlink_ext_ack *extack)
 {
 	struct switchdev_obj_port_vlan *vlan =
-		SWITCHDEV_OBJ_PORT_VLAN(port_obj_info->obj);
+		SWITCHDEV_OBJ_PORT_VLAN(obj);
 	bool flag_untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
 	bool flag_pvid = vlan->flags & BRIDGE_VLAN_INFO_PVID;
 	struct mlxsw_sp_bridge_device *bridge_device;
-	struct netlink_ext_ack *extack;
 	struct mlxsw_sp *mlxsw_sp;
 	struct net_device *br_dev;
 
-	extack = switchdev_notifier_info_to_extack(&port_obj_info->info);
 	br_dev = netdev_master_upper_dev_get(vxlan_dev);
 	if (!br_dev)
 		return 0;
@@ -3952,8 +3966,6 @@  mlxsw_sp_switchdev_vxlan_vlans_add(struct net_device *vxlan_dev,
 	if (!mlxsw_sp)
 		return 0;
 
-	port_obj_info->handled = true;
-
 	bridge_device = mlxsw_sp_bridge_device_find(mlxsw_sp->bridge, br_dev);
 	if (!bridge_device)
 		return -EINVAL;
@@ -3969,11 +3981,10 @@  mlxsw_sp_switchdev_vxlan_vlans_add(struct net_device *vxlan_dev,
 
 static void
 mlxsw_sp_switchdev_vxlan_vlans_del(struct net_device *vxlan_dev,
-				   struct switchdev_notifier_port_obj_info *
-				   port_obj_info)
+				   const struct switchdev_obj *obj)
 {
 	struct switchdev_obj_port_vlan *vlan =
-		SWITCHDEV_OBJ_PORT_VLAN(port_obj_info->obj);
+		SWITCHDEV_OBJ_PORT_VLAN(obj);
 	struct mlxsw_sp_bridge_device *bridge_device;
 	struct mlxsw_sp *mlxsw_sp;
 	struct net_device *br_dev;
@@ -3986,8 +3997,6 @@  mlxsw_sp_switchdev_vxlan_vlans_del(struct net_device *vxlan_dev,
 	if (!mlxsw_sp)
 		return;
 
-	port_obj_info->handled = true;
-
 	bridge_device = mlxsw_sp_bridge_device_find(mlxsw_sp->bridge, br_dev);
 	if (!bridge_device)
 		return;
@@ -4001,15 +4010,15 @@  mlxsw_sp_switchdev_vxlan_vlans_del(struct net_device *vxlan_dev,
 
 static int
 mlxsw_sp_switchdev_handle_vxlan_obj_add(struct net_device *vxlan_dev,
-					struct switchdev_notifier_port_obj_info *
-					port_obj_info)
+					const struct switchdev_obj *obj,
+					struct netlink_ext_ack *extack)
 {
 	int err = 0;
 
-	switch (port_obj_info->obj->id) {
+	switch (obj->id) {
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
 		err = mlxsw_sp_switchdev_vxlan_vlans_add(vxlan_dev,
-							 port_obj_info);
+							 obj, extack);
 		break;
 	default:
 		break;
@@ -4020,12 +4029,11 @@  mlxsw_sp_switchdev_handle_vxlan_obj_add(struct net_device *vxlan_dev,
 
 static void
 mlxsw_sp_switchdev_handle_vxlan_obj_del(struct net_device *vxlan_dev,
-					struct switchdev_notifier_port_obj_info *
-					port_obj_info)
+					const struct switchdev_obj *obj)
 {
-	switch (port_obj_info->obj->id) {
+	switch (obj->id) {
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
-		mlxsw_sp_switchdev_vxlan_vlans_del(vxlan_dev, port_obj_info);
+		mlxsw_sp_switchdev_vxlan_vlans_del(vxlan_dev, obj);
 		break;
 	default:
 		break;
@@ -4040,20 +4048,16 @@  static int mlxsw_sp_switchdev_blocking_event(struct notifier_block *unused,
 
 	switch (event) {
 	case SWITCHDEV_PORT_OBJ_ADD:
-		if (netif_is_vxlan(dev))
-			err = mlxsw_sp_switchdev_handle_vxlan_obj_add(dev, ptr);
-		else
-			err = switchdev_handle_port_obj_add(dev, ptr,
-							mlxsw_sp_port_dev_check,
-							mlxsw_sp_port_obj_add);
+		err = switchdev_handle_port_obj_add_foreign(dev, ptr,
+							    mlxsw_sp_port_dev_check,
+							    mlxsw_sp_foreign_dev_check,
+							    mlxsw_sp_port_obj_add);
 		return notifier_from_errno(err);
 	case SWITCHDEV_PORT_OBJ_DEL:
-		if (netif_is_vxlan(dev))
-			mlxsw_sp_switchdev_handle_vxlan_obj_del(dev, ptr);
-		else
-			err = switchdev_handle_port_obj_del(dev, ptr,
-							mlxsw_sp_port_dev_check,
-							mlxsw_sp_port_obj_del);
+		err = switchdev_handle_port_obj_del_foreign(dev, ptr,
+							    mlxsw_sp_port_dev_check,
+							    mlxsw_sp_foreign_dev_check,
+							    mlxsw_sp_port_obj_del);
 		return notifier_from_errno(err);
 	case SWITCHDEV_PORT_ATTR_SET:
 		err = switchdev_handle_port_attr_set(dev, ptr,