diff mbox series

[RFC,net-next,1/2] net: dsa: tag_mtk: skip address learning on transmit to standalone ports

Message ID 20210728175327.1150120-2-dqfext@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series mt7530 software fallback bridging fix | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers success CCed 13 of 13 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 18 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link

Commit Message

Qingfang Deng July 28, 2021, 5:53 p.m. UTC
Consider the following bridge configuration, where bond0 is not
offloaded:

         +-- br0 --+
        / /   |     \
       / /    |      \
      /  |    |     bond0
     /   |    |     /   \
   swp0 swp1 swp2 swp3 swp4
     .        .       .
     .        .       .
     A        B       C

Address learning is enabled on offloaded ports (swp0~2) and the CPU
port, so when client A sends a packet to C, the following will happen:

1. The switch learns that client A can be reached at swp0.
2. The switch probably already knows that client C can be reached at the
   CPU port, so it forwards the packet to the CPU.
3. The bridge core knows client C can be reached at bond0, so it
   forwards the packet back to the switch.
4. The switch learns that client A can be reached at the CPU port.
5. The switch forwards the packet to either swp3 or swp4, according to
   the packet's tag.

That makes client A's MAC address flap between swp0 and the CPU port. If
client B sends a packet to A, it is possible that the packet is
forwarded to the CPU. With offload_fwd_mark = 1, the bridge core won't
forward it back to the switch, resulting in packet loss.

To avoid that, skip address learning on the CPU port when the destination
port is standalone, which can be done by setting the SA_DIS bit of the
MTK tag, if bridge_dev of the destination port is not set.

Signed-off-by: DENG Qingfang <dqfext@gmail.com>
---
 net/dsa/tag_mtk.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Vladimir Oltean July 28, 2021, 6:37 p.m. UTC | #1
On Thu, Jul 29, 2021 at 01:53:25AM +0800, DENG Qingfang wrote:
> Consider the following bridge configuration, where bond0 is not
> offloaded:
> 
>          +-- br0 --+
>         / /   |     \
>        / /    |      \
>       /  |    |     bond0
>      /   |    |     /   \
>    swp0 swp1 swp2 swp3 swp4
>      .        .       .
>      .        .       .
>      A        B       C
> 
> Address learning is enabled on offloaded ports (swp0~2) and the CPU
> port, so when client A sends a packet to C, the following will happen:
> 
> 1. The switch learns that client A can be reached at swp0.
> 2. The switch probably already knows that client C can be reached at the
>    CPU port, so it forwards the packet to the CPU.
> 3. The bridge core knows client C can be reached at bond0, so it
>    forwards the packet back to the switch.
> 4. The switch learns that client A can be reached at the CPU port.
> 5. The switch forwards the packet to either swp3 or swp4, according to
>    the packet's tag.
> 
> That makes client A's MAC address flap between swp0 and the CPU port. If
> client B sends a packet to A, it is possible that the packet is
> forwarded to the CPU. With offload_fwd_mark = 1, the bridge core won't
> forward it back to the switch, resulting in packet loss.
> 
> To avoid that, skip address learning on the CPU port when the destination
> port is standalone, which can be done by setting the SA_DIS bit of the
> MTK tag, if bridge_dev of the destination port is not set.
> 
> Signed-off-by: DENG Qingfang <dqfext@gmail.com>
> ---
>  net/dsa/tag_mtk.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c
> index cc3ba864ad5b..8c361812e21b 100644
> --- a/net/dsa/tag_mtk.c
> +++ b/net/dsa/tag_mtk.c
> @@ -15,8 +15,7 @@
>  #define MTK_HDR_XMIT_TAGGED_TPID_8100	1
>  #define MTK_HDR_XMIT_TAGGED_TPID_88A8	2
>  #define MTK_HDR_RECV_SOURCE_PORT_MASK	GENMASK(2, 0)
> -#define MTK_HDR_XMIT_DP_BIT_MASK	GENMASK(5, 0)
> -#define MTK_HDR_XMIT_SA_DIS		BIT(6)
> +#define MTK_HDR_XMIT_SA_DIS_SHIFT	6
>  
>  static struct sk_buff *mtk_tag_xmit(struct sk_buff *skb,
>  				    struct net_device *dev)
> @@ -50,7 +49,8 @@ static struct sk_buff *mtk_tag_xmit(struct sk_buff *skb,
>  	 * whether that's a combined special tag with 802.1Q header.
>  	 */
>  	mtk_tag[0] = xmit_tpid;
> -	mtk_tag[1] = (1 << dp->index) & MTK_HDR_XMIT_DP_BIT_MASK;

Why stop AND-ing with MTK_HDR_XMIT_DP_BIT_MASK if you were doing that
before? If it's not needed (probably isn't), it would be nice to split
that up.

> +	mtk_tag[1] = BIT(dp->index) |
> +		     (!dp->bridge_dev << MTK_HDR_XMIT_SA_DIS_SHIFT);
>  
>  	/* Tag control information is kept for 802.1Q */
>  	if (xmit_tpid == MTK_HDR_XMIT_UNTAGGED) {
> -- 
> 2.25.1
> 

Otherwise this is as correct as can be without implementing TX
forwarding offload for the bridge (which you've explained why it doesn't
map 1:1 with what your hw can do). But just because a port is under a bridge
doesn't mean that the only packets it sends belong to that bridge. Think
AF_PACKET sockets, PTP etc. The bridge also has a no_linklocal_learn
option that maybe should be taken into consideration for drivers that
can do something meaningful about it. Anyway, food for thought.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Vladimir Oltean July 30, 2021, 4:24 p.m. UTC | #2
On Wed, Jul 28, 2021 at 09:37:05PM +0300, Vladimir Oltean wrote:
> Otherwise this is as correct as can be without implementing TX
> forwarding offload for the bridge (which you've explained why it doesn't
> map 1:1 with what your hw can do). But just because a port is under a bridge
> doesn't mean that the only packets it sends belong to that bridge. Think
> AF_PACKET sockets, PTP etc. The bridge also has a no_linklocal_learn
> option that maybe should be taken into consideration for drivers that
> can do something meaningful about it. Anyway, food for thought.

Considering that you also have the option of setting
ds->assisted_learning_on_cpu_port = true and this will have less false
positives, what are the reasons why you did not choose that approach?
Qingfang Deng July 30, 2021, 5:32 p.m. UTC | #3
On Fri, Jul 30, 2021 at 07:24:03PM +0300, Vladimir Oltean wrote:
> Considering that you also have the option of setting
> ds->assisted_learning_on_cpu_port = true and this will have less false
> positives, what are the reasons why you did not choose that approach?

You're right. Hardware learning on CPU port does have some limitations.

I have been testing a multi CPU ports patch, and assisted learning has
to be used, because FDB entries should be installed like multicast
ones, which point to all CPU ports.
Vladimir Oltean July 30, 2021, 5:39 p.m. UTC | #4
On Sat, Jul 31, 2021 at 01:32:03AM +0800, DENG Qingfang wrote:
> On Fri, Jul 30, 2021 at 07:24:03PM +0300, Vladimir Oltean wrote:
> > Considering that you also have the option of setting
> > ds->assisted_learning_on_cpu_port = true and this will have less false
> > positives, what are the reasons why you did not choose that approach?
> 
> You're right. Hardware learning on CPU port does have some limitations.
> 
> I have been testing a multi CPU ports patch, and assisted learning has
> to be used, because FDB entries should be installed like multicast
> ones, which point to all CPU ports.

Ah, mt7530 is one of the switches which has multiple CPU ports, I had
forgotten that. In that case, then static FDB entries are pretty much
the only way to go indeed.

I am going to send a patch series soon to convert sja1105 to assisted
learning too. It doesn't support multiple CPU ports, and it does have
hardware learning on the CPU port, but it can be arranged in cross-chip
topologies where each switch has its own CPU port, so from DSA's
perspective, it is as though we are dealing with a multi-CPU port switch
(the DSA tree does have multiple CPUs, in fact).  I have been
obsessively testing this configuration for the past few weeks and I
think the assisted learning functionality works fairly well by now.
Vladimir Oltean July 30, 2021, 5:41 p.m. UTC | #5
On Fri, Jul 30, 2021 at 08:39:02PM +0300, Vladimir Oltean wrote:
> On Sat, Jul 31, 2021 at 01:32:03AM +0800, DENG Qingfang wrote:
> > On Fri, Jul 30, 2021 at 07:24:03PM +0300, Vladimir Oltean wrote:
> > > Considering that you also have the option of setting
> > > ds->assisted_learning_on_cpu_port = true and this will have less false
> > > positives, what are the reasons why you did not choose that approach?
> > 
> > You're right. Hardware learning on CPU port does have some limitations.
> > 
> > I have been testing a multi CPU ports patch, and assisted learning has
> > to be used, because FDB entries should be installed like multicast
> > ones, which point to all CPU ports.
> 
> Ah, mt7530 is one of the switches which has multiple CPU ports, I had
> forgotten that. In that case, then static FDB entries are pretty much
> the only way to go indeed.

I forget which ones are the modes in which the multi-CPU feature on
mt7530 is supposed to be used: static assignment of user ports to CPU
ports, or LAG between the CPU ports, or a mix of both?
Qingfang Deng July 30, 2021, 5:58 p.m. UTC | #6
On Fri, Jul 30, 2021 at 08:41:35PM +0300, Vladimir Oltean wrote:
> On Fri, Jul 30, 2021 at 08:39:02PM +0300, Vladimir Oltean wrote:
> > Ah, mt7530 is one of the switches which has multiple CPU ports, I had
> > forgotten that. In that case, then static FDB entries are pretty much
> > the only way to go indeed.
> 
> I forget which ones are the modes in which the multi-CPU feature on
> mt7530 is supposed to be used: static assignment of user ports to CPU
> ports, or LAG between the CPU ports, or a mix of both?

MT7530 only supports static assignment, by changing the port matrix.

MT7531 also supports hardware LAG, but I don't think it's ideal because
its CPU ports have different speeds (one 1Gbps RGMII and the other 2.5Gbps
HSGMII).
Qingfang Deng July 30, 2021, 7 p.m. UTC | #7
On Fri, Jul 30, 2021 at 07:24:03PM +0300, Vladimir Oltean wrote:
> Considering that you also have the option of setting
> ds->assisted_learning_on_cpu_port = true and this will have less false
> positives, what are the reasons why you did not choose that approach?

After enabling it, I noticed .port_fdb_{add,del} are called with VID=0
(which it does not use now) unless I turn on VLAN filtering. Is that
normal?
Vladimir Oltean July 30, 2021, 7:07 p.m. UTC | #8
On Sat, Jul 31, 2021 at 03:00:20AM +0800, DENG Qingfang wrote:
> On Fri, Jul 30, 2021 at 07:24:03PM +0300, Vladimir Oltean wrote:
> > Considering that you also have the option of setting
> > ds->assisted_learning_on_cpu_port = true and this will have less false
> > positives, what are the reasons why you did not choose that approach?
> 
> After enabling it, I noticed .port_fdb_{add,del} are called with VID=0
> (which it does not use now) unless I turn on VLAN filtering. Is that
> normal?

They are called with the VID from the learned packet.
If the bridge is VLAN-unaware, the MAC SA is learned with VID 0.
Generally, VID 0 is always used for VLAN-unaware bridging. You can
privately translate VID 0 to whatever VLAN ID you use in VLAN-unaware
mode.
Qingfang Deng July 30, 2021, 7:25 p.m. UTC | #9
On Fri, Jul 30, 2021 at 10:07:06PM +0300, Vladimir Oltean wrote:
> > After enabling it, I noticed .port_fdb_{add,del} are called with VID=0
> > (which it does not use now) unless I turn on VLAN filtering. Is that
> > normal?
> 
> They are called with the VID from the learned packet.
> If the bridge is VLAN-unaware, the MAC SA is learned with VID 0.
> Generally, VID 0 is always used for VLAN-unaware bridging. You can
> privately translate VID 0 to whatever VLAN ID you use in VLAN-unaware
> mode.

Now the issue is PVID is always set to the bridge's vlan_default_pvid,
regardless of VLAN awareless.
Vladimir Oltean July 30, 2021, 7:30 p.m. UTC | #10
On Sat, Jul 31, 2021 at 03:25:55AM +0800, DENG Qingfang wrote:
> On Fri, Jul 30, 2021 at 10:07:06PM +0300, Vladimir Oltean wrote:
> > > After enabling it, I noticed .port_fdb_{add,del} are called with VID=0
> > > (which it does not use now) unless I turn on VLAN filtering. Is that
> > > normal?
> > 
> > They are called with the VID from the learned packet.
> > If the bridge is VLAN-unaware, the MAC SA is learned with VID 0.
> > Generally, VID 0 is always used for VLAN-unaware bridging. You can
> > privately translate VID 0 to whatever VLAN ID you use in VLAN-unaware
> > mode.
> 
> Now the issue is PVID is always set to the bridge's vlan_default_pvid,
> regardless of VLAN awareless.

Then change that, sja1105 and ocelot/felix are good examples of how to
set a pvid in VLAN-unaware mode that is independent of what the bridge
asks for.
diff mbox series

Patch

diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c
index cc3ba864ad5b..8c361812e21b 100644
--- a/net/dsa/tag_mtk.c
+++ b/net/dsa/tag_mtk.c
@@ -15,8 +15,7 @@ 
 #define MTK_HDR_XMIT_TAGGED_TPID_8100	1
 #define MTK_HDR_XMIT_TAGGED_TPID_88A8	2
 #define MTK_HDR_RECV_SOURCE_PORT_MASK	GENMASK(2, 0)
-#define MTK_HDR_XMIT_DP_BIT_MASK	GENMASK(5, 0)
-#define MTK_HDR_XMIT_SA_DIS		BIT(6)
+#define MTK_HDR_XMIT_SA_DIS_SHIFT	6
 
 static struct sk_buff *mtk_tag_xmit(struct sk_buff *skb,
 				    struct net_device *dev)
@@ -50,7 +49,8 @@  static struct sk_buff *mtk_tag_xmit(struct sk_buff *skb,
 	 * whether that's a combined special tag with 802.1Q header.
 	 */
 	mtk_tag[0] = xmit_tpid;
-	mtk_tag[1] = (1 << dp->index) & MTK_HDR_XMIT_DP_BIT_MASK;
+	mtk_tag[1] = BIT(dp->index) |
+		     (!dp->bridge_dev << MTK_HDR_XMIT_SA_DIS_SHIFT);
 
 	/* Tag control information is kept for 802.1Q */
 	if (xmit_tpid == MTK_HDR_XMIT_UNTAGGED) {