Message ID | 20210807120726.1063225-3-dqfext@gmail.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | qca8k bridge flags offload | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | success | CCed 7 of 7 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 8 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | Link |
On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote: > Enable assisted learning on CPU port to fix roaming issues. 'roaming issues' implies to me it suffered from blindness to MAC addresses learned on foreign interfaces, which appears to not be true since your previous patch removes hardware learning on the CPU port (=> hardware learning on the CPU port was supported, so there were no roaming issues) > > Although hardware learning is available, it won't work well with > software bridging fallback or multiple CPU ports. This part is potentially true however, but it would need proof. I am not familiar enough with the qca8k driver to say for sure that it suffers from the typical problem with bridging with software LAG uppers (no FDB isolation for standalone ports => attempt to shortcircuit the forwarding through the CPU port and go directly towards the bridged port, which would result in drops), and I also can't say anything about its support for multiple CPU ports.
On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote: > On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote: > > Enable assisted learning on CPU port to fix roaming issues. > > 'roaming issues' implies to me it suffered from blindness to MAC > addresses learned on foreign interfaces, which appears to not be true > since your previous patch removes hardware learning on the CPU port > (=> hardware learning on the CPU port was supported, so there were no > roaming issues) The datasheet says learning is enabled by default, but if that's true, the driver won't have to enable it manually. Others have reported roaming issues as well: https://github.com/Ansuel/openwrt/pull/3 As I don't have the hardware to test, I don't know what the default value really is, so I just disable learning to make sure. > > > > > Although hardware learning is available, it won't work well with > > software bridging fallback or multiple CPU ports. > > This part is potentially true however, but it would need proof. I am not > familiar enough with the qca8k driver to say for sure that it suffers > from the typical problem with bridging with software LAG uppers (no FDB > isolation for standalone ports => attempt to shortcircuit the forwarding > through the CPU port and go directly towards the bridged port, which > would result in drops), and I also can't say anything about its support > for multiple CPU ports. QCA8337 supports disabling learning and FDB lookup on a per-VLAN basis, so we could assign all standalone ports to a reserved VLAN (ID 0 or 4095) with learning and FDB lookup disabled. Ansuel has a patch set for multiple CPU ports.
On Mon, Aug 09, 2021 at 12:05:03AM +0800, DENG Qingfang wrote: > On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote: > > On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote: > > > Enable assisted learning on CPU port to fix roaming issues. > > > > 'roaming issues' implies to me it suffered from blindness to MAC > > addresses learned on foreign interfaces, which appears to not be true > > since your previous patch removes hardware learning on the CPU port > > (=> hardware learning on the CPU port was supported, so there were no > > roaming issues) > > The datasheet says learning is enabled by default, but if that's true, > the driver won't have to enable it manually. > > Others have reported roaming issues as well: > https://github.com/Ansuel/openwrt/pull/3 > > As I don't have the hardware to test, I don't know what the default > value really is, so I just disable learning to make sure. That link doesn't really say more than "roaming issues" either, so I am still not clear on what is being fixed here exactly. Note that I can still think of 'roaming'-related issues with VLAN-aware bridges and foreign interfaces and hardware learning on the CPU port, but I don't want to speculate too much and just want to hear what is the issue that is being fixed. > > > Although hardware learning is available, it won't work well with > > > software bridging fallback or multiple CPU ports. > > > > This part is potentially true however, but it would need proof. I am not > > familiar enough with the qca8k driver to say for sure that it suffers > > from the typical problem with bridging with software LAG uppers (no FDB > > isolation for standalone ports => attempt to shortcircuit the forwarding > > through the CPU port and go directly towards the bridged port, which > > would result in drops), and I also can't say anything about its support > > for multiple CPU ports. > > QCA8337 supports disabling learning and FDB lookup on a per-VLAN basis, > so we could assign all standalone ports to a reserved VLAN (ID 0 or 4095) > with learning and FDB lookup disabled. And to follow along that idea, if you also change the tagger to send all packets towards a standalone port using that reserved VLAN, then even if hardware learning is enabled on the CPU port, it will be inconsequential as long as IVL is used, because no FDB lookup will match the VLAN in which those addresses were learned. My point is, if you come with something functional to the table, present the whole story. If more changes still need to be made until it works with software bridging fallback, say that too. Otherwise, I think that the general idea that "hardware learning on the CPU port won't work well with software bridging fallback" is not strictly true, and that this patch has a weak overall justification. > > Ansuel has a patch set for multiple CPU ports.
On Sun, Aug 08, 2021 at 1805, DENG Qingfang wrote: > On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote: >> On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote: >>> Enable assisted learning on CPU port to fix roaming issues. >> >> 'roaming issues' implies to me it suffered from blindness to MAC >> addresses learned on foreign interfaces, which appears to not be true >> since your previous patch removes hardware learning on the CPU port >> (=> hardware learning on the CPU port was supported, so there were no >> roaming issues) The issue is with a wifi AP bridged into dsa and previously learned addresses. Test setup: We have to wifi APs a and b(with qca8k). Client is on AP a. The qca8k switch in AP b sees also the broadcast traffic from the client and takes the address into its fdb. Now the client roams to AP b. The client starts DHCP but does not get an IP. With tcpdump, I see the packets going through the switch (ap->cpu port->ethernet port) and they arrive at the DHCP server. It responds, the response packet reaches the ethernet port of the qca8k, and is not forwarded. After about 3 minutes the fdb entry in the qca8k on AP b is "cleaned up" and the client can immediately get its IP from the DHCP server. I hope this helps understanding the background.
On Tue, Aug 10, 2021 at 07:27:05PM +0200, Andre Valentin wrote: > On Sun, Aug 08, 2021 at 1805, DENG Qingfang wrote: > > On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote: > >> On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote: > >>> Enable assisted learning on CPU port to fix roaming issues. > >> > >> 'roaming issues' implies to me it suffered from blindness to MAC > >> addresses learned on foreign interfaces, which appears to not be true > >> since your previous patch removes hardware learning on the CPU port > >> (=> hardware learning on the CPU port was supported, so there were no > >> roaming issues) > > The issue is with a wifi AP bridged into dsa and previously learned > addresses. > > Test setup: > We have to wifi APs a and b(with qca8k). Client is on AP a. > > The qca8k switch in AP b sees also the broadcast traffic from the client > and takes the address into its fdb. > > Now the client roams to AP b. > The client starts DHCP but does not get an IP. With tcpdump, I see the > packets going through the switch (ap->cpu port->ethernet port) and they > arrive at the DHCP server. It responds, the response packet reaches the > ethernet port of the qca8k, and is not forwarded. > > After about 3 minutes the fdb entry in the qca8k on AP b is > "cleaned up" and the client can immediately get its IP from the DHCP server. > > I hope this helps understanding the background. How does this differ from what is described in commit d5f19486cee7 ("net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign bridge neighbors")?
Am 10.08.21 um 19:53 schrieb Vladimir Oltean: > On Tue, Aug 10, 2021 at 07:27:05PM +0200, Andre Valentin wrote: >> On Sun, Aug 08, 2021 at 1805, DENG Qingfang wrote: >>> On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote: >>>> On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote: >>>>> Enable assisted learning on CPU port to fix roaming issues. >>>> >>>> 'roaming issues' implies to me it suffered from blindness to MAC >>>> addresses learned on foreign interfaces, which appears to not be true >>>> since your previous patch removes hardware learning on the CPU port >>>> (=> hardware learning on the CPU port was supported, so there were no >>>> roaming issues) >> >> The issue is with a wifi AP bridged into dsa and previously learned >> addresses. >> >> Test setup: >> We have to wifi APs a and b(with qca8k). Client is on AP a. >> >> The qca8k switch in AP b sees also the broadcast traffic from the client >> and takes the address into its fdb. >> >> Now the client roams to AP b. >> The client starts DHCP but does not get an IP. With tcpdump, I see the >> packets going through the switch (ap->cpu port->ethernet port) and they >> arrive at the DHCP server. It responds, the response packet reaches the >> ethernet port of the qca8k, and is not forwarded. >> >> After about 3 minutes the fdb entry in the qca8k on AP b is >> "cleaned up" and the client can immediately get its IP from the DHCP server. >> >> I hope this helps understanding the background. > > How does this differ from what is described in commit d5f19486cee7 > ("net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign > bridge neighbors")? > I lost a bit, It is a bit different. I've been also working a bit on the ipq807x device with such a switch on OpenWRT. There is a backport of that commit. To fix the problems described by d5f19486cee7, I enabled assisted_learning on qca8k. And it solves the problem. But initially, the device was unreachable until I created traffic from the device to a client (cpu port->ethernet). At first, I did not notice this because a wifi client with it's traffic immediately solved the issue automatically. Later I verified this in detail. Hopefully DENG Qingfang patches help. But I cannot proove atm.
On Tue, Aug 10, 2021 at 11:09:07PM +0200, Andre Valentin wrote: > Am 10.08.21 um 19:53 schrieb Vladimir Oltean: > > On Tue, Aug 10, 2021 at 07:27:05PM +0200, Andre Valentin wrote: > >> On Sun, Aug 08, 2021 at 1805, DENG Qingfang wrote: > >>> On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote: > >>>> On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote: > >>>>> Enable assisted learning on CPU port to fix roaming issues. > >>>> > >>>> 'roaming issues' implies to me it suffered from blindness to MAC > >>>> addresses learned on foreign interfaces, which appears to not be true > >>>> since your previous patch removes hardware learning on the CPU port > >>>> (=> hardware learning on the CPU port was supported, so there were no > >>>> roaming issues) > >> > >> The issue is with a wifi AP bridged into dsa and previously learned > >> addresses. > >> > >> Test setup: > >> We have to wifi APs a and b(with qca8k). Client is on AP a. > >> > >> The qca8k switch in AP b sees also the broadcast traffic from the client > >> and takes the address into its fdb. > >> > >> Now the client roams to AP b. > >> The client starts DHCP but does not get an IP. With tcpdump, I see the > >> packets going through the switch (ap->cpu port->ethernet port) and they > >> arrive at the DHCP server. It responds, the response packet reaches the > >> ethernet port of the qca8k, and is not forwarded. > >> > >> After about 3 minutes the fdb entry in the qca8k on AP b is > >> "cleaned up" and the client can immediately get its IP from the DHCP server. > >> > >> I hope this helps understanding the background. > > > > How does this differ from what is described in commit d5f19486cee7 > > ("net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign > > bridge neighbors")? > > > I lost a bit, It is a bit different. > > I've been also working a bit on the ipq807x device with such a switch on > OpenWRT. There is a backport of that commit. To fix the problems described > by d5f19486cee7, I enabled assisted_learning on qca8k. And it solves the > problem. > > But initially, the device was unreachable until I created traffic from the device > to a client (cpu port->ethernet). At first, I did not notice this because a wifi client > with it's traffic immediately solved the issue automatically. > Later I verified this in detail. > > Hopefully DENG Qingfang patches help. But I cannot proove atm. I don't understand. You're saying that when the device sends a packet from its new position, the switch learns it on the CPU port, and that fixes the issue? Isn't that always how issues like that get fixed? If hardware learning is supported on the CPU port, it is no different than a device roaming from one switch port to another (but isn't directly connected to that switch port, otherwise the switch might fast age the port when the device roams) - it is inaccessible until it says something. I still have no idea what we're talking about, and why this patch is necessary. Does the qca8k switch support hardware learning on the CPU port or not?
diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c index 798bc548e5b0..de2aa7812d1c 100644 --- a/drivers/net/dsa/qca8k.c +++ b/drivers/net/dsa/qca8k.c @@ -1114,6 +1114,8 @@ qca8k_setup(struct dsa_switch *ds) /* We don't have interrupts for link changes, so we need to poll */ ds->pcs_poll = true; + ds->assisted_learning_on_cpu_port = true; + return 0; }
Enable assisted learning on CPU port to fix roaming issues. Although hardware learning is available, it won't work well with software bridging fallback or multiple CPU ports. Signed-off-by: DENG Qingfang <dqfext@gmail.com> --- drivers/net/dsa/qca8k.c | 2 ++ 1 file changed, 2 insertions(+)