Message ID | 20220627201330.45219-1-horatiu.vultur@microchip.com (mailing list archive) |
---|---|
Headers | show |
Series | net: lan966x: Add lag support | expand |
Hi Horatiu, On Mon, Jun 27, 2022 at 10:13:23PM +0200, Horatiu Vultur wrote: > Add lag support for lan966x. > First 4 patches don't do any changes to the current behaviour, they > just prepare for lag support. While the rest is to add the lag support. > > v1->v2: > - fix the LAG PGIDs when ports go down, in this way is not > needed anymore the last patch of the series. > > Horatiu Vultur (7): > net: lan966x: Add reqisters used to configure lag interfaces > net: lan966x: Split lan966x_fdb_event_work > net: lan966x: Expose lan966x_switchdev_nb and > lan966x_switchdev_blocking_nb > net: lan966x: Extend lan966x_foreign_bridging_check > net: lan966x: Add lag support for lan966x. > net: lan966x: Extend FDB to support also lag > net: lan966x: Extend MAC to support also lag interfaces. > > .../net/ethernet/microchip/lan966x/Makefile | 2 +- > .../ethernet/microchip/lan966x/lan966x_fdb.c | 153 ++++++--- > .../ethernet/microchip/lan966x/lan966x_lag.c | 322 ++++++++++++++++++ > .../ethernet/microchip/lan966x/lan966x_mac.c | 66 +++- > .../ethernet/microchip/lan966x/lan966x_main.h | 41 +++ > .../ethernet/microchip/lan966x/lan966x_regs.h | 45 +++ > .../microchip/lan966x/lan966x_switchdev.c | 115 +++++-- > 7 files changed, 654 insertions(+), 90 deletions(-) > create mode 100644 drivers/net/ethernet/microchip/lan966x/lan966x_lag.c > > -- > 2.33.0 > I've downloaded and applied your patches and I have some general feedback. Some of it relates to changes which were not made and hence I couldn't have commented on the patches themselves, so I'm posting it here. 1. switchdev_bridge_port_offload() returns an error code if object replay failed, or if it couldn't get the port parent id, or if the user tries to join a lan966x port and a port belonging to another switchdev driver to the same LAG. It would be good to propagate this error and not ignore it. Side note: maybe this could help to eliminate the extra logic you need to add to lan966x_foreign_bridging_check(). 2. lan966x_foreign_dev_check() seems wrong/misunderstood. Currently it reports that a LAG upper is a foreign interface (unoffloaded). In turn, this makes switchdev_lower_dev_find() not find any lan966x interface beneath a LAG, and hence, __switchdev_handle_fdb_event_to_device() would not recurse to the lan966x "dev" below a LAG when the "orig_dev" of an FDB event is the bridge itself. Otherwise said, if you have no direct lan966x port under a bridge, but just bridge -> LAG -> lan966x, you will miss all local (host-filtered) FDB event notifications that you should otherwise learn towards the CPU. 3. The implementation of lan966x_lag_mac_add_entry(), with that first call to lan966x_mac_del_entry(), seems a hack. Why do you need to do that? 4. The handling of lan966x->mac_lock seems wrong in general, not just particular to this patch set. In particular, it appears to protect too little in lan966x_mac_add_entry(), i.e. just the list_add_tail. This makes it possible for lan966x_mac_lookup and lan966x_mac_learn to be concurrent with lan966x_mac_del_entry(). In turn, this appears bad first and foremost for the hardware access interface, since the MAC table access is indirect, and if you allow multiple threads to concurrently call lan966x_mac_select(), change the command in ANA_MACACCESS, and poll for command completion, things will go sideways very quickly (one command will inadvertently poll for the completion of another, which inadvertently operates on the row/column selected by yet a third command, all that due to improper serialization). 5. There is a race between lan966x_fdb_lag_event_work() calling lan966x_lag_first_port(), and lan966x_lag_port_leave() changing port->bond = NULL. Specifically, when a lan966x port leaves a LAG, there might still be deferred FDB events (add or del) which are still pending. There exists a dead time during which you will ignore these, because you think that the first lan966x LAG port isn't the first lan966x LAG port, which will lead to a desynchronization between the bridge FDB and the hardware FDB. In DSA we solved this by flushing lan966x->fdb_work inside lan966x_port_prechangeupper() on leave. This waits for the pending events to finish, and the bridge will not emit further events. It's important to do this in prechangeupper() rather than in changeupper() because switchdev_handle_fdb_event_to_device() needs the upper/lower relationship to still exist to function properly, and in changeupper() it has already been destroyed. Side note: if you flush lan966x->fdb_work, then you have an upper bound for how long can lan966x_fdb_event_work be deferred. Specifically, you can remove the dev_hold() and dev_put() calls, since it surely can't be deferred until after the netdev is unregistered. The bounding event is much quicker - the lan966x port leaves the LAG. 6. You are missing LAG FDB migration logic in lan966x_lag_port_join(). Specifically, you assume that the lan966x_lag_first_port() will never change, probably because you just make the switch ports join the LAG in the order 1, 2, 3. But they can also join in the order 3, 2, 1.
The 06/29/2022 12:26, Vladimir Oltean wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > Hi Horatiu, Hi Vladimir, Thanks for review this and for detail explanation. > > On Mon, Jun 27, 2022 at 10:13:23PM +0200, Horatiu Vultur wrote: > > Add lag support for lan966x. > > First 4 patches don't do any changes to the current behaviour, they > > just prepare for lag support. While the rest is to add the lag support. > > > > v1->v2: > > - fix the LAG PGIDs when ports go down, in this way is not > > needed anymore the last patch of the series. > > > > Horatiu Vultur (7): > > net: lan966x: Add reqisters used to configure lag interfaces > > net: lan966x: Split lan966x_fdb_event_work > > net: lan966x: Expose lan966x_switchdev_nb and > > lan966x_switchdev_blocking_nb > > net: lan966x: Extend lan966x_foreign_bridging_check > > net: lan966x: Add lag support for lan966x. > > net: lan966x: Extend FDB to support also lag > > net: lan966x: Extend MAC to support also lag interfaces. > > > > .../net/ethernet/microchip/lan966x/Makefile | 2 +- > > .../ethernet/microchip/lan966x/lan966x_fdb.c | 153 ++++++--- > > .../ethernet/microchip/lan966x/lan966x_lag.c | 322 ++++++++++++++++++ > > .../ethernet/microchip/lan966x/lan966x_mac.c | 66 +++- > > .../ethernet/microchip/lan966x/lan966x_main.h | 41 +++ > > .../ethernet/microchip/lan966x/lan966x_regs.h | 45 +++ > > .../microchip/lan966x/lan966x_switchdev.c | 115 +++++-- > > 7 files changed, 654 insertions(+), 90 deletions(-) > > create mode 100644 drivers/net/ethernet/microchip/lan966x/lan966x_lag.c > > > > -- > > 2.33.0 > > > > I've downloaded and applied your patches and I have some general feedback. > Some of it relates to changes which were not made and hence I couldn't > have commented on the patches themselves, so I'm posting it here. > > 1. switchdev_bridge_port_offload() returns an error code if object > replay failed, or if it couldn't get the port parent id, or if the user > tries to join a lan966x port and a port belonging to another switchdev > driver to the same LAG. It would be good to propagate this error and not > ignore it. Yes, I will do that. What about the case when the other port is not a switchdev port. For example: ip link set dev eth0 master bond0 ip link set dev dummy master bond0 ip link set dev bond0 master br0 At the last line, I was expecting to get an error. > > Side note: maybe this could help to eliminate the extra logic you need > to add to lan966x_foreign_bridging_check(). > > 2. lan966x_foreign_dev_check() seems wrong/misunderstood. Currently it > reports that a LAG upper is a foreign interface (unoffloaded). In turn, > this makes switchdev_lower_dev_find() not find any lan966x interface > beneath a LAG, and hence, __switchdev_handle_fdb_event_to_device() would > not recurse to the lan966x "dev" below a LAG when the "orig_dev" of an > FDB event is the bridge itself. Otherwise said, if you have no direct > lan966x port under a bridge, but just bridge -> LAG -> lan966x, you will > miss all local (host-filtered) FDB event notifications that you should > otherwise learn towards the CPU. Good observation. I have missed that case. > > 3. The implementation of lan966x_lag_mac_add_entry(), with that first > call to lan966x_mac_del_entry(), seems a hack. Why do you need to do > that? Ah... that is not needed anymore. I forgot to remove it. > > 4. The handling of lan966x->mac_lock seems wrong in general, not just > particular to this patch set. In particular, it appears to protect too > little in lan966x_mac_add_entry(), i.e. just the list_add_tail. > This makes it possible for lan966x_mac_lookup and lan966x_mac_learn to > be concurrent with lan966x_mac_del_entry(). In turn, this appears bad > first and foremost for the hardware access interface, since the MAC > table access is indirect, and if you allow multiple threads to > concurrently call lan966x_mac_select(), change the command in > ANA_MACACCESS, and poll for command completion, things will go sideways > very quickly (one command will inadvertently poll for the completion of > another, which inadvertently operates on the row/column selected by yet > a third command, all that due to improper serialization). Now that you mention it, I can see it. The following functions need to be updated: lan966x_mac_learn, lan966x_mac_ip_learn, lan966x_mac_forget, lan966x_mac_add_entry, lan966x_mac_del_entry. But I think this needs to be in a separate patch for net. > > 5. There is a race between lan966x_fdb_lag_event_work() calling > lan966x_lag_first_port(), and lan966x_lag_port_leave() changing > port->bond = NULL. Specifically, when a lan966x port leaves a LAG, there > might still be deferred FDB events (add or del) which are still pending. > There exists a dead time during which you will ignore these, because you > think that the first lan966x LAG port isn't the first lan966x LAG port, > which will lead to a desynchronization between the bridge FDB and the > hardware FDB. > > In DSA we solved this by flushing lan966x->fdb_work inside > lan966x_port_prechangeupper() on leave. This waits for the pending > events to finish, and the bridge will not emit further events. > It's important to do this in prechangeupper() rather than in > changeupper() because switchdev_handle_fdb_event_to_device() needs the > upper/lower relationship to still exist to function properly, and in > changeupper() it has already been destroyed. > > Side note: if you flush lan966x->fdb_work, then you have an upper bound > for how long can lan966x_fdb_event_work be deferred. Specifically, you > can remove the dev_hold() and dev_put() calls, since it surely can't be > deferred until after the netdev is unregistered. The bounding event is > much quicker - the lan966x port leaves the LAG. Thanks for the observation, it would have been taken a long time to see this. > > 6. You are missing LAG FDB migration logic in lan966x_lag_port_join(). > Specifically, you assume that the lan966x_lag_first_port() will never > change, probably because you just make the switch ports join the LAG in > the order 1, 2, 3. But they can also join in the order 3, 2, 1. It would work, but there will be problems when the ports start to leave the LAG. It would work because all the ports under the LAG will have the same value in PGID_ID for DST_IDX. So if the MAC entry points to any of this entries will be OK. The problem is when the port leaves the LAG, if the MAC entry points to the port that left the LAG then is not working anymore. I will fix this in the next series.
On Thu, Jun 30, 2022 at 09:08:46PM +0200, Horatiu Vultur wrote: > > I've downloaded and applied your patches and I have some general feedback. > > Some of it relates to changes which were not made and hence I couldn't > > have commented on the patches themselves, so I'm posting it here. > > > > 1. switchdev_bridge_port_offload() returns an error code if object > > replay failed, or if it couldn't get the port parent id, or if the user > > tries to join a lan966x port and a port belonging to another switchdev > > driver to the same LAG. It would be good to propagate this error and not > > ignore it. > > Yes, I will do that. > > What about the case when the other port is not a switchdev port. For > example: > ip link set dev eth0 master bond0 > ip link set dev dummy master bond0 > ip link set dev bond0 master br0 > > At the last line, I was expecting to get an error. switchdev_bridge_port_offload() currently only detects mismatched port parent IDs, so it will not detect this condition where one bond slave is switchdev and the other isn't. This is because the non-switchdev bond slave does not even call switchdev_bridge_port_offload(), it's completely silent from the perspective of the bridge. Fact is that we don't have a common layer that enforces all common sense netdev adjacency restrictions with switchdev, and that is one of the big problems of the system. > > 6. You are missing LAG FDB migration logic in lan966x_lag_port_join(). > > Specifically, you assume that the lan966x_lag_first_port() will never > > change, probably because you just make the switch ports join the LAG in > > the order 1, 2, 3. But they can also join in the order 3, 2, 1. > > It would work, but there will be problems when the ports start to leave > the LAG. > It would work because all the ports under the LAG will have the same > value in PGID_ID for DST_IDX. So if the MAC entry points to any of > this entries will be OK. OK, I forgot DEST_IDX selects PGID and not logical port ID directly. > The problem is when the port leaves the LAG, if the MAC entry points > to the port that left the LAG then is not working anymore. > I will fix this in the next series.