Message ID | 20240425153958.2326772-1-lukma@denx.de (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] hsr: Simplify code for announcing HSR nodes timer setup | expand |
On Thu, 25 Apr 2024 17:39:58 +0200 Lukasz Majewski wrote: > Up till now the code to start HSR announce timer, which triggers sending > supervisory frames, was assuming that hsr_netdev_notify() would be called > at least twice for hsrX interface. This was required to have different > values for old and current values of network device's operstate. > > This is problematic for a case where hsrX interface is already in the > operational state when hsr_netdev_notify() is called, so timer is not > configured to trigger and as a result the hsrX is not sending supervisory > frames to HSR ring. > > This error has been discovered when hsr_ping.sh script was run. To be > more specific - for the hsr1 and hsr2 the hsr_netdev_notify() was > called at least twice with different IF_OPER_{LOWERDOWN|DOWN|UP} states > assigned in hsr_check_carrier_and_operstate(hsr). As a result there was > no issue with sending supervisory frames. > However, with hsr3, the notify function was called only once with > operstate set to IF_OPER_UP and timer responsible for triggering > supervisory frames was not fired. > > The solution is to use netif_oper_up() helper function to assess if > network device is up and then setup timer. Otherwise the timer is > activated. NETDEV_CHANGE can get called for multiple trivial reasons, if the timer is already running we'll mess with the spacing of the frames, no? If there is a path where the device may get activated without the notifier firing - maybe we can check carrier there and schedule the timer? Also sounds like a bug fix, so please add a Fixes tag.
Hi Jakub, > On Thu, 25 Apr 2024 17:39:58 +0200 Lukasz Majewski wrote: > > Up till now the code to start HSR announce timer, which triggers > > sending supervisory frames, was assuming that hsr_netdev_notify() > > would be called at least twice for hsrX interface. This was > > required to have different values for old and current values of > > network device's operstate. > > > > This is problematic for a case where hsrX interface is already in > > the operational state when hsr_netdev_notify() is called, so timer > > is not configured to trigger and as a result the hsrX is not > > sending supervisory frames to HSR ring. > > > > This error has been discovered when hsr_ping.sh script was run. To > > be more specific - for the hsr1 and hsr2 the hsr_netdev_notify() was > > called at least twice with different IF_OPER_{LOWERDOWN|DOWN|UP} > > states assigned in hsr_check_carrier_and_operstate(hsr). As a > > result there was no issue with sending supervisory frames. > > However, with hsr3, the notify function was called only once with > > operstate set to IF_OPER_UP and timer responsible for triggering > > supervisory frames was not fired. > > > > The solution is to use netif_oper_up() helper function to assess if > > network device is up and then setup timer. Otherwise the timer is > > activated. > > NETDEV_CHANGE can get called for multiple trivial reasons, I've assumed that NETDEV_CHANGE would be called when the link has changed - i.e. it is down/up or carrier is down/up. The timer shall be running _only_ when the hsrX port is fully operational (i.e. at least one of 'slave' ports is up and running). The motivation for this patch was to enable HSR announce timer not only on state change, but also when the ethernet device is already up (as it happens with QEMU + netns setup). > if the > timer is already running we'll mess with the spacing of the frames, > no? When NETDEV_CHANGE is trigger for reason different than carrier (or port state) change and the netif_oper_up() returns true, the period for HSR supervisory frames (i.e. HSR_ANNOUNCE_INTEVAL) would be violated. What are here the potential threads? > > If there is a path where the device may get activated without the > notifier firing - maybe we can check carrier there and schedule the > timer? As I've stated above - IMHO the "announce" supervisory frames shall be send only when HSR interface is up and running. > > Also sounds like a bug fix, so please add a Fixes tag. Ok. Best regards, Lukasz Majewski -- DENX Software Engineering GmbH, Managing Director: Erika Unter HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
On Mon, 29 Apr 2024 12:09:04 +0200 Lukasz Majewski wrote: > > if the > > timer is already running we'll mess with the spacing of the frames, > > no? > > When NETDEV_CHANGE is trigger for reason different than carrier (or > port state) change and the netif_oper_up() returns true, the period for > HSR supervisory frames (i.e. HSR_ANNOUNCE_INTEVAL) would be violated. > > What are here the potential threads? Practically speaking I'm not sure if anyone uses any of the weird IFF_* flags, but they are defined in uAPI (enum net_device_flags) and I don't see much validation so presumably it's possible to flip them.
Hi Jakub, > On Mon, 29 Apr 2024 12:09:04 +0200 Lukasz Majewski wrote: > > > if the > > > timer is already running we'll mess with the spacing of the > > > frames, no? > > > > When NETDEV_CHANGE is trigger for reason different than carrier (or > > port state) change and the netif_oper_up() returns true, the period > > for HSR supervisory frames (i.e. HSR_ANNOUNCE_INTEVAL) would be > > violated. > > > > What are here the potential threads? > > Practically speaking I'm not sure if anyone uses any of the weird > IFF_* flags, but they are defined in uAPI (enum net_device_flags) and > I don't see much validation so presumably it's possible to flip them. Ok, I see. Then - what would you recommend instead? The approach with manual checking the previous state has described drawbacks. I've poked around kernel sources and it looks like the netif_oper_up() is used in conjunction with netif_running(): netif_running(dev) && netif_oper_up(dev) so, IMHO the netif_running(dev) shall be added to the condition. In the uapi/include/linux/if.h there are serveral IF_OPER_* flags defined. It looks to me that only for the IF_OPER_UP the HSR interface shall send announcement supervisory frames. With other conditions it shall be turned off. Best regards, Lukasz Majewski -- DENX Software Engineering GmbH, Managing Director: Erika Unter HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
On Tue, 30 Apr 2024 14:52:43 +0200 Lukasz Majewski wrote: > > Practically speaking I'm not sure if anyone uses any of the weird > > IFF_* flags, but they are defined in uAPI (enum net_device_flags) and > > I don't see much validation so presumably it's possible to flip them. > > Ok, I see. > > Then - what would you recommend instead? The approach with manual > checking the previous state has described drawbacks. Add a bool somewhere to track if the timer has been scheduled? The NETDEV_ events in question are called under rtnl_lock, so no extra locking should be needed.
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c index cd1e7c6d2fc0..e91d897e2cee 100644 --- a/net/hsr/hsr_device.c +++ b/net/hsr/hsr_device.c @@ -61,39 +61,34 @@ static bool hsr_check_carrier(struct hsr_port *master) return false; } -static void hsr_check_announce(struct net_device *hsr_dev, - unsigned char old_operstate) +static void hsr_check_announce(struct net_device *hsr_dev) { struct hsr_priv *hsr; hsr = netdev_priv(hsr_dev); - - if (READ_ONCE(hsr_dev->operstate) == IF_OPER_UP && old_operstate != IF_OPER_UP) { + if (netif_oper_up(hsr_dev)) { /* Went up */ hsr->announce_count = 0; mod_timer(&hsr->announce_timer, jiffies + msecs_to_jiffies(HSR_ANNOUNCE_INTERVAL)); - } - - if (READ_ONCE(hsr_dev->operstate) != IF_OPER_UP && old_operstate == IF_OPER_UP) + } else { /* Went down */ del_timer(&hsr->announce_timer); + } } void hsr_check_carrier_and_operstate(struct hsr_priv *hsr) { struct hsr_port *master; - unsigned char old_operstate; bool has_carrier; master = hsr_port_get_hsr(hsr, HSR_PT_MASTER); /* netif_stacked_transfer_operstate() cannot be used here since * it doesn't set IF_OPER_LOWERLAYERDOWN (?) */ - old_operstate = READ_ONCE(master->dev->operstate); has_carrier = hsr_check_carrier(master); hsr_set_operstate(master, has_carrier); - hsr_check_announce(master->dev, old_operstate); + hsr_check_announce(master->dev); } int hsr_get_max_mtu(struct hsr_priv *hsr)
Up till now the code to start HSR announce timer, which triggers sending supervisory frames, was assuming that hsr_netdev_notify() would be called at least twice for hsrX interface. This was required to have different values for old and current values of network device's operstate. This is problematic for a case where hsrX interface is already in the operational state when hsr_netdev_notify() is called, so timer is not configured to trigger and as a result the hsrX is not sending supervisory frames to HSR ring. This error has been discovered when hsr_ping.sh script was run. To be more specific - for the hsr1 and hsr2 the hsr_netdev_notify() was called at least twice with different IF_OPER_{LOWERDOWN|DOWN|UP} states assigned in hsr_check_carrier_and_operstate(hsr). As a result there was no issue with sending supervisory frames. However, with hsr3, the notify function was called only once with operstate set to IF_OPER_UP and timer responsible for triggering supervisory frames was not fired. The solution is to use netif_oper_up() helper function to assess if network device is up and then setup timer. Otherwise the timer is activated. Signed-off-by: Lukasz Majewski <lukma@denx.de> --- net/hsr/hsr_device.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-)