diff mbox series

[net-next] hsr: Simplify code for announcing HSR nodes timer setup

Message ID 20240425153958.2326772-1-lukma@denx.de (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next] hsr: Simplify code for announcing HSR nodes timer setup | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 926 this patch: 926
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 937 this patch: 937
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 937 this patch: 937
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 44 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-04-26--09-00 (tests: 993)

Commit Message

Lukasz Majewski April 25, 2024, 3:39 p.m. UTC
Up till now the code to start HSR announce timer, which triggers sending
supervisory frames, was assuming that hsr_netdev_notify() would be called
at least twice for hsrX interface. This was required to have different
values for old and current values of network device's operstate.

This is problematic for a case where hsrX interface is already in the
operational state when hsr_netdev_notify() is called, so timer is not
configured to trigger and as a result the hsrX is not sending supervisory
frames to HSR ring.

This error has been discovered when hsr_ping.sh script was run. To be
more specific - for the hsr1 and hsr2 the hsr_netdev_notify() was
called at least twice with different IF_OPER_{LOWERDOWN|DOWN|UP} states
assigned in hsr_check_carrier_and_operstate(hsr). As a result there was
no issue with sending supervisory frames.
However, with hsr3, the notify function was called only once with
operstate set to IF_OPER_UP and timer responsible for triggering
supervisory frames was not fired.

The solution is to use netif_oper_up() helper function to assess if
network device is up and then setup timer. Otherwise the timer is
activated.

Signed-off-by: Lukasz Majewski <lukma@denx.de>
---
 net/hsr/hsr_device.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

Comments

Jakub Kicinski April 27, 2024, 12:33 a.m. UTC | #1
On Thu, 25 Apr 2024 17:39:58 +0200 Lukasz Majewski wrote:
> Up till now the code to start HSR announce timer, which triggers sending
> supervisory frames, was assuming that hsr_netdev_notify() would be called
> at least twice for hsrX interface. This was required to have different
> values for old and current values of network device's operstate.
> 
> This is problematic for a case where hsrX interface is already in the
> operational state when hsr_netdev_notify() is called, so timer is not
> configured to trigger and as a result the hsrX is not sending supervisory
> frames to HSR ring.
> 
> This error has been discovered when hsr_ping.sh script was run. To be
> more specific - for the hsr1 and hsr2 the hsr_netdev_notify() was
> called at least twice with different IF_OPER_{LOWERDOWN|DOWN|UP} states
> assigned in hsr_check_carrier_and_operstate(hsr). As a result there was
> no issue with sending supervisory frames.
> However, with hsr3, the notify function was called only once with
> operstate set to IF_OPER_UP and timer responsible for triggering
> supervisory frames was not fired.
> 
> The solution is to use netif_oper_up() helper function to assess if
> network device is up and then setup timer. Otherwise the timer is
> activated.

NETDEV_CHANGE can get called for multiple trivial reasons, if the timer
is already running we'll mess with the spacing of the frames, no?

If there is a path where the device may get activated without the
notifier firing - maybe we can check carrier there and schedule the
timer?

Also sounds like a bug fix, so please add a Fixes tag.
Lukasz Majewski April 29, 2024, 10:09 a.m. UTC | #2
Hi Jakub,

> On Thu, 25 Apr 2024 17:39:58 +0200 Lukasz Majewski wrote:
> > Up till now the code to start HSR announce timer, which triggers
> > sending supervisory frames, was assuming that hsr_netdev_notify()
> > would be called at least twice for hsrX interface. This was
> > required to have different values for old and current values of
> > network device's operstate.
> > 
> > This is problematic for a case where hsrX interface is already in
> > the operational state when hsr_netdev_notify() is called, so timer
> > is not configured to trigger and as a result the hsrX is not
> > sending supervisory frames to HSR ring.
> > 
> > This error has been discovered when hsr_ping.sh script was run. To
> > be more specific - for the hsr1 and hsr2 the hsr_netdev_notify() was
> > called at least twice with different IF_OPER_{LOWERDOWN|DOWN|UP}
> > states assigned in hsr_check_carrier_and_operstate(hsr). As a
> > result there was no issue with sending supervisory frames.
> > However, with hsr3, the notify function was called only once with
> > operstate set to IF_OPER_UP and timer responsible for triggering
> > supervisory frames was not fired.
> > 
> > The solution is to use netif_oper_up() helper function to assess if
> > network device is up and then setup timer. Otherwise the timer is
> > activated.  
> 
> NETDEV_CHANGE can get called for multiple trivial reasons,

I've assumed that NETDEV_CHANGE would be called when the link has
changed - i.e. it is down/up or carrier is down/up.

The timer shall be running _only_ when the hsrX port is fully
operational (i.e. at least one of 'slave' ports is up and running).

The motivation for this patch was to enable HSR announce timer not only
on state change, but also when the ethernet device is already up (as it
happens with QEMU + netns setup).
 

> if the
> timer is already running we'll mess with the spacing of the frames,
> no?

When NETDEV_CHANGE is trigger for reason different than carrier (or
port state) change and the netif_oper_up() returns true, the period for
HSR supervisory frames (i.e. HSR_ANNOUNCE_INTEVAL) would be violated.

What are here the potential threads?

> 
> If there is a path where the device may get activated without the
> notifier firing - maybe we can check carrier there and schedule the
> timer?

As I've stated above - IMHO the "announce" supervisory frames shall be
send only when HSR interface is up and running.

> 
> Also sounds like a bug fix, so please add a Fixes tag.

Ok.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Jakub Kicinski April 29, 2024, 5:40 p.m. UTC | #3
On Mon, 29 Apr 2024 12:09:04 +0200 Lukasz Majewski wrote:
> > if the
> > timer is already running we'll mess with the spacing of the frames,
> > no?  
> 
> When NETDEV_CHANGE is trigger for reason different than carrier (or
> port state) change and the netif_oper_up() returns true, the period for
> HSR supervisory frames (i.e. HSR_ANNOUNCE_INTEVAL) would be violated.
> 
> What are here the potential threads?

Practically speaking I'm not sure if anyone uses any of the weird IFF_*
flags, but they are defined in uAPI (enum net_device_flags) and I don't
see much validation so presumably it's possible to flip them.
Lukasz Majewski April 30, 2024, 12:52 p.m. UTC | #4
Hi Jakub,

> On Mon, 29 Apr 2024 12:09:04 +0200 Lukasz Majewski wrote:
> > > if the
> > > timer is already running we'll mess with the spacing of the
> > > frames, no?    
> > 
> > When NETDEV_CHANGE is trigger for reason different than carrier (or
> > port state) change and the netif_oper_up() returns true, the period
> > for HSR supervisory frames (i.e. HSR_ANNOUNCE_INTEVAL) would be
> > violated.
> > 
> > What are here the potential threads?  
> 
> Practically speaking I'm not sure if anyone uses any of the weird
> IFF_* flags, but they are defined in uAPI (enum net_device_flags) and
> I don't see much validation so presumably it's possible to flip them.

Ok, I see.

Then - what would you recommend instead? The approach with manual
checking the previous state has described drawbacks.

I've poked around kernel sources and it looks like the netif_oper_up()
is used in conjunction with netif_running():

netif_running(dev) && netif_oper_up(dev)

so, IMHO the netif_running(dev) shall be added to the condition.


In the uapi/include/linux/if.h there are serveral IF_OPER_* flags
defined. It looks to me that only for the IF_OPER_UP the HSR interface
shall send announcement supervisory frames. With other conditions it
shall be turned off.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Jakub Kicinski April 30, 2024, 2:42 p.m. UTC | #5
On Tue, 30 Apr 2024 14:52:43 +0200 Lukasz Majewski wrote:
> > Practically speaking I'm not sure if anyone uses any of the weird
> > IFF_* flags, but they are defined in uAPI (enum net_device_flags) and
> > I don't see much validation so presumably it's possible to flip them.  
> 
> Ok, I see.
> 
> Then - what would you recommend instead? The approach with manual
> checking the previous state has described drawbacks.

Add a bool somewhere to track if the timer has been scheduled?
The NETDEV_ events in question are called under rtnl_lock, so
no extra locking should be needed.
diff mbox series

Patch

diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
index cd1e7c6d2fc0..e91d897e2cee 100644
--- a/net/hsr/hsr_device.c
+++ b/net/hsr/hsr_device.c
@@ -61,39 +61,34 @@  static bool hsr_check_carrier(struct hsr_port *master)
 	return false;
 }
 
-static void hsr_check_announce(struct net_device *hsr_dev,
-			       unsigned char old_operstate)
+static void hsr_check_announce(struct net_device *hsr_dev)
 {
 	struct hsr_priv *hsr;
 
 	hsr = netdev_priv(hsr_dev);
-
-	if (READ_ONCE(hsr_dev->operstate) == IF_OPER_UP && old_operstate != IF_OPER_UP) {
+	if (netif_oper_up(hsr_dev)) {
 		/* Went up */
 		hsr->announce_count = 0;
 		mod_timer(&hsr->announce_timer,
 			  jiffies + msecs_to_jiffies(HSR_ANNOUNCE_INTERVAL));
-	}
-
-	if (READ_ONCE(hsr_dev->operstate) != IF_OPER_UP && old_operstate == IF_OPER_UP)
+	} else {
 		/* Went down */
 		del_timer(&hsr->announce_timer);
+	}
 }
 
 void hsr_check_carrier_and_operstate(struct hsr_priv *hsr)
 {
 	struct hsr_port *master;
-	unsigned char old_operstate;
 	bool has_carrier;
 
 	master = hsr_port_get_hsr(hsr, HSR_PT_MASTER);
 	/* netif_stacked_transfer_operstate() cannot be used here since
 	 * it doesn't set IF_OPER_LOWERLAYERDOWN (?)
 	 */
-	old_operstate = READ_ONCE(master->dev->operstate);
 	has_carrier = hsr_check_carrier(master);
 	hsr_set_operstate(master, has_carrier);
-	hsr_check_announce(master->dev, old_operstate);
+	hsr_check_announce(master->dev);
 }
 
 int hsr_get_max_mtu(struct hsr_priv *hsr)