Message ID | 20220122025921.199446-1-sukadev@linux.ibm.com (mailing list archive) |
---|---|
State | Accepted |
Commit | db9f0e8bf79e6da7068b5818fea0ffd9d0d4b4da |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,1/4] ibmvnic: Allow extra failures before disabling | expand |
On 2022-01-21 18:59, Sukadev Bhattiprolu wrote: > If auto-priority-failover (APF) is enabled and there are at least two > backing devices of different priorities, some resets like fail-over, > change-param etc can cause at least two back to back failovers. > (Failover > from high priority backing device to lower priority one and then back > to the higher priority one if that is still functional). > > Depending on the timimg of the two failovers it is possible to trigger > a "hard" reset and for the hard reset to fail due to failovers. When > this > occurs, the driver assumes that the network is unstable and disables > the > VNIC for a 60-second "settling time". This in turn can cause the > ethtool > command to fail with "No such device" while the vnic automatically > recovers > a little while later. > > Given that it's possible to have two back to back failures, allow for > extra > failures before disabling the vnic for the settling time. > > Fixes: f15fde9d47b8 ("ibmvnic: delay next reset if hard reset fails") > Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> > --- > drivers/net/ethernet/ibm/ibmvnic.c | 21 +++++++++++++++++---- > 1 file changed, 17 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c > b/drivers/net/ethernet/ibm/ibmvnic.c > index 0bb3911dd014..9b2d16ad76f1 100644 > --- a/drivers/net/ethernet/ibm/ibmvnic.c > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > @@ -2598,6 +2598,7 @@ static void __ibmvnic_reset(struct work_struct > *work) > struct ibmvnic_rwi *rwi; > unsigned long flags; > u32 reset_state; > + int num_fails = 0; > int rc = 0; > > adapter = container_of(work, struct ibmvnic_adapter, ibmvnic_reset); > @@ -2651,11 +2652,23 @@ static void __ibmvnic_reset(struct work_struct > *work) > rc = do_hard_reset(adapter, rwi, reset_state); > rtnl_unlock(); > } > - if (rc) { > - /* give backing device time to settle down */ > + if (rc) > + num_fails++; > + else > + num_fails = 0; > + > + /* If auto-priority-failover is enabled we can get > + * back to back failovers during resets, resulting > + * in at least two failed resets (from high-priority > + * backing device to low-priority one and then back) > + * If resets continue to fail beyond that, give the > + * adapter some time to settle down before retrying. > + */ > + if (num_fails >= 3) { > netdev_dbg(adapter->netdev, > - "[S:%s] Hard reset failed, waiting 60 secs\n", > - adapter_state_to_string(adapter->state)); > + "[S:%s] Hard reset failed %d times, waiting 60 secs\n", > + adapter_state_to_string(adapter->state), > + num_fails); > set_current_state(TASK_UNINTERRUPTIBLE); > schedule_timeout(60 * HZ); > }
Hello: This series was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Fri, 21 Jan 2022 18:59:18 -0800 you wrote: > If auto-priority-failover (APF) is enabled and there are at least two > backing devices of different priorities, some resets like fail-over, > change-param etc can cause at least two back to back failovers. (Failover > from high priority backing device to lower priority one and then back > to the higher priority one if that is still functional). > > Depending on the timimg of the two failovers it is possible to trigger > a "hard" reset and for the hard reset to fail due to failovers. When this > occurs, the driver assumes that the network is unstable and disables the > VNIC for a 60-second "settling time". This in turn can cause the ethtool > command to fail with "No such device" while the vnic automatically recovers > a little while later. > > [...] Here is the summary with links: - [net,1/4] ibmvnic: Allow extra failures before disabling https://git.kernel.org/netdev/net/c/db9f0e8bf79e - [net,2/4] ibmvnic: init ->running_cap_crqs early https://git.kernel.org/netdev/net/c/151b6a5c06b6 - [net,3/4] ibmvnic: don't spin in tasklet https://git.kernel.org/netdev/net/c/48079e7fdd02 - [net,4/4] ibmvnic: remove unused ->wait_capability https://git.kernel.org/netdev/net/c/3a5d9db7fbdf You are awesome, thank you!
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 0bb3911dd014..9b2d16ad76f1 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2598,6 +2598,7 @@ static void __ibmvnic_reset(struct work_struct *work) struct ibmvnic_rwi *rwi; unsigned long flags; u32 reset_state; + int num_fails = 0; int rc = 0; adapter = container_of(work, struct ibmvnic_adapter, ibmvnic_reset); @@ -2651,11 +2652,23 @@ static void __ibmvnic_reset(struct work_struct *work) rc = do_hard_reset(adapter, rwi, reset_state); rtnl_unlock(); } - if (rc) { - /* give backing device time to settle down */ + if (rc) + num_fails++; + else + num_fails = 0; + + /* If auto-priority-failover is enabled we can get + * back to back failovers during resets, resulting + * in at least two failed resets (from high-priority + * backing device to low-priority one and then back) + * If resets continue to fail beyond that, give the + * adapter some time to settle down before retrying. + */ + if (num_fails >= 3) { netdev_dbg(adapter->netdev, - "[S:%s] Hard reset failed, waiting 60 secs\n", - adapter_state_to_string(adapter->state)); + "[S:%s] Hard reset failed %d times, waiting 60 secs\n", + adapter_state_to_string(adapter->state), + num_fails); set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(60 * HZ); }
If auto-priority-failover (APF) is enabled and there are at least two backing devices of different priorities, some resets like fail-over, change-param etc can cause at least two back to back failovers. (Failover from high priority backing device to lower priority one and then back to the higher priority one if that is still functional). Depending on the timimg of the two failovers it is possible to trigger a "hard" reset and for the hard reset to fail due to failovers. When this occurs, the driver assumes that the network is unstable and disables the VNIC for a 60-second "settling time". This in turn can cause the ethtool command to fail with "No such device" while the vnic automatically recovers a little while later. Given that it's possible to have two back to back failures, allow for extra failures before disabling the vnic for the settling time. Fixes: f15fde9d47b8 ("ibmvnic: delay next reset if hard reset fails") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> --- drivers/net/ethernet/ibm/ibmvnic.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-)