Message ID | 20240430181434.1942751-1-anthony.l.nguyen@intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] ice: Do not get coalesce settings while in reset | expand |
On Tue, 30 Apr 2024 11:14:32 -0700 Tony Nguyen wrote: > Getting coalesce settings while reset is in progress can cause NULL > pointer deference bug. > If under reset, abort get coalesce for ethtool. Did you not add locks around reset to allow waiting instead of returning -EBUSY to user space? I feel like we've been over this...
On 02.05.2024 04:56, Jakub Kicinski wrote: > Did you not add locks around reset to allow waiting instead of returning > -EBUSY to user space? I feel like we've been over this... Will use the approach with ice_wait_for_reset() in next revision, thanks --Dawid
On 06.05.2024 15:30, Dawid Osuchowski wrote: > On 02.05.2024 04:56, Jakub Kicinski wrote: >> Did you not add locks around reset to allow waiting instead of returning >> -EBUSY to user space? I feel like we've been over this... > > Will use the approach with ice_wait_for_reset() in next revision, thanks > > --Dawid Hey Jakub, I went ahead with the approach of using ice_wait_for_reset() [1], however this resulted in a new problem in the reset flow. I want to prove why I think returning immediately with -EBUSY (or perhaps -EAGAIN) is the correct way in this particular case. The issue has to deal with the way both the ethtool handler and the adapter reset flow call rtnl_lock() during operation. If we wait for reset completion inside of an ethtool handling function such as ice_get_coalesce(), the wait will always timeout due to reset being blocked by rtnl_lock() inside of ice_queue_set_napi() (which is called during reset process), and in turn we will always return -EBUSY anyways, with the added hang time of the timeout value (in case of [1] it's 10 seconds). There are other places where similar deadlock can occur, not only in ice_queue_set_napi() and Larysa is currently working on an extensive solution to this problem. --Dawid [1] https://lore.kernel.org/netdev/20240506153307.114104-1-dawid.osuchowski@linux.intel.com/
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c index 78b833b3e1d7..efdfe46a91ee 100644 --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c @@ -3825,6 +3825,9 @@ __ice_get_coalesce(struct net_device *netdev, struct ethtool_coalesce *ec, struct ice_netdev_priv *np = netdev_priv(netdev); struct ice_vsi *vsi = np->vsi; + if (ice_is_reset_in_progress(vsi->back->state)) + return -EBUSY; + if (q_num < 0) q_num = 0;