diff mbox series

[net] ice: Do not get coalesce settings while in reset

Message ID 20240430181434.1942751-1-anthony.l.nguyen@intel.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] ice: Do not get coalesce settings while in reset | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 926 this patch: 926
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 4 of 4 maintainers
netdev/build_clang success Errors and warnings before: 937 this patch: 937
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 937 this patch: 937
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 55 this patch: 55
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-05-02--15-00 (tests: 1000)

Commit Message

Tony Nguyen April 30, 2024, 6:14 p.m. UTC
From: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>

Getting coalesce settings while reset is in progress can cause NULL
pointer deference bug.
If under reset, abort get coalesce for ethtool.

Fixes: 67fe64d78c43 ("ice: Implement getting and setting ethtool coalesce")
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_ethtool.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Jakub Kicinski May 2, 2024, 2:56 a.m. UTC | #1
On Tue, 30 Apr 2024 11:14:32 -0700 Tony Nguyen wrote:
> Getting coalesce settings while reset is in progress can cause NULL
> pointer deference bug.
> If under reset, abort get coalesce for ethtool.

Did you not add locks around reset to allow waiting instead of returning
-EBUSY to user space? I feel like we've been over this...
Dawid Osuchowski May 6, 2024, 1:30 p.m. UTC | #2
On 02.05.2024 04:56, Jakub Kicinski wrote:
> Did you not add locks around reset to allow waiting instead of returning
> -EBUSY to user space? I feel like we've been over this...

Will use the approach with ice_wait_for_reset() in next revision, thanks

--Dawid
Dawid Osuchowski May 17, 2024, 1:31 p.m. UTC | #3
On 06.05.2024 15:30, Dawid Osuchowski wrote:
> On 02.05.2024 04:56, Jakub Kicinski wrote:
>> Did you not add locks around reset to allow waiting instead of returning
>> -EBUSY to user space? I feel like we've been over this...
> 
> Will use the approach with ice_wait_for_reset() in next revision, thanks
> 
> --Dawid

Hey Jakub,

I went ahead with the approach of using ice_wait_for_reset() [1], 
however this resulted in a new problem in the reset flow. I want to 
prove why I think returning immediately with -EBUSY (or perhaps -EAGAIN) 
is the correct way in this particular case.

The issue has to deal with the way both the ethtool handler and the 
adapter reset flow call rtnl_lock() during operation. If we wait for 
reset completion inside of an ethtool handling function such as 
ice_get_coalesce(), the wait will always timeout due to reset being 
blocked by rtnl_lock() inside of ice_queue_set_napi() (which is called 
during reset process), and in turn we will always return -EBUSY anyways, 
with the added hang time of the timeout value (in case of [1] it's 10 
seconds).

There are other places where similar deadlock can occur, not only in 
ice_queue_set_napi() and Larysa is currently working on an extensive 
solution to this problem.

--Dawid

[1] 
https://lore.kernel.org/netdev/20240506153307.114104-1-dawid.osuchowski@linux.intel.com/
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 78b833b3e1d7..efdfe46a91ee 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -3825,6 +3825,9 @@  __ice_get_coalesce(struct net_device *netdev, struct ethtool_coalesce *ec,
 	struct ice_netdev_priv *np = netdev_priv(netdev);
 	struct ice_vsi *vsi = np->vsi;
 
+	if (ice_is_reset_in_progress(vsi->back->state))
+		return -EBUSY;
+
 	if (q_num < 0)
 		q_num = 0;