diff mbox series

[v2] drivers: ethernet: cpsw: fix panic when intrrupt coaleceing is set via ethtool

Message ID 20220323084725.65864-1-jan.sondhauss@wago.com (mailing list archive)
State New, archived
Headers show
Series [v2] drivers: ethernet: cpsw: fix panic when intrrupt coaleceing is set via ethtool | expand

Commit Message

Sondhauß, Jan March 23, 2022, 8:47 a.m. UTC
cpsw_ethtool_begin directly returns the result of pm_runtime_get_sync
when successful.
pm_runtime_get_sync returns -error code on failure and 0 on successful
resume but also 1 when the device is already active. So the common case
for cpsw_ethtool_begin is to return 1. That leads to inconsistent calls
to pm_runtime_put in the call-chain so that pm_runtime_put is called
one too many times and as result leaving the cpsw dev behind suspended.

The suspended cpsw dev leads to an access violation later on by
different parts of the cpsw driver.

Fix this by calling the return-friendly pm_runtime_resume_and_get
function.

Signed-off-by: Jan Sondhauss <jan.sondhauss@wago.com>
---
 drivers/net/ethernet/ti/cpsw_ethtool.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Vignesh Raghavendra March 23, 2022, 11:10 a.m. UTC | #1
On 23/03/22 2:17 pm, Sondhauß, Jan wrote:
> cpsw_ethtool_begin directly returns the result of pm_runtime_get_sync
> when successful.
> pm_runtime_get_sync returns -error code on failure and 0 on successful
> resume but also 1 when the device is already active. So the common case
> for cpsw_ethtool_begin is to return 1. That leads to inconsistent calls
> to pm_runtime_put in the call-chain so that pm_runtime_put is called
> one too many times and as result leaving the cpsw dev behind suspended.
> 
> The suspended cpsw dev leads to an access violation later on by
> different parts of the cpsw driver.
> 
> Fix this by calling the return-friendly pm_runtime_resume_and_get
> function.
> 
> Signed-off-by: Jan Sondhauss <jan.sondhauss@wago.com>
> ---
>  drivers/net/ethernet/ti/cpsw_ethtool.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw_ethtool.c b/drivers/net/ethernet/ti/cpsw_ethtool.c
> index 158c8d3793f4..b5bae6324970 100644
> --- a/drivers/net/ethernet/ti/cpsw_ethtool.c
> +++ b/drivers/net/ethernet/ti/cpsw_ethtool.c
> @@ -364,11 +364,9 @@ int cpsw_ethtool_op_begin(struct net_device *ndev)
>  	struct cpsw_common *cpsw = priv->cpsw;
>  	int ret;
>  
> -	ret = pm_runtime_get_sync(cpsw->dev);
> -	if (ret < 0) {
> +	ret = pm_runtime_resume_and_get(cpsw->dev);
> +	if (ret < 0)
>  		cpsw_err(priv, drv, "ethtool begin failed %d\n", ret);
> -		pm_runtime_put_noidle(cpsw->dev);
> -	}
>  
>  	return ret;
>  }

Thanks for the fix!

Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>

I am guessing this issue started with d43c65b05b84 (ethtool:
runtime-resume netdev parent in ethnl_ops_begin) ? If so, could you add
a Fixes tag so that patch gets backported to all affected stable kernels?

Regards
Vignesh
Sondhauß, Jan March 23, 2022, 3:03 p.m. UTC | #2
On 23/03/2022 12:10, Vignesh Raghavendra wrote:
> On 23/03/22 2:17 pm, Sondhauß, Jan wrote: > cpsw_ethtool_begin directly 
> returns the result of pm_runtime_get_sync > when successful. > 
> pm_runtime_get_sync returns -error code on failure and 0 on successful > 
> resume but also 1 when ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> Please use caution when clicking on links or opening attachments!
> ZjQcmQRYFpfptBannerEnd
> 
> On 23/03/22 2:17 pm, Sondhauß, Jan wrote:
>> cpsw_ethtool_begin directly returns the result of pm_runtime_get_sync
>> when successful.
>> pm_runtime_get_sync returns -error code on failure and 0 on successful
>> resume but also 1 when the device is already active. So the common case
>> for cpsw_ethtool_begin is to return 1. That leads to inconsistent calls
>> to pm_runtime_put in the call-chain so that pm_runtime_put is called
>> one too many times and as result leaving the cpsw dev behind suspended.
>> 
>> The suspended cpsw dev leads to an access violation later on by
>> different parts of the cpsw driver.
>> 
>> Fix this by calling the return-friendly pm_runtime_resume_and_get
>> function.
>> 
>> Signed-off-by: Jan Sondhauss <jan.sondhauss@wago.com>
>> ---
>>  drivers/net/ethernet/ti/cpsw_ethtool.c | 6 ++----
>>  1 file changed, 2 insertions(+), 4 deletions(-)
>> 
>> diff --git a/drivers/net/ethernet/ti/cpsw_ethtool.c b/drivers/net/ethernet/ti/cpsw_ethtool.c
>> index 158c8d3793f4..b5bae6324970 100644
>> --- a/drivers/net/ethernet/ti/cpsw_ethtool.c
>> +++ b/drivers/net/ethernet/ti/cpsw_ethtool.c
>> @@ -364,11 +364,9 @@ int cpsw_ethtool_op_begin(struct net_device *ndev)
>>  	struct cpsw_common *cpsw = priv->cpsw;
>>  	int ret;
>>  
>> -	ret = pm_runtime_get_sync(cpsw->dev);
>> -	if (ret < 0) {
>> +	ret = pm_runtime_resume_and_get(cpsw->dev);
>> +	if (ret < 0)
>>  		cpsw_err(priv, drv, "ethtool begin failed %d\n", ret);
>> -		pm_runtime_put_noidle(cpsw->dev);
>> -	}
>>  
>>  	return ret;
>>  }
> 
> Thanks for the fix!
> 
> Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
> 
> I am guessing this issue started with d43c65b05b84 (ethtool:
> runtime-resume netdev parent in ethnl_ops_begin) ? If so, could you add
> a Fixes tag so that patch gets backported to all affected stable kernels?

Yes it started with that commit.

> 
> Regards
> Vignesh
>
patchwork-bot+netdevbpf@kernel.org March 23, 2022, 6 p.m. UTC | #3
Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 23 Mar 2022 08:47:33 +0000 you wrote:
> cpsw_ethtool_begin directly returns the result of pm_runtime_get_sync
> when successful.
> pm_runtime_get_sync returns -error code on failure and 0 on successful
> resume but also 1 when the device is already active. So the common case
> for cpsw_ethtool_begin is to return 1. That leads to inconsistent calls
> to pm_runtime_put in the call-chain so that pm_runtime_put is called
> one too many times and as result leaving the cpsw dev behind suspended.
> 
> [...]

Here is the summary with links:
  - [v2] drivers: ethernet: cpsw: fix panic when intrrupt coaleceing is set via ethtool
    https://git.kernel.org/netdev/net-next/c/2844e2434385

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/ti/cpsw_ethtool.c b/drivers/net/ethernet/ti/cpsw_ethtool.c
index 158c8d3793f4..b5bae6324970 100644
--- a/drivers/net/ethernet/ti/cpsw_ethtool.c
+++ b/drivers/net/ethernet/ti/cpsw_ethtool.c
@@ -364,11 +364,9 @@  int cpsw_ethtool_op_begin(struct net_device *ndev)
 	struct cpsw_common *cpsw = priv->cpsw;
 	int ret;
 
-	ret = pm_runtime_get_sync(cpsw->dev);
-	if (ret < 0) {
+	ret = pm_runtime_resume_and_get(cpsw->dev);
+	if (ret < 0)
 		cpsw_err(priv, drv, "ethtool begin failed %d\n", ret);
-		pm_runtime_put_noidle(cpsw->dev);
-	}
 
 	return ret;
 }