diff mbox series

[net,v3,1/1] mlxbf_gige: Fix kernel panic at shutdown

Message ID 20230720205620.7019-1-asmaa@nvidia.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net,v3,1/1] mlxbf_gige: Fix kernel panic at shutdown | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 1342 this patch: 20
netdev/cc_maintainers fail 1 blamed authors not CCed: limings@nvidia.com; 1 maintainers not CCed: limings@nvidia.com
netdev/build_clang fail Errors and warnings before: 1365 this patch: 20
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn fail Errors and warnings before: 1365 this patch: 20
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 31 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Asmaa Mnebhi July 20, 2023, 8:56 p.m. UTC
There is a race condition happening during shutdown due to pending
napi transactions. Since mlxbf_gige_poll is still running, it tries
to access a NULL pointer and as a result causes a kernel panic.
To fix this during shutdown, invoke mlxbf_gige_remove to disable and
dequeue napi.

Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
---
v2-v3:
 - remove mlxbf_gige_shutdown() since it is redundant with mlxbf_gige_remove().
v1->v2:
 - fixed the tag from net-next to net
 - check that the "priv" pointer is not NULL in mlxbf_gige_remove()

 .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c  | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

Comments

Asmaa Mnebhi July 20, 2023, 9:11 p.m. UTC | #1
>  	.probe = mlxbf_gige_probe,
>  	.remove = mlxbf_gige_remove,
> -	.shutdown = mlxbf_gige_shutdown,
> +	.shutdown = mlxbf_gige_remove,

Actually, apologies for this commit in response to Sridhar's comment on v2. This will not work. Mlxbf_gige_remove() returns void while ".shutdown" expects to return an int.
Please advise on how to proceed. Should I send a v4 with the same patch as v2?
Jakub Kicinski July 21, 2023, 3:21 a.m. UTC | #2
On Thu, 20 Jul 2023 21:11:53 +0000 Asmaa Mnebhi wrote:
> >  	.probe = mlxbf_gige_probe,
> >  	.remove = mlxbf_gige_remove,
> > -	.shutdown = mlxbf_gige_shutdown,
> > +	.shutdown = mlxbf_gige_remove,  
> 
> Actually, apologies for this commit in response to Sridhar's comment
> on v2. This will not work. Mlxbf_gige_remove() returns void while
> ".shutdown" expects to return an int. Please advise on how to
> proceed. Should I send a v4 with the same patch as v2?

You can make it work, there is a new version of remove upstream:

https://elixir.bootlin.com/linux/v6.5-rc2/source/include/linux/platform_device.h#L220

You can make them both void.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
index 694de9513b9f..31b84d6417de 100644
--- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
@@ -475,6 +475,9 @@  static int mlxbf_gige_remove(struct platform_device *pdev)
 {
 	struct mlxbf_gige *priv = platform_get_drvdata(pdev);
 
+	if (!priv)
+		return 0;
+
 	unregister_netdev(priv->netdev);
 	phy_disconnect(priv->netdev->phydev);
 	mlxbf_gige_mdio_remove(priv);
@@ -483,14 +486,6 @@  static int mlxbf_gige_remove(struct platform_device *pdev)
 	return 0;
 }
 
-static void mlxbf_gige_shutdown(struct platform_device *pdev)
-{
-	struct mlxbf_gige *priv = platform_get_drvdata(pdev);
-
-	writeq(0, priv->base + MLXBF_GIGE_INT_EN);
-	mlxbf_gige_clean_port(priv);
-}
-
 static const struct acpi_device_id __maybe_unused mlxbf_gige_acpi_match[] = {
 	{ "MLNXBF17", 0 },
 	{},
@@ -500,7 +495,7 @@  MODULE_DEVICE_TABLE(acpi, mlxbf_gige_acpi_match);
 static struct platform_driver mlxbf_gige_driver = {
 	.probe = mlxbf_gige_probe,
 	.remove = mlxbf_gige_remove,
-	.shutdown = mlxbf_gige_shutdown,
+	.shutdown = mlxbf_gige_remove,
 	.driver = {
 		.name = KBUILD_MODNAME,
 		.acpi_match_table = ACPI_PTR(mlxbf_gige_acpi_match),