Message ID | 20230720205620.7019-1-asmaa@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v3,1/1] mlxbf_gige: Fix kernel panic at shutdown | expand |
> .probe = mlxbf_gige_probe, > .remove = mlxbf_gige_remove, > - .shutdown = mlxbf_gige_shutdown, > + .shutdown = mlxbf_gige_remove, Actually, apologies for this commit in response to Sridhar's comment on v2. This will not work. Mlxbf_gige_remove() returns void while ".shutdown" expects to return an int. Please advise on how to proceed. Should I send a v4 with the same patch as v2?
On Thu, 20 Jul 2023 21:11:53 +0000 Asmaa Mnebhi wrote: > > .probe = mlxbf_gige_probe, > > .remove = mlxbf_gige_remove, > > - .shutdown = mlxbf_gige_shutdown, > > + .shutdown = mlxbf_gige_remove, > > Actually, apologies for this commit in response to Sridhar's comment > on v2. This will not work. Mlxbf_gige_remove() returns void while > ".shutdown" expects to return an int. Please advise on how to > proceed. Should I send a v4 with the same patch as v2? You can make it work, there is a new version of remove upstream: https://elixir.bootlin.com/linux/v6.5-rc2/source/include/linux/platform_device.h#L220 You can make them both void.
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c index 694de9513b9f..31b84d6417de 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c @@ -475,6 +475,9 @@ static int mlxbf_gige_remove(struct platform_device *pdev) { struct mlxbf_gige *priv = platform_get_drvdata(pdev); + if (!priv) + return 0; + unregister_netdev(priv->netdev); phy_disconnect(priv->netdev->phydev); mlxbf_gige_mdio_remove(priv); @@ -483,14 +486,6 @@ static int mlxbf_gige_remove(struct platform_device *pdev) return 0; } -static void mlxbf_gige_shutdown(struct platform_device *pdev) -{ - struct mlxbf_gige *priv = platform_get_drvdata(pdev); - - writeq(0, priv->base + MLXBF_GIGE_INT_EN); - mlxbf_gige_clean_port(priv); -} - static const struct acpi_device_id __maybe_unused mlxbf_gige_acpi_match[] = { { "MLNXBF17", 0 }, {}, @@ -500,7 +495,7 @@ MODULE_DEVICE_TABLE(acpi, mlxbf_gige_acpi_match); static struct platform_driver mlxbf_gige_driver = { .probe = mlxbf_gige_probe, .remove = mlxbf_gige_remove, - .shutdown = mlxbf_gige_shutdown, + .shutdown = mlxbf_gige_remove, .driver = { .name = KBUILD_MODNAME, .acpi_match_table = ACPI_PTR(mlxbf_gige_acpi_match),
There is a race condition happening during shutdown due to pending napi transactions. Since mlxbf_gige_poll is still running, it tries to access a NULL pointer and as a result causes a kernel panic. To fix this during shutdown, invoke mlxbf_gige_remove to disable and dequeue napi. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> --- v2-v3: - remove mlxbf_gige_shutdown() since it is redundant with mlxbf_gige_remove(). v1->v2: - fixed the tag from net-next to net - check that the "priv" pointer is not NULL in mlxbf_gige_remove() .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-)