Message ID | 20230721141956.29842-1-asmaa@nvidia.com (mailing list archive) |
---|---|
State | Deferred |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v4,1/1] mlxbf_gige: Fix kernel panic at shutdown | expand |
Hi Asmaa, On Fri, Jul 21, 2023 at 10:19:56AM -0400, Asmaa Mnebhi wrote: > There is a race condition happening during shutdown due to pending > napi transactions. Since mlxbf_gige_poll is still running, it tries > to access a NULL pointer and as a result causes a kernel panic. > To fix this during shutdown, invoke mlxbf_gige_remove to disable and > dequeue napi. > > Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") > Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> > --- What is the race condition; what does the stack trace of the NPD look like?
> What is the race condition; what does the stack trace of the NPD look like?
Hi Vladimir,
[ OK ] Reached target Shutdown.
[ OK ] Reached target Final Step.
[ OK ] Started Reboot.
[ OK ] Reached target Reboot.
...
[ 285.126250] mlxbf_gige MLNXBF17:00: shutdown
[ 285.130669] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
[ 285.139447] Mem abort info:
[ 285.142228] ESR = 0x0000000096000004
[ 285.145964] EC = 0x25: DABT (current EL), IL = 32 bits
[ 285.151261] SET = 0, FnV = 0
[ 285.154303] EA = 0, S1PTW = 0
[ 285.157430] FSC = 0x04: level 0 translation fault
[ 285.162293] Data abort info:
[ 285.165159] ISV = 0, ISS = 0x00000004
[ 285.168980] CM = 0, WnR = 0
[ 285.171932] user pgtable: 4k pages, 48-bit VAs, pgdp=000000011d373000
[ 285.178358] [0000000000000070] pgd=0000000000000000, p4d=0000000000000000
[ 285.185134] Internal error: Oops: 96000004 [#1] SMP
Hi Asmaa, On Fri, Jul 21, 2023 at 07:06:03PM +0000, Asmaa Mnebhi wrote: > > What is the race condition; what does the stack trace of the NPD look like? > > Hi Vladimir, > > [ OK ] Reached target Shutdown. > [ OK ] Reached target Final Step. > [ OK ] Started Reboot. > [ OK ] Reached target Reboot. > ... > [ 285.126250] mlxbf_gige MLNXBF17:00: shutdown > [ 285.130669] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 > [ 285.139447] Mem abort info: > [ 285.142228] ESR = 0x0000000096000004 > [ 285.145964] EC = 0x25: DABT (current EL), IL = 32 bits > [ 285.151261] SET = 0, FnV = 0 > [ 285.154303] EA = 0, S1PTW = 0 > [ 285.157430] FSC = 0x04: level 0 translation fault > [ 285.162293] Data abort info: > [ 285.165159] ISV = 0, ISS = 0x00000004 > [ 285.168980] CM = 0, WnR = 0 > [ 285.171932] user pgtable: 4k pages, 48-bit VAs, pgdp=000000011d373000 > [ 285.178358] [0000000000000070] pgd=0000000000000000, p4d=0000000000000000 > [ 285.185134] Internal error: Oops: 96000004 [#1] SMP That is not a stack trace. The stack trace is what appears between the lines "Call trace:" and "---[ end trace 0000000000000000 ]---", and it is missing from what you've posted. It would be nice if you could also post-process the stack trace you get through the command below (run in a kernel tree): cat stack_trace | scripts/decode_stacktrace.sh path/to/vmlinux . so that we could see the actual C code line numbers, and not just the offsets within your particular binary image.
> > [ 285.126250] mlxbf_gige MLNXBF17:00: shutdown [ 285.130669] Unable > > to handle kernel NULL pointer dereference at virtual address > > 0000000000000070 [ 285.139447] Mem abort info: > > [ 285.142228] ESR = 0x0000000096000004 > > [ 285.145964] EC = 0x25: DABT (current EL), IL = 32 bits > > [ 285.151261] SET = 0, FnV = 0 > > [ 285.154303] EA = 0, S1PTW = 0 > > [ 285.157430] FSC = 0x04: level 0 translation fault > > [ 285.162293] Data abort info: > > [ 285.165159] ISV = 0, ISS = 0x00000004 > > [ 285.168980] CM = 0, WnR = 0 > > [ 285.171932] user pgtable: 4k pages, 48-bit VAs, > > pgdp=000000011d373000 [ 285.178358] [0000000000000070] > > pgd=0000000000000000, p4d=0000000000000000 [ 285.185134] Internal > > error: Oops: 96000004 [#1] SMP > > That is not a stack trace. The stack trace is what appears between the lines "Call > trace:" and "---[ end trace 0000000000000000 ]---", and it is missing from what > you've posted. > > It would be nice if you could also post-process the stack trace you get through > the command below (run in a kernel tree): > > cat stack_trace | scripts/decode_stacktrace.sh path/to/vmlinux . > > so that we could see the actual C code line numbers, and not just the offsets > within your particular binary image. Hi Vladimir, Do you want me to just send it as a reply or send a v5 and add it to the commit message?
On Fri, Jul 21, 2023 at 11:27:10PM +0000, Asmaa Mnebhi wrote: > > > [ 285.126250] mlxbf_gige MLNXBF17:00: shutdown [ 285.130669] Unable > > > to handle kernel NULL pointer dereference at virtual address > > > 0000000000000070 [ 285.139447] Mem abort info: > > > [ 285.142228] ESR = 0x0000000096000004 > > > [ 285.145964] EC = 0x25: DABT (current EL), IL = 32 bits > > > [ 285.151261] SET = 0, FnV = 0 > > > [ 285.154303] EA = 0, S1PTW = 0 > > > [ 285.157430] FSC = 0x04: level 0 translation fault > > > [ 285.162293] Data abort info: > > > [ 285.165159] ISV = 0, ISS = 0x00000004 > > > [ 285.168980] CM = 0, WnR = 0 > > > [ 285.171932] user pgtable: 4k pages, 48-bit VAs, > > > pgdp=000000011d373000 [ 285.178358] [0000000000000070] > > > pgd=0000000000000000, p4d=0000000000000000 [ 285.185134] Internal > > > error: Oops: 96000004 [#1] SMP > > > > That is not a stack trace. The stack trace is what appears between the lines "Call > > trace:" and "---[ end trace 0000000000000000 ]---", and it is missing from what > > you've posted. > > > > It would be nice if you could also post-process the stack trace you get through > > the command below (run in a kernel tree): > > > > cat stack_trace | scripts/decode_stacktrace.sh path/to/vmlinux . > > > > so that we could see the actual C code line numbers, and not just the offsets > > within your particular binary image. > > Hi Vladimir, > > Do you want me to just send it as a reply or send a v5 and add it to the commit message? As a reply, for now, please. I haven't requested any changes to the patch.
On Fri, 21 Jul 2023 10:19:56 -0400 Asmaa Mnebhi wrote: > There is a race condition happening during shutdown due to pending > napi transactions. Since mlxbf_gige_poll is still running, it tries > to access a NULL pointer and as a result causes a kernel panic. > To fix this during shutdown, invoke mlxbf_gige_remove to disable and > dequeue napi. > > Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") > Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> Asmaa, you need to reply to Vladimir, I'm dropping this patch :(
> > Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") > > Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> > > Asmaa, you need to reply to Vladimir, I'm dropping this patch :( > -- Hi Jakub, Apologies for the delay. Actually this worked out for the best, because this change caused another issue on our systems. I will submit a different patch once I test it more thoroughly and will attach the panic as requested previously. Thanks again for your help and feedback. Asmaa
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c index 694de9513b9f..5e677bd32956 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c @@ -471,24 +471,19 @@ static int mlxbf_gige_probe(struct platform_device *pdev) return err; } -static int mlxbf_gige_remove(struct platform_device *pdev) +static void mlxbf_gige_remove(struct platform_device *pdev) { struct mlxbf_gige *priv = platform_get_drvdata(pdev); + if (!priv) + return; + unregister_netdev(priv->netdev); phy_disconnect(priv->netdev->phydev); mlxbf_gige_mdio_remove(priv); platform_set_drvdata(pdev, NULL); - return 0; -} - -static void mlxbf_gige_shutdown(struct platform_device *pdev) -{ - struct mlxbf_gige *priv = platform_get_drvdata(pdev); - - writeq(0, priv->base + MLXBF_GIGE_INT_EN); - mlxbf_gige_clean_port(priv); + return; } static const struct acpi_device_id __maybe_unused mlxbf_gige_acpi_match[] = { @@ -499,8 +494,8 @@ MODULE_DEVICE_TABLE(acpi, mlxbf_gige_acpi_match); static struct platform_driver mlxbf_gige_driver = { .probe = mlxbf_gige_probe, - .remove = mlxbf_gige_remove, - .shutdown = mlxbf_gige_shutdown, + .remove_new = mlxbf_gige_remove, + .shutdown = mlxbf_gige_remove, .driver = { .name = KBUILD_MODNAME, .acpi_match_table = ACPI_PTR(mlxbf_gige_acpi_match),
There is a race condition happening during shutdown due to pending napi transactions. Since mlxbf_gige_poll is still running, it tries to access a NULL pointer and as a result causes a kernel panic. To fix this during shutdown, invoke mlxbf_gige_remove to disable and dequeue napi. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> --- v3-v4: - replace .remove with .remove_new to be able to return a void to .shutdown as well. v2-v3: - remove mlxbf_gige_shutdown() since it is redundant with mlxbf_gige_remove(). v1->v2: - fixed the tag from net-next to net - check that the "priv" pointer is not NULL in mlxbf_gige_remove() .../mellanox/mlxbf_gige/mlxbf_gige_main.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-)