Message ID | 20230919221308.30735-2-asmaa@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | mlxbf_gige: Fix several bugs | expand |
On 9/19/2023 3:13 PM, Asmaa Mnebhi wrote: > There is a race condition happening during shutdown due to pending napi transactions. > Since mlxbf_gige_poll is still running, it tries to access a NULL pointer and as a > result causes a kernel panic: > > [ 284.074822] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 > ... > [ 284.322326] Call trace: > [ 284.324757] mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige] > [ 284.330924] mlxbf_gige_poll+0x54/0x160 [mlxbf_gige] > [ 284.335876] __napi_poll+0x40/0x1c8 > [ 284.339353] net_rx_action+0x314/0x3a0 > [ 284.343086] __do_softirq+0x128/0x334 > [ 284.346734] run_ksoftirqd+0x54/0x6c > [ 284.350294] smpboot_thread_fn+0x14c/0x190 > [ 284.354375] kthread+0x10c/0x110 > [ 284.357588] ret_from_fork+0x10/0x20 > [ 284.361150] Code: 8b070000 f9000ea0 f95056c0 f86178a1 (b9407002) > [ 284.367227] ---[ end trace a18340bbb9ea2fa7 ]--- > > To fix this, return in the case where "priv" is NULL. > > Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") > Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> > Reviewed-by: David Thompson <davthompson@nvidia.com> This adds a test in a hot-path when the solution would be to simply make sure that the interface does not schedule any new NAPI calls, as well as stops being visible to the system. In its current form your shutdown function is trying to be as efficient as possible, I would just make your shutdown function the same as the remove function which would ensure the network device is torn down using a well traveled path.
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c index 0d5a41a2ae01..cfb8fb957f0c 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c @@ -298,6 +298,9 @@ int mlxbf_gige_poll(struct napi_struct *napi, int budget) priv = container_of(napi, struct mlxbf_gige, napi); + if (!priv) + return 0; + mlxbf_gige_handle_tx_complete(priv); do {