Message ID | 20220426153913.6966-1-manishc@marvell.com (mailing list archive) |
---|---|
State | Accepted |
Commit | af68656d66eda219b7f55ce8313a1da0312c79e1 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] bnx2x: fix napi API usage sequence | expand |
Hello: This patch was applied to netdev/net.git (master) by Jakub Kicinski <kuba@kernel.org>: On Tue, 26 Apr 2022 08:39:13 -0700 you wrote: > While handling PCI errors (AER flow) driver tries to > disable NAPI [napi_disable()] after NAPI is deleted > [__netif_napi_del()] which causes unexpected system > hang/crash. > > System message log shows the following: > ======================================= > [ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000 [ 3222.537511] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures. > [ 3222.537512] EEH: Notify device drivers to shutdown [ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)' > [ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking > bnx2x->error_detected(IO frozen) > [ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error detected [ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports: > 'need reset' > [ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking > bnx2x->error_detected(IO frozen) > [ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error detected [ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports: > 'need reset' > [ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset' > [ 3222.537890] EEH: Collect temporary log [ 3222.583481] EEH: of node=0384:80:00.0 [ 3222.583519] EEH: PCI device/vendor: 168e14e4 [ 3222.583557] EEH: PCI cmd/status register: 00100140 [ 3222.583557] EEH: PCI-E capabilities and status follow: > [ 3222.583744] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.583892] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.583893] EEH: PCI-E 20: 00000000 [ 3222.583893] EEH: PCI-E AER capability register set follows: > [ 3222.584079] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.584230] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.584378] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.584416] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.584416] EEH: of node=0384:80:00.1 [ 3222.584454] EEH: PCI device/vendor: 168e14e4 [ 3222.584491] EEH: PCI cmd/status register: 00100140 [ 3222.584492] EEH: PCI-E capabilities and status follow: > [ 3222.584677] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.584825] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.584826] EEH: PCI-E 20: 00000000 [ 3222.584826] EEH: PCI-E AER capability register set follows: > [ 3222.585011] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.585160] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.585309] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.585347] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2 [ 3222.586873] EEH: Reset without hotplug activity [ 3224.762767] EEH: Beginning: 'slot_reset' > [ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking > bnx2x->slot_reset() > [ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset initializing... > [ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142) [ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset > --> driver unload > > [...] Here is the summary with links: - [net] bnx2x: fix napi API usage sequence https://git.kernel.org/netdev/net/c/af68656d66ed You are awesome, thank you!
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c index c19b072f3a23..962253db25b8 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c @@ -14153,10 +14153,6 @@ static int bnx2x_eeh_nic_unload(struct bnx2x *bp) /* Stop Tx */ bnx2x_tx_disable(bp); - /* Delete all NAPI objects */ - bnx2x_del_all_napi(bp); - if (CNIC_LOADED(bp)) - bnx2x_del_all_napi_cnic(bp); netdev_reset_tc(bp->dev); del_timer_sync(&bp->timer); @@ -14261,6 +14257,11 @@ static pci_ers_result_t bnx2x_io_slot_reset(struct pci_dev *pdev) bnx2x_drain_tx_queues(bp); bnx2x_send_unload_req(bp, UNLOAD_RECOVERY); bnx2x_netif_stop(bp, 1); + bnx2x_del_all_napi(bp); + + if (CNIC_LOADED(bp)) + bnx2x_del_all_napi_cnic(bp); + bnx2x_free_irq(bp); /* Report UNLOAD_DONE to MCP */