Message ID | 20211123075447.3083579-3-idosch@idosch.org (mailing list archive) |
---|---|
State | Accepted |
Commit | c1020d3cf4752f61a6a413f632ea2ce2370e150d |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | mlxsw: Various updates | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Clearly marked for net-next |
netdev/fixes_present | success | Fixes tag not required for -next series |
netdev/subject_prefix | success | Link |
netdev/cover_letter | success | Series has a cover letter |
netdev/patch_count | success | Link |
netdev/header_inline | success | No static functions without inline keyword in header files |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/cc_maintainers | success | CCed 5 of 5 maintainers |
netdev/build_clang | success | Errors and warnings before: 0 this patch: 0 |
netdev/module_param | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/verify_fixes | success | No Fixes tag |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 7 lines checked |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/source_inline | success | Was 0 now: 0 |
11/23/21 10:54 AM, Ido Schimmel пишет: > From: Danielle Ratson <danieller@nvidia.com> > > On an arm64 platform with the Spectrum ASIC, after loading and executing > a new kernel via kexec, the following trace [1] is observed. This seems > to be caused by the fact that the device is not properly shutdown before > executing the new kernel. This should be sent to net tree instead of net-next with Fixes tag added. > Fix this by implementing a shutdown method which mirrors the remove > method, as recommended by the kexec maintainer [2][3]. > > [1] > BUG: Bad page state in process devlink pfn:22f73d > page:fffffe00089dcf40 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0 > flags: 0x2ffff00000000000() > raw: 2ffff00000000000 0000000000000000 ffffffff089d0201 0000000000000000 > raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000 > page dumped because: nonzero _refcount > Modules linked in: > CPU: 1 PID: 16346 Comm: devlink Tainted: G B 5.8.0-rc6-custom-273020-gac6b365b1bf5 #44 > Hardware name: Marvell Armada 7040 TX4810M (DT) > Call trace: > dump_backtrace+0x0/0x1d0 > show_stack+0x1c/0x28 > dump_stack+0xbc/0x118 > bad_page+0xcc/0xf8 > check_free_page_bad+0x80/0x88 > __free_pages_ok+0x3f8/0x418 > __free_pages+0x38/0x60 > kmem_freepages+0x200/0x2a8 > slab_destroy+0x28/0x68 > slabs_destroy+0x60/0x90 > ___cache_free+0x1b4/0x358 > kfree+0xc0/0x1d0 > skb_free_head+0x2c/0x38 > skb_release_data+0x110/0x1a0 > skb_release_all+0x2c/0x38 > consume_skb+0x38/0x130 > __dev_kfree_skb_any+0x44/0x50 > mlxsw_pci_rdq_fini+0x8c/0xb0 > mlxsw_pci_queue_fini.isra.0+0x28/0x58 > mlxsw_pci_queue_group_fini+0x58/0x88 > mlxsw_pci_aqs_fini+0x2c/0x60 > mlxsw_pci_fini+0x34/0x50 > mlxsw_core_bus_device_unregister+0x104/0x1d0 > mlxsw_devlink_core_bus_device_reload_down+0x2c/0x48 > devlink_reload+0x44/0x158 > devlink_nl_cmd_reload+0x270/0x290 > genl_rcv_msg+0x188/0x2f0 > netlink_rcv_skb+0x5c/0x118 > genl_rcv+0x3c/0x50 > netlink_unicast+0x1bc/0x278 > netlink_sendmsg+0x194/0x390 > __sys_sendto+0xe0/0x158 > __arm64_sys_sendto+0x2c/0x38 > el0_svc_common.constprop.0+0x70/0x168 > do_el0_svc+0x28/0x88 > el0_sync_handler+0x88/0x190 > el0_sync+0x140/0x180 > > [2] > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1195432.html > > [3] > https://patchwork.kernel.org/project/linux-scsi/patch/20170212214920.28866-1-anton@ozlabs.org/#20116693 > > Cc: Eric Biederman <ebiederm@xmission.com> > Signed-off-by: Danielle Ratson <danieller@nvidia.com> > Signed-off-by: Ido Schimmel <idosch@nvidia.com> > --- > drivers/net/ethernet/mellanox/mlxsw/pci.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c > index a15c95a10bae..cd3331a077bb 100644 > --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c > +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c > @@ -1973,6 +1973,7 @@ int mlxsw_pci_driver_register(struct pci_driver *pci_driver) > { > pci_driver->probe = mlxsw_pci_probe; > pci_driver->remove = mlxsw_pci_remove; > + pci_driver->shutdown = mlxsw_pci_remove; > return pci_register_driver(pci_driver); > } > EXPORT_SYMBOL(mlxsw_pci_driver_register); >
On Tue, Nov 23, 2021 at 11:47:34AM +0300, Denis Kirjanov wrote: > > > 11/23/21 10:54 AM, Ido Schimmel пишет: > > From: Danielle Ratson <danieller@nvidia.com> > > > > On an arm64 platform with the Spectrum ASIC, after loading and executing > > a new kernel via kexec, the following trace [1] is observed. This seems > > to be caused by the fact that the device is not properly shutdown before > > executing the new kernel. > > This should be sent to net tree instead of net-next with Fixes tag added. This is not a regression (never worked) and the system does not crash. The trace is only observed on a specific platform and only with kexec which I assume nobody is using but our team (for development purposes). Therefore, I prefer to route it via net-next. If users complain (unlikely), I will send backports to stable kernels.
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c index a15c95a10bae..cd3331a077bb 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c @@ -1973,6 +1973,7 @@ int mlxsw_pci_driver_register(struct pci_driver *pci_driver) { pci_driver->probe = mlxsw_pci_probe; pci_driver->remove = mlxsw_pci_remove; + pci_driver->shutdown = mlxsw_pci_remove; return pci_register_driver(pci_driver); } EXPORT_SYMBOL(mlxsw_pci_driver_register);