diff mbox series

Revert "net/ncsi: change from ndo_set_mac_address to dev_set_mac_address"

Message ID 20241129-potin-revert-ncsi-set-mac-addr-v1-1-94ea2cb596af@gmail.com (mailing list archive)
State New
Delegated to: Netdev Maintainers
Headers show
Series Revert "net/ncsi: change from ndo_set_mac_address to dev_set_mac_address" | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 304 this patch: 304
netdev/checkpatch warning WARNING: From:/Signed-off-by: email address mismatch: 'From: Potin Lai <potin.lai@quantatw.com>' != 'Signed-off-by: Potin Lai <potin.lai.pt@gmail.com>'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest warning net-next-2024-11-29--18-00 (tests: 761)

Commit Message

Potin Lai Nov. 29, 2024, 9:12 a.m. UTC
From: Potin Lai <potin.lai@quantatw.com>

This reverts commit 790071347a0a1a89e618eedcd51c687ea783aeb3.

We are seeing kernel panic when enabling two NCSI interfaces at same
time. It looks like mutex lock is being used in softirq caused the
issue.

Kernel panic log:
```
[  224.323380] 8021q: adding VLAN 0 to HW filter on device eth0
[  224.337533] ftgmac100 1e670000.ethernet eth0: NCSI: Handler for packet type 0x82 returned -19
[  224.358372] BUG: scheduling while atomic: systemd-network/697/0x00000100
[  224.373274] Modules linked in:
[  224.373817] 8021q: adding VLAN 0 to HW filter on device eth1
[  224.380063] CPU: 0 PID: 697 Comm: systemd-network Tainted: G        W          6.6.62-8ea1fc6-dirty-cbd80d0-gcbd80d04d13c #1
[  224.380081] Hardware name: Generic DT based system
[  224.380096]  unwind_backtrace from show_stack+0x18/0x1c
[  224.439407]  show_stack from dump_stack_lvl+0x40/0x4c
[  224.450573]  dump_stack_lvl from __schedule_bug+0x5c/0x70
[  224.462492]  __schedule_bug from __schedule+0x884/0x968
[  224.474026]  __schedule from schedule+0x58/0xa8
[  224.484026]  schedule from schedule_preempt_disabled+0x14/0x18
[  224.496906]  schedule_preempt_disabled from __mutex_lock.constprop.0+0x350/0x76c
[  224.513235]  __mutex_lock.constprop.0 from ncsi_rsp_handler_oem_gma+0x104/0x1a0
[  224.529367]  ncsi_rsp_handler_oem_gma from ncsi_rcv_rsp+0x120/0x2cc
[  224.543195]  ncsi_rcv_rsp from __netif_receive_skb_one_core+0x60/0x84
[  224.557413]  __netif_receive_skb_one_core from netif_receive_skb+0x38/0x148
[  224.572779]  netif_receive_skb from ftgmac100_poll+0x358/0x444
[  224.585656]  ftgmac100_poll from __napi_poll.constprop.0+0x34/0x1d0
[  224.599490]  __napi_poll.constprop.0 from net_rx_action+0x350/0x43c
[  224.613325]  net_rx_action from handle_softirqs+0x114/0x32c
[  224.625624]  handle_softirqs from irq_exit+0x88/0xb8
[  224.636575]  irq_exit from call_with_stack+0x18/0x20
[  224.647530]  call_with_stack from __irq_usr+0x78/0xa0
[  224.658675] Exception stack(0xe075dfb0 to 0xe075dff8)
[  224.669799] dfa0:                                     00000000 00000000 00000000 00000020
[  224.687843] dfc0: 00000069 aefde3e0 00000000 00000000 00000000 00000000 00000000 aefde4e4
[  224.705887] dfe0: 01010101 aefddf20 a6b4331c a6b43618 600f0010 ffffffff
[  224.721100] ------------[ cut here ]------------
```

Signed-off-by: Potin Lai <potin.lai.pt@gmail.com>
---
 net/ncsi/ncsi-rsp.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)


---
base-commit: 59b723cd2adbac2a34fc8e12c74ae26ae45bf230
change-id: 20241129-potin-revert-ncsi-set-mac-addr-7122f2896258

Best regards,

Comments

Andrew Lunn Nov. 29, 2024, 3:54 p.m. UTC | #1
On Fri, Nov 29, 2024 at 05:12:56PM +0800, Potin Lai wrote:
> From: Potin Lai <potin.lai@quantatw.com>
> 
> This reverts commit 790071347a0a1a89e618eedcd51c687ea783aeb3.
> 
> We are seeing kernel panic when enabling two NCSI interfaces at same
> time. It looks like mutex lock is being used in softirq caused the
> issue.

So a revert does make sense, you are seeing a real problem from that
commit.

However with the revert, is the code actually correct? Or is it
missing some locking? Normally dev_addr_sem is used to protect against
two calls to change the MAC address at once. Is this protection
needed? It would also be typical to hold RTNL while changing the MAC
address. So it would be nice to see an analysis of the locking, and
maybe the revert commit message says this gets you from a broken state
to a less broken state, and the real fix will be submitted soon?

	Andrew
diff mbox series

Patch

diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index e28be33bdf2c..0cd7b916d3f8 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -629,6 +629,7 @@  static int ncsi_rsp_handler_oem_gma(struct ncsi_request *nr, int mfr_id)
 {
 	struct ncsi_dev_priv *ndp = nr->ndp;
 	struct net_device *ndev = ndp->ndev.dev;
+	const struct net_device_ops *ops = ndev->netdev_ops;
 	struct ncsi_rsp_oem_pkt *rsp;
 	struct sockaddr saddr;
 	u32 mac_addr_off = 0;
@@ -655,9 +656,7 @@  static int ncsi_rsp_handler_oem_gma(struct ncsi_request *nr, int mfr_id)
 	/* Set the flag for GMA command which should only be called once */
 	ndp->gma_flag = 1;
 
-	rtnl_lock();
-	ret = dev_set_mac_address(ndev, &saddr, NULL);
-	rtnl_unlock();
+	ret = ops->ndo_set_mac_address(ndev, &saddr);
 	if (ret < 0)
 		netdev_warn(ndev, "NCSI: 'Writing mac address to device failed\n");