Message ID | 20240724222106.147744-1-michael.chan@broadcom.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 98ba1d931f611e8f8f519c0405fa0a1a76554bfa |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | bnxt_en: Fix RSS logic in __bnxt_reserve_rings() | expand |
On Wed, 24 Jul 2024 15:21:06 -0700 Michael Chan wrote: > Now, with RSS contexts support, if the user has added or deleted RSS > contexts, we may now enter this path to reserve the new number of VNICs. > However, netif_is_rxfh_configured() will not return the correct state if > we are still in the middle of set_rxfh(). So the existing code may > set the indirection table of the default RSS context to default by > mistake. I feel like my explanation was more clear :S The key point is that ethtool::set_rxfh() calls the "reload" functions and expects the scope of the "reload" to be quite narrow, because only the RSS table has changed. Unfortunately the add / delete of additional contexts de-sync the resource counts, so ethtool::set_rxfh() now ends up "reloading" more than it intended. The "more than intended" includes going down the RSS indir reset path, which calls netif_is_rxfh_configured(). Return value from netif_is_rxfh_configured() during ethtool::set_rxfh() is undefined. Reported tag would have been nice too..
On Wed, 24 Jul 2024 17:25:36 -0700 Jakub Kicinski wrote: > On Wed, 24 Jul 2024 15:21:06 -0700 Michael Chan wrote: > > Now, with RSS contexts support, if the user has added or deleted RSS > > contexts, we may now enter this path to reserve the new number of VNICs. > > However, netif_is_rxfh_configured() will not return the correct state if > > we are still in the middle of set_rxfh(). So the existing code may > > set the indirection table of the default RSS context to default by > > mistake. > > I feel like my explanation was more clear :S > > The key point is that ethtool::set_rxfh() calls the "reload" functions > and expects the scope of the "reload" to be quite narrow, because only > the RSS table has changed. Unfortunately the add / delete of additional > contexts de-sync the resource counts, so ethtool::set_rxfh() now ends > up "reloading" more than it intended. The "more than intended" includes > going down the RSS indir reset path, which calls netif_is_rxfh_configured(). > Return value from netif_is_rxfh_configured() during ethtool::set_rxfh() > is undefined. > > Reported tag would have been nice too.. Reported-and-tested-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/20240625010210.2002310-1-kuba@kernel.org There's one more problem. It looks like changing queue count discards existing ntuple filters: # Check| At /root/./ksft/drivers/net/hw/rss_ctx.py, line 387, in test_rss_context_queue_reconfigure: # Check| test_rss_queue_reconfigure(cfg, main_ctx=False) # Check| At /root/./ksft/drivers/net/hw/rss_ctx.py, line 230, in test_rss_queue_reconfigure: # Check| _send_traffic_check(cfg, port, ctx_ref, { 'target': (0, 3), # Check| At /root/./ksft/drivers/net/hw/rss_ctx.py, line 92, in _send_traffic_check: # Check| ksft_lt(sum(cnts[i] for i in params['noise']), directed / 2, # Check failed 1045235 >= 405823.5 traffic on other queues (context 1)':[460068, 351995, 565970, 351579, 127270] # Exception while handling defer / cleanup (callback 1 of 3)! # Defer Exception| Traceback (most recent call last): # Defer Exception| File "/root/ksft/net/lib/py/ksft.py", line 129, in ksft_flush_defer # Defer Exception| entry.exec_only() # Defer Exception| File "/root/ksft/net/lib/py/utils.py", line 93, in exec_only # Defer Exception| self.func(*self.args, **self.kwargs) # Defer Exception| File "/root/ksft/net/lib/py/utils.py", line 121, in ethtool # Defer Exception| return tool('ethtool', args, json=json, ns=ns, host=host) # Defer Exception| File "/root/ksft/net/lib/py/utils.py", line 108, in tool # Defer Exception| cmd_obj = cmd(cmd_str, ns=ns, host=host) # Defer Exception| File "/root/ksft/net/lib/py/utils.py", line 32, in __init__ # Defer Exception| self.process(terminate=False, fail=fail, timeout=timeout) # Defer Exception| File "/root/ksft/net/lib/py/utils.py", line 50, in process # Defer Exception| raise CmdExitFailure("Command failed: %s\nSTDOUT: %s\nSTDERR: %s" % # Defer Exception| net.lib.py.utils.CmdExitFailure: Command failed: ethtool -N eth0 delete 0 # Defer Exception| STDOUT: b'' # Defer Exception| STDERR: b'rmgr: Cannot delete RX class rule: No such file or directory\nCannot delete classification rule\n' not ok 8 rss_ctx.test_rss_context_queue_reconfigure This is from the following chunk of the test: 225 # We should be able to increase queues, but table should be left untouched 226 ethtool(f"-L {cfg.ifname} combined 5") 227 data = get_rss(cfg, context=ctx_id) 228 ksft_eq({0, 3}, set(data['rss-indirection-table'])) 229 230 _send_traffic_check(cfg, port, ctx_ref, { 'target': (0, 3), 231 other_key: (1, 2, 4) }) The Check failure tells us the traffic was sprayed. The Defer Exception, well, self-explanatory: "Cannot delete RX class rule: No such file or directory"
On Thu, Jul 25, 2024 at 11:19:12AM -0700, Jakub Kicinski wrote: > On Wed, 24 Jul 2024 17:25:36 -0700 Jakub Kicinski wrote: > > On Wed, 24 Jul 2024 15:21:06 -0700 Michael Chan wrote: > > > Now, with RSS contexts support, if the user has added or deleted RSS > > > contexts, we may now enter this path to reserve the new number of VNICs. > > > However, netif_is_rxfh_configured() will not return the correct state if > > > we are still in the middle of set_rxfh(). So the existing code may > > > set the indirection table of the default RSS context to default by > > > mistake. > > > > I feel like my explanation was more clear :S > > > > The key point is that ethtool::set_rxfh() calls the "reload" functions > > and expects the scope of the "reload" to be quite narrow, because only > > the RSS table has changed. Unfortunately the add / delete of additional > > contexts de-sync the resource counts, so ethtool::set_rxfh() now ends > > up "reloading" more than it intended. The "more than intended" includes > > going down the RSS indir reset path, which calls netif_is_rxfh_configured(). > > Return value from netif_is_rxfh_configured() during ethtool::set_rxfh() > > is undefined. > > > > Reported tag would have been nice too.. > Agreed. Sorry that was missed. > Reported-and-tested-by: Jakub Kicinski <kuba@kernel.org> > Link: https://lore.kernel.org/20240625010210.2002310-1-kuba@kernel.org > > There's one more problem. It looks like changing queue count discards > existing ntuple filters: > > # Check| At /root/./ksft/drivers/net/hw/rss_ctx.py, line 387, in test_rss_context_queue_reconfigure: > # Check| test_rss_queue_reconfigure(cfg, main_ctx=False) > # Check| At /root/./ksft/drivers/net/hw/rss_ctx.py, line 230, in test_rss_queue_reconfigure: > # Check| _send_traffic_check(cfg, port, ctx_ref, { 'target': (0, 3), > # Check| At /root/./ksft/drivers/net/hw/rss_ctx.py, line 92, in _send_traffic_check: > # Check| ksft_lt(sum(cnts[i] for i in params['noise']), directed / 2, > # Check failed 1045235 >= 405823.5 traffic on other queues (context 1)':[460068, 351995, 565970, 351579, 127270] > # Exception while handling defer / cleanup (callback 1 of 3)! > # Defer Exception| Traceback (most recent call last): > # Defer Exception| File "/root/ksft/net/lib/py/ksft.py", line 129, in ksft_flush_defer > # Defer Exception| entry.exec_only() > # Defer Exception| File "/root/ksft/net/lib/py/utils.py", line 93, in exec_only > # Defer Exception| self.func(*self.args, **self.kwargs) > # Defer Exception| File "/root/ksft/net/lib/py/utils.py", line 121, in ethtool > # Defer Exception| return tool('ethtool', args, json=json, ns=ns, host=host) > # Defer Exception| File "/root/ksft/net/lib/py/utils.py", line 108, in tool > # Defer Exception| cmd_obj = cmd(cmd_str, ns=ns, host=host) > # Defer Exception| File "/root/ksft/net/lib/py/utils.py", line 32, in __init__ > # Defer Exception| self.process(terminate=False, fail=fail, timeout=timeout) > # Defer Exception| File "/root/ksft/net/lib/py/utils.py", line 50, in process > # Defer Exception| raise CmdExitFailure("Command failed: %s\nSTDOUT: %s\nSTDERR: %s" % > # Defer Exception| net.lib.py.utils.CmdExitFailure: Command failed: ethtool -N eth0 delete 0 > # Defer Exception| STDOUT: b'' > # Defer Exception| STDERR: b'rmgr: Cannot delete RX class rule: No such file or directory\nCannot delete classification rule\n' > not ok 8 rss_ctx.test_rss_context_queue_reconfigure > > This is from the following chunk of the test: > > 225 # We should be able to increase queues, but table should be left untouched > 226 ethtool(f"-L {cfg.ifname} combined 5") > 227 data = get_rss(cfg, context=ctx_id) > 228 ksft_eq({0, 3}, set(data['rss-indirection-table'])) > 229 > 230 _send_traffic_check(cfg, port, ctx_ref, { 'target': (0, 3), > 231 other_key: (1, 2, 4) }) > > The Check failure tells us the traffic was sprayed. > The Defer Exception, well, self-explanatory: > "Cannot delete RX class rule: No such file or directory" We can take a look at that, but we currently do this on purpose.
On Thu, 25 Jul 2024 17:33:10 -0400 Andy Gospodarek wrote: > > The Check failure tells us the traffic was sprayed. > > The Defer Exception, well, self-explanatory: > > "Cannot delete RX class rule: No such file or directory" > > We can take a look at that, but we currently do this on purpose. Hm, I thought the rules may get lost if someone ifconfig down's the entire device. Losing rules on a config change is much more of a no-no, especially as long as the queue API remains all but a mirage.
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Wed, 24 Jul 2024 15:21:06 -0700 you wrote: > From: Pavan Chebbi <pavan.chebbi@broadcom.com> > > In __bnxt_reserve_rings(), the existing code unconditionally sets the > default RSS indirection table to default if netif_is_rxfh_configured() > returns false. This used to be correct before we added RSS contexts > support. For example, if the user is changing the number of ethtool > channels, we will enter this path to reserve the new number of rings. > We will then set the RSS indirection table to default to cover the new > number of rings if netif_is_rxfh_configured() is false. > > [...] Here is the summary with links: - bnxt_en: Fix RSS logic in __bnxt_reserve_rings() https://git.kernel.org/netdev/net/c/98ba1d931f61 You are awesome, thank you!
On Fri, Jul 26, 2024 at 4:02 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 25 Jul 2024 17:33:10 -0400 Andy Gospodarek wrote: > > > The Check failure tells us the traffic was sprayed. > > > The Defer Exception, well, self-explanatory: > > > "Cannot delete RX class rule: No such file or directory" > > > > We can take a look at that, but we currently do this on purpose. > > Hm, I thought the rules may get lost if someone ifconfig down's > the entire device. Losing rules on a config change is much more > of a no-no, especially as long as the queue API remains all but > a mirage. The rules won't be lost on ifconfig down. They will be lost in firmware but driver will restore them on ifup. I will work on the fix to not lose them on any non-impacting config change. Thanks.
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index bb3be33c1bbd..f788f114e430 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -7648,8 +7648,8 @@ static int bnxt_get_avail_msix(struct bnxt *bp, int num); static int __bnxt_reserve_rings(struct bnxt *bp) { struct bnxt_hw_rings hwr = {0}; + int rx_rings, old_rx_rings, rc; int cp = bp->cp_nr_rings; - int rx_rings, rc; int ulp_msix = 0; bool sh = false; int tx_cp; @@ -7683,6 +7683,7 @@ static int __bnxt_reserve_rings(struct bnxt *bp) hwr.grp = bp->rx_nr_rings; hwr.rss_ctx = bnxt_get_total_rss_ctxs(bp, &hwr); hwr.stat = bnxt_get_func_stat_ctxs(bp); + old_rx_rings = bp->hw_resc.resv_rx_rings; rc = bnxt_hwrm_reserve_rings(bp, &hwr); if (rc) @@ -7737,7 +7738,8 @@ static int __bnxt_reserve_rings(struct bnxt *bp) if (!bnxt_rings_ok(bp, &hwr)) return -ENOMEM; - if (!netif_is_rxfh_configured(bp->dev)) + if (old_rx_rings != bp->hw_resc.resv_rx_rings && + !netif_is_rxfh_configured(bp->dev)) bnxt_set_dflt_rss_indir_tbl(bp, NULL); if (!bnxt_ulp_registered(bp->edev) && BNXT_NEW_RM(bp)) {