Message ID | 20210615085415.1696103-1-joamaki@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 848ca9182a7d25bb54955c3aab9a3a2742bf9678 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net: bonding: Use per-cpu rr_tx_counter | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 4 maintainers not CCed: vfalico@gmail.com davem@davemloft.net andy@greyhouse.net kuba@kernel.org |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 11 this patch: 11 |
netdev/kdoc | success | Errors and warnings before: 2 this patch: 2 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 51 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 11 this patch: 11 |
netdev/header_inline | success | Link |
Hello: This patch was applied to netdev/net-next.git (refs/heads/master): On Tue, 15 Jun 2021 08:54:15 +0000 you wrote: > The round-robin rr_tx_counter was shared across CPUs leading to > significant cache thrashing at high packet rates. This patch switches > the round-robin packet counter to use a per-cpu variable to decide > the destination slave. > > On a test with 2x100Gbit ICE nic with pktgen_sample_04_many_flows.sh > (-s 64 -t 32) the tx rate was 19.6Mpps before and 22.3Mpps after > this patch. > > [...] Here is the summary with links: - [net-next] net: bonding: Use per-cpu rr_tx_counter https://git.kernel.org/netdev/net-next/c/848ca9182a7d You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index eb79a9f05914..1d9137e77dfc 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4202,16 +4202,16 @@ static u32 bond_rr_gen_slave_id(struct bonding *bond) slave_id = prandom_u32(); break; case 1: - slave_id = bond->rr_tx_counter; + slave_id = this_cpu_inc_return(*bond->rr_tx_counter); break; default: reciprocal_packets_per_slave = bond->params.reciprocal_packets_per_slave; - slave_id = reciprocal_divide(bond->rr_tx_counter, + slave_id = this_cpu_inc_return(*bond->rr_tx_counter); + slave_id = reciprocal_divide(slave_id, reciprocal_packets_per_slave); break; } - bond->rr_tx_counter++; return slave_id; } @@ -4852,6 +4852,9 @@ static void bond_destructor(struct net_device *bond_dev) if (bond->wq) destroy_workqueue(bond->wq); + + if (bond->rr_tx_counter) + free_percpu(bond->rr_tx_counter); } void bond_setup(struct net_device *bond_dev) @@ -5350,6 +5353,15 @@ static int bond_init(struct net_device *bond_dev) if (!bond->wq) return -ENOMEM; + if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN) { + bond->rr_tx_counter = alloc_percpu(u32); + if (!bond->rr_tx_counter) { + destroy_workqueue(bond->wq); + bond->wq = NULL; + return -ENOMEM; + } + } + spin_lock_init(&bond->stats_lock); netdev_lockdep_set_classes(bond_dev); diff --git a/include/net/bonding.h b/include/net/bonding.h index 019e998d944a..15335732e166 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -232,7 +232,7 @@ struct bonding { char proc_file_name[IFNAMSIZ]; #endif /* CONFIG_PROC_FS */ struct list_head bond_list; - u32 rr_tx_counter; + u32 __percpu *rr_tx_counter; struct ad_bond_info ad_info; struct alb_bond_info alb_info; struct bond_params params;
The round-robin rr_tx_counter was shared across CPUs leading to significant cache thrashing at high packet rates. This patch switches the round-robin packet counter to use a per-cpu variable to decide the destination slave. On a test with 2x100Gbit ICE nic with pktgen_sample_04_many_flows.sh (-s 64 -t 32) the tx rate was 19.6Mpps before and 22.3Mpps after this patch. "perf top -e cache_misses" before: 12.31% [bonding] [k] bond_xmit_roundrobin_slave_get 10.59% [sch_fq_codel] [k] fq_codel_dequeue 9.34% [kernel] [k] skb_release_data after: 15.42% [sch_fq_codel] [k] fq_codel_dequeue 10.06% [kernel] [k] __memset 9.12% [kernel] [k] skb_release_data Signed-off-by: Jussi Maki <joamaki@gmail.com> --- drivers/net/bonding/bond_main.c | 18 +++++++++++++++--- include/net/bonding.h | 2 +- 2 files changed, 16 insertions(+), 4 deletions(-)