Message ID | 20230123221727.30423-1-nnac123@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] ibmvnic: Toggle between queue types in affinity mapping | expand |
On Mon, 23 Jan 2023 16:17:27 -0600 Nick Child wrote: > A more optimal algorithm would balance the number RX and TX IRQ's across > the physical cores. Therefore, to increase performance, distribute RX and > TX IRQs across cores by alternating between assigning IRQs for RX and TX > queues to CPUs. > With a system with 64 CPUs and 32 queues, this results in the following > pattern (binding is done in reverse order for readable code): > > IRQ type | CPU number > ----------------------- > TX15 | 0-1 > RX15 | 2-3 > TX14 | 4-5 > RX14 | 6-7 Seems sensible but why did you invert the order? To save LoC?
On 1/24/23 20:39, Jakub Kicinski wrote: > On Mon, 23 Jan 2023 16:17:27 -0600 Nick Child wrote: >> A more optimal algorithm would balance the number RX and TX IRQ's across >> the physical cores. Therefore, to increase performance, distribute RX and >> TX IRQs across cores by alternating between assigning IRQs for RX and TX >> queues to CPUs. >> With a system with 64 CPUs and 32 queues, this results in the following >> pattern (binding is done in reverse order for readable code): >> >> IRQ type | CPU number >> ----------------------- >> TX15 | 0-1 >> RX15 | 2-3 >> TX14 | 4-5 >> RX14 | 6-7 > > Seems sensible but why did you invert the order? To save LoC? Thanks for checking this out Jakub. Correct, the effect on performance is the same and IMO the algorithm is more readable. Less so about minimizing lines and more about making the code understandable for the next dev.
On Wed, 25 Jan 2023 10:55:20 -0600 Nick Child wrote: > On 1/24/23 20:39, Jakub Kicinski wrote: > > On Mon, 23 Jan 2023 16:17:27 -0600 Nick Child wrote: > >> A more optimal algorithm would balance the number RX and TX IRQ's across > >> the physical cores. Therefore, to increase performance, distribute RX and > >> TX IRQs across cores by alternating between assigning IRQs for RX and TX > >> queues to CPUs. > >> With a system with 64 CPUs and 32 queues, this results in the following > >> pattern (binding is done in reverse order for readable code): > >> > >> IRQ type | CPU number > >> ----------------------- > >> TX15 | 0-1 > >> RX15 | 2-3 > >> TX14 | 4-5 > >> RX14 | 6-7 > > > > Seems sensible but why did you invert the order? To save LoC? > > Thanks for checking this out Jakub. > > Correct, the effect on performance is the same and IMO the algorithm > is more readable. Less so about minimizing lines and more about > making the code understandable for the next dev. I spend way too much time explaining IRQ pinning to developers at my "day job" :( Stuff like threaded NAPI means that more and more people interact with it. So I think having a more easily understandable mapping is worth the extra complexity in the driver. By which I mean: Tx0 -> 0-1 Rx0 -> 2-3 Tx1 -> 4-5 IOW Qn -> n*4+is_rx*2 - n*4+is_rx*2+1
On 1/24/23 18:39, Jakub Kicinski wrote:
> Seems sensible but why did you invert the order? To save LoC?
Proc zero is often the default vector for other interrupts. If we're going to diddle with the irq's for performance, it would make sense to me to steer around proc 0.
Rick
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index e19a6bb3f444..314a72cef592 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -254,6 +254,7 @@ static void ibmvnic_set_affinity(struct ibmvnic_adapter *adapter) int num_txqs = adapter->num_active_tx_scrqs; int total_queues, stride, stragglers, i; unsigned int num_cpu, cpu; + bool is_rx_queue; int rc = 0; netdev_dbg(adapter->netdev, "%s: Setting irq affinity hints", __func__); @@ -273,14 +274,22 @@ static void ibmvnic_set_affinity(struct ibmvnic_adapter *adapter) /* next available cpu to assign irq to */ cpu = cpumask_next(-1, cpu_online_mask); - for (i = 0; i < num_txqs; i++) { - queue = txqs[i]; + for (i = 0; i < total_queues; i++) { + is_rx_queue = false; + /* balance core load by alternating rx and tx assignments */ + if ((i % 2 == 1 && num_rxqs > 0) || num_txqs == 0) { + queue = rxqs[--num_rxqs]; + is_rx_queue = true; + } else { + queue = txqs[--num_txqs]; + } + rc = ibmvnic_set_queue_affinity(queue, &cpu, &stragglers, stride); if (rc) goto out; - if (!queue) + if (!queue || is_rx_queue) continue; rc = __netif_set_xps_queue(adapter->netdev, @@ -291,14 +300,6 @@ static void ibmvnic_set_affinity(struct ibmvnic_adapter *adapter) __func__, i, rc); } - for (i = 0; i < num_rxqs; i++) { - queue = rxqs[i]; - rc = ibmvnic_set_queue_affinity(queue, &cpu, &stragglers, - stride); - if (rc) - goto out; - } - out: if (rc) { netdev_warn(adapter->netdev,