diff mbox series

[v1,net,2/4] net: ena: Wrong missing IO completions check order

Message ID 20240410091358.16289-3-darinzon@amazon.com (mailing list archive)
State Accepted
Commit f7e417180665234fdb7af2ebe33d89aaa434d16f
Delegated to: Netdev Maintainers
Headers show
Series ENA driver bug fixes | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 942 this patch: 942
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 2 maintainers not CCed: pabeni@redhat.com edumazet@google.com
netdev/build_clang success Errors and warnings before: 953 this patch: 953
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 953 this patch: 953
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 49 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-04-11--00-00 (tests: 959)

Commit Message

Arinzon, David April 10, 2024, 9:13 a.m. UTC
From: David Arinzon <darinzon@amazon.com>

Missing IO completions check is called every second (HZ jiffies).
This commit fixes several issues with this check:

1. Duplicate queues check:
   Max of 4 queues are scanned on each check due to monitor budget.
   Once reaching the budget, this check exits under the assumption that
   the next check will continue to scan the remainder of the queues,
   but in practice, next check will first scan the last already scanned
   queue which is not necessary and may cause the full queue scan to
   last a couple of seconds longer.
   The fix is to start every check with the next queue to scan.
   For example, on 8 IO queues:
   Bug: [0,1,2,3], [3,4,5,6], [6,7]
   Fix: [0,1,2,3], [4,5,6,7]

2. Unbalanced queues check:
   In case the number of active IO queues is not a multiple of budget,
   there will be checks which don't utilize the full budget
   because the full scan exits when reaching the last queue id.
   The fix is to run every TX completion check with exact queue budget
   regardless of the queue id.
   For example, on 7 IO queues:
   Bug: [0,1,2,3], [4,5,6], [0,1,2,3]
   Fix: [0,1,2,3], [4,5,6,0], [1,2,3,4]
   The budget may be lowered in case the number of IO queues is less
   than the budget (4) to make sure there are no duplicate queues on
   the same check.
   For example, on 3 IO queues:
   Bug: [0,1,2,0], [1,2,0,1]
   Fix: [0,1,2], [0,1,2]

Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Amit Bernstein <amitbern@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 21 +++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

Comments

Nelson, Shannon April 10, 2024, 9:51 p.m. UTC | #1
On 4/10/2024 2:13 AM, darinzon@amazon.com wrote:
> From: David Arinzon <darinzon@amazon.com>
> 
> Missing IO completions check is called every second (HZ jiffies).
> This commit fixes several issues with this check:
> 
> 1. Duplicate queues check:
>     Max of 4 queues are scanned on each check due to monitor budget.
>     Once reaching the budget, this check exits under the assumption that
>     the next check will continue to scan the remainder of the queues,
>     but in practice, next check will first scan the last already scanned
>     queue which is not necessary and may cause the full queue scan to
>     last a couple of seconds longer.
>     The fix is to start every check with the next queue to scan.
>     For example, on 8 IO queues:
>     Bug: [0,1,2,3], [3,4,5,6], [6,7]
>     Fix: [0,1,2,3], [4,5,6,7]
> 
> 2. Unbalanced queues check:
>     In case the number of active IO queues is not a multiple of budget,
>     there will be checks which don't utilize the full budget
>     because the full scan exits when reaching the last queue id.
>     The fix is to run every TX completion check with exact queue budget
>     regardless of the queue id.
>     For example, on 7 IO queues:
>     Bug: [0,1,2,3], [4,5,6], [0,1,2,3]
>     Fix: [0,1,2,3], [4,5,6,0], [1,2,3,4]
>     The budget may be lowered in case the number of IO queues is less
>     than the budget (4) to make sure there are no duplicate queues on
>     the same check.
>     For example, on 3 IO queues:
>     Bug: [0,1,2,0], [1,2,0,1]
>     Fix: [0,1,2], [0,1,2]
> 
> Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
> Signed-off-by: Amit Bernstein <amitbern@amazon.com>
> Signed-off-by: David Arinzon <darinzon@amazon.com>


Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>


> ---
>   drivers/net/ethernet/amazon/ena/ena_netdev.c | 21 +++++++++++---------
>   1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 09e7da1a..59befc0f 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -3481,10 +3481,11 @@ static void check_for_missing_completions(struct ena_adapter *adapter)
>   {
>          struct ena_ring *tx_ring;
>          struct ena_ring *rx_ring;
> -       int i, budget, rc;
> +       int qid, budget, rc;
>          int io_queue_count;
> 
>          io_queue_count = adapter->xdp_num_queues + adapter->num_io_queues;
> +
>          /* Make sure the driver doesn't turn the device in other process */
>          smp_rmb();
> 
> @@ -3497,27 +3498,29 @@ static void check_for_missing_completions(struct ena_adapter *adapter)
>          if (adapter->missing_tx_completion_to == ENA_HW_HINTS_NO_TIMEOUT)
>                  return;
> 
> -       budget = ENA_MONITORED_TX_QUEUES;
> +       budget = min_t(u32, io_queue_count, ENA_MONITORED_TX_QUEUES);
> 
> -       for (i = adapter->last_monitored_tx_qid; i < io_queue_count; i++) {
> -               tx_ring = &adapter->tx_ring[i];
> -               rx_ring = &adapter->rx_ring[i];
> +       qid = adapter->last_monitored_tx_qid;
> +
> +       while (budget) {
> +               qid = (qid + 1) % io_queue_count;
> +
> +               tx_ring = &adapter->tx_ring[qid];
> +               rx_ring = &adapter->rx_ring[qid];
> 
>                  rc = check_missing_comp_in_tx_queue(adapter, tx_ring);
>                  if (unlikely(rc))
>                          return;
> 
> -               rc =  !ENA_IS_XDP_INDEX(adapter, i) ?
> +               rc =  !ENA_IS_XDP_INDEX(adapter, qid) ?
>                          check_for_rx_interrupt_queue(adapter, rx_ring) : 0;
>                  if (unlikely(rc))
>                          return;
> 
>                  budget--;
> -               if (!budget)
> -                       break;
>          }
> 
> -       adapter->last_monitored_tx_qid = i % io_queue_count;
> +       adapter->last_monitored_tx_qid = qid;
>   }
> 
>   /* trigger napi schedule after 2 consecutive detections */
> --
> 2.40.1
> 
>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 09e7da1a..59befc0f 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3481,10 +3481,11 @@  static void check_for_missing_completions(struct ena_adapter *adapter)
 {
 	struct ena_ring *tx_ring;
 	struct ena_ring *rx_ring;
-	int i, budget, rc;
+	int qid, budget, rc;
 	int io_queue_count;
 
 	io_queue_count = adapter->xdp_num_queues + adapter->num_io_queues;
+
 	/* Make sure the driver doesn't turn the device in other process */
 	smp_rmb();
 
@@ -3497,27 +3498,29 @@  static void check_for_missing_completions(struct ena_adapter *adapter)
 	if (adapter->missing_tx_completion_to == ENA_HW_HINTS_NO_TIMEOUT)
 		return;
 
-	budget = ENA_MONITORED_TX_QUEUES;
+	budget = min_t(u32, io_queue_count, ENA_MONITORED_TX_QUEUES);
 
-	for (i = adapter->last_monitored_tx_qid; i < io_queue_count; i++) {
-		tx_ring = &adapter->tx_ring[i];
-		rx_ring = &adapter->rx_ring[i];
+	qid = adapter->last_monitored_tx_qid;
+
+	while (budget) {
+		qid = (qid + 1) % io_queue_count;
+
+		tx_ring = &adapter->tx_ring[qid];
+		rx_ring = &adapter->rx_ring[qid];
 
 		rc = check_missing_comp_in_tx_queue(adapter, tx_ring);
 		if (unlikely(rc))
 			return;
 
-		rc =  !ENA_IS_XDP_INDEX(adapter, i) ?
+		rc =  !ENA_IS_XDP_INDEX(adapter, qid) ?
 			check_for_rx_interrupt_queue(adapter, rx_ring) : 0;
 		if (unlikely(rc))
 			return;
 
 		budget--;
-		if (!budget)
-			break;
 	}
 
-	adapter->last_monitored_tx_qid = i % io_queue_count;
+	adapter->last_monitored_tx_qid = qid;
 }
 
 /* trigger napi schedule after 2 consecutive detections */