diff mbox series

[for-next] RDMA/efa: Properly handle unexpected AQ completions

Message ID 20240513064630.6247-1-mrgolin@amazon.com (mailing list archive)
State Accepted
Headers show
Series [for-next] RDMA/efa: Properly handle unexpected AQ completions | expand

Commit Message

Michael Margolin May 13, 2024, 6:46 a.m. UTC
Do not try to handle admin command completion if it has an unexpected
command id and print a relevant error message.

Reviewed-by: Firas Jahjah <firasj@amazon.com>
Reviewed-by: Yehuda Yitschak <yehuday@amazon.com>
Signed-off-by: Michael Margolin <mrgolin@amazon.com>
---
 drivers/infiniband/hw/efa/efa_com.c | 30 ++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

Comments

Gal Pressman May 16, 2024, 10:54 a.m. UTC | #1
On 13/05/2024 9:46, Michael Margolin wrote:
> Do not try to handle admin command completion if it has an unexpected
> command id and print a relevant error message.
> 
> Reviewed-by: Firas Jahjah <firasj@amazon.com>
> Reviewed-by: Yehuda Yitschak <yehuday@amazon.com>
> Signed-off-by: Michael Margolin <mrgolin@amazon.com>
> ---
>  static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
>  {
>  	struct efa_admin_acq_entry *cqe;
>  	u16 queue_size_mask;
> -	u16 comp_num = 0;
> +	u16 comp_cmds = 0;
>  	u8 phase;
> +	int err;
>  	u16 ci;
>  
>  	queue_size_mask = aq->depth - 1;
> @@ -453,10 +456,12 @@ static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
>  		 * phase bit was validated
>  		 */
>  		dma_rmb();
> -		efa_com_handle_single_admin_completion(aq, cqe);
> +		err = efa_com_handle_single_admin_completion(aq, cqe);
> +		if (!err)
> +			comp_cmds++;

I would count the unexpected completions as well.
Regardless, I would definitely add a counter to track these (hopefully)
rare cases.

Whatever you decide:
Reviewed-by: Gal Pressman <gal.pressman@linux.dev>
Michael Margolin May 22, 2024, 11:02 a.m. UTC | #2
Thanks Gal, we indeed shouldn't see such errors on regular basis and the 
print is sufficient to help tracking and debugging in case it appears.


Michael

On 5/16/2024 1:54 PM, Gal Pressman wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On 13/05/2024 9:46, Michael Margolin wrote:
>> Do not try to handle admin command completion if it has an unexpected
>> command id and print a relevant error message.
>>
>> Reviewed-by: Firas Jahjah <firasj@amazon.com>
>> Reviewed-by: Yehuda Yitschak <yehuday@amazon.com>
>> Signed-off-by: Michael Margolin <mrgolin@amazon.com>
>> ---
>>   static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
>>   {
>>        struct efa_admin_acq_entry *cqe;
>>        u16 queue_size_mask;
>> -     u16 comp_num = 0;
>> +     u16 comp_cmds = 0;
>>        u8 phase;
>> +     int err;
>>        u16 ci;
>>
>>        queue_size_mask = aq->depth - 1;
>> @@ -453,10 +456,12 @@ static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
>>                 * phase bit was validated
>>                 */
>>                dma_rmb();
>> -             efa_com_handle_single_admin_completion(aq, cqe);
>> +             err = efa_com_handle_single_admin_completion(aq, cqe);
>> +             if (!err)
>> +                     comp_cmds++;
> I would count the unexpected completions as well.
> Regardless, I would definitely add a counter to track these (hopefully)
> rare cases.
>
> Whatever you decide:
> Reviewed-by: Gal Pressman <gal.pressman@linux.dev>
Leon Romanovsky May 30, 2024, 12:49 p.m. UTC | #3
On Mon, 13 May 2024 06:46:30 +0000, Michael Margolin wrote:
> Do not try to handle admin command completion if it has an unexpected
> command id and print a relevant error message.
> 
> 

Applied, thanks!

[1/1] RDMA/efa: Properly handle unexpected AQ completions
      https://git.kernel.org/rdma/rdma/c/2d0e7ba468eae3

Best regards,
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/efa/efa_com.c b/drivers/infiniband/hw/efa/efa_com.c
index 16a24a05fc2a..bafd210dd43e 100644
--- a/drivers/infiniband/hw/efa/efa_com.c
+++ b/drivers/infiniband/hw/efa/efa_com.c
@@ -1,6 +1,6 @@ 
 // SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause
 /*
- * Copyright 2018-2021 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2018-2024 Amazon.com, Inc. or its affiliates. All rights reserved.
  */
 
 #include "efa_com.h"
@@ -406,8 +406,8 @@  static struct efa_comp_ctx *efa_com_submit_admin_cmd(struct efa_com_admin_queue
 	return comp_ctx;
 }
 
-static void efa_com_handle_single_admin_completion(struct efa_com_admin_queue *aq,
-						   struct efa_admin_acq_entry *cqe)
+static int efa_com_handle_single_admin_completion(struct efa_com_admin_queue *aq,
+						  struct efa_admin_acq_entry *cqe)
 {
 	struct efa_comp_ctx *comp_ctx;
 	u16 cmd_id;
@@ -416,11 +416,11 @@  static void efa_com_handle_single_admin_completion(struct efa_com_admin_queue *a
 			 EFA_ADMIN_ACQ_COMMON_DESC_COMMAND_ID);
 
 	comp_ctx = efa_com_get_comp_ctx(aq, cmd_id, false);
-	if (!comp_ctx) {
+	if (comp_ctx->status != EFA_CMD_SUBMITTED) {
 		ibdev_err(aq->efa_dev,
-			  "comp_ctx is NULL. Changing the admin queue running state\n");
-		clear_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state);
-		return;
+			  "Received completion with unexpected command id[%d], sq producer: %d, sq consumer: %d, cq consumer: %d\n",
+			  cmd_id, aq->sq.pc, aq->sq.cc, aq->cq.cc);
+		return -EINVAL;
 	}
 
 	comp_ctx->status = EFA_CMD_COMPLETED;
@@ -428,14 +428,17 @@  static void efa_com_handle_single_admin_completion(struct efa_com_admin_queue *a
 
 	if (!test_bit(EFA_AQ_STATE_POLLING_BIT, &aq->state))
 		complete(&comp_ctx->wait_event);
+
+	return 0;
 }
 
 static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
 {
 	struct efa_admin_acq_entry *cqe;
 	u16 queue_size_mask;
-	u16 comp_num = 0;
+	u16 comp_cmds = 0;
 	u8 phase;
+	int err;
 	u16 ci;
 
 	queue_size_mask = aq->depth - 1;
@@ -453,10 +456,12 @@  static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
 		 * phase bit was validated
 		 */
 		dma_rmb();
-		efa_com_handle_single_admin_completion(aq, cqe);
+		err = efa_com_handle_single_admin_completion(aq, cqe);
+		if (!err)
+			comp_cmds++;
 
+		aq->cq.cc++;
 		ci++;
-		comp_num++;
 		if (ci == aq->depth) {
 			ci = 0;
 			phase = !phase;
@@ -465,10 +470,9 @@  static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
 		cqe = &aq->cq.entries[ci];
 	}
 
-	aq->cq.cc += comp_num;
 	aq->cq.phase = phase;
-	aq->sq.cc += comp_num;
-	atomic64_add(comp_num, &aq->stats.completed_cmd);
+	aq->sq.cc += comp_cmds;
+	atomic64_add(comp_cmds, &aq->stats.completed_cmd);
 }
 
 static int efa_com_comp_status_to_errno(u8 comp_status)