Message ID | 20240513064630.6247-1-mrgolin@amazon.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [for-next] RDMA/efa: Properly handle unexpected AQ completions | expand |
On 13/05/2024 9:46, Michael Margolin wrote: > Do not try to handle admin command completion if it has an unexpected > command id and print a relevant error message. > > Reviewed-by: Firas Jahjah <firasj@amazon.com> > Reviewed-by: Yehuda Yitschak <yehuday@amazon.com> > Signed-off-by: Michael Margolin <mrgolin@amazon.com> > --- > static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq) > { > struct efa_admin_acq_entry *cqe; > u16 queue_size_mask; > - u16 comp_num = 0; > + u16 comp_cmds = 0; > u8 phase; > + int err; > u16 ci; > > queue_size_mask = aq->depth - 1; > @@ -453,10 +456,12 @@ static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq) > * phase bit was validated > */ > dma_rmb(); > - efa_com_handle_single_admin_completion(aq, cqe); > + err = efa_com_handle_single_admin_completion(aq, cqe); > + if (!err) > + comp_cmds++; I would count the unexpected completions as well. Regardless, I would definitely add a counter to track these (hopefully) rare cases. Whatever you decide: Reviewed-by: Gal Pressman <gal.pressman@linux.dev>
Thanks Gal, we indeed shouldn't see such errors on regular basis and the print is sufficient to help tracking and debugging in case it appears. Michael On 5/16/2024 1:54 PM, Gal Pressman wrote: > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > On 13/05/2024 9:46, Michael Margolin wrote: >> Do not try to handle admin command completion if it has an unexpected >> command id and print a relevant error message. >> >> Reviewed-by: Firas Jahjah <firasj@amazon.com> >> Reviewed-by: Yehuda Yitschak <yehuday@amazon.com> >> Signed-off-by: Michael Margolin <mrgolin@amazon.com> >> --- >> static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq) >> { >> struct efa_admin_acq_entry *cqe; >> u16 queue_size_mask; >> - u16 comp_num = 0; >> + u16 comp_cmds = 0; >> u8 phase; >> + int err; >> u16 ci; >> >> queue_size_mask = aq->depth - 1; >> @@ -453,10 +456,12 @@ static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq) >> * phase bit was validated >> */ >> dma_rmb(); >> - efa_com_handle_single_admin_completion(aq, cqe); >> + err = efa_com_handle_single_admin_completion(aq, cqe); >> + if (!err) >> + comp_cmds++; > I would count the unexpected completions as well. > Regardless, I would definitely add a counter to track these (hopefully) > rare cases. > > Whatever you decide: > Reviewed-by: Gal Pressman <gal.pressman@linux.dev>
On Mon, 13 May 2024 06:46:30 +0000, Michael Margolin wrote: > Do not try to handle admin command completion if it has an unexpected > command id and print a relevant error message. > > Applied, thanks! [1/1] RDMA/efa: Properly handle unexpected AQ completions https://git.kernel.org/rdma/rdma/c/2d0e7ba468eae3 Best regards,
diff --git a/drivers/infiniband/hw/efa/efa_com.c b/drivers/infiniband/hw/efa/efa_com.c index 16a24a05fc2a..bafd210dd43e 100644 --- a/drivers/infiniband/hw/efa/efa_com.c +++ b/drivers/infiniband/hw/efa/efa_com.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause /* - * Copyright 2018-2021 Amazon.com, Inc. or its affiliates. All rights reserved. + * Copyright 2018-2024 Amazon.com, Inc. or its affiliates. All rights reserved. */ #include "efa_com.h" @@ -406,8 +406,8 @@ static struct efa_comp_ctx *efa_com_submit_admin_cmd(struct efa_com_admin_queue return comp_ctx; } -static void efa_com_handle_single_admin_completion(struct efa_com_admin_queue *aq, - struct efa_admin_acq_entry *cqe) +static int efa_com_handle_single_admin_completion(struct efa_com_admin_queue *aq, + struct efa_admin_acq_entry *cqe) { struct efa_comp_ctx *comp_ctx; u16 cmd_id; @@ -416,11 +416,11 @@ static void efa_com_handle_single_admin_completion(struct efa_com_admin_queue *a EFA_ADMIN_ACQ_COMMON_DESC_COMMAND_ID); comp_ctx = efa_com_get_comp_ctx(aq, cmd_id, false); - if (!comp_ctx) { + if (comp_ctx->status != EFA_CMD_SUBMITTED) { ibdev_err(aq->efa_dev, - "comp_ctx is NULL. Changing the admin queue running state\n"); - clear_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state); - return; + "Received completion with unexpected command id[%d], sq producer: %d, sq consumer: %d, cq consumer: %d\n", + cmd_id, aq->sq.pc, aq->sq.cc, aq->cq.cc); + return -EINVAL; } comp_ctx->status = EFA_CMD_COMPLETED; @@ -428,14 +428,17 @@ static void efa_com_handle_single_admin_completion(struct efa_com_admin_queue *a if (!test_bit(EFA_AQ_STATE_POLLING_BIT, &aq->state)) complete(&comp_ctx->wait_event); + + return 0; } static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq) { struct efa_admin_acq_entry *cqe; u16 queue_size_mask; - u16 comp_num = 0; + u16 comp_cmds = 0; u8 phase; + int err; u16 ci; queue_size_mask = aq->depth - 1; @@ -453,10 +456,12 @@ static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq) * phase bit was validated */ dma_rmb(); - efa_com_handle_single_admin_completion(aq, cqe); + err = efa_com_handle_single_admin_completion(aq, cqe); + if (!err) + comp_cmds++; + aq->cq.cc++; ci++; - comp_num++; if (ci == aq->depth) { ci = 0; phase = !phase; @@ -465,10 +470,9 @@ static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq) cqe = &aq->cq.entries[ci]; } - aq->cq.cc += comp_num; aq->cq.phase = phase; - aq->sq.cc += comp_num; - atomic64_add(comp_num, &aq->stats.completed_cmd); + aq->sq.cc += comp_cmds; + atomic64_add(comp_cmds, &aq->stats.completed_cmd); } static int efa_com_comp_status_to_errno(u8 comp_status)