From patchwork Thu Oct 5 05:31:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Selvin Xavier X-Patchwork-Id: 13410120 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45F5BE92703 for ; Thu, 5 Oct 2023 14:20:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233887AbjJEOU6 (ORCPT ); Thu, 5 Oct 2023 10:20:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233923AbjJEOS5 (ORCPT ); Thu, 5 Oct 2023 10:18:57 -0400 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3EAB55B8 for ; Wed, 4 Oct 2023 22:43:23 -0700 (PDT) Received: by mail-pl1-x62c.google.com with SMTP id d9443c01a7336-1c3bd829b86so4599665ad.0 for ; Wed, 04 Oct 2023 22:43:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; t=1696484603; x=1697089403; darn=vger.kernel.org; h=references:in-reply-to:message-id:date:subject:cc:to:from:from:to :cc:subject:date:message-id:reply-to; bh=r420GRutCSvvUtoYiNRF3BRRrBITrwqNYaJuk7ybSA0=; b=DSmof6rpgNsguxn049dd6BdgzCm5QSoRUePX+IWiMQUlVZVyidLVhhYB7Lv9TQ+0ZB 5jSJKl/dINVq33sTK15bw0wKrVL3gXUENuXFkfDGYzoiJTnMDeYQ8uV6gAAHtloLvVNC vapYZapjQL6D3++1u9XFOjHtbkzbl7iROmyjg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696484603; x=1697089403; h=references:in-reply-to:message-id:date:subject:cc:to:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=r420GRutCSvvUtoYiNRF3BRRrBITrwqNYaJuk7ybSA0=; b=l7quyJs4K2xQ1TdGUcM9vm8m41rM3rCJEshwj+zpgEc9ja8o5bjaGh4oI7WXyiP3Zr v8V0cJ3bziv1U0AdFafTxH8tZXSEzXJC0d1HnPG+JTUpXvdNhbX9lj9h9oETaRhPVZJq grTuuF1vHBlKBaapyvlXMh5w3eaRMWwwv2f3qvFNnGzGFe2d4pHXoDfx/QTAYMKILAzk pcmQdruLeAWFmhbuNK6vhWIUMcK/5bvFZR7dBX/SsbJ6f2v1O+c2BTZSKvksrLUTc/Mr o9C/unkKRMb8ajUu4eeAfMzHd290m/IoHoJBwYfd4/FcsVy9ZDFV2izN0B4l8vZc+wYE X4/g== X-Gm-Message-State: AOJu0Yx0YL3ePtFHZ+RDinI5l4bRSl7Nvl7bKDM8oNGe+Sh58emPegIv tOTsF5Ig+voSfuXBpfdAvesnLA== X-Google-Smtp-Source: AGHT+IGrEaNulZXATDICU6GbGCDZTLZcgm/72DUwKB7osWRXCsupvPQ442Zw1TzJto5zAiHLUsIwMQ== X-Received: by 2002:a17:902:ef87:b0:1bf:7d3b:4404 with SMTP id iz7-20020a170902ef8700b001bf7d3b4404mr3261810plb.17.1696484602996; Wed, 04 Oct 2023 22:43:22 -0700 (PDT) Received: from dhcp-10-192-206-197.iig.avagotech.net.net ([192.19.234.250]) by smtp.gmail.com with ESMTPSA id iz2-20020a170902ef8200b001c71ec1866fsm601422plb.258.2023.10.04.22.43.20 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 04 Oct 2023 22:43:22 -0700 (PDT) From: Selvin Xavier To: jgg@ziepe.ca, leon@kernel.org Cc: linux-rdma@vger.kernel.org, andrew.gospodarek@broadcom.com, Chandramohan Akula , Selvin Xavier Subject: [PATCH for-next 2/3] RDMA/bnxt_re: Report async events and errors Date: Wed, 4 Oct 2023 22:31:28 -0700 Message-Id: <1696483889-17427-3-git-send-email-selvin.xavier@broadcom.com> X-Mailer: git-send-email 2.5.5 In-Reply-To: <1696483889-17427-1-git-send-email-selvin.xavier@broadcom.com> References: <1696483889-17427-1-git-send-email-selvin.xavier@broadcom.com> Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Chandramohan Akula Report QP, SRQ and CQ async events and errors. Signed-off-by: Chandramohan Akula Signed-off-by: Selvin Xavier --- drivers/infiniband/hw/bnxt_re/main.c | 165 +++++++++++++++++++++++++++++++++-- 1 file changed, 156 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c index c9066aa..7bd18ec 100644 --- a/drivers/infiniband/hw/bnxt_re/main.c +++ b/drivers/infiniband/hw/bnxt_re/main.c @@ -970,6 +970,9 @@ static int bnxt_re_handle_unaffi_async_event(struct creq_func_event static int bnxt_re_handle_qp_async_event(struct creq_qp_event *qp_event, struct bnxt_re_qp *qp) { + struct bnxt_re_srq *srq = container_of(qp->qplib_qp.srq, struct bnxt_re_srq, + qplib_srq); + struct creq_qp_error_notification *err_event; struct ib_event event = {}; unsigned int flags; @@ -980,14 +983,146 @@ static int bnxt_re_handle_qp_async_event(struct creq_qp_event *qp_event, bnxt_re_unlock_cqs(qp, flags); } - if (qp->qplib_qp.srq) { - event.device = &qp->rdev->ibdev; - event.element.qp = &qp->ib_qp; - event.event = IB_EVENT_QP_LAST_WQE_REACHED; + event.device = &qp->rdev->ibdev; + event.element.qp = &qp->ib_qp; + event.event = IB_EVENT_QP_FATAL; + + err_event = (struct creq_qp_error_notification *)qp_event; + + switch (err_event->req_err_state_reason) { + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_OPCODE_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_TIMEOUT_RETRY_LIMIT: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_RNR_TIMEOUT_RETRY_LIMIT: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_NAK_ARRIVAL_2: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_NAK_ARRIVAL_3: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_INVALID_READ_RESP: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_ILLEGAL_BIND: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_ILLEGAL_FAST_REG: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_ILLEGAL_INVALIDATE: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_RETRAN_LOCAL_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_AV_DOMAIN_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_PROD_WQE_MSMTCH_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_PSN_RANGE_CHECK_ERROR: + event.event = IB_EVENT_QP_ACCESS_ERR; + break; + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_NAK_ARRIVAL_1: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_NAK_ARRIVAL_4: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_READ_RESP_LENGTH: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_WQE_FORMAT_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_ORRQ_FORMAT_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_INVALID_AVID_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_SERV_TYPE_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_INVALID_OP_ERROR: + event.event = IB_EVENT_QP_REQ_ERR; + break; + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_RX_MEMORY_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_TX_MEMORY_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_CMP_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_CQ_LOAD_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_TX_PCI_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_RX_PCI_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_REQ_ERR_STATE_REASON_REQ_RETX_SETUP_ERROR: + event.event = IB_EVENT_QP_FATAL; + break; + + default: + break; } - if (event.device && qp->ib_qp.event_handler) + switch (err_event->res_err_state_reason) { + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_EXCEED_MAX: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_PAYLOAD_LENGTH_MISMATCH: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_PSN_SEQ_ERROR_RETRY_LIMIT: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_RX_INVALID_R_KEY: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_RX_DOMAIN_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_RX_NO_PERMISSION: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_RX_RANGE_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_TX_INVALID_R_KEY: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_TX_DOMAIN_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_TX_NO_PERMISSION: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_TX_RANGE_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_UNALIGN_ATOMIC: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_PSN_NOT_FOUND: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_INVALID_DUP_RKEY: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_IRRQ_FORMAT_ERROR: + event.event = IB_EVENT_QP_ACCESS_ERR; + break; + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_EXCEEDS_WQE: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_WQE_FORMAT_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_UNSUPPORTED_OPCODE: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_REM_INVALIDATE: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_OPCODE_ERROR: + event.event = IB_EVENT_QP_REQ_ERR; + break; + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_IRRQ_OFLOW: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_CMP_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_CQ_LOAD_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_TX_PCI_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_RX_PCI_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_MEMORY_ERROR: + event.event = IB_EVENT_QP_FATAL; + break; + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_SRQ_LOAD_ERROR: + case CREQ_QP_ERROR_NOTIFICATION_RES_ERR_STATE_REASON_RES_SRQ_ERROR: + if (srq) + event.event = IB_EVENT_SRQ_ERR; + break; + default: + break; + } + + if (err_event->res_err_state_reason || err_event->req_err_state_reason) { + dev_err_once(rdev_to_dev(qp->rdev), + "%s %s qp_id: %d cons (%d %d) req (%d %d) res (%d %d)\n", + __func__, rdma_is_kernel_res(&qp->ib_qp.res) ? "kernel" : "user", + qp->qplib_qp.id, + err_event->sq_cons_idx, + err_event->rq_cons_idx, + err_event->req_slow_path_state, + err_event->req_err_state_reason, + err_event->res_slow_path_state, + err_event->res_err_state_reason); + } else { + if (srq) + event.event = IB_EVENT_QP_LAST_WQE_REACHED; + } + + if (event.event == IB_EVENT_SRQ_ERR && srq->ib_srq.event_handler) { + (*srq->ib_srq.event_handler)(&event, + srq->ib_srq.srq_context); + } else if (event.device && qp->ib_qp.event_handler) { qp->ib_qp.event_handler(&event, qp->ib_qp.qp_context); + } + + return 0; +} + +static int bnxt_re_handle_cq_async_error(void *event, struct bnxt_re_cq *cq) +{ + struct creq_cq_error_notification *cqerr; + struct ib_event ibevent = {}; + + cqerr = event; + switch (cqerr->cq_err_reason) { + case CREQ_CQ_ERROR_NOTIFICATION_CQ_ERR_REASON_REQ_CQ_INVALID_ERROR: + case CREQ_CQ_ERROR_NOTIFICATION_CQ_ERR_REASON_REQ_CQ_OVERFLOW_ERROR: + case CREQ_CQ_ERROR_NOTIFICATION_CQ_ERR_REASON_REQ_CQ_LOAD_ERROR: + case CREQ_CQ_ERROR_NOTIFICATION_CQ_ERR_REASON_RES_CQ_INVALID_ERROR: + case CREQ_CQ_ERROR_NOTIFICATION_CQ_ERR_REASON_RES_CQ_OVERFLOW_ERROR: + case CREQ_CQ_ERROR_NOTIFICATION_CQ_ERR_REASON_RES_CQ_LOAD_ERROR: + ibevent.event = IB_EVENT_CQ_ERR; + default: + break; + } + + if (ibevent.event == IB_EVENT_CQ_ERR && cq->ib_cq.event_handler) { + ibevent.element.cq = &cq->ib_cq; + ibevent.device = &cq->rdev->ibdev; + + dev_err_once(rdev_to_dev(cq->rdev), + "%s err reason %d\n", __func__, cqerr->cq_err_reason); + cq->ib_cq.event_handler(&ibevent, cq->ib_cq.cq_context); + } return 0; } @@ -995,6 +1130,10 @@ static int bnxt_re_handle_qp_async_event(struct creq_qp_event *qp_event, static int bnxt_re_handle_affi_async_event(struct creq_qp_event *affi_async, void *obj) { + struct bnxt_qplib_qp *lib_qp; + struct bnxt_qplib_cq *lib_cq; + struct bnxt_re_qp *qp; + struct bnxt_re_cq *cq; int rc = 0; u8 event; @@ -1002,11 +1141,19 @@ static int bnxt_re_handle_affi_async_event(struct creq_qp_event *affi_async, return rc; /* QP was already dead, still return success */ event = affi_async->event; - if (event == CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION) { - struct bnxt_qplib_qp *lib_qp = obj; - struct bnxt_re_qp *qp = container_of(lib_qp, struct bnxt_re_qp, - qplib_qp); + switch (event) { + case CREQ_QP_EVENT_EVENT_QP_ERROR_NOTIFICATION: + lib_qp = obj; + qp = container_of(lib_qp, struct bnxt_re_qp, qplib_qp); rc = bnxt_re_handle_qp_async_event(affi_async, qp); + break; + case CREQ_QP_EVENT_EVENT_CQ_ERROR_NOTIFICATION: + lib_cq = obj; + cq = container_of(lib_cq, struct bnxt_re_cq, qplib_cq); + rc = bnxt_re_handle_cq_async_error(affi_async, cq); + break; + default: + rc = -EINVAL; } return rc; }