From patchwork Mon Mar 18 16:55:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 10858109 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C0B8F6C2 for ; Mon, 18 Mar 2019 16:55:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 98E0F28FAB for ; Mon, 18 Mar 2019 16:55:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 895EA28FDC; Mon, 18 Mar 2019 16:55:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7ACF528FAB for ; Mon, 18 Mar 2019 16:55:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727368AbfCRQzK (ORCPT ); Mon, 18 Mar 2019 12:55:10 -0400 Received: from mga06.intel.com ([134.134.136.31]:29037 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726765AbfCRQzK (ORCPT ); Mon, 18 Mar 2019 12:55:10 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Mar 2019 09:55:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,494,1544515200"; d="scan'208";a="128003754" Received: from scymds01.sc.intel.com ([10.82.194.37]) by orsmga006.jf.intel.com with ESMTP; 18 Mar 2019 09:55:09 -0700 Received: from scvm10.sc.intel.com (scvm10.sc.intel.com [10.82.195.27]) by scymds01.sc.intel.com with ESMTP id x2IGt9fl020790; Mon, 18 Mar 2019 09:55:09 -0700 Received: from scvm10.sc.intel.com (localhost [127.0.0.1]) by scvm10.sc.intel.com with ESMTP id x2IGt93C025195; Mon, 18 Mar 2019 09:55:09 -0700 Subject: [PATCH for-rc 1/5] IB/hfi1: Fix WQ_MEM_RECLAIM warning From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, "Michael J. Ruhl" , Mike Marciniszyn Date: Mon, 18 Mar 2019 09:55:09 -0700 Message-ID: <20190318165501.23550.24989.stgit@scvm10.sc.intel.com> In-Reply-To: <20190318165205.23550.97894.stgit@scvm10.sc.intel.com> References: <20190318165205.23550.97894.stgit@scvm10.sc.intel.com> User-Agent: StGit/0.17.1-18-g2e886-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Mike Marciniszyn The work_item cancels that occur when a QP is destroyed can elicit the following trace: [ 708.997199] workqueue: WQ_MEM_RECLAIM ipoib_wq:ipoib_cm_tx_reap [ib_ipoib] is flushing !WQ_MEM_RECLAIM hfi0_0:_hfi1_do_send [hfi1] [ 708.997209] WARNING: CPU: 7 PID: 1403 at kernel/workqueue.c:2486 check_flush_dependency+0xb1/0x100 [ 709.227743] Call Trace: [ 709.230852] __flush_work.isra.29+0x8c/0x1a0 [ 709.235779] ? __switch_to_asm+0x40/0x70 [ 709.240335] __cancel_work_timer+0x103/0x190 [ 709.245253] ? schedule+0x32/0x80 [ 709.249216] iowait_cancel_work+0x15/0x30 [hfi1] [ 709.254475] rvt_reset_qp+0x1f8/0x3e0 [rdmavt] [ 709.259554] rvt_destroy_qp+0x65/0x1f0 [rdmavt] [ 709.264703] ? _cond_resched+0x15/0x30 [ 709.269081] ib_destroy_qp+0xe9/0x230 [ib_core] [ 709.274223] ipoib_cm_tx_reap+0x21c/0x560 [ib_ipoib] [ 709.279799] process_one_work+0x171/0x370 [ 709.284425] worker_thread+0x49/0x3f0 [ 709.288695] kthread+0xf8/0x130 [ 709.292450] ? max_active_store+0x80/0x80 [ 709.297050] ? kthread_bind+0x10/0x10 [ 709.301293] ret_from_fork+0x35/0x40 [ 709.305441] ---[ end trace f0e973737146499b ]--- Since QP destruction frees memory, hfi1_wq should have the WQ_MEM_RECLAIM. Fixes: 0a226edd203f ("staging/rdma/hfi1: Use parallel workqueue for SDMA engines") Reviewed-by: Michael J. Ruhl Signed-off-by: Mike Marciniszyn Signed-off-by: Dennis Dalessandro --- drivers/infiniband/hw/hfi1/init.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c index 7841a0a..3cd6f07 100644 --- a/drivers/infiniband/hw/hfi1/init.c +++ b/drivers/infiniband/hw/hfi1/init.c @@ -800,7 +800,8 @@ static int create_workqueues(struct hfi1_devdata *dd) ppd->hfi1_wq = alloc_workqueue( "hfi%d_%d", - WQ_SYSFS | WQ_HIGHPRI | WQ_CPU_INTENSIVE, + WQ_SYSFS | WQ_HIGHPRI | WQ_CPU_INTENSIVE | + WQ_MEM_RECLAIM, HFI1_MAX_ACTIVE_WORKQUEUE_ENTRIES, dd->unit, pidx); if (!ppd->hfi1_wq) From patchwork Mon Mar 18 16:55:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 10858111 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6E2D31390 for ; Mon, 18 Mar 2019 16:55:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4A84F28F9E for ; Mon, 18 Mar 2019 16:55:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3E49A28FC5; Mon, 18 Mar 2019 16:55:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CCE9F28F9E for ; Mon, 18 Mar 2019 16:55:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727111AbfCRQzV (ORCPT ); Mon, 18 Mar 2019 12:55:21 -0400 Received: from mga04.intel.com ([192.55.52.120]:37976 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726765AbfCRQzU (ORCPT ); Mon, 18 Mar 2019 12:55:20 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Mar 2019 09:55:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,494,1544515200"; d="scan'208";a="283700800" Received: from scymds01.sc.intel.com ([10.82.194.37]) by orsmga004.jf.intel.com with ESMTP; 18 Mar 2019 09:55:19 -0700 Received: from scvm10.sc.intel.com (scvm10.sc.intel.com [10.82.195.27]) by scymds01.sc.intel.com with ESMTP id x2IGtJQM022324; Mon, 18 Mar 2019 09:55:19 -0700 Received: from scvm10.sc.intel.com (localhost [127.0.0.1]) by scvm10.sc.intel.com with ESMTP id x2IGtJjU025261; Mon, 18 Mar 2019 09:55:19 -0700 Subject: [PATCH for-rc 2/5] IB/hfi1: Failed to drain send queue when QP is put into error state From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, Mike Marciniszyn , Alex Estrin , stable@vger.kernel.org, Kaike Wan Date: Mon, 18 Mar 2019 09:55:19 -0700 Message-ID: <20190318165514.23550.89083.stgit@scvm10.sc.intel.com> In-Reply-To: <20190318165205.23550.97894.stgit@scvm10.sc.intel.com> References: <20190318165205.23550.97894.stgit@scvm10.sc.intel.com> User-Agent: StGit/0.17.1-18-g2e886-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Kaike Wan When a QP is put into error state, all pending requests in the send work queue should be drained. The following sequence of events could lead to a failure, causing a request to hang: (1) The QP builds a packet and tries to send through SDMA engine. However, PIO engine is still busy. Consequently, this packet is put on the QP's tx list and the QP is put on the PIO waiting list. The field qp->s_flags is set with HFI1_S_WAIT_PIO_DRAIN; (2) The QP is put into error state by the user application and notify_error_qp() is called, which removes the QP from the PIO waiting list and the packet from the QP's tx list. In addition, qp->s_flags is cleared of RVT_S_ANY_WAIT_IO bits, which does not include HFI1_S_WAIT_PIO_DRAIN bit; (3) The hfi1_schdule_send() function is called to drain the QP's send queue. Subsequently, hfi1_do_send() is called. Since the flag bit HFI1_S_WAIT_PIO_DRAIN is set in qp->s_flags, hfi1_send_ok() fails. As a result, hfi1_do_send() bails out without draining any request from the send queue; (4) The PIO engine completes the sending and tries to wake up any QP on its waiting list. But the QP has been removed from the PIO waiting list and therefore is kept in sleep forever. The fix is to clear qp->s_flags of HFI1_S_ANY_WAIT_IO bits in step (2). HFI1_S_ANY_WAIT_IO includes RVT_S_ANY_WAIT_IO and HFI1_S_WAIT_PIO_DRAIN. Fixes: 2e2ba09e48b7 ("IB/rdmavt, IB/hfi1: Create device dependent s_flags") Cc: # 4.19.x+ Reviewed-by: Mike Marciniszyn Reviewed-by: Alex Estrin Signed-off-by: Kaike Wan Signed-off-by: Dennis Dalessandro --- drivers/infiniband/hw/hfi1/qp.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/qp.c b/drivers/infiniband/hw/hfi1/qp.c index 9b643c2..64889ed 100644 --- a/drivers/infiniband/hw/hfi1/qp.c +++ b/drivers/infiniband/hw/hfi1/qp.c @@ -898,7 +898,7 @@ void notify_error_qp(struct rvt_qp *qp) if (!list_empty(&priv->s_iowait.list) && !(qp->s_flags & RVT_S_BUSY) && !(priv->s_flags & RVT_S_BUSY)) { - qp->s_flags &= ~RVT_S_ANY_WAIT_IO; + qp->s_flags &= ~HFI1_S_ANY_WAIT_IO; list_del_init(&priv->s_iowait.list); priv->s_iowait.lock = NULL; rvt_put_qp(qp); From patchwork Mon Mar 18 16:55:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 10858113 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5E3A51390 for ; Mon, 18 Mar 2019 16:55:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3AD8828FAB for ; Mon, 18 Mar 2019 16:55:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2EB5428F9E; Mon, 18 Mar 2019 16:55:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C0F4728F9E for ; Mon, 18 Mar 2019 16:55:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726835AbfCRQza (ORCPT ); Mon, 18 Mar 2019 12:55:30 -0400 Received: from mga12.intel.com ([192.55.52.136]:22737 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726765AbfCRQza (ORCPT ); Mon, 18 Mar 2019 12:55:30 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Mar 2019 09:55:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,494,1544515200"; d="scan'208";a="328355411" Received: from scymds01.sc.intel.com ([10.82.194.37]) by fmsmga006.fm.intel.com with ESMTP; 18 Mar 2019 09:55:29 -0700 Received: from scvm10.sc.intel.com (scvm10.sc.intel.com [10.82.195.27]) by scymds01.sc.intel.com with ESMTP id x2IGtTGG022801; Mon, 18 Mar 2019 09:55:29 -0700 Received: from scvm10.sc.intel.com (localhost [127.0.0.1]) by scvm10.sc.intel.com with ESMTP id x2IGtTTX025348; Mon, 18 Mar 2019 09:55:29 -0700 Subject: [PATCH for-rc 3/5] IB/hfi1: Clear the IOWAIT pending bits when QP is put into error state From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, Mike Marciniszyn , Alex Estrin , Kaike Wan Date: Mon, 18 Mar 2019 09:55:29 -0700 Message-ID: <20190318165524.23550.64023.stgit@scvm10.sc.intel.com> In-Reply-To: <20190318165205.23550.97894.stgit@scvm10.sc.intel.com> References: <20190318165205.23550.97894.stgit@scvm10.sc.intel.com> User-Agent: StGit/0.17.1-18-g2e886-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Kaike Wan When a QP is put into error state, it may be waiting for send engine resources. In this case, the QP will be removed from the send engine's waiting list, but its IOWAIT pending bits are not cleared. This will normally not have any major impact as the QP is being destroyed. However, the QP still needs to wind down its operations, such as draining the send queue by scheduling the send engine. Clearing the pending bits will avoid any potential complications. In addition, if the QP will eventually hang, clearing the pending bits can help debugging by presenting a consistent picture if the user dumps the qp_stats. This patch clears a QP's IOWAIT_PENDING_IB and IO_PENDING_TID bits in priv->s_iowait.flags in this case. Fixes: 5da0fc9dbf89 ("IB/hfi1: Prepare resource waits for dual leg") Reviewed-by: Mike Marciniszyn Reviewed-by: Alex Estrin Signed-off-by: Kaike Wan Signed-off-by: Dennis Dalessandro --- drivers/infiniband/hw/hfi1/qp.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/qp.c b/drivers/infiniband/hw/hfi1/qp.c index 64889ed..eba3003 100644 --- a/drivers/infiniband/hw/hfi1/qp.c +++ b/drivers/infiniband/hw/hfi1/qp.c @@ -899,6 +899,8 @@ void notify_error_qp(struct rvt_qp *qp) !(qp->s_flags & RVT_S_BUSY) && !(priv->s_flags & RVT_S_BUSY)) { qp->s_flags &= ~HFI1_S_ANY_WAIT_IO; + iowait_clear_flag(&priv->s_iowait, IOWAIT_PENDING_IB); + iowait_clear_flag(&priv->s_iowait, IOWAIT_PENDING_TID); list_del_init(&priv->s_iowait.list); priv->s_iowait.lock = NULL; rvt_put_qp(qp); From patchwork Mon Mar 18 16:55:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 10858115 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 587456C2 for ; Mon, 18 Mar 2019 16:55:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 32E1428F9E for ; Mon, 18 Mar 2019 16:55:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 25DBF28FDC; Mon, 18 Mar 2019 16:55:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AD11728F9E for ; Mon, 18 Mar 2019 16:55:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727534AbfCRQzm (ORCPT ); Mon, 18 Mar 2019 12:55:42 -0400 Received: from mga01.intel.com ([192.55.52.88]:23000 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726765AbfCRQzl (ORCPT ); Mon, 18 Mar 2019 12:55:41 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Mar 2019 09:55:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,494,1544515200"; d="scan'208";a="123674130" Received: from scymds01.sc.intel.com ([10.82.194.37]) by orsmga007.jf.intel.com with ESMTP; 18 Mar 2019 09:55:39 -0700 Received: from scvm10.sc.intel.com (scvm10.sc.intel.com [10.82.195.27]) by scymds01.sc.intel.com with ESMTP id x2IGtdV3023907; Mon, 18 Mar 2019 09:55:39 -0700 Received: from scvm10.sc.intel.com (localhost [127.0.0.1]) by scvm10.sc.intel.com with ESMTP id x2IGtdfP025421; Mon, 18 Mar 2019 09:55:39 -0700 Subject: [PATCH for-rc 4/5] IB/hfi1: Eliminate opcode tests on mr deref From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, Mike Marciniszyn , Kaike Wan Date: Mon, 18 Mar 2019 09:55:39 -0700 Message-ID: <20190318165534.23550.6324.stgit@scvm10.sc.intel.com> In-Reply-To: <20190318165205.23550.97894.stgit@scvm10.sc.intel.com> References: <20190318165205.23550.97894.stgit@scvm10.sc.intel.com> User-Agent: StGit/0.17.1-18-g2e886-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Kaike Wan When an old ack_queue entry is used to store an incoming request, it may need to clean up the old entry if it is still referencing the mr. Originally only RDMA READ request needed to reference mr on the responder side and therefore the opcode was tested when cleaning up the old entry. The introduction of tid rdma specific operations in the ack_queue makes the specific opcode tests wrong. Multiple opcodes (RDMA READ, TID RDMA READ, and TID RDMA WRITE) may need mr ref cleanup. Remove the opcode specific tests associated with the ack_queue. Fixes: f48ad614c100 ("IB/hfi1: Move driver out of staging") Signed-off-by: Mike Marciniszyn Signed-off-by: Kaike Wan Signed-off-by: Dennis Dalessandro --- drivers/infiniband/hw/hfi1/rc.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/rc.c b/drivers/infiniband/hw/hfi1/rc.c index e6726c1..5991211 100644 --- a/drivers/infiniband/hw/hfi1/rc.c +++ b/drivers/infiniband/hw/hfi1/rc.c @@ -3088,7 +3088,7 @@ void hfi1_rc_rcv(struct hfi1_packet *packet) update_ack_queue(qp, next); } e = &qp->s_ack_queue[qp->r_head_ack_queue]; - if (e->opcode == OP(RDMA_READ_REQUEST) && e->rdma_sge.mr) { + if (e->rdma_sge.mr) { rvt_put_mr(e->rdma_sge.mr); e->rdma_sge.mr = NULL; } @@ -3166,7 +3166,7 @@ void hfi1_rc_rcv(struct hfi1_packet *packet) update_ack_queue(qp, next); } e = &qp->s_ack_queue[qp->r_head_ack_queue]; - if (e->opcode == OP(RDMA_READ_REQUEST) && e->rdma_sge.mr) { + if (e->rdma_sge.mr) { rvt_put_mr(e->rdma_sge.mr); e->rdma_sge.mr = NULL; } From patchwork Mon Mar 18 16:55:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 10858117 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 301311390 for ; Mon, 18 Mar 2019 16:55:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0A4A828F9E for ; Mon, 18 Mar 2019 16:55:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F2AFC28FDC; Mon, 18 Mar 2019 16:55:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 86C5B28F9E for ; Mon, 18 Mar 2019 16:55:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726922AbfCRQzw (ORCPT ); Mon, 18 Mar 2019 12:55:52 -0400 Received: from mga06.intel.com ([134.134.136.31]:29061 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726765AbfCRQzw (ORCPT ); Mon, 18 Mar 2019 12:55:52 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Mar 2019 09:55:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,494,1544515200"; d="scan'208";a="156070460" Received: from scymds01.sc.intel.com ([10.82.194.37]) by fmsmga001.fm.intel.com with ESMTP; 18 Mar 2019 09:55:50 -0700 Received: from scvm10.sc.intel.com (scvm10.sc.intel.com [10.82.195.27]) by scymds01.sc.intel.com with ESMTP id x2IGtn1F023945; Mon, 18 Mar 2019 09:55:49 -0700 Received: from scvm10.sc.intel.com (localhost [127.0.0.1]) by scvm10.sc.intel.com with ESMTP id x2IGtnli025487; Mon, 18 Mar 2019 09:55:49 -0700 Subject: [PATCH for-rc 5/5] IB/hfi1: Fix the allocation of RSM table From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, "Michael J. Ruhl" , Mike Marciniszyn , Kaike Wan Date: Mon, 18 Mar 2019 09:55:49 -0700 Message-ID: <20190318165544.23550.96951.stgit@scvm10.sc.intel.com> In-Reply-To: <20190318165205.23550.97894.stgit@scvm10.sc.intel.com> References: <20190318165205.23550.97894.stgit@scvm10.sc.intel.com> User-Agent: StGit/0.17.1-18-g2e886-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Kaike Wan The receive side mapping (RSM) on hfi1 hardware is a special matching mechanism to direct an incoming packet to a given hardware receive context. It has 4 instances of matching capabilities (RSM0 - RSM3) that share the same RSM table (RMT). The RMT has a total of 256 entries, each of which points to a receive context. Currently, three instances of RSM have been used: 1. RSM0 by QOS; 2. RSM1 by PSM FECN; 3. RSM2 by VNIC. Each RSM instance should reserve enough entries in RMT to function properly. Since both PSM and VNIC could allocate any receive context between dd->first_dyn_alloc_ctxt and dd->num_rcv_contexts, PSM FECN must reserve enough RMT entries to cover the entire receive context index range (dd->num_rcv_contexts - dd->first_dyn_alloc_ctxt) instead of only the user receive contexts allocated for PSM (dd->num_user_contexts). Consequently, the sizing of dd->num_user_contexts in set_up_context_variables is incorrect. This patch will fix the calculation. Fixes: 2280740f01ae ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support") Reviewed-by: Mike Marciniszyn Reviewed-by: Michael J. Ruhl Signed-off-by: Kaike Wan Signed-off-by: Dennis Dalessandro --- drivers/infiniband/hw/hfi1/chip.c | 26 +++++++++++++++++++------- 1 files changed, 19 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c index 612f041..9784c6c 100644 --- a/drivers/infiniband/hw/hfi1/chip.c +++ b/drivers/infiniband/hw/hfi1/chip.c @@ -13232,7 +13232,7 @@ static int set_up_context_variables(struct hfi1_devdata *dd) int total_contexts; int ret; unsigned ngroups; - int qos_rmt_count; + int rmt_count; int user_rmt_reduced; u32 n_usr_ctxts; u32 send_contexts = chip_send_contexts(dd); @@ -13294,10 +13294,20 @@ static int set_up_context_variables(struct hfi1_devdata *dd) n_usr_ctxts = rcv_contexts - total_contexts; } - /* each user context requires an entry in the RMT */ - qos_rmt_count = qos_rmt_entries(dd, NULL, NULL); - if (qos_rmt_count + n_usr_ctxts > NUM_MAP_ENTRIES) { - user_rmt_reduced = NUM_MAP_ENTRIES - qos_rmt_count; + /* + * The RMT entries are currently allocated as shown below: + * 1. QOS (0 to 128 entries); + * 2. FECN for PSM (num_user_contexts + num_vnic_contexts); + * 3. VNIC (num_vnic_contexts). + * It should be noted that PSM FECN oversubscribe num_vnic_contexts + * entries of RMT because both VNIC and PSM could allocate any receive + * context between dd->first_dyn_alloc_text and dd->num_rcv_contexts, + * and PSM FECN must reserve an RMT entry for each possible PSM receive + * context. + */ + rmt_count = qos_rmt_entries(dd, NULL, NULL) + (num_vnic_contexts * 2); + if (rmt_count + n_usr_ctxts > NUM_MAP_ENTRIES) { + user_rmt_reduced = NUM_MAP_ENTRIES - rmt_count; dd_dev_err(dd, "RMT size is reducing the number of user receive contexts from %u to %d\n", n_usr_ctxts, @@ -14285,9 +14295,11 @@ static void init_user_fecn_handling(struct hfi1_devdata *dd, u64 reg; int i, idx, regoff, regidx; u8 offset; + u32 total_cnt; /* there needs to be enough room in the map table */ - if (rmt->used + dd->num_user_contexts >= NUM_MAP_ENTRIES) { + total_cnt = dd->num_rcv_contexts - dd->first_dyn_alloc_ctxt; + if (rmt->used + total_cnt >= NUM_MAP_ENTRIES) { dd_dev_err(dd, "User FECN handling disabled - too many user contexts allocated\n"); return; } @@ -14341,7 +14353,7 @@ static void init_user_fecn_handling(struct hfi1_devdata *dd, /* add rule 1 */ add_rsm_rule(dd, RSM_INS_FECN, &rrd); - rmt->used += dd->num_user_contexts; + rmt->used += total_cnt; } /* Initialize RSM for VNIC */