From patchwork Fri Jun 7 12:25:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 10981613 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0F87514E5 for ; Fri, 7 Jun 2019 12:25:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F1E3A28B73 for ; Fri, 7 Jun 2019 12:25:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E64D528BCC; Fri, 7 Jun 2019 12:25:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5BF0228BCD for ; Fri, 7 Jun 2019 12:25:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728906AbfFGMZq (ORCPT ); Fri, 7 Jun 2019 08:25:46 -0400 Received: from mga07.intel.com ([134.134.136.100]:39619 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728597AbfFGMZp (ORCPT ); Fri, 7 Jun 2019 08:25:45 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Jun 2019 05:25:44 -0700 X-ExtLoop1: 1 Received: from sedona.ch.intel.com ([10.2.136.157]) by orsmga004.jf.intel.com with ESMTP; 07 Jun 2019 05:25:42 -0700 Received: from awfm-01.aw.intel.com (awfm-01.aw.intel.com [10.228.212.213]) by sedona.ch.intel.com (8.14.3/8.14.3/Standard MailSET/Hub) with ESMTP id x57CPdoO062826; Fri, 7 Jun 2019 05:25:40 -0700 Received: from awfm-01.aw.intel.com (localhost [127.0.0.1]) by awfm-01.aw.intel.com (8.14.7/8.14.7) with ESMTP id x57CPcUF158531; Fri, 7 Jun 2019 08:25:38 -0400 Subject: [PATCH for-rc 3/3] IB/hfi1: Correct tid qp rcd to match verbs context From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, Mike Marciniszyn , stable@vger.kernel.org, Kaike Wan Date: Fri, 07 Jun 2019 08:25:38 -0400 Message-ID: <20190607122538.158478.62945.stgit@awfm-01.aw.intel.com> In-Reply-To: <20190607113807.157915.48581.stgit@awfm-01.aw.intel.com> References: <20190607113807.157915.48581.stgit@awfm-01.aw.intel.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Mike Marciniszyn The qp priv rcd pointer doesn't match the context being used for verbs causing issues when 9B and kdeth packets are processed by different receive contexts and hence different CPUs. When running on different CPUs the following panic can occur: [476262.398106] WARNING: CPU: 3 PID: 2584 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0 [476262.398109] list_del corruption. prev->next should be ffff9a7ac31f7a30, but was ffff9a7c3bc89230 [476262.398266] CPU: 3 PID: 2584 Comm: z_wr_iss Kdump: loaded Tainted: P OE ------------ 3.10.0-862.2.3.el7_lustre.x86_64 #1 [476262.398272] Call Trace: [476262.398277] [] dump_stack+0x19/0x1b [476262.398314] [] __warn+0xd8/0x100 [476262.398317] [] warn_slowpath_fmt+0x5f/0x80 [476262.398320] [] __list_del_entry+0xa1/0xd0 [476262.398402] [] process_rcv_qp_work+0xb5/0x160 [hfi1] [476262.398424] [] handle_receive_interrupt_nodma_rtail+0x20b/0x2b0 [hfi1] [476262.398438] [] receive_context_interrupt+0x23/0x40 [hfi1] [476262.398447] [] __handle_irq_event_percpu+0x44/0x1c0 [476262.398450] [] handle_irq_event_percpu+0x32/0x80 [476262.398454] [] handle_irq_event+0x3c/0x60 [476262.398460] [] handle_edge_irq+0x7f/0x150 [476262.398469] [] handle_irq+0xe4/0x1a0 [476262.398475] [] do_IRQ+0x4d/0xf0 [476262.398481] [] common_interrupt+0x162/0x162 [476262.398482] [] ? memcpy+0x6/0x110 [476262.398645] [] ? abd_copy_from_buf_off_cb+0x1d/0x30 [zfs] [476262.398678] [] ? abd_copy_to_buf_off_cb+0x30/0x30 [zfs] [476262.398696] [] abd_iterate_func+0x97/0x120 [zfs] [476262.398710] [] abd_copy_from_buf_off+0x39/0x60 [zfs] [476262.398726] [] arc_write_ready+0x178/0x300 [zfs] [476262.398732] [] ? mutex_lock+0x12/0x2f [476262.398734] [] ? mutex_lock+0x12/0x2f [476262.398837] [] zio_ready+0x65/0x3d0 [zfs] [476262.398884] [] ? tsd_get_by_thread+0x2e/0x50 [spl] [476262.398893] [] ? taskq_member+0x18/0x30 [spl] [476262.398968] [] zio_execute+0xa2/0x100 [zfs] [476262.398982] [] taskq_thread+0x2ac/0x4f0 [spl] [476262.399001] [] ? wake_up_state+0x20/0x20 [476262.399043] [] ? zio_taskq_member.isra.7.constprop.10+0x80/0x80 [zfs] [476262.399055] [] ? taskq_thread_spawn+0x60/0x60 [spl] [476262.399067] [] kthread+0xd1/0xe0 [476262.399072] [] ? insert_kthread_work+0x40/0x40 [476262.399082] [] ret_from_fork_nospec_begin+0x21/0x21 [476262.399087] [] ? insert_kthread_work+0x40/0x40 Fix by reading the map entry in the same manner as the hardware so that the kdeth and verbs contexts match. Fixes: 5190f052a365 ("IB/hfi1: Allow the driver to initialize QP priv struct") Cc: Reviewed-by: Kaike Wan Signed-off-by: Mike Marciniszyn Signed-off-by: Dennis Dalessandro --- drivers/infiniband/hw/hfi1/chip.c | 13 +++++++++++++ drivers/infiniband/hw/hfi1/chip.h | 1 + drivers/infiniband/hw/hfi1/tid_rdma.c | 5 ++--- 3 files changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c index 4221a99e..674f62a 100644 --- a/drivers/infiniband/hw/hfi1/chip.c +++ b/drivers/infiniband/hw/hfi1/chip.c @@ -14032,6 +14032,19 @@ static void init_kdeth_qp(struct hfi1_devdata *dd) } /** + * hfi1_get_qp_map + * @dd: device data + * @idx: index to read + */ +u8 hfi1_get_qp_map(struct hfi1_devdata *dd, u8 idx) +{ + u64 reg = read_csr(dd, RCV_QP_MAP_TABLE + (idx / 8) * 8); + + reg >>= (idx % 8) * 8; + return (u8)reg; +} + +/** * init_qpmap_table * @dd - device data * @first_ctxt - first context diff --git a/drivers/infiniband/hw/hfi1/chip.h b/drivers/infiniband/hw/hfi1/chip.h index 4e6c355..b76cf81 100644 --- a/drivers/infiniband/hw/hfi1/chip.h +++ b/drivers/infiniband/hw/hfi1/chip.h @@ -1445,6 +1445,7 @@ int hfi1_set_ctxt_pkey(struct hfi1_devdata *dd, struct hfi1_ctxtdata *ctxt, void remap_intr(struct hfi1_devdata *dd, int isrc, int msix_intr); void remap_sdma_interrupts(struct hfi1_devdata *dd, int engine, int msix_intr); void reset_interrupts(struct hfi1_devdata *dd); +u8 hfi1_get_qp_map(struct hfi1_devdata *dd, u8 idx); /* * Interrupt source table. diff --git a/drivers/infiniband/hw/hfi1/tid_rdma.c b/drivers/infiniband/hw/hfi1/tid_rdma.c index 6fb9303..d77276d 100644 --- a/drivers/infiniband/hw/hfi1/tid_rdma.c +++ b/drivers/infiniband/hw/hfi1/tid_rdma.c @@ -312,9 +312,8 @@ static struct hfi1_ctxtdata *qp_to_rcd(struct rvt_dev_info *rdi, if (qp->ibqp.qp_num == 0) ctxt = 0; else - ctxt = ((qp->ibqp.qp_num >> dd->qos_shift) % - (dd->n_krcv_queues - 1)) + 1; - + ctxt = hfi1_get_qp_map(dd, + (u8)(qp->ibqp.qp_num >> dd->qos_shift)); return dd->rcd[ctxt]; }