From patchwork Mon Feb 10 13:10:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 11373225 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B76DE924 for ; Mon, 10 Feb 2020 13:10:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A267D20708 for ; Mon, 10 Feb 2020 13:10:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729400AbgBJNKn (ORCPT ); Mon, 10 Feb 2020 08:10:43 -0500 Received: from mga07.intel.com ([134.134.136.100]:55385 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729347AbgBJNKb (ORCPT ); Mon, 10 Feb 2020 08:10:31 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Feb 2020 05:10:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,425,1574150400"; d="scan'208";a="431595858" Received: from sedona.ch.intel.com ([10.2.136.157]) by fmsmga005.fm.intel.com with ESMTP; 10 Feb 2020 05:10:29 -0800 Received: from awfm-01.aw.intel.com (awfm-01.aw.intel.com [10.228.212.213]) by sedona.ch.intel.com (8.14.3/8.14.3/Standard MailSET/Hub) with ESMTP id 01ADATnF007521; Mon, 10 Feb 2020 06:10:29 -0700 Received: from awfm-01.aw.intel.com (localhost [127.0.0.1]) by awfm-01.aw.intel.com (8.14.7/8.14.7) with ESMTP id 01ADAQ0Z087642; Mon, 10 Feb 2020 08:10:27 -0500 Subject: [PATCH for-rc 1/3] IB/hfi1: Acquire lock to release TID entries when user file is closed From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, Mike Marciniszyn , Kaike Wan Date: Mon, 10 Feb 2020 08:10:26 -0500 Message-ID: <20200210131026.87408.86853.stgit@awfm-01.aw.intel.com> In-Reply-To: <20200210130712.87408.34564.stgit@awfm-01.aw.intel.com> References: <20200210130712.87408.34564.stgit@awfm-01.aw.intel.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Kaike Wan Each user context is allocated a certain number of RcvArray (TID) entries and these entries are managed through TID groups. These groups are put into one of three lists in each user context: tid_group_list, tid_used_list, and tid_full_list, depending on the number of used TID entries within each group. When TID packets are expected, one or more TID groups will be allocated. After the packets are received, the TID groups will be freed. Since multiple user threads may access the TID groups simultaneously, a mutex exp_mutex is used to synchronize the access. However, when the user file is closed, it tries to release all TID groups without acquiring the mutex first, which risks a race condition with another thread that may be releasing its TID groups, leading to data corruption. This patch addresses the issue by acquiring the mutex first before releasing the TID groups when the file is closed. Fixes: 3abb33ac6521 ("staging/hfi1: Add TID cache receive init and free funcs") Reviewed-by: Mike Marciniszyn Signed-off-by: Kaike Wan Signed-off-by: Dennis Dalessandro --- drivers/infiniband/hw/hfi1/user_exp_rcv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.c b/drivers/infiniband/hw/hfi1/user_exp_rcv.c index f05742a..2443423 100644 --- a/drivers/infiniband/hw/hfi1/user_exp_rcv.c +++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.c @@ -142,10 +142,12 @@ void hfi1_user_exp_rcv_free(struct hfi1_filedata *fd) { struct hfi1_ctxtdata *uctxt = fd->uctxt; + mutex_lock(&uctxt->exp_mutex); if (!EXP_TID_SET_EMPTY(uctxt->tid_full_list)) unlock_exp_tids(uctxt, &uctxt->tid_full_list, fd); if (!EXP_TID_SET_EMPTY(uctxt->tid_used_list)) unlock_exp_tids(uctxt, &uctxt->tid_used_list, fd); + mutex_unlock(&uctxt->exp_mutex); kfree(fd->invalid_tids); fd->invalid_tids = NULL; From patchwork Mon Feb 10 13:10:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 11373223 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 34567924 for ; Mon, 10 Feb 2020 13:10:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0ED3320714 for ; Mon, 10 Feb 2020 13:10:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729338AbgBJNKj (ORCPT ); Mon, 10 Feb 2020 08:10:39 -0500 Received: from mga11.intel.com ([192.55.52.93]:4880 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730486AbgBJNKh (ORCPT ); Mon, 10 Feb 2020 08:10:37 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Feb 2020 05:10:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,425,1574150400"; d="scan'208";a="233098939" Received: from sedona.ch.intel.com ([10.2.136.157]) by orsmga003.jf.intel.com with ESMTP; 10 Feb 2020 05:10:36 -0800 Received: from awfm-01.aw.intel.com (awfm-01.aw.intel.com [10.228.212.213]) by sedona.ch.intel.com (8.14.3/8.14.3/Standard MailSET/Hub) with ESMTP id 01ADAZol007524; Mon, 10 Feb 2020 06:10:35 -0700 Received: from awfm-01.aw.intel.com (localhost [127.0.0.1]) by awfm-01.aw.intel.com (8.14.7/8.14.7) with ESMTP id 01ADAX7Z087658; Mon, 10 Feb 2020 08:10:34 -0500 Subject: [PATCH for-rc 2/3] IB/hfi1: Close window for pq and request coliding From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, Mike Marciniszyn , Kaike Wan Date: Mon, 10 Feb 2020 08:10:33 -0500 Message-ID: <20200210131033.87408.81174.stgit@awfm-01.aw.intel.com> In-Reply-To: <20200210130712.87408.34564.stgit@awfm-01.aw.intel.com> References: <20200210130712.87408.34564.stgit@awfm-01.aw.intel.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Mike Marciniszyn Cleaning up a pq can result in the following warning and panic: [29386.970819] WARNING: CPU: 52 PID: 77418 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0 [29386.970821] list_del corruption, ffff88cb2c6ac068->next is LIST_POISON1 (dead000000000100) [29386.970823] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables [29386.970877] nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit] [29386.970891] CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G OE ------------ 3.10.0-957.38.3.el7.x86_64 #1 [29386.970893] Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019 [29386.970894] Call Trace: [29386.970905] [] dump_stack+0x19/0x1b [29386.970911] [] __warn+0xd8/0x100 [29386.970913] [] warn_slowpath_fmt+0x5f/0x80 [29386.970916] [] __list_del_entry+0x63/0xd0 [29386.970919] [] list_del+0xd/0x30 [29386.970924] [] kmem_cache_destroy+0x50/0x110 [29386.970969] [] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1] [29386.970984] [] hfi1_file_close+0x70/0x1e0 [hfi1] [29386.970988] [] __fput+0xec/0x260 [29386.970991] [] ____fput+0xe/0x10 [29386.970995] [] task_work_run+0xbb/0xe0 [29386.971000] [] do_notify_resume+0xa5/0xc0 [29386.971004] [] int_signal+0x12/0x17 [29386.971024] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [29386.978897] IP: [] kmem_cache_close+0x7e/0x300 [29386.984928] PGD 2cdab19067 PUD 2f7bfdb067 PMD 0 [29386.989627] Oops: 0000 [#1] SMP [29386.992902] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables [29387.065320] nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit] [29387.078246] CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.38.3.el7.x86_64 #1 [29387.089796] Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019 [29387.099097] task: ffff88cc26db9040 ti: ffff88b5393a8000 task.ti: ffff88b5393a8000 [29387.106576] RIP: 0010:[] [] kmem_cache_close+0x7e/0x300 [29387.115033] RSP: 0018:ffff88b5393abd60 EFLAGS: 00010287 [29387.120346] RAX: 0000000000000000 RBX: ffff88cb2c6ac000 RCX: 0000000000000003 [29387.127479] RDX: 0000000000000400 RSI: 0000000000000400 RDI: ffffffff9095b800 [29387.134612] RBP: ffff88b5393abdb0 R08: ffffffff9095b808 R09: ffffffff8ff77c19 [29387.141746] R10: ffff88b73ce1f160 R11: ffffddecddde9800 R12: ffff88cb2c6ac000 [29387.148877] R13: 000000000000000c R14: ffff88cf3fdca780 R15: 0000000000000000 [29387.156010] FS: 00002aaaaab52500(0000) GS:ffff88b73ce00000(0000) knlGS:0000000000000000 [29387.164097] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [29387.169839] CR2: 0000000000000010 CR3: 0000002d27664000 CR4: 00000000007607e0 [29387.176966] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [29387.184100] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [29387.191232] PKRU: 55555554 [29387.193944] Call Trace: [29387.196400] [] __kmem_cache_shutdown+0x14/0x80 [29387.202492] [] kmem_cache_destroy+0x58/0x110 [29387.208461] [] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1] [29387.215694] [] hfi1_file_close+0x70/0x1e0 [hfi1] [29387.221957] [] __fput+0xec/0x260 [29387.226837] [] ____fput+0xe/0x10 [29387.231718] [] task_work_run+0xbb/0xe0 [29387.237116] [] do_notify_resume+0xa5/0xc0 [29387.242778] [] int_signal+0x12/0x17 [29387.247922] Code: 00 00 ba 00 04 00 00 0f 4f c2 3d 00 04 00 00 89 45 bc 0f 84 e7 01 00 00 48 63 45 bc 49 8d 04 c4 48 89 45 b0 48 8b 80 c8 00 00 00 <48> 8b 78 10 48 89 45 c0 48 83 c0 10 48 89 45 d0 48 8b 17 48 39 [29387.268313] RIP [] kmem_cache_close+0x7e/0x300 [29387.274440] RSP [29387.277932] CR2: 0000000000000010 The panic is the result of slab entries being freed during the destruction of the pq slab. The code attempts to quiesce the pq, but looking for n_req == 0 doesn't account for new requests. Fix the issue by using SRCU to get a pq pointer and adjust the pq free logic to NULL the fd pq pointer prior to the quiesce. Fixes: e87473bc1b6c ("IB/hfi1: Only set fd pointer when base context is completely initialized") Reviewed-by: Kaike Wan Signed-off-by: Mike Marciniszyn Signed-off-by: Dennis Dalessandro --- drivers/infiniband/hw/hfi1/file_ops.c | 52 ++++++++++++++++++----------- drivers/infiniband/hw/hfi1/hfi.h | 5 ++- drivers/infiniband/hw/hfi1/user_exp_rcv.c | 3 -- drivers/infiniband/hw/hfi1/user_sdma.c | 17 +++++++-- 4 files changed, 48 insertions(+), 29 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c index bef6946..2591158 100644 --- a/drivers/infiniband/hw/hfi1/file_ops.c +++ b/drivers/infiniband/hw/hfi1/file_ops.c @@ -200,23 +200,24 @@ static int hfi1_file_open(struct inode *inode, struct file *fp) fd = kzalloc(sizeof(*fd), GFP_KERNEL); - if (fd) { - fd->rec_cpu_num = -1; /* no cpu affinity by default */ - fd->mm = current->mm; - mmgrab(fd->mm); - fd->dd = dd; - kobject_get(&fd->dd->kobj); - fp->private_data = fd; - } else { - fp->private_data = NULL; - - if (atomic_dec_and_test(&dd->user_refcount)) - complete(&dd->user_comp); - - return -ENOMEM; - } - + if (!fd || init_srcu_struct(&fd->pq_srcu)) + goto nomem; + spin_lock_init(&fd->pq_rcu_lock); + spin_lock_init(&fd->tid_lock); + spin_lock_init(&fd->invalid_lock); + fd->rec_cpu_num = -1; /* no cpu affinity by default */ + fd->mm = current->mm; + mmgrab(fd->mm); + fd->dd = dd; + kobject_get(&fd->dd->kobj); + fp->private_data = fd; return 0; +nomem: + kfree(fd); + fp->private_data = NULL; + if (atomic_dec_and_test(&dd->user_refcount)) + complete(&dd->user_comp); + return -ENOMEM; } static long hfi1_file_ioctl(struct file *fp, unsigned int cmd, @@ -301,21 +302,30 @@ static long hfi1_file_ioctl(struct file *fp, unsigned int cmd, static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from) { struct hfi1_filedata *fd = kiocb->ki_filp->private_data; - struct hfi1_user_sdma_pkt_q *pq = fd->pq; + struct hfi1_user_sdma_pkt_q *pq; struct hfi1_user_sdma_comp_q *cq = fd->cq; int done = 0, reqs = 0; unsigned long dim = from->nr_segs; + int idx; - if (!cq || !pq) + idx = srcu_read_lock(&fd->pq_srcu); + pq = srcu_dereference(fd->pq, &fd->pq_srcu); + if (!cq || !pq) { + srcu_read_unlock(&fd->pq_srcu, idx); return -EIO; + } - if (!iter_is_iovec(from) || !dim) + if (!iter_is_iovec(from) || !dim) { + srcu_read_unlock(&fd->pq_srcu, idx); return -EINVAL; + } trace_hfi1_sdma_request(fd->dd, fd->uctxt->ctxt, fd->subctxt, dim); - if (atomic_read(&pq->n_reqs) == pq->n_max_reqs) + if (atomic_read(&pq->n_reqs) == pq->n_max_reqs) { + srcu_read_unlock(&fd->pq_srcu, idx); return -ENOSPC; + } while (dim) { int ret; @@ -333,6 +343,7 @@ static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from) reqs++; } + srcu_read_unlock(&fd->pq_srcu, idx); return reqs; } @@ -707,6 +718,7 @@ static int hfi1_file_close(struct inode *inode, struct file *fp) if (atomic_dec_and_test(&dd->user_refcount)) complete(&dd->user_comp); + cleanup_srcu_struct(&fdata->pq_srcu); kfree(fdata); return 0; } diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h index 6365e8f..cae12f4 100644 --- a/drivers/infiniband/hw/hfi1/hfi.h +++ b/drivers/infiniband/hw/hfi1/hfi.h @@ -1444,10 +1444,13 @@ static inline bool hfi1_vnic_is_rsm_full(struct hfi1_devdata *dd, int spare) /* Private data for file operations */ struct hfi1_filedata { + struct srcu_struct pq_srcu; struct hfi1_devdata *dd; struct hfi1_ctxtdata *uctxt; struct hfi1_user_sdma_comp_q *cq; - struct hfi1_user_sdma_pkt_q *pq; + /* update side lock for SRCU */ + spinlock_t pq_rcu_lock; + struct hfi1_user_sdma_pkt_q __rcu *pq; u16 subctxt; /* for cpu affinity; -1 if none */ int rec_cpu_num; diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.c b/drivers/infiniband/hw/hfi1/user_exp_rcv.c index 2443423..4da03f8 100644 --- a/drivers/infiniband/hw/hfi1/user_exp_rcv.c +++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.c @@ -87,9 +87,6 @@ int hfi1_user_exp_rcv_init(struct hfi1_filedata *fd, { int ret = 0; - spin_lock_init(&fd->tid_lock); - spin_lock_init(&fd->invalid_lock); - fd->entry_to_rb = kcalloc(uctxt->expected_count, sizeof(struct rb_node *), GFP_KERNEL); diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c index fd754a1..c2f0d9b 100644 --- a/drivers/infiniband/hw/hfi1/user_sdma.c +++ b/drivers/infiniband/hw/hfi1/user_sdma.c @@ -179,7 +179,6 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt, pq = kzalloc(sizeof(*pq), GFP_KERNEL); if (!pq) return -ENOMEM; - pq->dd = dd; pq->ctxt = uctxt->ctxt; pq->subctxt = fd->subctxt; @@ -236,7 +235,7 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt, goto pq_mmu_fail; } - fd->pq = pq; + rcu_assign_pointer(fd->pq, pq); fd->cq = cq; return 0; @@ -264,8 +263,14 @@ int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd, trace_hfi1_sdma_user_free_queues(uctxt->dd, uctxt->ctxt, fd->subctxt); - pq = fd->pq; + spin_lock(&fd->pq_rcu_lock); + pq = srcu_dereference_check(fd->pq, &fd->pq_srcu, + lockdep_is_held(&fd->pq_rcu_lock)); if (pq) { + rcu_assign_pointer(fd->pq, NULL); + spin_unlock(&fd->pq_rcu_lock); + synchronize_srcu(&fd->pq_srcu); + /* at this point there can be no more new requests */ if (pq->handler) hfi1_mmu_rb_unregister(pq->handler); iowait_sdma_drain(&pq->busy); @@ -277,7 +282,8 @@ int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd, kfree(pq->req_in_use); kmem_cache_destroy(pq->txreq_cache); kfree(pq); - fd->pq = NULL; + } else { + spin_unlock(&fd->pq_rcu_lock); } if (fd->cq) { vfree(fd->cq->comps); @@ -321,7 +327,8 @@ int hfi1_user_sdma_process_request(struct hfi1_filedata *fd, { int ret = 0, i; struct hfi1_ctxtdata *uctxt = fd->uctxt; - struct hfi1_user_sdma_pkt_q *pq = fd->pq; + struct hfi1_user_sdma_pkt_q *pq = + srcu_dereference(fd->pq, &fd->pq_srcu); struct hfi1_user_sdma_comp_q *cq = fd->cq; struct hfi1_devdata *dd = pq->dd; unsigned long idx = 0; From patchwork Mon Feb 10 13:10:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 11373227 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B9A05138D for ; Mon, 10 Feb 2020 13:10:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9428920715 for ; Mon, 10 Feb 2020 13:10:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729495AbgBJNKp (ORCPT ); Mon, 10 Feb 2020 08:10:45 -0500 Received: from mga03.intel.com ([134.134.136.65]:12280 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729347AbgBJNKo (ORCPT ); Mon, 10 Feb 2020 08:10:44 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Feb 2020 05:10:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,425,1574150400"; d="scan'208";a="226144313" Received: from sedona.ch.intel.com ([10.2.136.157]) by orsmga008.jf.intel.com with ESMTP; 10 Feb 2020 05:10:42 -0800 Received: from awfm-01.aw.intel.com (awfm-01.aw.intel.com [10.228.212.213]) by sedona.ch.intel.com (8.14.3/8.14.3/Standard MailSET/Hub) with ESMTP id 01ADAfNh007530; Mon, 10 Feb 2020 06:10:42 -0700 Received: from awfm-01.aw.intel.com (localhost [127.0.0.1]) by awfm-01.aw.intel.com (8.14.7/8.14.7) with ESMTP id 01ADAevu087674; Mon, 10 Feb 2020 08:10:40 -0500 Subject: [PATCH for-rc 3/3] IB/rdmavt: Reset all QPs when the device is shut down From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, Mike Marciniszyn , Kaike Wan Date: Mon, 10 Feb 2020 08:10:40 -0500 Message-ID: <20200210131040.87408.38161.stgit@awfm-01.aw.intel.com> In-Reply-To: <20200210130712.87408.34564.stgit@awfm-01.aw.intel.com> References: <20200210130712.87408.34564.stgit@awfm-01.aw.intel.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Kaike Wan When the hfi1 device is shut down during a system reboot, it is possible that some QPs might have not not freed by ULPs. More requests could be post sent and a lingering timer could be triggered to schedule more packet sends, leading to a crash: [ 188.570075] BUG: unable to handle kernel NULL pointer dereference at 0000000000000102 [ 188.570114] IP: [ffffffff810a65f2] __queue_work+0x32/0x3c0 [ 188.570142] PGD 0 [ 188.570154] Oops: 0000 1 SMP [ 188.570171] Modules linked in: nvmet_rdma(OE) nvmet(OE) nvme(OE) dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) pal_raw(POE) pal_pmt(POE) pal_cache(POE) pal_pile(POE) pal(POE) pal_compatible(OE) rpcrdma sunrpc ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_ssif pcspkr ses enclosure joydev scsi_transport_sas i2c_i801 sg mei_me lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad dm_multipath hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en [ 188.570501] sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core crct10dif_pclmul crct10dif_common hfi1(OE) igb crc32c_intel rdmavt(OE) ahci ib_core libahci libata ptp megaraid_sas pps_core dca i2c_algo_bit i2c_core devlink dm_mirror dm_region_hash dm_log dm_mod [ 188.570641] CPU: 23 PID: 0 Comm: swapper/23 Tainted: P OE ------------ 3.10.0-693.el7.x86_64 #1 [ 188.570674] Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0028.121720182203 12/17/2018 [ 188.570708] task: ffff8808f4ec4f10 ti: ffff8808f4ed8000 task.ti: ffff8808f4ed8000 [ 188.570733] RIP: 0010:[ffffffff810a65f2] [ffffffff810a65f2] __queue_work+0x32/0x3c0 [ 188.570763] RSP: 0018:ffff88105df43d48 EFLAGS: 00010046 [ 188.570782] RAX: 0000000000000086 RBX: 0000000000000086 RCX: 0000000000000000 [ 188.570806] RDX: ffff880f74e758b0 RSI: 0000000000000000 RDI: 000000000000001f [ 188.570830] RBP: ffff88105df43d80 R08: ffff8808f3c583c8 R09: ffff8808f3c58000 [ 188.570854] R10: 0000000000000002 R11: ffff88105df43da8 R12: ffff880f74e758b0 [ 188.570877] R13: 000000000000001f R14: 0000000000000000 R15: ffff88105a300000 [ 188.570901] FS: 0000000000000000(0000) GS:ffff88105df40000(0000) knlGS:0000000000000000 [ 188.570929] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 188.570948] CR2: 0000000000000102 CR3: 00000000019f2000 CR4: 00000000001407e0 [ 188.570972] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 188.570996] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 188.571020] Stack: [ 188.571029] ffff88105b6dd708 0000001f00000286 0000000000000086 ffff88105a300000 [ 188.571060] ffff880f74e75800 0000000000000000 ffff88105a300000 ffff88105df43d98 [ 188.571090] ffffffff810a6b85 ffff88105a301e80 ffff88105df43dc8 ffffffffc0224cde [ 188.571119] Call Trace: [ 188.571130] IRQ [ 188.571138] [ 188.571148] [ffffffff810a6b85] queue_work_on+0x45/0x50 [ 188.571186] [ffffffffc0224cde] _hfi1_schedule_send+0x6e/0xc0 [hfi1] [ 188.571215] [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt] [ 188.571248] [ffffffffc0224d62] hfi1_schedule_send+0x32/0x70 [hfi1] [ 188.571275] [ffffffffc0170644] rvt_rc_timeout+0xd4/0x120 [rdmavt] [ 188.571301] [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt] [ 188.571324] [ffffffff81097316] call_timer_fn+0x36/0x110 [ 188.571348] [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt] [ 188.571370] [ffffffff8109982d] run_timer_softirq+0x22d/0x310 [ 188.571393] [ffffffff81090b3f] __do_softirq+0xef/0x280 [ 188.571415] [ffffffff816b6a5c] call_softirq+0x1c/0x30 [ 188.571435] [ffffffff8102d3c5] do_softirq+0x65/0xa0 [ 188.571454] [ffffffff81090ec5] irq_exit+0x105/0x110 [ 188.572263] [ffffffff816b76c2] smp_apic_timer_interrupt+0x42/0x50 [ 188.573060] [ffffffff816b5c1d] apic_timer_interrupt+0x6d/0x80 [ 188.573846] EOI [ 188.573854] [ 188.574631] [ffffffff81527a02] ? cpuidle_enter_state+0x52/0xc0 [ 188.575425] [ffffffff81527b48] cpuidle_idle_call+0xd8/0x210 [ 188.576203] [ffffffff81034fee] arch_cpu_idle+0xe/0x30 [ 188.576959] [ffffffff810e7bca] cpu_startup_entry+0x14a/0x1c0 [ 188.577697] [ffffffff81051af6] start_secondary+0x1b6/0x230 [ 188.578412] Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 be 02 00 00 41 f6 86 02 01 00 00 01 0f 85 58 02 00 00 49 c7 c7 28 19 01 00 [ 188.579926] RIP [ffffffff810a65f2] __queue_work+0x32/0x3c0 [ 188.580623] RSP ffff88105df43d48 [ 188.581293] CR2: 0000000000000102 The solution is to reset the QPs before the device resources are freed. This reset will change the QP state to prevent post sends and delete timers to prevent callbacks. Fixes: 0acb0cc7ecc1 ("IB/rdmavt: Initialize and teardown of qpn table") Reviewed-by: Mike Marciniszyn Signed-off-by: Kaike Wan Signed-off-by: Dennis Dalessandro --- drivers/infiniband/sw/rdmavt/qp.c | 84 ++++++++++++++++++++++--------------- 1 file changed, 51 insertions(+), 33 deletions(-) diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c index 3cdf75d..7858d49 100644 --- a/drivers/infiniband/sw/rdmavt/qp.c +++ b/drivers/infiniband/sw/rdmavt/qp.c @@ -61,6 +61,8 @@ #define RVT_RWQ_COUNT_THRESHOLD 16 static void rvt_rc_timeout(struct timer_list *t); +static void rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp, + enum ib_qp_type type); /* * Convert the AETH RNR timeout code into the number of microseconds. @@ -452,40 +454,41 @@ int rvt_driver_qp_init(struct rvt_dev_info *rdi) } /** - * free_all_qps - check for QPs still in use + * rvt_free_qp_cb - callback function to reset a qp + * @qp: the qp to reset + * @v: a 64-bit value + * + * This function resets the qp and removes it from the + * qp hash table. + */ +static void rvt_free_qp_cb(struct rvt_qp *qp, u64 v) +{ + unsigned int *qp_inuse = (unsigned int *)v; + struct rvt_dev_info *rdi = ib_to_rvt(qp->ibqp.device); + + /* Reset the qp and remove it from the qp hash list */ + rvt_reset_qp(rdi, qp, qp->ibqp.qp_type); + + /* Increment the qp_inuse count */ + (*qp_inuse)++; +} + +/** + * rvt_free_all_qps - check for QPs still in use * @rdi: rvt device info structure * * There should not be any QPs still in use. * Free memory for table. + * Return the number of QPs still in use. */ static unsigned rvt_free_all_qps(struct rvt_dev_info *rdi) { - unsigned long flags; - struct rvt_qp *qp; - unsigned n, qp_inuse = 0; - spinlock_t *ql; /* work around too long line below */ - - if (rdi->driver_f.free_all_qps) - qp_inuse = rdi->driver_f.free_all_qps(rdi); + unsigned int qp_inuse = 0; qp_inuse += rvt_mcast_tree_empty(rdi); - if (!rdi->qp_dev) - return qp_inuse; - - ql = &rdi->qp_dev->qpt_lock; - spin_lock_irqsave(ql, flags); - for (n = 0; n < rdi->qp_dev->qp_table_size; n++) { - qp = rcu_dereference_protected(rdi->qp_dev->qp_table[n], - lockdep_is_held(ql)); - RCU_INIT_POINTER(rdi->qp_dev->qp_table[n], NULL); + rvt_qp_iter(rdi, (u64)&qp_inuse, rvt_free_qp_cb); - for (; qp; qp = rcu_dereference_protected(qp->next, - lockdep_is_held(ql))) - qp_inuse++; - } - spin_unlock_irqrestore(ql, flags); - synchronize_rcu(); return qp_inuse; } @@ -902,14 +905,14 @@ static void rvt_init_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp, } /** - * rvt_reset_qp - initialize the QP state to the reset state + * _rvt_reset_qp - initialize the QP state to the reset state * @qp: the QP to reset * @type: the QP type * * r_lock, s_hlock, and s_lock are required to be held by the caller */ -static void rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp, - enum ib_qp_type type) +static void _rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp, + enum ib_qp_type type) __must_hold(&qp->s_lock) __must_hold(&qp->s_hlock) __must_hold(&qp->r_lock) @@ -955,6 +958,27 @@ static void rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp, lockdep_assert_held(&qp->s_lock); } +/** + * rvt_reset_qp - initialize the QP state to the reset state + * @rdi: the device info + * @qp: the QP to reset + * @type: the QP type + * + * This is the wrapper function to acquire the r_lock, s_hlock, and s_lock + * before calling _rvt_reset_qp(). + */ +static void rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp, + enum ib_qp_type type) +{ + spin_lock_irq(&qp->r_lock); + spin_lock(&qp->s_hlock); + spin_lock(&qp->s_lock); + _rvt_reset_qp(rdi, qp, type); + spin_unlock(&qp->s_lock); + spin_unlock(&qp->s_hlock); + spin_unlock_irq(&qp->r_lock); +} + /** rvt_free_qpn - Free a qpn from the bit map * @qpt: QP table * @qpn: queue pair number to free @@ -1546,7 +1570,7 @@ int rvt_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, switch (new_state) { case IB_QPS_RESET: if (qp->state != IB_QPS_RESET) - rvt_reset_qp(rdi, qp, ibqp->qp_type); + _rvt_reset_qp(rdi, qp, ibqp->qp_type); break; case IB_QPS_RTR: @@ -1695,13 +1719,7 @@ int rvt_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata) struct rvt_qp *qp = ibqp_to_rvtqp(ibqp); struct rvt_dev_info *rdi = ib_to_rvt(ibqp->device); - spin_lock_irq(&qp->r_lock); - spin_lock(&qp->s_hlock); - spin_lock(&qp->s_lock); rvt_reset_qp(rdi, qp, ibqp->qp_type); - spin_unlock(&qp->s_lock); - spin_unlock(&qp->s_hlock); - spin_unlock_irq(&qp->r_lock); wait_event(qp->wait, !atomic_read(&qp->refcount)); /* qpn is now available for use again */