Message ID | 20190117204148.30826.53799.stgit@scvm10.sc.intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | hfi1 and qib patches for rc | expand |
On Thu, Jan 17, 2019 at 12:41:54PM -0800, Dennis Dalessandro wrote: > From: Michael J. Ruhl <michael.j.ruhl@intel.com> > > When disabling and removing a receive context, it is possible for an > asynchronous event (i.e IRQ) to occur. Because of this there is a > race between cleaning up the context, and the context being used by > the asynchronous event. > > cpu 0 (context cleanup) > rc->ref_count-- (ref_count == 0) > hfi1_rcd_free() > cpu 1 (IRQ (with rcd index)) > rcd_get_by_index() > lock > ref_count+++ <-- reference count race (WARNING) > return rcd > unlock > cpu 0 > hfi1_free_ctxtdata() <-- incorrect free location > lock > remove rcd from array > unlock > free rcd > > This race will cause the following WARNING trace: > > WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] > CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1 > Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 > Call Trace: > dump_stack+0x19/0x1b > __warn+0xd8/0x100 > warn_slowpath_null+0x1d/0x20 > hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] > is_rcv_urgent_int+0x24/0x90 [hfi1] > general_interrupt+0x1b6/0x210 [hfi1] > __handle_irq_event_percpu+0x44/0x1c0 > handle_irq_event_percpu+0x32/0x80 > handle_irq_event+0x3c/0x60 > handle_edge_irq+0x7f/0x150 > handle_irq+0xe4/0x1a0 > do_IRQ+0x4d/0xf0 > common_interrupt+0x162/0x162 > > The race can also lead to a use after free which could be similar > to: > > general protection fault: 0000 1 SMP > CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1 > Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 > task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230 > Call Trace: > ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] > hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] > hfi1_aio_write+0xba/0x110 [hfi1] > do_sync_readv_writev+0x7b/0xd0 > do_readv_writev+0xce/0x260 > ? handle_mm_fault+0x39d/0x9b0 > ? pick_next_task_fair+0x5f/0x1b0 > ? sched_clock_cpu+0x85/0xc0 > ? __schedule+0x13a/0x890 > vfs_writev+0x35/0x60 > SyS_writev+0x7f/0x110 > system_call_fastpath+0x22/0x27 > > Introduce a context enable flag to indicate if the context is enabled. > > Reorder context cleanup to ensure context removal before cleanup > occurs correctly. > > Cc: stable@vger.kernel.org # v4.14.0+ > Fixes: f683c80ca68e ("IB/hfi1: Resolve kernel panics by reference counting receive contexts") > Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> > Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> > Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> > drivers/infiniband/hw/hfi1/file_ops.c | 2 ++ > drivers/infiniband/hw/hfi1/hfi.h | 3 ++- > drivers/infiniband/hw/hfi1/init.c | 8 +++++--- > 3 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c > index c22ebc7..0ba0cf5 100644 > +++ b/drivers/infiniband/hw/hfi1/file_ops.c > @@ -671,6 +671,8 @@ static int hfi1_file_close(struct inode *inode, struct file *fp) > } > spin_unlock_irqrestore(&dd->uctxt_lock, flags); > > + uctxt->del_pend = 1; Not going to get good results if you update shared data outside the lock that is reading it.. > @@ -326,7 +325,10 @@ struct hfi1_ctxtdata *hfi1_rcd_get_by_index(struct hfi1_devdata *dd, u16 ctxt) > spin_lock_irqsave(&dd->uctxt_lock, flags); > if (dd->rcd[ctxt]) { > rcd = dd->rcd[ctxt]; > - hfi1_rcd_get(rcd); > + if (rcd->del_pend) > + rcd = NULL; > + else > + hfi1_rcd_get(rcd); This probably wants to be kref_get_not_zero() - why add a confusing del_pend? Jason
>-----Original Message----- >From: Jason Gunthorpe [mailto:jgg@ziepe.ca] >Sent: Thursday, January 17, 2019 4:12 PM >To: Dalessandro, Dennis <dennis.dalessandro@intel.com> >Cc: dledford@redhat.com; linux-rdma@vger.kernel.org; Ruhl, Michael J ><michael.j.ruhl@intel.com>; Marciniszyn, Mike ><mike.marciniszyn@intel.com>; stable@vger.kernel.org >Subject: Re: [PATCH for-rc 5/7] IB/hfi1: Close race condition on user context >disable and close > >On Thu, Jan 17, 2019 at 12:41:54PM -0800, Dennis Dalessandro wrote: >> From: Michael J. Ruhl <michael.j.ruhl@intel.com> >> >> When disabling and removing a receive context, it is possible for an >> asynchronous event (i.e IRQ) to occur. Because of this there is a >> race between cleaning up the context, and the context being used by >> the asynchronous event. >> >> cpu 0 (context cleanup) >> rc->ref_count-- (ref_count == 0) >> hfi1_rcd_free() >> cpu 1 (IRQ (with rcd index)) >> rcd_get_by_index() >> lock >> ref_count+++ <-- reference count race (WARNING) >> return rcd >> unlock >> cpu 0 >> hfi1_free_ctxtdata() <-- incorrect free location >> lock >> remove rcd from array >> unlock >> free rcd >> >> This race will cause the following WARNING trace: >> >> WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 >hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] >> CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ----------- >- 3.10.0-957.el7.x86_64 #1 >> Hardware name: Intel Corporation S2600KP/S2600KP, BIOS >SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 >> Call Trace: >> dump_stack+0x19/0x1b >> __warn+0xd8/0x100 >> warn_slowpath_null+0x1d/0x20 >> hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] >> is_rcv_urgent_int+0x24/0x90 [hfi1] >> general_interrupt+0x1b6/0x210 [hfi1] >> __handle_irq_event_percpu+0x44/0x1c0 >> handle_irq_event_percpu+0x32/0x80 >> handle_irq_event+0x3c/0x60 >> handle_edge_irq+0x7f/0x150 >> handle_irq+0xe4/0x1a0 >> do_IRQ+0x4d/0xf0 >> common_interrupt+0x162/0x162 >> >> The race can also lead to a use after free which could be similar >> to: >> >> general protection fault: 0000 1 SMP >> CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------ >------ 3.10.0-957.el7.x86_64 #1 >> Hardware name: Intel Corporation S2600KP/S2600KP, BIOS >SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 >> task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 >__kmalloc+0x94/0x230 >> Call Trace: >> ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] >> hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] >> hfi1_aio_write+0xba/0x110 [hfi1] >> do_sync_readv_writev+0x7b/0xd0 >> do_readv_writev+0xce/0x260 >> ? handle_mm_fault+0x39d/0x9b0 >> ? pick_next_task_fair+0x5f/0x1b0 >> ? sched_clock_cpu+0x85/0xc0 >> ? __schedule+0x13a/0x890 >> vfs_writev+0x35/0x60 >> SyS_writev+0x7f/0x110 >> system_call_fastpath+0x22/0x27 >> >> Introduce a context enable flag to indicate if the context is enabled. >> >> Reorder context cleanup to ensure context removal before cleanup >> occurs correctly. >> >> Cc: stable@vger.kernel.org # v4.14.0+ >> Fixes: f683c80ca68e ("IB/hfi1: Resolve kernel panics by reference counting >receive contexts") >> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> >> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> >> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> >> drivers/infiniband/hw/hfi1/file_ops.c | 2 ++ >> drivers/infiniband/hw/hfi1/hfi.h | 3 ++- >> drivers/infiniband/hw/hfi1/init.c | 8 +++++--- >> 3 files changed, 9 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/infiniband/hw/hfi1/file_ops.c >b/drivers/infiniband/hw/hfi1/file_ops.c >> index c22ebc7..0ba0cf5 100644 >> +++ b/drivers/infiniband/hw/hfi1/file_ops.c >> @@ -671,6 +671,8 @@ static int hfi1_file_close(struct inode *inode, struct >file *fp) >> } >> spin_unlock_irqrestore(&dd->uctxt_lock, flags); >> >> + uctxt->del_pend = 1; > >Not going to get good results if you update shared data outside the >lock that is reading it.. >> @@ -326,7 +325,10 @@ struct hfi1_ctxtdata *hfi1_rcd_get_by_index(struct >hfi1_devdata *dd, u16 ctxt) >> spin_lock_irqsave(&dd->uctxt_lock, flags); >> if (dd->rcd[ctxt]) { >> rcd = dd->rcd[ctxt]; >> - hfi1_rcd_get(rcd); >> + if (rcd->del_pend) >> + rcd = NULL; >> + else >> + hfi1_rcd_get(rcd); > >This probably wants to be kref_get_not_zero() - why add a confusing >del_pend? kref_get_unless_zero()? /sigh. I missed that function completely. I will examine this some more and work up a new patch. M >Jason
Hi, This patch needs to be redone. Please drop. Thanks, Mike >-----Original Message----- >From: Sasha Levin [mailto:sashal@kernel.org] >Sent: Tuesday, January 22, 2019 10:56 AM >To: Sasha Levin <sashal@kernel.org>; Dalessandro, Dennis ><dennis.dalessandro@intel.com>; Ruhl, Michael J ><michael.j.ruhl@intel.com>; jgg@ziepe.ca; dledford@redhat.com >Cc: linux-rdma@vger.kernel.org; stable@vger.kernel.org >Subject: Re: [PATCH for-rc 5/7] IB/hfi1: Close race condition on user context >disable and close > >Hi, > >[This is an automated email] > >This commit has been processed because it contains a "Fixes:" tag, >fixing commit: f683c80ca68e IB/hfi1: Resolve kernel panics by reference >counting receive contexts. > >The bot has tested the following trees: v4.20.3, v4.19.16, v4.14.94. > >v4.20.3: Build OK! >v4.19.16: Build OK! >v4.14.94: Failed to apply! Possible dependencies: > 071e4fec8e4d ("IB/hfi1: Reorg ctxtdata and rightsize fields") > 1b311f8931cf ("IB/hfi1: Add tx_opcode_stats like the opcode_stats") > 40442b30aad0 ("IB/hfi1: Move rhf_offset from devdata to ctxtdata") > b0ba3c18d6bf ("IB/hfi1: Move normal functions from hfi1_devdata to const >array") > c8314811f9b2 ("IB/hfi1: Cleanup of exp_rcv") > ed71e86a8d66 ("IB/hfi1: Rename exp_lock to exp_mutex") > > >How should we proceed with this patch? > >-- >Thanks, >Sasha
diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c index c22ebc7..0ba0cf5 100644 --- a/drivers/infiniband/hw/hfi1/file_ops.c +++ b/drivers/infiniband/hw/hfi1/file_ops.c @@ -671,6 +671,8 @@ static int hfi1_file_close(struct inode *inode, struct file *fp) } spin_unlock_irqrestore(&dd->uctxt_lock, flags); + uctxt->del_pend = 1; + /* * Disable receive context and interrupt available, reset all * RcvCtxtCtrl bits to default values. diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h index dc8ba43..ea99768 100644 --- a/drivers/infiniband/hw/hfi1/hfi.h +++ b/drivers/infiniband/hw/hfi1/hfi.h @@ -284,7 +284,8 @@ struct hfi1_ctxtdata { u16 expected_base; /* Device context index */ u8 ctxt; - + /* ctxt deletion is pending */ + u8 del_pend; /* PSM Specific fields */ /* lock protecting all Expected TID data */ struct mutex exp_mutex; diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c index 044d9a8..d2ac3ef 100644 --- a/drivers/infiniband/hw/hfi1/init.c +++ b/drivers/infiniband/hw/hfi1/init.c @@ -215,12 +215,11 @@ static void hfi1_rcd_free(struct kref *kref) struct hfi1_ctxtdata *rcd = container_of(kref, struct hfi1_ctxtdata, kref); - hfi1_free_ctxtdata(rcd->dd, rcd); - spin_lock_irqsave(&rcd->dd->uctxt_lock, flags); rcd->dd->rcd[rcd->ctxt] = NULL; spin_unlock_irqrestore(&rcd->dd->uctxt_lock, flags); + hfi1_free_ctxtdata(rcd->dd, rcd); kfree(rcd); } @@ -326,7 +325,10 @@ struct hfi1_ctxtdata *hfi1_rcd_get_by_index(struct hfi1_devdata *dd, u16 ctxt) spin_lock_irqsave(&dd->uctxt_lock, flags); if (dd->rcd[ctxt]) { rcd = dd->rcd[ctxt]; - hfi1_rcd_get(rcd); + if (rcd->del_pend) + rcd = NULL; + else + hfi1_rcd_get(rcd); } spin_unlock_irqrestore(&dd->uctxt_lock, flags);