Message ID | 20201126202720.2304559-1-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | IB/iser: Remove in_interrupt() usage. | expand |
On Thu, Nov 26, 2020 at 09:27:20PM +0100, Sebastian Andrzej Siewior wrote: > iser_initialize_task_headers() uses in_interrupt() to find out if it is > safe to acquire a mutex. > > in_interrupt() is deprecated as it is ill defined and does not provide what > it suggests. Aside of that it covers only parts of the contexts in which > a mutex may not be acquired. > > The following callchains exist: > > iscsi_queuecommand() *locks* iscsi_session::frwd_lock > -> iscsi_prep_scsi_cmd_pdu() > -> session->tt->init_task() (iscsi_iser_task_init()) > -> iser_initialize_task_headers() > -> iscsi_iser_task_xmit() (iscsi_transport::xmit_task) > -> iscsi_iser_task_xmit_unsol_data() > -> iser_send_data_out() > -> iser_initialize_task_headers() > > iscsi_data_xmit() *locks* iscsi_session::frwd_lock > -> iscsi_prep_mgmt_task() > -> session->tt->init_task() (iscsi_iser_task_init()) > -> iser_initialize_task_headers() > -> iscsi_prep_scsi_cmd_pdu() > -> session->tt->init_task() (iscsi_iser_task_init()) > -> iser_initialize_task_headers() > > __iscsi_conn_send_pdu() caller has iscsi_session::frwd_lock > -> iscsi_prep_mgmt_task() > -> session->tt->init_task() (iscsi_iser_task_init()) > -> iser_initialize_task_headers() > -> session->tt->xmit_task() ( > > The only callchain that is close to be invoked in preemptible context: > iscsi_xmitworker() worker > -> iscsi_data_xmit() > -> iscsi_xmit_task() > -> conn->session->tt->xmit_task() (iscsi_iser_task_xmit() > > In iscsi_iser_task_xmit() there is this check: > if (!task->sc) > return iscsi_iser_mtask_xmit(conn, task); > > so it does end up in iser_initialize_task_headers() and > iser_initialize_task_headers() relies on iscsi_task::sc == NULL. > > Remove conditional locking of iser_conn::state_mutex because there is no > call chain to do so. AFAIK, there is no way to get into a hard IRQ from drivers/infiniband/ulp/* The closest it gets to real HW is a soft IRQ from the CQ handler, starting at these functions: drivers/infiniband/ulp/iser/iser_initiator.c: tx_desc->cqe.done = iser_cmd_comp; drivers/infiniband/ulp/iser/iser_initiator.c: tx_desc->cqe.done = iser_dataout_comp; drivers/infiniband/ulp/iser/iser_initiator.c: mdesc->cqe.done = iser_ctrl_comp; drivers/infiniband/ulp/iser/iser_verbs.c: desc->cqe.done = iser_login_rsp; drivers/infiniband/ulp/iser/iser_verbs.c: rx_desc->cqe.done = iser_task_rsp; So, I can't see any way in_interrupt() was ever detecting actual interrupts. I wonder if it is was some hacky way to detect non-preemption from a softirq or something? > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Cc: Sagi Grimberg <sagi@grimberg.me> > Cc: Max Gurtovoy <maxg@nvidia.com> > Cc: Doug Ledford <dledford@redhat.com> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > --- > drivers/infiniband/ulp/iser/iscsi_iser.c | 7 ------- > 1 file changed, 7 deletions(-) > > diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c > index 3690e28cc7ea2..b34a1881c4cad 100644 > --- a/drivers/infiniband/ulp/iser/iscsi_iser.c > +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c > @@ -187,12 +187,8 @@ iser_initialize_task_headers(struct iscsi_task *task, > struct iser_device *device = iser_conn->ib_conn.device; > struct iscsi_iser_task *iser_task = task->dd_data; > u64 dma_addr; > - const bool mgmt_task = !task->sc && !in_interrupt(); > int ret = 0; Why do you think the task->sc doesn't matter? > - if (unlikely(mgmt_task)) > - mutex_lock(&iser_conn->state_mutex); > - > if (unlikely(iser_conn->state != ISER_CONN_UP)) { > ret = -ENODEV; > goto out; > @@ -215,9 +211,6 @@ iser_initialize_task_headers(struct iscsi_task *task, > > iser_task->iser_conn = iser_conn; > out: > - if (unlikely(mgmt_task)) > - mutex_unlock(&iser_conn->state_mutex); > - > return ret; > } Sagi, you added this code, any rememberance of what it is for? commit 7414dde0a6c3a958e26141991bf5c75dc58d28b2 Author: Sagi Grimberg <sagig@mellanox.com> Date: Sun Dec 7 16:09:59 2014 +0200 IB/iser: Fix race between iser connection teardown and scsi TMFs In certain scenarios (target kill with live IO) scsi TMFs may race with iser RDMA teardown, which might cause NULL dereference on iser IB device handle (which might have been freed). In this case we take a conditional lock for TMFs and check the connection state (avoid introducing lock contention in the IO path). This is indeed best effort approach, but sufficient to survive multi targets sudden death while heavy IO is inflight. While we are on it, add a nice kernel-doc style documentation. Max, can you do a test with this patch and we might luck into a lockdep splat that will be informative? Jason
On 2020-11-26 16:53:57 [-0400], Jason Gunthorpe wrote: > > --- a/drivers/infiniband/ulp/iser/iscsi_iser.c > > +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c > > @@ -187,12 +187,8 @@ iser_initialize_task_headers(struct iscsi_task *task, > > struct iser_device *device = iser_conn->ib_conn.device; > > struct iscsi_iser_task *iser_task = task->dd_data; > > u64 dma_addr; > > - const bool mgmt_task = !task->sc && !in_interrupt(); > > int ret = 0; > > Why do you think the task->sc doesn't matter? Based on the call paths I checked, there was no evidence that state_mutex can be acquired. If I remove locking here then `mgmt_task' is no longer needed. How should task->sc matter? > > - if (unlikely(mgmt_task)) > > - mutex_lock(&iser_conn->state_mutex); > > - > > if (unlikely(iser_conn->state != ISER_CONN_UP)) { > > ret = -ENODEV; > > goto out; … > Jason Sebastian
On Fri, Nov 27, 2020 at 01:34:55PM +0100, Sebastian Andrzej Siewior wrote: > On 2020-11-26 16:53:57 [-0400], Jason Gunthorpe wrote: > > > +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c > > > @@ -187,12 +187,8 @@ iser_initialize_task_headers(struct iscsi_task *task, > > > struct iser_device *device = iser_conn->ib_conn.device; > > > struct iscsi_iser_task *iser_task = task->dd_data; > > > u64 dma_addr; > > > - const bool mgmt_task = !task->sc && !in_interrupt(); > > > int ret = 0; > > > > Why do you think the task->sc doesn't matter? > > Based on the call paths I checked, there was no evidence that > state_mutex can be acquired. If I remove locking here then `mgmt_task' > is no longer needed. That only says there is no recursive deadlock.. > How should task->sc matter? I was able to get the internal bug report that caused the 7414dde0a6c3a commit. The issue here is that the state_mutex is protecting This: if (unlikely(iser_conn->state != ISER_CONN_UP)) { Which indicates that this: dma_addr = ib_dma_map_single(device->ib_device, (void *)tx_desc, Won't crash because iser_con->ib_con is invalid. The notes say that the iSCSI stack is in some state where data traffic won't flow but management traffic is still possible. I suppose this is some fast path so it was "optimized" to eliminate the lock for data traffic. A call chain of interest for the lock at least is: Nov 3 12:24:37 rsws10 BUG: unable to handle kernel Nov 3 12:24:37 NULL pointer dereference Nov 3 12:24:37 rsws10 Pid: 5245, comm: scsi_eh_5 Tainted: GF O 3.8.13-16.2.1.el6uek.x86_64 #1 IBM System x3550 M3 -[7944KEG]-/90Y4784 [..] Nov 3 12:24:37 rsws10 [<ffffffffa069d628>] iscsi_iser_task_init+0x28/0x70 [ib_iser] Nov 3 12:24:37 rsws10 [<ffffffffa0610029>] iscsi_prep_mgmt_task+0x129/0x150 [libiscsi] Nov 3 12:24:37 rsws10 [<ffffffffa061354c>] __iscsi_conn_send_pdu+0x23c/0x310 [libiscsi] Nov 3 12:24:37 rsws10 [<ffffffffa0614277>] iscsi_exec_task_mgmt_fn+0x37/0x290 [libiscsi] Nov 3 12:24:37 rsws10 [<ffffffff813c2694>] ? scsi_send_eh_cmnd+0xd4/0x3a0 Nov 3 12:24:37 rsws10 [<ffffffff810c39df>] ? module_refcount+0x9f/0xc0 Nov 3 12:24:37 rsws10 [<ffffffffa061497b>] iscsi_eh_device_reset+0x1bb/0x2d0 [libiscsi] Nov 3 12:24:37 rsws10 [<ffffffff813c3119>] scsi_eh_bus_device_reset+0xb9/0x1e0 Nov 3 12:24:37 rsws10 [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0 Nov 3 12:24:37 rsws10 [<ffffffff813c3cbe>] scsi_eh_ready_devs+0x5e/0x110 Nov 3 12:24:37 rsws10 [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0 Nov 3 12:24:37 rsws10 [<ffffffff813c3e5d>] scsi_unjam_host+0xed/0x1f0 Nov 3 12:24:37 rsws10 [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0 Nov 3 12:24:37 rsws10 [<ffffffff813c40c8>] scsi_error_handler+0x168/0x1c0 Nov 3 12:24:37 rsws10 [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0 Nov 3 12:24:37 rsws10 [<ffffffff81082a6e>] kthread+0xce/0xe0 Nov 3 12:24:37 rsws10 [<ffffffff810829a0>] ? kthread_freezable_should_stop+0x70/0x70 Nov 3 12:24:37 rsws10 [<ffffffff8159b66c>] ret_from_fork+0x7c/0xb0 Nov 3 12:24:37 rsws10 [<ffffffff810829a0>] ? kthread_freezable_should_stop+0x70/0x70 So, I think the usual 'pass in atomic context flag' is really needed here Jason
On 2020-11-27 09:03:14 [-0400], Jason Gunthorpe wrote: > I was able to get the internal bug report that caused the > 7414dde0a6c3a commit. > > The issue here is that the state_mutex is protecting > > This: > > if (unlikely(iser_conn->state != ISER_CONN_UP)) { > > Which indicates that this: > > dma_addr = ib_dma_map_single(device->ib_device, (void *)tx_desc, > > Won't crash because iser_con->ib_con is invalid. The notes say that > the iSCSI stack is in some state where data traffic won't flow but > management traffic is still possible. I suppose this is some fast path > so it was "optimized" to eliminate the lock for data traffic. > > A call chain of interest for the lock at least is: > > Nov 3 12:24:37 rsws10 BUG: unable to handle kernel > Nov 3 12:24:37 NULL pointer dereference > Nov 3 12:24:37 rsws10 Pid: 5245, comm: scsi_eh_5 Tainted: GF O 3.8.13-16.2.1.el6uek.x86_64 #1 IBM System x3550 M3 -[7944KEG]-/90Y4784 > [..] > Nov 3 12:24:37 rsws10 [<ffffffffa069d628>] iscsi_iser_task_init+0x28/0x70 [ib_iser] > Nov 3 12:24:37 rsws10 [<ffffffffa0610029>] iscsi_prep_mgmt_task+0x129/0x150 [libiscsi] > Nov 3 12:24:37 rsws10 [<ffffffffa061354c>] __iscsi_conn_send_pdu+0x23c/0x310 [libiscsi] > Nov 3 12:24:37 rsws10 [<ffffffffa0614277>] iscsi_exec_task_mgmt_fn+0x37/0x290 [libiscsi] > Nov 3 12:24:37 rsws10 [<ffffffffa061497b>] iscsi_eh_device_reset+0x1bb/0x2d0 [libiscsi] preemptible until here and this function has: | mutex_lock(&session->eh_mutex); | spin_lock_bh(&session->frwd_lock); I don't see the lock dropped between here and iscsi_iser_task_init(). > Nov 3 12:24:37 rsws10 [<ffffffff813c3119>] scsi_eh_bus_device_reset+0xb9/0x1e0 > Nov 3 12:24:37 rsws10 [<ffffffff810829a0>] ? kthread_freezable_should_stop+0x70/0x70 > > So, I think the usual 'pass in atomic context flag' is really needed > here Sure, I would do that but as noted above, it the `frwd_lock' is acquired so you can't acquire the mutex here. > Jason Sebastian
On Fri, Nov 27, 2020 at 03:14:32PM +0100, Sebastian Andrzej Siewior wrote: > On 2020-11-27 09:03:14 [-0400], Jason Gunthorpe wrote: > > I was able to get the internal bug report that caused the > > 7414dde0a6c3a commit. > > > > The issue here is that the state_mutex is protecting > > > > This: > > > > if (unlikely(iser_conn->state != ISER_CONN_UP)) { > > > > Which indicates that this: > > > > dma_addr = ib_dma_map_single(device->ib_device, (void *)tx_desc, > > > > Won't crash because iser_con->ib_con is invalid. The notes say that > > the iSCSI stack is in some state where data traffic won't flow but > > management traffic is still possible. I suppose this is some fast path > > so it was "optimized" to eliminate the lock for data traffic. > > > > A call chain of interest for the lock at least is: > > > > Nov 3 12:24:37 rsws10 BUG: unable to handle kernel > > Nov 3 12:24:37 NULL pointer dereference > > Nov 3 12:24:37 rsws10 Pid: 5245, comm: scsi_eh_5 Tainted: GF O 3.8.13-16.2.1.el6uek.x86_64 #1 IBM System x3550 M3 -[7944KEG]-/90Y4784 > > [..] > > Nov 3 12:24:37 rsws10 [<ffffffffa069d628>] iscsi_iser_task_init+0x28/0x70 [ib_iser] > > Nov 3 12:24:37 rsws10 [<ffffffffa0610029>] iscsi_prep_mgmt_task+0x129/0x150 [libiscsi] > > Nov 3 12:24:37 rsws10 [<ffffffffa061354c>] __iscsi_conn_send_pdu+0x23c/0x310 [libiscsi] > > Nov 3 12:24:37 rsws10 [<ffffffffa0614277>] iscsi_exec_task_mgmt_fn+0x37/0x290 [libiscsi] > > Nov 3 12:24:37 rsws10 [<ffffffffa061497b>] iscsi_eh_device_reset+0x1bb/0x2d0 [libiscsi] > > preemptible until here and this function has: > > | mutex_lock(&session->eh_mutex); > | spin_lock_bh(&session->frwd_lock); > > I don't see the lock dropped between here and iscsi_iser_task_init(). Hmm, nor do I This whole thing does look broken. So.. it looks like the "fix" in 7414dde0a6c3a was adding the: + if (unlikely(iser_conn->state != ISER_CONN_UP)) { Without any locking. Which is a pretty typical mistake :\ > Sure, I would do that but as noted above, it the `frwd_lock' is acquired > so you can't acquire the mutex here. Ok, well, I'm thinking this patch is OK as is. Lets wait for Max and Sagi Jason
On 2020-11-27 10:31:38 [-0400], Jason Gunthorpe wrote: > > Sure, I would do that but as noted above, it the `frwd_lock' is acquired > > so you can't acquire the mutex here. > > Ok, well, I'm thinking this patch is OK as is. Lets wait for Max and Sagi a gentle ping to Max and Sagi in case we still wait for them here. > Jason Sebastian
>>> Sure, I would do that but as noted above, it the `frwd_lock' is acquired >>> so you can't acquire the mutex here. >> >> Ok, well, I'm thinking this patch is OK as is. Lets wait for Max and Sagi > > a gentle ping to Max and Sagi in case we still wait for them here. Hey, I agree with the change, it was a while back, and advisory anyways. But while touching it, you can remove the now redundant goto out tag because there is no finalization of the routine now.
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 3690e28cc7ea2..b34a1881c4cad 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -187,12 +187,8 @@ iser_initialize_task_headers(struct iscsi_task *task, struct iser_device *device = iser_conn->ib_conn.device; struct iscsi_iser_task *iser_task = task->dd_data; u64 dma_addr; - const bool mgmt_task = !task->sc && !in_interrupt(); int ret = 0; - if (unlikely(mgmt_task)) - mutex_lock(&iser_conn->state_mutex); - if (unlikely(iser_conn->state != ISER_CONN_UP)) { ret = -ENODEV; goto out; @@ -215,9 +211,6 @@ iser_initialize_task_headers(struct iscsi_task *task, iser_task->iser_conn = iser_conn; out: - if (unlikely(mgmt_task)) - mutex_unlock(&iser_conn->state_mutex); - return ret; }
iser_initialize_task_headers() uses in_interrupt() to find out if it is safe to acquire a mutex. in_interrupt() is deprecated as it is ill defined and does not provide what it suggests. Aside of that it covers only parts of the contexts in which a mutex may not be acquired. The following callchains exist: iscsi_queuecommand() *locks* iscsi_session::frwd_lock -> iscsi_prep_scsi_cmd_pdu() -> session->tt->init_task() (iscsi_iser_task_init()) -> iser_initialize_task_headers() -> iscsi_iser_task_xmit() (iscsi_transport::xmit_task) -> iscsi_iser_task_xmit_unsol_data() -> iser_send_data_out() -> iser_initialize_task_headers() iscsi_data_xmit() *locks* iscsi_session::frwd_lock -> iscsi_prep_mgmt_task() -> session->tt->init_task() (iscsi_iser_task_init()) -> iser_initialize_task_headers() -> iscsi_prep_scsi_cmd_pdu() -> session->tt->init_task() (iscsi_iser_task_init()) -> iser_initialize_task_headers() __iscsi_conn_send_pdu() caller has iscsi_session::frwd_lock -> iscsi_prep_mgmt_task() -> session->tt->init_task() (iscsi_iser_task_init()) -> iser_initialize_task_headers() -> session->tt->xmit_task() ( The only callchain that is close to be invoked in preemptible context: iscsi_xmitworker() worker -> iscsi_data_xmit() -> iscsi_xmit_task() -> conn->session->tt->xmit_task() (iscsi_iser_task_xmit() In iscsi_iser_task_xmit() there is this check: if (!task->sc) return iscsi_iser_mtask_xmit(conn, task); so it does end up in iser_initialize_task_headers() and iser_initialize_task_headers() relies on iscsi_task::sc == NULL. Remove conditional locking of iser_conn::state_mutex because there is no call chain to do so. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Max Gurtovoy <maxg@nvidia.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> --- drivers/infiniband/ulp/iser/iscsi_iser.c | 7 ------- 1 file changed, 7 deletions(-)