IB/iser: Remove in_interrupt() usage.

Message ID	20201126202720.2304559-1-bigeasy@linutronix.de (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-rdma-owner@kernel.org> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> To: linux-rdma@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Sagi Grimberg <sagi@grimberg.me>, Max Gurtovoy <maxg@nvidia.com>, Doug Ledford <dledford@redhat.com>, Jason Gunthorpe <jgg@ziepe.ca> Subject: [PATCH] IB/iser: Remove in_interrupt() usage. Date: Thu, 26 Nov 2020 21:27:20 +0100 Message-Id: <20201126202720.2304559-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk
Series	IB/iser: Remove in_interrupt() usage. \| expand IB/iser: Remove in_interrupt() usage.

Sebastian Andrzej Siewior Nov. 26, 2020, 8:27 p.m. UTC

iser_initialize_task_headers() uses in_interrupt() to find out if it is
safe to acquire a mutex.

in_interrupt() is deprecated as it is ill defined and does not provide what
it suggests. Aside of that it covers only parts of the contexts in which
a mutex may not be acquired.

The following callchains exist:

iscsi_queuecommand() *locks* iscsi_session::frwd_lock
-> iscsi_prep_scsi_cmd_pdu()
   -> session->tt->init_task() (iscsi_iser_task_init())
      -> iser_initialize_task_headers()
-> iscsi_iser_task_xmit() (iscsi_transport::xmit_task)
  -> iscsi_iser_task_xmit_unsol_data()
    -> iser_send_data_out()
      -> iser_initialize_task_headers()

iscsi_data_xmit() *locks* iscsi_session::frwd_lock
-> iscsi_prep_mgmt_task()
   -> session->tt->init_task() (iscsi_iser_task_init())
      -> iser_initialize_task_headers()
-> iscsi_prep_scsi_cmd_pdu()
   -> session->tt->init_task() (iscsi_iser_task_init())
      -> iser_initialize_task_headers()

__iscsi_conn_send_pdu() caller has iscsi_session::frwd_lock
  -> iscsi_prep_mgmt_task()
     -> session->tt->init_task() (iscsi_iser_task_init())
        -> iser_initialize_task_headers()
  -> session->tt->xmit_task() (

The only callchain that is close to be invoked in preemptible context:
iscsi_xmitworker() worker
-> iscsi_data_xmit()
   -> iscsi_xmit_task()
      -> conn->session->tt->xmit_task() (iscsi_iser_task_xmit()

In iscsi_iser_task_xmit() there is this check:
   if (!task->sc)
      return iscsi_iser_mtask_xmit(conn, task);

so it does end up in iser_initialize_task_headers() and
iser_initialize_task_headers() relies on iscsi_task::sc == NULL.

Remove conditional locking of iser_conn::state_mutex because there is no
call chain to do so.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Max Gurtovoy <maxg@nvidia.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
---
 drivers/infiniband/ulp/iser/iscsi_iser.c | 7 -------
 1 file changed, 7 deletions(-)

Jason Gunthorpe Nov. 26, 2020, 8:53 p.m. UTC | #1

On Thu, Nov 26, 2020 at 09:27:20PM +0100, Sebastian Andrzej Siewior wrote:
> iser_initialize_task_headers() uses in_interrupt() to find out if it is
> safe to acquire a mutex.
> 
> in_interrupt() is deprecated as it is ill defined and does not provide what
> it suggests. Aside of that it covers only parts of the contexts in which
> a mutex may not be acquired.
> 
> The following callchains exist:
> 
> iscsi_queuecommand() *locks* iscsi_session::frwd_lock
> -> iscsi_prep_scsi_cmd_pdu()
>    -> session->tt->init_task() (iscsi_iser_task_init())
>       -> iser_initialize_task_headers()
> -> iscsi_iser_task_xmit() (iscsi_transport::xmit_task)
>   -> iscsi_iser_task_xmit_unsol_data()
>     -> iser_send_data_out()
>       -> iser_initialize_task_headers()
> 
> iscsi_data_xmit() *locks* iscsi_session::frwd_lock
> -> iscsi_prep_mgmt_task()
>    -> session->tt->init_task() (iscsi_iser_task_init())
>       -> iser_initialize_task_headers()
> -> iscsi_prep_scsi_cmd_pdu()
>    -> session->tt->init_task() (iscsi_iser_task_init())
>       -> iser_initialize_task_headers()
> 
> __iscsi_conn_send_pdu() caller has iscsi_session::frwd_lock
>   -> iscsi_prep_mgmt_task()
>      -> session->tt->init_task() (iscsi_iser_task_init())
>         -> iser_initialize_task_headers()
>   -> session->tt->xmit_task() (
> 
> The only callchain that is close to be invoked in preemptible context:
> iscsi_xmitworker() worker
> -> iscsi_data_xmit()
>    -> iscsi_xmit_task()
>       -> conn->session->tt->xmit_task() (iscsi_iser_task_xmit()
> 
> In iscsi_iser_task_xmit() there is this check:
>    if (!task->sc)
>       return iscsi_iser_mtask_xmit(conn, task);
> 
> so it does end up in iser_initialize_task_headers() and
> iser_initialize_task_headers() relies on iscsi_task::sc == NULL.
> 
> Remove conditional locking of iser_conn::state_mutex because there is no
> call chain to do so.

AFAIK, there is no way to get into a hard IRQ from
drivers/infiniband/ulp/*

The closest it gets to real HW is a soft IRQ from the CQ handler,
starting at these functions:

drivers/infiniband/ulp/iser/iser_initiator.c:   tx_desc->cqe.done = iser_cmd_comp;
drivers/infiniband/ulp/iser/iser_initiator.c:   tx_desc->cqe.done = iser_dataout_comp;
drivers/infiniband/ulp/iser/iser_initiator.c:   mdesc->cqe.done = iser_ctrl_comp;
drivers/infiniband/ulp/iser/iser_verbs.c:       desc->cqe.done = iser_login_rsp;
drivers/infiniband/ulp/iser/iser_verbs.c:               rx_desc->cqe.done = iser_task_rsp;

So, I can't see any way in_interrupt() was ever detecting actual
interrupts. I wonder if it is was some hacky way to detect
non-preemption from a softirq or something?

> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Max Gurtovoy <maxg@nvidia.com>
> Cc: Doug Ledford <dledford@redhat.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> ---
>  drivers/infiniband/ulp/iser/iscsi_iser.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
> index 3690e28cc7ea2..b34a1881c4cad 100644
> --- a/drivers/infiniband/ulp/iser/iscsi_iser.c
> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
> @@ -187,12 +187,8 @@ iser_initialize_task_headers(struct iscsi_task *task,
>  	struct iser_device *device = iser_conn->ib_conn.device;
>  	struct iscsi_iser_task *iser_task = task->dd_data;
>  	u64 dma_addr;
> -	const bool mgmt_task = !task->sc && !in_interrupt();
>  	int ret = 0;

Why do you think the task->sc doesn't matter?

> -	if (unlikely(mgmt_task))
> -		mutex_lock(&iser_conn->state_mutex);
> -
>  	if (unlikely(iser_conn->state != ISER_CONN_UP)) {
>  		ret = -ENODEV;
>  		goto out;
> @@ -215,9 +211,6 @@ iser_initialize_task_headers(struct iscsi_task *task,
>  
>  	iser_task->iser_conn = iser_conn;
>  out:
> -	if (unlikely(mgmt_task))
> -		mutex_unlock(&iser_conn->state_mutex);
> -
>  	return ret;
>  }

Sagi, you added this code, any rememberance of what it is for?

commit 7414dde0a6c3a958e26141991bf5c75dc58d28b2
Author: Sagi Grimberg <sagig@mellanox.com>
Date:   Sun Dec 7 16:09:59 2014 +0200

    IB/iser: Fix race between iser connection teardown and scsi TMFs
    
    In certain scenarios (target kill with live IO) scsi TMFs may race
    with iser RDMA teardown, which might cause NULL dereference on iser IB
    device handle (which might have been freed). In this case we take a
    conditional lock for TMFs and check the connection state (avoid
    introducing lock contention in the IO path). This is indeed best
    effort approach, but sufficient to survive multi targets sudden death
    while heavy IO is inflight.
    
    While we are on it, add a nice kernel-doc style documentation.

Max, can you do a test with this patch and we might luck into a
lockdep splat that will be informative?

Jason

Sebastian Andrzej Siewior Nov. 27, 2020, 12:34 p.m. UTC | #2

On 2020-11-26 16:53:57 [-0400], Jason Gunthorpe wrote:
> > --- a/drivers/infiniband/ulp/iser/iscsi_iser.c
> > +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
> > @@ -187,12 +187,8 @@ iser_initialize_task_headers(struct iscsi_task *task,
> >  	struct iser_device *device = iser_conn->ib_conn.device;
> >  	struct iscsi_iser_task *iser_task = task->dd_data;
> >  	u64 dma_addr;
> > -	const bool mgmt_task = !task->sc && !in_interrupt();
> >  	int ret = 0;
> 
> Why do you think the task->sc doesn't matter?

Based on the call paths I checked, there was no evidence that
state_mutex can be acquired. If I remove locking here then `mgmt_task'
is no longer needed.
How should task->sc matter?

> > -	if (unlikely(mgmt_task))
> > -		mutex_lock(&iser_conn->state_mutex);
> > -
> >  	if (unlikely(iser_conn->state != ISER_CONN_UP)) {
> >  		ret = -ENODEV;
> >  		goto out;
…
> Jason

Sebastian

Jason Gunthorpe Nov. 27, 2020, 1:03 p.m. UTC | #3

On Fri, Nov 27, 2020 at 01:34:55PM +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-26 16:53:57 [-0400], Jason Gunthorpe wrote:
> > > +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
> > > @@ -187,12 +187,8 @@ iser_initialize_task_headers(struct iscsi_task *task,
> > >  	struct iser_device *device = iser_conn->ib_conn.device;
> > >  	struct iscsi_iser_task *iser_task = task->dd_data;
> > >  	u64 dma_addr;
> > > -	const bool mgmt_task = !task->sc && !in_interrupt();
> > >  	int ret = 0;
> > 
> > Why do you think the task->sc doesn't matter?
> 
> Based on the call paths I checked, there was no evidence that
> state_mutex can be acquired. If I remove locking here then `mgmt_task'
> is no longer needed.

That only says there is no recursive deadlock..

> How should task->sc matter?

I was able to get the internal bug report that caused the
7414dde0a6c3a commit.

The issue here is that the state_mutex is protecting 

This:

	if (unlikely(iser_conn->state != ISER_CONN_UP)) {

Which indicates that this:

        dma_addr = ib_dma_map_single(device->ib_device, (void *)tx_desc,

Won't crash because iser_con->ib_con is invalid. The notes say that
the iSCSI stack is in some state where data traffic won't flow but
management traffic is still possible. I suppose this is some fast path
so it was "optimized" to eliminate the lock for data traffic.

A call chain of interest for the lock at least is:

Nov  3 12:24:37 rsws10 BUG: unable to handle kernel 
Nov  3 12:24:37 NULL pointer dereference
Nov  3 12:24:37 rsws10 Pid: 5245, comm: scsi_eh_5 Tainted: GF          O 3.8.13-16.2.1.el6uek.x86_64 #1 IBM System x3550 M3 -[7944KEG]-/90Y4784
[..]
Nov  3 12:24:37 rsws10  [<ffffffffa069d628>] iscsi_iser_task_init+0x28/0x70 [ib_iser]
Nov  3 12:24:37 rsws10  [<ffffffffa0610029>] iscsi_prep_mgmt_task+0x129/0x150 [libiscsi]
Nov  3 12:24:37 rsws10  [<ffffffffa061354c>] __iscsi_conn_send_pdu+0x23c/0x310 [libiscsi]
Nov  3 12:24:37 rsws10  [<ffffffffa0614277>] iscsi_exec_task_mgmt_fn+0x37/0x290 [libiscsi]
Nov  3 12:24:37 rsws10  [<ffffffff813c2694>] ? scsi_send_eh_cmnd+0xd4/0x3a0
Nov  3 12:24:37 rsws10  [<ffffffff810c39df>] ? module_refcount+0x9f/0xc0
Nov  3 12:24:37 rsws10  [<ffffffffa061497b>] iscsi_eh_device_reset+0x1bb/0x2d0 [libiscsi]
Nov  3 12:24:37 rsws10  [<ffffffff813c3119>] scsi_eh_bus_device_reset+0xb9/0x1e0
Nov  3 12:24:37 rsws10  [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0
Nov  3 12:24:37 rsws10  [<ffffffff813c3cbe>] scsi_eh_ready_devs+0x5e/0x110
Nov  3 12:24:37 rsws10  [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0
Nov  3 12:24:37 rsws10  [<ffffffff813c3e5d>] scsi_unjam_host+0xed/0x1f0
Nov  3 12:24:37 rsws10  [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0
Nov  3 12:24:37 rsws10  [<ffffffff813c40c8>] scsi_error_handler+0x168/0x1c0
Nov  3 12:24:37 rsws10  [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0
Nov  3 12:24:37 rsws10  [<ffffffff81082a6e>] kthread+0xce/0xe0
Nov  3 12:24:37 rsws10  [<ffffffff810829a0>] ? kthread_freezable_should_stop+0x70/0x70
Nov  3 12:24:37 rsws10  [<ffffffff8159b66c>] ret_from_fork+0x7c/0xb0
Nov  3 12:24:37 rsws10  [<ffffffff810829a0>] ? kthread_freezable_should_stop+0x70/0x70

So, I think the usual 'pass in atomic context flag' is really needed
here

Jason

Sebastian Andrzej Siewior Nov. 27, 2020, 2:14 p.m. UTC | #4

On 2020-11-27 09:03:14 [-0400], Jason Gunthorpe wrote:
> I was able to get the internal bug report that caused the
> 7414dde0a6c3a commit.
> 
> The issue here is that the state_mutex is protecting 
> 
> This:
> 
> 	if (unlikely(iser_conn->state != ISER_CONN_UP)) {
> 
> Which indicates that this:
> 
>         dma_addr = ib_dma_map_single(device->ib_device, (void *)tx_desc,
> 
> Won't crash because iser_con->ib_con is invalid. The notes say that
> the iSCSI stack is in some state where data traffic won't flow but
> management traffic is still possible. I suppose this is some fast path
> so it was "optimized" to eliminate the lock for data traffic.
> 
> A call chain of interest for the lock at least is:
> 
> Nov  3 12:24:37 rsws10 BUG: unable to handle kernel 
> Nov  3 12:24:37 NULL pointer dereference
> Nov  3 12:24:37 rsws10 Pid: 5245, comm: scsi_eh_5 Tainted: GF          O 3.8.13-16.2.1.el6uek.x86_64 #1 IBM System x3550 M3 -[7944KEG]-/90Y4784
> [..]
> Nov  3 12:24:37 rsws10  [<ffffffffa069d628>] iscsi_iser_task_init+0x28/0x70 [ib_iser]
> Nov  3 12:24:37 rsws10  [<ffffffffa0610029>] iscsi_prep_mgmt_task+0x129/0x150 [libiscsi]
> Nov  3 12:24:37 rsws10  [<ffffffffa061354c>] __iscsi_conn_send_pdu+0x23c/0x310 [libiscsi]
> Nov  3 12:24:37 rsws10  [<ffffffffa0614277>] iscsi_exec_task_mgmt_fn+0x37/0x290 [libiscsi]
> Nov  3 12:24:37 rsws10  [<ffffffffa061497b>] iscsi_eh_device_reset+0x1bb/0x2d0 [libiscsi]

preemptible until here and this function has:

|	mutex_lock(&session->eh_mutex);
|	spin_lock_bh(&session->frwd_lock);

I don't see the lock dropped between here and iscsi_iser_task_init().

> Nov  3 12:24:37 rsws10  [<ffffffff813c3119>] scsi_eh_bus_device_reset+0xb9/0x1e0

> Nov  3 12:24:37 rsws10  [<ffffffff810829a0>] ? kthread_freezable_should_stop+0x70/0x70
> 
> So, I think the usual 'pass in atomic context flag' is really needed
> here

Sure, I would do that but as noted above, it the `frwd_lock' is acquired
so you can't acquire the mutex here.

> Jason

Sebastian

Jason Gunthorpe Nov. 27, 2020, 2:31 p.m. UTC | #5

On Fri, Nov 27, 2020 at 03:14:32PM +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-27 09:03:14 [-0400], Jason Gunthorpe wrote:
> > I was able to get the internal bug report that caused the
> > 7414dde0a6c3a commit.
> > 
> > The issue here is that the state_mutex is protecting 
> > 
> > This:
> > 
> > 	if (unlikely(iser_conn->state != ISER_CONN_UP)) {
> > 
> > Which indicates that this:
> > 
> >         dma_addr = ib_dma_map_single(device->ib_device, (void *)tx_desc,
> > 
> > Won't crash because iser_con->ib_con is invalid. The notes say that
> > the iSCSI stack is in some state where data traffic won't flow but
> > management traffic is still possible. I suppose this is some fast path
> > so it was "optimized" to eliminate the lock for data traffic.
> > 
> > A call chain of interest for the lock at least is:
> > 
> > Nov  3 12:24:37 rsws10 BUG: unable to handle kernel 
> > Nov  3 12:24:37 NULL pointer dereference
> > Nov  3 12:24:37 rsws10 Pid: 5245, comm: scsi_eh_5 Tainted: GF          O 3.8.13-16.2.1.el6uek.x86_64 #1 IBM System x3550 M3 -[7944KEG]-/90Y4784
> > [..]
> > Nov  3 12:24:37 rsws10  [<ffffffffa069d628>] iscsi_iser_task_init+0x28/0x70 [ib_iser]
> > Nov  3 12:24:37 rsws10  [<ffffffffa0610029>] iscsi_prep_mgmt_task+0x129/0x150 [libiscsi]
> > Nov  3 12:24:37 rsws10  [<ffffffffa061354c>] __iscsi_conn_send_pdu+0x23c/0x310 [libiscsi]
> > Nov  3 12:24:37 rsws10  [<ffffffffa0614277>] iscsi_exec_task_mgmt_fn+0x37/0x290 [libiscsi]
> > Nov  3 12:24:37 rsws10  [<ffffffffa061497b>] iscsi_eh_device_reset+0x1bb/0x2d0 [libiscsi]
> 
> preemptible until here and this function has:
> 
> |	mutex_lock(&session->eh_mutex);
> |	spin_lock_bh(&session->frwd_lock);
> 
> I don't see the lock dropped between here and iscsi_iser_task_init().

Hmm, nor do I

This whole thing does look broken.

So.. it looks like the "fix" in 7414dde0a6c3a was adding the:

+       if (unlikely(iser_conn->state != ISER_CONN_UP)) {

Without any locking. Which is a pretty typical mistake :\

> Sure, I would do that but as noted above, it the `frwd_lock' is acquired
> so you can't acquire the mutex here.

Ok, well, I'm thinking this patch is OK as is. Lets wait for Max and Sagi

Jason

Sebastian Andrzej Siewior Dec. 3, 2020, 1:56 p.m. UTC | #6

On 2020-11-27 10:31:38 [-0400], Jason Gunthorpe wrote:
> > Sure, I would do that but as noted above, it the `frwd_lock' is acquired
> > so you can't acquire the mutex here.
> 
> Ok, well, I'm thinking this patch is OK as is. Lets wait for Max and Sagi

a gentle ping to Max and Sagi in case we still wait for them here.

> Jason

Sebastian

Sagi Grimberg Dec. 3, 2020, 7:30 p.m. UTC | #7

>>> Sure, I would do that but as noted above, it the `frwd_lock' is acquired
>>> so you can't acquire the mutex here.
>>
>> Ok, well, I'm thinking this patch is OK as is. Lets wait for Max and Sagi
> 
> a gentle ping to Max and Sagi in case we still wait for them here.

Hey, I agree with the change, it was a while back, and advisory anyways.

But while touching it, you can remove the now redundant goto out tag
because there is no finalization of the routine now.

IB/iser: Remove in_interrupt() usage.

Commit Message

Comments

Patch