blk-mq: Fix several SCSI request queue lockups

Message ID	20171204173032.16330-1-bart.vanassche@wdc.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Bart Van Assche <bart.vanassche@wdc.com> To: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>, Bart Van Assche <bart.vanassche@wdc.com>, Ming Lei <ming.lei@redhat.com>, Hannes Reinecke <hare@suse.com>, Johannes Thumshirn <jthumshirn@suse.de>, "James E . J . Bottomley" <jejb@linux.vnet.ibm.com>, "Martin K . Petersen" <martin.petersen@oracle.com>, linux-scsi@vger.kernel.org Subject: [PATCH] blk-mq: Fix several SCSI request queue lockups Date: Mon, 4 Dec 2017 09:30:32 -0800 Message-Id: <20171204173032.16330-1-bart.vanassche@wdc.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk

Bart Van Assche Dec. 4, 2017, 5:30 p.m. UTC

Commit 0df21c86bdbf introduced several bugs:
* A SCSI queue stall for queue depths > 1, addressed by commit
  88022d7201e9 ("blk-mq: don't handle failure in .get_budget")
* A systematic lockup for SCSI queues with queue depth 1. The
  following test reproduces that bug systematically:
  - Change the SRP initiator such that SCSI target queue depth is
    limited to 1.
  - Run the following command:
      srp-test/run_tests -f xfs -d -e none -r 60 -t 01
  See also "[PATCH 4/7] blk-mq: Avoid that request processing
  stalls when sharing tags"
  (https://marc.info/?l=linux-block&m=151208695316857). Note:
  reverting commit 0df21c86bdbf also fixes a sporadic SCSI request
  queue lockup while inserting a blk_mq_sched_mark_restart_hctx()
  before all blk_mq_dispatch_rq_list() calls only fixes the
  systematic lockup for queue depth 1.
* A scsi_debug lockup - see also "[PATCH] SCSI: delay run queue if
  device is blocked in scsi_dev_queue_ready()"
  (https://marc.info/?l=linux-block&m=151223233407154).

I think the above means that it is too risky to try to fix all bugs
introduced by commit 0df21c86bdbf before kernel v4.15 is released.
Hence revert that commit.

Fixes: commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
---
 drivers/scsi/scsi_lib.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Ming Lei Dec. 4, 2017, 10:42 p.m. UTC | #1

On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote:
> Commit 0df21c86bdbf introduced several bugs:
> * A SCSI queue stall for queue depths > 1, addressed by commit
>   88022d7201e9 ("blk-mq: don't handle failure in .get_budget")

This one is committed already.

> * A systematic lockup for SCSI queues with queue depth 1. The
>   following test reproduces that bug systematically:
>   - Change the SRP initiator such that SCSI target queue depth is
>     limited to 1.
>   - Run the following command:
>       srp-test/run_tests -f xfs -d -e none -r 60 -t 01
>   See also "[PATCH 4/7] blk-mq: Avoid that request processing
>   stalls when sharing tags"
>   (https://marc.info/?l=linux-block&m=151208695316857). Note:
>   reverting commit 0df21c86bdbf also fixes a sporadic SCSI request
>   queue lockup while inserting a blk_mq_sched_mark_restart_hctx()
>   before all blk_mq_dispatch_rq_list() calls only fixes the
>   systematic lockup for queue depth 1.

You are the only reproducer, and you don't want to provide any kernel
log about this issue, so how can we help you fix your issue?

You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched:
improve dispatching from sw queue")', but you don't mention any issue
about that commit, and your patch is actually nothing to do with
commit b347689ffbca, and seems your work style is just try and guess.

Also both Jens and I have run tests on null_blk and scsi_debug by setting
queue_depth as one, and we all can't see IO hang with current blk-mq.

> * A scsi_debug lockup - see also "[PATCH] SCSI: delay run queue if
>   device is blocked in scsi_dev_queue_ready()"
>   (https://marc.info/?l=linux-block&m=151223233407154).

This issue is clearly explained in theory, and can be reproduced/verified
by scsi_debug, so why can't we apply it to fix the issue? And the fix is
simply and can be thought as cleanup too, since the handling for this case
becomes same with non-mq path now.

> 
> I think the above means that it is too risky to try to fix all bugs
> introduced by commit 0df21c86bdbf before kernel v4.15 is released.
> Hence revert that commit.

What is the risk?

> 
> Fixes: commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq")
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: linux-scsi@vger.kernel.org

This commit fixes one important SCSI_MQ performance issue, we can't
simply revert it just because of one un-confirmed report from you
only(without any kernel log provided).

So Nak.

Bart Van Assche Dec. 4, 2017, 10:48 p.m. UTC | #2

On Tue, 2017-12-05 at 06:42 +0800, Ming Lei wrote:
> On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote:

> > * A systematic lockup for SCSI queues with queue depth 1. The

> >   following test reproduces that bug systematically:

> >   - Change the SRP initiator such that SCSI target queue depth is

> >     limited to 1.

> >   - Run the following command:

> >       srp-test/run_tests -f xfs -d -e none -r 60 -t 01

> >   See also "[PATCH 4/7] blk-mq: Avoid that request processing

> >   stalls when sharing tags"

> >   (https://marc.info/?l=linux-block&m=151208695316857). Note:

> >   reverting commit 0df21c86bdbf also fixes a sporadic SCSI request

> >   queue lockup while inserting a blk_mq_sched_mark_restart_hctx()

> >   before all blk_mq_dispatch_rq_list() calls only fixes the

> >   systematic lockup for queue depth 1.

> 

> You are the only reproducer [ ... ]


That's not correct. I'm pretty sure if you try to reproduce this that
you will see the same hang I ran into. Does this mean that you have not
yet tried to reproduce the hang I reported?

> You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched:

> improve dispatching from sw queue")', but you don't mention any issue

> about that commit.


That's not correct either. From the commit message "A systematic lockup
for SCSI queues with queue depth 1."

> > I think the above means that it is too risky to try to fix all bugs

> > introduced by commit 0df21c86bdbf before kernel v4.15 is released.

> > Hence revert that commit.

> 

> What is the risk?


That more bugs were introduced by commit 0df21c86bdbf than the ones that
have been discovered so far.

Bart.

Ming Lei Dec. 4, 2017, 11:01 p.m. UTC | #3

On Mon, Dec 04, 2017 at 10:48:18PM +0000, Bart Van Assche wrote:
> On Tue, 2017-12-05 at 06:42 +0800, Ming Lei wrote:
> > On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote:
> > > * A systematic lockup for SCSI queues with queue depth 1. The
> > >   following test reproduces that bug systematically:
> > >   - Change the SRP initiator such that SCSI target queue depth is
> > >     limited to 1.
> > >   - Run the following command:
> > >       srp-test/run_tests -f xfs -d -e none -r 60 -t 01
> > >   See also "[PATCH 4/7] blk-mq: Avoid that request processing
> > >   stalls when sharing tags"
> > >   (https://marc.info/?l=linux-block&m=151208695316857). Note:
> > >   reverting commit 0df21c86bdbf also fixes a sporadic SCSI request
> > >   queue lockup while inserting a blk_mq_sched_mark_restart_hctx()
> > >   before all blk_mq_dispatch_rq_list() calls only fixes the
> > >   systematic lockup for queue depth 1.
> > 
> > You are the only reproducer [ ... ]
> 
> That's not correct. I'm pretty sure if you try to reproduce this that
> you will see the same hang I ran into. Does this mean that you have not
> yet tried to reproduce the hang I reported?

Do you mean every kernel developer has to own one SRP/IB hardware?
I don't have your hardware to reproduce that, and I don't think most
of guys have that. Otherwise, there should have be such similar reports
from others, not from only you.

More importantly I don't understand why you can't share the kernel
log/debugfs log when IO hang happens?

Without any kernel log, how can we confirm that it is a valid report?

> 
> > You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched:
> > improve dispatching from sw queue")', but you don't mention any issue
> > about that commit.
> 
> That's not correct either. From the commit message "A systematic lockup
> for SCSI queues with queue depth 1."

I mean you mentioned your patch can fix 'commit b347689ffbca
("blk-mq-sched: improve dispatching from sw queue")', but you never
point where the commit b347689ffbca is wrong, how your patch fixes
the mistake of that commit.

> 
> > > I think the above means that it is too risky to try to fix all bugs
> > > introduced by commit 0df21c86bdbf before kernel v4.15 is released.
> > > Hence revert that commit.
> > 
> > What is the risk?
> 
> That more bugs were introduced by commit 0df21c86bdbf than the ones that
> have been discovered so far.

If you don't provide any log, I have to ignore your report simply.
So there is only one real issue which can be addressed easily by
the following patch:

https://marc.info/?l=linux-scsi&m=151223234607157&w=2

Bart Van Assche Dec. 4, 2017, 11:32 p.m. UTC | #4

On Tue, 2017-12-05 at 07:01 +0800, Ming Lei wrote:
> On Mon, Dec 04, 2017 at 10:48:18PM +0000, Bart Van Assche wrote:

> > On Tue, 2017-12-05 at 06:42 +0800, Ming Lei wrote:

> > > On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote:

> > > > * A systematic lockup for SCSI queues with queue depth 1. The

> > > >   following test reproduces that bug systematically:

> > > >   - Change the SRP initiator such that SCSI target queue depth is

> > > >     limited to 1.

> > > >   - Run the following command:

> > > >       srp-test/run_tests -f xfs -d -e none -r 60 -t 01

> > > >   See also "[PATCH 4/7] blk-mq: Avoid that request processing

> > > >   stalls when sharing tags"

> > > >   (https://marc.info/?l=linux-block&m=151208695316857). Note:

> > > >   reverting commit 0df21c86bdbf also fixes a sporadic SCSI request

> > > >   queue lockup while inserting a blk_mq_sched_mark_restart_hctx()

> > > >   before all blk_mq_dispatch_rq_list() calls only fixes the

> > > >   systematic lockup for queue depth 1.

> > > 

> > > You are the only reproducer [ ... ]

> > 

> > That's not correct. I'm pretty sure if you try to reproduce this that

> > you will see the same hang I ran into. Does this mean that you have not

> > yet tried to reproduce the hang I reported?

> 

> Do you mean every kernel developer has to own one SRP/IB hardware?

When I have the time I will make it possible to run this test on any system
equipped with at least one Ethernet port. But the fact that the test I
mentioned requires IB hardware should not prevent you from running this test
since you have run this test software before.

> I don't have your hardware to reproduce that,

That's not true. Your employer definitely owns IB hardware. E.g. the
following message shows that you have run the srp-test yourself on IB hardware
only four weeks ago:

https://www.spinics.net/lists/linux-block/msg19511.html

> Otherwise, there should have be such similar reports from others, not from

> only you.

That's not correct either. How long was it ago that kernel v4.15-rc1 was
released? One week? How many SRP users do you think have tried to trigger a
queue full condition with that kernel version?

> More importantly I don't understand why you can't share the kernel

> log/debugfs log when IO hang happens?

> 

> Without any kernel log, how can we confirm that it is a valid report?

It's really unfortunate that you are focussing on denying that the bug I
reported exists instead of trying to fix the bugs introduced by commit
b347689ffbca. BTW, I have more than enough experience to decide myself what
a valid report is and what not. I can easily send you several MB of kernel
logs. The reason I have not yet done this is because I'm 99.9% sure that
these won't help to root cause the reported issue. But I can tell you what
I learned from analyzing the information under /sys/kernel/debug/block:
every time a hang occurred I noticed that no requests were "busy", that two
requests occurred in rq_lists and one request occurred in a hctx dispatch
list. This is enough to conclude that a queue run was missing. And I think
that the patch at the start of this e-mail thread not only shows that the
root cause is in the block layer but also that this bug was introduced by
commit b347689ffbca.

> > > You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched:

> > > improve dispatching from sw queue")', but you don't mention any issue

> > > about that commit.

> > 

> > That's not correct either. From the commit message "A systematic lockup

> > for SCSI queues with queue depth 1."

> 

> I mean you mentioned your patch can fix 'commit b347689ffbca

> ("blk-mq-sched: improve dispatching from sw queue")', but you never

> point where the commit b347689ffbca is wrong, how your patch fixes

> the mistake of that commit.

You should know that it is not required to perform a root cause analysis
before posting a revert. Having performed a bisect is sufficient.

BTW, it seems like you forgot that last Friday I explained to you that there
is an obvious bug in commit b347689ffbca, namely that a blk_mq_sched_mark_restart_hctx()
call is missing in blk_mq_sched_dispatch_requests() before the blk_mq_do_dispatch_ctx()
call. See also https://marc.info/?l=linux-block&m=151215794224401.

Bart.

Ming Lei Dec. 5, 2017, 12:20 a.m. UTC | #5

On Mon, Dec 04, 2017 at 11:32:27PM +0000, Bart Van Assche wrote:
> On Tue, 2017-12-05 at 07:01 +0800, Ming Lei wrote:
> > On Mon, Dec 04, 2017 at 10:48:18PM +0000, Bart Van Assche wrote:
> > > On Tue, 2017-12-05 at 06:42 +0800, Ming Lei wrote:
> > > > On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote:
> > > > > * A systematic lockup for SCSI queues with queue depth 1. The
> > > > >   following test reproduces that bug systematically:
> > > > >   - Change the SRP initiator such that SCSI target queue depth is
> > > > >     limited to 1.
> > > > >   - Run the following command:
> > > > >       srp-test/run_tests -f xfs -d -e none -r 60 -t 01
> > > > >   See also "[PATCH 4/7] blk-mq: Avoid that request processing
> > > > >   stalls when sharing tags"
> > > > >   (https://marc.info/?l=linux-block&m=151208695316857). Note:
> > > > >   reverting commit 0df21c86bdbf also fixes a sporadic SCSI request
> > > > >   queue lockup while inserting a blk_mq_sched_mark_restart_hctx()
> > > > >   before all blk_mq_dispatch_rq_list() calls only fixes the
> > > > >   systematic lockup for queue depth 1.
> > > > 
> > > > You are the only reproducer [ ... ]
> > > 
> > > That's not correct. I'm pretty sure if you try to reproduce this that
> > > you will see the same hang I ran into. Does this mean that you have not
> > > yet tried to reproduce the hang I reported?
> > 
> > Do you mean every kernel developer has to own one SRP/IB hardware?
> 
> When I have the time I will make it possible to run this test on any system
> equipped with at least one Ethernet port. But the fact that the test I
> mentioned requires IB hardware should not prevent you from running this test
> since you have run this test software before.
> 
> > I don't have your hardware to reproduce that,
> 
> That's not true. Your employer definitely owns IB hardware. E.g. the
> following message shows that you have run the srp-test yourself on IB hardware
> only four weeks ago:
> 
> https://www.spinics.net/lists/linux-block/msg19511.html

The hardware belongs to Laurence, at that time I can borrow from him, and
now I am not sure if it is available.

> 
> > Otherwise, there should have be such similar reports from others, not from
> > only you.
> 
> That's not correct either. How long was it ago that kernel v4.15-rc1 was
> released? One week? How many SRP users do you think have tried to trigger a
> queue full condition with that kernel version?

OK, we can wait for further reporters to provide kernel log if you
don't want.

> 
> > More importantly I don't understand why you can't share the kernel
> > log/debugfs log when IO hang happens?
> > 
> > Without any kernel log, how can we confirm that it is a valid report?
> 
> It's really unfortunate that you are focussing on denying that the bug I
> reported exists instead of trying to fix the bugs introduced by commit

If you look at bug reports in kenrel mail list, you will see most of
reports includes some kind of log, that is a common practice to report
issue with log attached. It can save us much time for talking in mails.

> b347689ffbca. BTW, I have more than enough experience to decide myself what
> a valid report is and what not. I can easily send you several MB of kernel

As I mentioned, only dmesg with hang trace and debugfs log should be enough,
both can't be so big, right?

> logs. The reason I have not yet done this is because I'm 99.9% sure that
> these won't help to root cause the reported issue. But I can tell you what

That is your opinion, most of times, I can find some clue from debugfs
about hang issue, then I can try to add trace just in some possible
places for narrowing down the issue.

> I learned from analyzing the information under /sys/kernel/debug/block:
> every time a hang occurred I noticed that no requests were "busy", that two
> requests occurred in rq_lists and one request occurred in a hctx dispatch

Then what do other attributes show? Like queue/hctx state?

The following script can get all this info easily:

	http://people.redhat.com/minlei/tests/tools/dump-blk-info

Also it is a bit odd to see request in hctx->dispatch now, and it can only
happen now when scsi_target_queue_ready() returns false, so I guess you apply
some change on target->can_queue(such as setting it as 1 in srp/ib code
manually)?

Please reply, if yes, I will try to see if I can reproduce it with this
kind of change on scsi_debug.

> list. This is enough to conclude that a queue run was missing. And I think

In this case, seems it isn't related with both commit b347689ff and 0df21c86bdbf,
since both don't change RESTART for hctx->dispatch, and shouldn't affect
run queue.

> that the patch at the start of this e-mail thread not only shows that the
> root cause is in the block layer but also that this bug was introduced by
> commit b347689ffbca.
> 
> > > > You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched:
> > > > improve dispatching from sw queue")', but you don't mention any issue
> > > > about that commit.
> > > 
> > > That's not correct either. From the commit message "A systematic lockup
> > > for SCSI queues with queue depth 1."
> > 
> > I mean you mentioned your patch can fix 'commit b347689ffbca
> > ("blk-mq-sched: improve dispatching from sw queue")', but you never
> > point where the commit b347689ffbca is wrong, how your patch fixes
> > the mistake of that commit.
> 
> You should know that it is not required to perform a root cause analysis
> before posting a revert. Having performed a bisect is sufficient.

I don't think your issue can't be fixed before releasing V4.15 if enough
log are provided, and reverting can cause performance regression for all
SCSI_MQ users.

Also more importantly you may revert a wrong commit, because sometimes
some commits may make some issue happen easily, not the direct reason
of the issue.

> 
> BTW, it seems like you forgot that last Friday I explained to you that there
> is an obvious bug in commit b347689ffbca, namely that a blk_mq_sched_mark_restart_hctx()
> call is missing in blk_mq_sched_dispatch_requests() before the blk_mq_do_dispatch_ctx()
> call. See also https://marc.info/?l=linux-block&m=151215794224401.

No, I have explained to you that your change isn't necessary, see:

 https://marc.info/?l=linux-block&m=151217500929341&w=2

Bart Van Assche Dec. 5, 2017, 12:29 a.m. UTC | #6

On Tue, 2017-12-05 at 08:20 +0800, Ming Lei wrote:
> Also it is a bit odd to see request in hctx->dispatch now, and it can only

> happen now when scsi_target_queue_ready() returns false, so I guess you apply

> some change on target->can_queue(such as setting it as 1 in srp/ib code

> manually)?

Yes, but that had already been mentioned. From the e-mail at the start of
this e-mail thread: "Change the SRP initiator such that SCSI target queue
depth is limited to 1." The changes I made in the SRP initiator are the same
as those described in the following message from about one month ago:
https://www.spinics.net/lists/linux-scsi/msg114720.html.

Bart.

Ming Lei Dec. 5, 2017, 1:04 a.m. UTC | #7

On Tue, Dec 05, 2017 at 12:29:59AM +0000, Bart Van Assche wrote:
> On Tue, 2017-12-05 at 08:20 +0800, Ming Lei wrote:
> > Also it is a bit odd to see request in hctx->dispatch now, and it can only
> > happen now when scsi_target_queue_ready() returns false, so I guess you apply
> > some change on target->can_queue(such as setting it as 1 in srp/ib code
> > manually)?
> 
> Yes, but that had already been mentioned. From the e-mail at the start of
> this e-mail thread: "Change the SRP initiator such that SCSI target queue
> depth is limited to 1." The changes I made in the SRP initiator are the same
> as those described in the following message from about one month ago:
> https://www.spinics.net/lists/linux-scsi/msg114720.html.

OK, got it.

Then no reason to revert commit(0df21c86bdbf scsi: implement .get_budget an
.put_budget for blk-mq) for one issue which may never happen in reality since
this reproducer need out-of-tree patch.

I don't mean it isn't a issue, but I don't think it has top priority
for reverting commit 0df21c86bdbf. Especially there isn't proof shown
that 0df21c86bdbf causes this issue since this commit won't change run
queue for requests in hctx->dispatch_list.

I's like to take a look if someone'd like to cooperate, such as
providing kernel log, test debug patch, and kind of things. Or when
I get this hardware to reproduce.

--
Ming

Bart Van Assche Dec. 5, 2017, 1:13 a.m. UTC | #8

On Tue, 2017-12-05 at 09:04 +0800, Ming Lei wrote:
> Then no reason to revert commit(0df21c86bdbf scsi: implement .get_budget an

> .put_budget for blk-mq) for one issue which may never happen in reality since

> this reproducer need out-of-tree patch.

Sorry but I disagree completely. You seem to overlook that there may be other
circumstances that trigger the same lockup, e.g. a SCSI queue full condition.

Bart.

Ming Lei Dec. 5, 2017, 1:18 a.m. UTC | #9

On Tue, Dec 05, 2017 at 01:13:43AM +0000, Bart Van Assche wrote:
> On Tue, 2017-12-05 at 09:04 +0800, Ming Lei wrote:
> > Then no reason to revert commit(0df21c86bdbf scsi: implement .get_budget an
> > .put_budget for blk-mq) for one issue which may never happen in reality since
> > this reproducer need out-of-tree patch.
> 
> Sorry but I disagree completely. You seem to overlook that there may be other
> circumstances that trigger the same lockup, e.g. a SCSI queue full condition.

If the scsi_dev_queue_ready() returns false, .get_budget() catches that
and never add request to hctx->dispatch. And scsi_host_queue_ready()
always returns true, since we respect per-host queue depth by
blk_mq_get_driver_tag() before calling .queue_rq().

Or if I miss other cases, please point it out.

blk-mq: Fix several SCSI request queue lockups

Commit Message

Comments

Patch