Message ID | 20180207170612.GB695913@devbig577.frc2.facebook.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 2018-02-07 at 09:06 -0800, Tejun Heo wrote: > On Tue, Feb 06, 2018 at 05:11:33PM -0800, Bart Van Assche wrote: > > The following race can occur between the code that resets the timer > > and completion handling: > > - The code that handles BLK_EH_RESET_TIMER resets aborted_gstate. > > - A completion occurs and blk_mq_complete_request() calls > > __blk_mq_complete_request(). > > - The timeout code calls blk_add_timer() and that function sets the > > request deadline and adjusts the timer. > > - __blk_mq_complete_request() frees the request tag. > > - The timer fires and the timeout handler gets called for a freed > > request. > > Can you see whether by any chance the following patch fixes the issue? > If not, can you share the repro case? > > Thanks. > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index df93102..651d18c 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -836,8 +836,8 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved) > * ->aborted_gstate is set, this may lead to ignored > * completions and further spurious timeouts. > */ > - blk_mq_rq_update_aborted_gstate(req, 0); > blk_add_timer(req); > + blk_mq_rq_update_aborted_gstate(req, 0); > break; > case BLK_EH_NOT_HANDLED: > break; Hello Tejun, Even with the above change I think that there is still a race between the code that handles timer resets and the completion handler. Anyway, the test with which I triggered these races is as follows: - Start from what will become kernel v4.16-rc1 and apply the patch that adds SRP over RoCE support to the ib_srpt driver. See also the "[PATCH v2 00/14] IB/srpt: Add RDMA/CM support" patch series (https://www.spinics.net/lists/linux-rdma/msg59589.html). - Apply my patch series that fixes a race between the SCSI error handler and SCSI transport recovery. - Apply my patch series that improves the stability of the SCSI target core (LIO). - Build and install that kernel. - Clone the following repository: https://github.com/bvanassche/srp-test. - Run the following test: while true; do srp-test/run_tests -c -t 02-mq; done - While the test is running, check whether or not something weird happens. Sometimes I see that scsi_times_out() crashes. Sometimes I see while running this test that a soft lockup is reported inside blk_mq_do_dispatch_ctx(). If you want I can share the tree on github that I use myself for my tests. Thanks, Bart.
Hello, Bart. On Wed, Feb 07, 2018 at 05:27:10PM +0000, Bart Van Assche wrote: > Even with the above change I think that there is still a race between the > code that handles timer resets and the completion handler. Anyway, the test Can you elaborate the scenario a bit further? If you're referring to lost completions, we've always had that and while we can try to close that hole too, I don't think it's a critical issue. > with which I triggered these races is as follows: > - Start from what will become kernel v4.16-rc1 and apply the patch that adds > SRP over RoCE support to the ib_srpt driver. See also the "[PATCH v2 00/14] > IB/srpt: Add RDMA/CM support" patch series > (https://www.spinics.net/lists/linux-rdma/msg59589.html). > - Apply my patch series that fixes a race between the SCSI error handler and > SCSI transport recovery. > - Apply my patch series that improves the stability of the SCSI target core > (LIO). > - Build and install that kernel. > - Clone the following repository: https://github.com/bvanassche/srp-test. > - Run the following test: > while true; do srp-test/run_tests -c -t 02-mq; done > - While the test is running, check whether or not something weird happens. > Sometimes I see that scsi_times_out() crashes. Sometimes I see while running > this test that a soft lockup is reported inside blk_mq_do_dispatch_ctx(). > > If you want I can share the tree on github that I use myself for my tests. Heh, that's quite a bit. I'll take up on that git tree later but for now I'd really appreciate if you can test the patch. Thank you very much.
On Wed, 2018-02-07 at 09:35 -0800, tj@kernel.org wrote: > On Wed, Feb 07, 2018 at 05:27:10PM +0000, Bart Van Assche wrote: > > Even with the above change I think that there is still a race between the > > code that handles timer resets and the completion handler. > > Can you elaborate the scenario a bit further? If you're referring to > lost completions, we've always had that and while we can try to close > that hole too, I don't think it's a critical issue. Hello Tejun, When I wrote my comment I was not sure whether or not non-reentrancy is guaranteed for work queue items. However, according to what I found in the workqueue implementation I think that is guaranteed. So it shouldn't be possible that the timer activated by blk_add_timer() gets handled before aborted_gstate is reset. But since the timeout handler and completion handlers can be executed by different CPUs, shouldn't a memory barrier be inserted between the blk_add_timer() call and resetting aborted_gsync to guarantee that a completion cannot occur before blk_add_timer() has reset RQF_MQ_TIMEOUT_EXPIRED? Thanks, Bart.
On Wed, 2018-02-07 at 09:06 -0800, Tejun Heo wrote: > Can you see whether by any chance the following patch fixes the issue? > If not, can you share the repro case? > > Thanks. > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index df93102..651d18c 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -836,8 +836,8 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved) > * ->aborted_gstate is set, this may lead to ignored > * completions and further spurious timeouts. > */ > - blk_mq_rq_update_aborted_gstate(req, 0); > blk_add_timer(req); > + blk_mq_rq_update_aborted_gstate(req, 0); > break; > case BLK_EH_NOT_HANDLED: > break; Hello Tejun, I tried the above patch but already during the first iteration of the test I noticed that the test hung, probably due to the following request that got stuck: $ (cd /sys/kernel/debug/block && grep -aH . */*/*/rq_list) 00000000a98cff60 {.op=SCSI_IN, .cmd_flags=, .rq_flags=MQ_INFLIGHT|PREEMPT|QUIET|IO_STAT|PM, .state=idle, .tag=22, .internal_tag=-1, .cmd=Synchronize Cache(10) 35 00 00 00 00 00, .retries=0, .result = 0x0, .flags=TAGGED, .timeout=60.000, allocated 872.690 s ago} Thanks, Bart.
Hello, On Wed, Feb 07, 2018 at 07:03:56PM +0000, Bart Van Assche wrote: > I tried the above patch but already during the first iteration of the test I > noticed that the test hung, probably due to the following request that got stuck: > > $ (cd /sys/kernel/debug/block && grep -aH . */*/*/rq_list) > 00000000a98cff60 {.op=SCSI_IN, .cmd_flags=, .rq_flags=MQ_INFLIGHT|PREEMPT|QUIET|IO_STAT|PM, > .state=idle, .tag=22, .internal_tag=-1, .cmd=Synchronize Cache(10) 35 00 00 00 00 00, .retries=0, > .result = 0x0, .flags=TAGGED, .timeout=60.000, allocated 872.690 s ago} I'm wonder how this happened, so we can lose a completion when it races against BLK_EH_RESET_TIMER; however, the command should timeout later cuz the timer is running again now. Maybe we actually had the memory barrier race that you pointed out in the other message? Thanks.
diff --git a/block/blk-mq.c b/block/blk-mq.c index df93102..651d18c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -836,8 +836,8 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved) * ->aborted_gstate is set, this may lead to ignored * completions and further spurious timeouts. */ - blk_mq_rq_update_aborted_gstate(req, 0); blk_add_timer(req); + blk_mq_rq_update_aborted_gstate(req, 0); break; case BLK_EH_NOT_HANDLED: break;