diff mbox

[v2] blk-mq: Fix race between resetting the timer and completion handling

Message ID 20180207170612.GB695913@devbig577.frc2.facebook.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tejun Heo Feb. 7, 2018, 5:06 p.m. UTC
Hello, Bart.

On Tue, Feb 06, 2018 at 05:11:33PM -0800, Bart Van Assche wrote:
> The following race can occur between the code that resets the timer
> and completion handling:
> - The code that handles BLK_EH_RESET_TIMER resets aborted_gstate.
> - A completion occurs and blk_mq_complete_request() calls
>   __blk_mq_complete_request().
> - The timeout code calls blk_add_timer() and that function sets the
>   request deadline and adjusts the timer.
> - __blk_mq_complete_request() frees the request tag.
> - The timer fires and the timeout handler gets called for a freed
>   request.

Can you see whether by any chance the following patch fixes the issue?
If not, can you share the repro case?

Thanks.

Comments

Bart Van Assche Feb. 7, 2018, 5:27 p.m. UTC | #1
On Wed, 2018-02-07 at 09:06 -0800, Tejun Heo wrote:
> On Tue, Feb 06, 2018 at 05:11:33PM -0800, Bart Van Assche wrote:

> > The following race can occur between the code that resets the timer

> > and completion handling:

> > - The code that handles BLK_EH_RESET_TIMER resets aborted_gstate.

> > - A completion occurs and blk_mq_complete_request() calls

> >   __blk_mq_complete_request().

> > - The timeout code calls blk_add_timer() and that function sets the

> >   request deadline and adjusts the timer.

> > - __blk_mq_complete_request() frees the request tag.

> > - The timer fires and the timeout handler gets called for a freed

> >   request.

> 

> Can you see whether by any chance the following patch fixes the issue?

> If not, can you share the repro case?

> 

> Thanks.

> 

> diff --git a/block/blk-mq.c b/block/blk-mq.c

> index df93102..651d18c 100644

> --- a/block/blk-mq.c

> +++ b/block/blk-mq.c

> @@ -836,8 +836,8 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)

>  		 * ->aborted_gstate is set, this may lead to ignored

>  		 * completions and further spurious timeouts.

>  		 */

> -		blk_mq_rq_update_aborted_gstate(req, 0);

>  		blk_add_timer(req);

> +		blk_mq_rq_update_aborted_gstate(req, 0);

>  		break;

>  	case BLK_EH_NOT_HANDLED:

>  		break;


Hello Tejun,

Even with the above change I think that there is still a race between the
code that handles timer resets and the completion handler. Anyway, the test
with which I triggered these races is as follows:
- Start from what will become kernel v4.16-rc1 and apply the patch that adds
  SRP over RoCE support to the ib_srpt driver. See also the "[PATCH v2 00/14]
  IB/srpt: Add RDMA/CM support" patch series
  (https://www.spinics.net/lists/linux-rdma/msg59589.html).
- Apply my patch series that fixes a race between the SCSI error handler and
  SCSI transport recovery.
- Apply my patch series that improves the stability of the SCSI target core
  (LIO).
- Build and install that kernel.
- Clone the following repository: https://github.com/bvanassche/srp-test.
- Run the following test:
  while true; do srp-test/run_tests -c -t 02-mq; done
- While the test is running, check whether or not something weird happens.
  Sometimes I see that scsi_times_out() crashes. Sometimes I see while running
  this test that a soft lockup is reported inside blk_mq_do_dispatch_ctx().

If you want I can share the tree on github that I use myself for my tests.

Thanks,

Bart.
Tejun Heo Feb. 7, 2018, 5:35 p.m. UTC | #2
Hello, Bart.

On Wed, Feb 07, 2018 at 05:27:10PM +0000, Bart Van Assche wrote:
> Even with the above change I think that there is still a race between the
> code that handles timer resets and the completion handler. Anyway, the test

Can you elaborate the scenario a bit further?  If you're referring to
lost completions, we've always had that and while we can try to close
that hole too, I don't think it's a critical issue.

> with which I triggered these races is as follows:
> - Start from what will become kernel v4.16-rc1 and apply the patch that adds
>   SRP over RoCE support to the ib_srpt driver. See also the "[PATCH v2 00/14]
>   IB/srpt: Add RDMA/CM support" patch series
>   (https://www.spinics.net/lists/linux-rdma/msg59589.html).
> - Apply my patch series that fixes a race between the SCSI error handler and
>   SCSI transport recovery.
> - Apply my patch series that improves the stability of the SCSI target core
>   (LIO).
> - Build and install that kernel.
> - Clone the following repository: https://github.com/bvanassche/srp-test.
> - Run the following test:
>   while true; do srp-test/run_tests -c -t 02-mq; done
> - While the test is running, check whether or not something weird happens.
>   Sometimes I see that scsi_times_out() crashes. Sometimes I see while running
>   this test that a soft lockup is reported inside blk_mq_do_dispatch_ctx().
> 
> If you want I can share the tree on github that I use myself for my tests.

Heh, that's quite a bit.  I'll take up on that git tree later but for
now I'd really appreciate if you can test the patch.

Thank you very much.
Bart Van Assche Feb. 7, 2018, 6:14 p.m. UTC | #3
On Wed, 2018-02-07 at 09:35 -0800, tj@kernel.org wrote:
> On Wed, Feb 07, 2018 at 05:27:10PM +0000, Bart Van Assche wrote:

> > Even with the above change I think that there is still a race between the

> > code that handles timer resets and the completion handler.

> 

> Can you elaborate the scenario a bit further?  If you're referring to

> lost completions, we've always had that and while we can try to close

> that hole too, I don't think it's a critical issue.


Hello Tejun,

When I wrote my comment I was not sure whether or not non-reentrancy is
guaranteed for work queue items. However, according to what I found in the
workqueue implementation I think that is guaranteed. So it shouldn't be
possible that the timer activated by blk_add_timer() gets handled before
aborted_gstate is reset. But since the timeout handler and completion
handlers can be executed by different CPUs, shouldn't a memory barrier be
inserted between the blk_add_timer() call and resetting aborted_gsync to
guarantee that a completion cannot occur before blk_add_timer() has reset
RQF_MQ_TIMEOUT_EXPIRED?

Thanks,

Bart.
Bart Van Assche Feb. 7, 2018, 7:03 p.m. UTC | #4
On Wed, 2018-02-07 at 09:06 -0800, Tejun Heo wrote:
> Can you see whether by any chance the following patch fixes the issue?

> If not, can you share the repro case?

> 

> Thanks.

> 

> diff --git a/block/blk-mq.c b/block/blk-mq.c

> index df93102..651d18c 100644

> --- a/block/blk-mq.c

> +++ b/block/blk-mq.c

> @@ -836,8 +836,8 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)

>  		 * ->aborted_gstate is set, this may lead to ignored

>  		 * completions and further spurious timeouts.

>  		 */

> -		blk_mq_rq_update_aborted_gstate(req, 0);

>  		blk_add_timer(req);

> +		blk_mq_rq_update_aborted_gstate(req, 0);

>  		break;

>  	case BLK_EH_NOT_HANDLED:

>  		break;


Hello Tejun,

I tried the above patch but already during the first iteration of the test I
noticed that the test hung, probably due to the following request that got stuck:

$ (cd /sys/kernel/debug/block && grep -aH . */*/*/rq_list)
00000000a98cff60 {.op=SCSI_IN, .cmd_flags=, .rq_flags=MQ_INFLIGHT|PREEMPT|QUIET|IO_STAT|PM,
 .state=idle, .tag=22, .internal_tag=-1, .cmd=Synchronize Cache(10) 35 00 00 00 00 00, .retries=0,
 .result = 0x0, .flags=TAGGED, .timeout=60.000, allocated 872.690 s ago}

Thanks,

Bart.
Tejun Heo Feb. 7, 2018, 8:09 p.m. UTC | #5
Hello,

On Wed, Feb 07, 2018 at 07:03:56PM +0000, Bart Van Assche wrote:
> I tried the above patch but already during the first iteration of the test I
> noticed that the test hung, probably due to the following request that got stuck:
> 
> $ (cd /sys/kernel/debug/block && grep -aH . */*/*/rq_list)
> 00000000a98cff60 {.op=SCSI_IN, .cmd_flags=, .rq_flags=MQ_INFLIGHT|PREEMPT|QUIET|IO_STAT|PM,
>  .state=idle, .tag=22, .internal_tag=-1, .cmd=Synchronize Cache(10) 35 00 00 00 00 00, .retries=0,
>  .result = 0x0, .flags=TAGGED, .timeout=60.000, allocated 872.690 s ago}

I'm wonder how this happened, so we can lose a completion when it
races against BLK_EH_RESET_TIMER; however, the command should timeout
later cuz the timer is running again now.  Maybe we actually had the
memory barrier race that you pointed out in the other message?

Thanks.
diff mbox

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index df93102..651d18c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -836,8 +836,8 @@  static void blk_mq_rq_timed_out(struct request *req, bool reserved)
 		 * ->aborted_gstate is set, this may lead to ignored
 		 * completions and further spurious timeouts.
 		 */
-		blk_mq_rq_update_aborted_gstate(req, 0);
 		blk_add_timer(req);
+		blk_mq_rq_update_aborted_gstate(req, 0);
 		break;
 	case BLK_EH_NOT_HANDLED:
 		break;