Message ID | 20180228002830.3052-1-bart.vanassche@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Feb 27, 2018 at 04:28:30PM -0800, Bart Van Assche wrote: > If a request times out the .completed_request() method is not called If BLK_EH_HANDLED is returned from .timeout(), __blk_mq_complete_request() should have called .completed_request(). Otherwise, somewhere may be wrong about timeout handling. Could you investigate why .completed_request isn't called under this situation? > and the .finish_request() method is only called if RQF_ELVPRIV has .finish_request() is counter-pair of .prepare_request(), and both aren't implemented by mq-deadline, so RQF_ELVPRIV needn't to be set, and the current rule is that this flag is managed by block core. > been set. Hence this patch that sets RQF_ELVPRIV and that adds a > .finish_request() method. Without this patch, if a request times out > the zone that request applies to remains locked forever and no further > writes are accepted for that zone. > > Fixes: 5700f69178e9 ("mq-deadline: Introduce zone locking support") > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> > Cc: Hannes Reinecke <hare@suse.com> > Cc: Ming Lei <ming.lei@redhat.com> > --- > block/mq-deadline.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/block/mq-deadline.c b/block/mq-deadline.c > index c56f211c8440..55d5b7a02d62 100644 > --- a/block/mq-deadline.c > +++ b/block/mq-deadline.c > @@ -367,7 +367,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd) > * If the request needs its target zone locked, do it. > */ > blk_req_zone_write_lock(rq); > - rq->rq_flags |= RQF_STARTED; > + /* Set RQF_ELVPRIV to ensure that .finish_request() gets called */ > + rq->rq_flags |= RQF_STARTED | RQF_ELVPRIV; > return rq; > } > > @@ -539,7 +540,9 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx, > * For zoned block devices, write unlock the target zone of > * completed write requests. Do this while holding the zone lock > * spinlock so that the zone is never unlocked while deadline_fifo_request() > - * while deadline_next_request() are executing. > + * while deadline_next_request() are executing. This function is called twice > + * for requests that complete in a normal way and once for requests that time > + * out. > */ > static void dd_completed_request(struct request *rq) > { > @@ -757,6 +760,7 @@ static struct elevator_type mq_deadline = { > .insert_requests = dd_insert_requests, > .dispatch_request = dd_dispatch_request, > .completed_request = dd_completed_request, > + .finish_request = dd_completed_request, > .next_request = elv_rb_latter_request, > .former_request = elv_rb_former_request, > .bio_merge = dd_bio_merge, > -- > 2.16.2 >
Ming, On 2018/02/27 17:35, Ming Lei wrote: > On Tue, Feb 27, 2018 at 04:28:30PM -0800, Bart Van Assche wrote: >> If a request times out the .completed_request() method is not called > > If BLK_EH_HANDLED is returned from .timeout(), __blk_mq_complete_request() > should have called .completed_request(). Otherwise, somewhere may be > wrong about timeout handling. Could you investigate why .completed_request > isn't called under this situation? Actually, the commit message is a little misleading. The problem is not only for timeout but also for commands completing with a failure. This is very easy to reproduce by simply doing an unaligned write to a sequential zone on an ATA zoned block device. In this case, the scheduler .completed_request method is not called, which result in the target zone of the failed write to remain locked. Hence the addition of a .finish_request method in mq-deadline pointing to the same function as .completed_request to ensure that the command target zone is unlocked. To ensure that the .finish_request method is called, the RQF_ELVPRIV flag is set when the request is dispatched after the target zone was locked. >> and the .finish_request() method is only called if RQF_ELVPRIV has > > .finish_request() is counter-pair of .prepare_request(), and both > aren't implemented by mq-deadline, so RQF_ELVPRIV needn't to be set, > and the current rule is that this flag is managed by block core. Indeed. So do you think it would be better to rather force a call to .completed_request for failed command in ATA case ? Currently, it is not called after all retries for the command failed. > >> been set. Hence this patch that sets RQF_ELVPRIV and that adds a >> .finish_request() method. Without this patch, if a request times out >> the zone that request applies to remains locked forever and no further >> writes are accepted for that zone. >> >> Fixes: 5700f69178e9 ("mq-deadline: Introduce zone locking support") >> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> >> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> >> Cc: Hannes Reinecke <hare@suse.com> >> Cc: Ming Lei <ming.lei@redhat.com> >> --- >> block/mq-deadline.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/block/mq-deadline.c b/block/mq-deadline.c >> index c56f211c8440..55d5b7a02d62 100644 >> --- a/block/mq-deadline.c >> +++ b/block/mq-deadline.c >> @@ -367,7 +367,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd) >> * If the request needs its target zone locked, do it. >> */ >> blk_req_zone_write_lock(rq); >> - rq->rq_flags |= RQF_STARTED; >> + /* Set RQF_ELVPRIV to ensure that .finish_request() gets called */ >> + rq->rq_flags |= RQF_STARTED | RQF_ELVPRIV; >> return rq; >> } >> >> @@ -539,7 +540,9 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx, >> * For zoned block devices, write unlock the target zone of >> * completed write requests. Do this while holding the zone lock >> * spinlock so that the zone is never unlocked while deadline_fifo_request() >> - * while deadline_next_request() are executing. >> + * while deadline_next_request() are executing. This function is called twice >> + * for requests that complete in a normal way and once for requests that time >> + * out. >> */ >> static void dd_completed_request(struct request *rq) >> { >> @@ -757,6 +760,7 @@ static struct elevator_type mq_deadline = { >> .insert_requests = dd_insert_requests, >> .dispatch_request = dd_dispatch_request, >> .completed_request = dd_completed_request, >> + .finish_request = dd_completed_request, >> .next_request = elv_rb_latter_request, >> .former_request = elv_rb_former_request, >> .bio_merge = dd_bio_merge, >> -- >> 2.16.2 >> > -- Damien Le Moal Western Digital Research
Hi Damien, On Wed, Feb 28, 2018 at 02:21:49AM +0000, Damien Le Moal wrote: > Ming, > > On 2018/02/27 17:35, Ming Lei wrote: > > On Tue, Feb 27, 2018 at 04:28:30PM -0800, Bart Van Assche wrote: > >> If a request times out the .completed_request() method is not called > > > > If BLK_EH_HANDLED is returned from .timeout(), __blk_mq_complete_request() > > should have called .completed_request(). Otherwise, somewhere may be > > wrong about timeout handling. Could you investigate why .completed_request > > isn't called under this situation? > > Actually, the commit message is a little misleading. The problem is not only for > timeout but also for commands completing with a failure. This is very easy to > reproduce by simply doing an unaligned write to a sequential zone on an ATA > zoned block device. In this case, the scheduler .completed_request method is not > called, which result in the target zone of the failed write to remain locked. Actually request should have been completed in case of timeout, otherwise the race between timeout and normal completion can't be avoided. But for dispatch failure, we deal with that with blk_mq_end_request(IOERR) directly, please see blk_mq_dispatch_rq_list(), so the failed request is freed without completion. > > Hence the addition of a .finish_request method in mq-deadline pointing to the > same function as .completed_request to ensure that the command target zone is > unlocked. To ensure that the .finish_request method is called, the RQF_ELVPRIV > flag is set when the request is dispatched after the target zone was locked. > > >> and the .finish_request() method is only called if RQF_ELVPRIV has > > > > .finish_request() is counter-pair of .prepare_request(), and both > > aren't implemented by mq-deadline, so RQF_ELVPRIV needn't to be set, > > and the current rule is that this flag is managed by block core. > > Indeed. So do you think it would be better to rather force a call to > .completed_request for failed command in ATA case ? Currently, it is not called > after all retries for the command failed. Now we know the reason, seems fine with either way: 1) handle it before calling blk_mq_end_request(IOERR) 2) introduce both .prepare_request()/.finish_request(), and do req zone write lock in .dispatch_reqeust, but unlock in .finish_request, just like what kyber does. Thanks, Ming
diff --git a/block/mq-deadline.c b/block/mq-deadline.c index c56f211c8440..55d5b7a02d62 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -367,7 +367,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd) * If the request needs its target zone locked, do it. */ blk_req_zone_write_lock(rq); - rq->rq_flags |= RQF_STARTED; + /* Set RQF_ELVPRIV to ensure that .finish_request() gets called */ + rq->rq_flags |= RQF_STARTED | RQF_ELVPRIV; return rq; } @@ -539,7 +540,9 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx, * For zoned block devices, write unlock the target zone of * completed write requests. Do this while holding the zone lock * spinlock so that the zone is never unlocked while deadline_fifo_request() - * while deadline_next_request() are executing. + * while deadline_next_request() are executing. This function is called twice + * for requests that complete in a normal way and once for requests that time + * out. */ static void dd_completed_request(struct request *rq) { @@ -757,6 +760,7 @@ static struct elevator_type mq_deadline = { .insert_requests = dd_insert_requests, .dispatch_request = dd_dispatch_request, .completed_request = dd_completed_request, + .finish_request = dd_completed_request, .next_request = elv_rb_latter_request, .former_request = elv_rb_former_request, .bio_merge = dd_bio_merge,