Message ID | 20171216120726.517153-6-tj@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi tejun On 12/16/2017 08:07 PM, Tejun Heo wrote: > After the recent updates to use generation number and state based > synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except > to avoid firing the same timeout multiple times. > > Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag > RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same timeout multiple > times. This removes atomic bitops from hot paths too. > > v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out(). > > v3: Added RQF_MQ_TIMEOUT_EXPIRED flag. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Cc: "jianchao.wang" <jianchao.w.wang@oracle.com> > --- > block/blk-mq.c | 18 ++++++++---------- > block/blk-timeout.c | 1 + > include/linux/blkdev.h | 2 ++ > 3 files changed, 11 insertions(+), 10 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 88baa82..47e722b 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -607,14 +607,12 @@ void blk_mq_complete_request(struct request *rq) > */ > if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { > rcu_read_lock(); > - if (blk_mq_rq_aborted_gstate(rq) != rq->gstate && > - !blk_mark_rq_complete(rq)) > + if (blk_mq_rq_aborted_gstate(rq) != rq->gstate) > __blk_mq_complete_request(rq); > rcu_read_unlock(); > } else { > srcu_idx = srcu_read_lock(hctx->queue_rq_srcu); > - if (blk_mq_rq_aborted_gstate(rq) != rq->gstate && > - !blk_mark_rq_complete(rq)) > + if (blk_mq_rq_aborted_gstate(rq) != rq->gstate) > __blk_mq_complete_request(rq); > srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx); > } It's worrying that even though the blk_mark_rq_complete() here is intended to synchronize with timeout path, but it indeed give the blk_mq_complete_request() the capability to exclude with itself. Maybe this capability should be reserved. Thanks Jianchao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Thu, Dec 21, 2017 at 11:56:49AM +0800, jianchao.wang wrote: > It's worrying that even though the blk_mark_rq_complete() here is intended to synchronize with > timeout path, but it indeed give the blk_mq_complete_request() the capability to exclude with > itself. Maybe this capability should be reserved. Can you explain further how that'd help? The problem there is that if you have two competing completions, where one isn't a timeout, there's nothing synchronizing the reuse of the request. IOW, the losing one can easily complete the next recycle instance. The atomic bitops might feel like more protection but it's just feels. Thanks.
Sorry for my non-detailed description. On 12/21/2017 09:50 PM, Tejun Heo wrote: > Hello, > > On Thu, Dec 21, 2017 at 11:56:49AM +0800, jianchao.wang wrote: >> It's worrying that even though the blk_mark_rq_complete() here is intended to synchronize with >> timeout path, but it indeed give the blk_mq_complete_request() the capability to exclude with There could be scenario where the driver itself stop a request itself with blk_mq_complete_request() or some other interface that will invoke it, races with the normal completion path where a same request comes. For example: a reset could be triggered through sysfs on nvme-rdma Then the driver will cancel all the reqs, including in-flight ones. nvme_rdma_reset_ctrl_work() nvme_rdma_shutdown_ctrl() >>>> if (ctrl->ctrl.queue_count > 1) { nvme_stop_queues(&ctrl->ctrl); //quiesce the queue blk_mq_tagset_busy_iter(&ctrl->tag_set, nvme_cancel_request, &ctrl->ctrl); //invoke blk_mq_complete_request() nvme_rdma_destroy_io_queues(ctrl, shutdown); } >>>> These operations could race with the normal completion path of in-flight ones. It should drain all the in-flight ones first here. But there maybe some other places similar with this. >> itself. Maybe this capability should be reserved. > > Can you explain further how that'd help? The problem there is that if > you have two competing completions, where one isn't a timeout, there's > nothing synchronizing the reuse of the request. IOW, the losing on > can easily complete the next recycle instance. The atomic bitops > might feel like more protection but it's just feels. In above case, the request may simultaneously enter requeue and end path. Thanks Jianchao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, Jianchao. On Fri, Dec 22, 2017 at 12:02:20PM +0800, jianchao.wang wrote: > > On Thu, Dec 21, 2017 at 11:56:49AM +0800, jianchao.wang wrote: > >> It's worrying that even though the blk_mark_rq_complete() here is > >> intended to synchronize with timeout path, but it indeed give the > >> blk_mq_complete_request() the capability to exclude with > > There could be scenario where the driver itself stop a request > itself with blk_mq_complete_request() or some other interface that > will invoke it, races with the normal completion path where a same > request comes. But what'd prevent the completion reinitializing the request and then the actual completion path coming in and completing the request again? > For example: > a reset could be triggered through sysfs on nvme-rdma > Then the driver will cancel all the reqs, including in-flight ones. > nvme_rdma_reset_ctrl_work() > nvme_rdma_shutdown_ctrl() > >>>> > if (ctrl->ctrl.queue_count > 1) { > nvme_stop_queues(&ctrl->ctrl); //quiesce the queue > blk_mq_tagset_busy_iter(&ctrl->tag_set, > nvme_cancel_request, &ctrl->ctrl); //invoke blk_mq_complete_request() > nvme_rdma_destroy_io_queues(ctrl, shutdown); > } > >>>> > > These operations could race with the normal completion path of in-flight ones. > It should drain all the in-flight ones first here. But there maybe some other > places similar with this. If there are any such places, they should be using an interface which is propelry synchronized like blk_abort_request(), which btw is what libata already does. Otherwise, it's racy with or without these patches. Thanks.
Hi tejun Many thanks for you kindly response. On 01/09/2018 01:27 AM, Tejun Heo wrote: > Hello, Jianchao. > > On Fri, Dec 22, 2017 at 12:02:20PM +0800, jianchao.wang wrote: >>> On Thu, Dec 21, 2017 at 11:56:49AM +0800, jianchao.wang wrote: >>>> It's worrying that even though the blk_mark_rq_complete() here is >>>> intended to synchronize with timeout path, but it indeed give the >>>> blk_mq_complete_request() the capability to exclude with >> >> There could be scenario where the driver itself stop a request >> itself with blk_mq_complete_request() or some other interface that >> will invoke it, races with the normal completion path where a same >> request comes. > > But what'd prevent the completion reinitializing the request and then > the actual completion path coming in and completing the request again? > blk_mark_rq_complete() will gate and ensure there will be only one __blk_mq_complete_request() to be invoked. >> For example: >> a reset could be triggered through sysfs on nvme-rdma >> Then the driver will cancel all the reqs, including in-flight ones. >> nvme_rdma_reset_ctrl_work() >> nvme_rdma_shutdown_ctrl() >> >>>> >> if (ctrl->ctrl.queue_count > 1) { >> nvme_stop_queues(&ctrl->ctrl); //quiesce the queue >> blk_mq_tagset_busy_iter(&ctrl->tag_set, >> nvme_cancel_request, &ctrl->ctrl); //invoke blk_mq_complete_request() >> nvme_rdma_destroy_io_queues(ctrl, shutdown); >> } >> >>>> >> >> These operations could race with the normal completion path of in-flight ones. >> It should drain all the in-flight ones first here. But there maybe some other >> places similar with this. > > If there are any such places, they should be using an interface which > is propelry synchronized like blk_abort_request(), which btw is what > libata already does. Otherwise, it's racy with or without these > patches. Yes, it is that. Thanks for you kindly response again. Jianchao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Tue, Jan 09, 2018 at 11:08:04AM +0800, jianchao.wang wrote: > > But what'd prevent the completion reinitializing the request and then > > the actual completion path coming in and completing the request again? > > blk_mark_rq_complete() will gate and ensure there will be only one > __blk_mq_complete_request() to be invoked. Yeah, but then the complete flag will be cleared once completion is done and the request is reinitialized. Thanks.
Hi tejun Many thanks for your kindly response. On 01/09/2018 11:37 AM, Tejun Heo wrote: > Hello, > > On Tue, Jan 09, 2018 at 11:08:04AM +0800, jianchao.wang wrote: >>> But what'd prevent the completion reinitializing the request and then >>> the actual completion path coming in and completing the request again? >> >> blk_mark_rq_complete() will gate and ensure there will be only one >> __blk_mq_complete_request() to be invoked. > > Yeah, but then the complete flag will be cleared once completion is > done and the request is reinitialized. Yes, it is. What I mean is not to reserve blk_mark_rq_complete and REQ_ATOM_COMPLETE usages in blk-mq but the side-effect after blk_mq_complete_request cannot exclude with itself. As you said, the scene is racy and should be modified. :) Thanks Jianchao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/block/blk-mq.c b/block/blk-mq.c index 88baa82..47e722b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -607,14 +607,12 @@ void blk_mq_complete_request(struct request *rq) */ if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { rcu_read_lock(); - if (blk_mq_rq_aborted_gstate(rq) != rq->gstate && - !blk_mark_rq_complete(rq)) + if (blk_mq_rq_aborted_gstate(rq) != rq->gstate) __blk_mq_complete_request(rq); rcu_read_unlock(); } else { srcu_idx = srcu_read_lock(hctx->queue_rq_srcu); - if (blk_mq_rq_aborted_gstate(rq) != rq->gstate && - !blk_mark_rq_complete(rq)) + if (blk_mq_rq_aborted_gstate(rq) != rq->gstate) __blk_mq_complete_request(rq); srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx); } @@ -665,8 +663,6 @@ void blk_mq_start_request(struct request *rq) preempt_enable(); set_bit(REQ_ATOM_STARTED, &rq->atomic_flags); - if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) - clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags); if (q->dma_drain_size && blk_rq_bytes(rq)) { /* @@ -817,6 +813,8 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved) if (!test_bit(REQ_ATOM_STARTED, &req->atomic_flags)) return; + req->rq_flags |= RQF_MQ_TIMEOUT_EXPIRED; + if (ops->timeout) ret = ops->timeout(req, reserved); @@ -832,7 +830,6 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved) */ blk_mq_rq_update_aborted_gstate(req, 0); blk_add_timer(req); - blk_clear_rq_complete(req); break; case BLK_EH_NOT_HANDLED: break; @@ -851,7 +848,8 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx, might_sleep(); - if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) + if ((rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) || + !test_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) return; /* read coherent snapshots of @rq->state_gen and @rq->deadline */ @@ -886,8 +884,8 @@ static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx, * now guaranteed to see @rq->aborted_gstate and yield. If * @rq->aborted_gstate still matches @rq->gstate, @rq is ours. */ - if (READ_ONCE(rq->gstate) == rq->aborted_gstate && - !blk_mark_rq_complete(rq)) + if (!(rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) && + READ_ONCE(rq->gstate) == rq->aborted_gstate) blk_mq_rq_timed_out(rq, reserved); } diff --git a/block/blk-timeout.c b/block/blk-timeout.c index d580af3..25c4ffa 100644 --- a/block/blk-timeout.c +++ b/block/blk-timeout.c @@ -209,6 +209,7 @@ void blk_add_timer(struct request *req) req->timeout = q->rq_timeout; req->deadline = jiffies + req->timeout; + req->rq_flags &= ~RQF_MQ_TIMEOUT_EXPIRED; /* * Only the non-mq case needs to add the request to a protected list. diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2d6fd11..13186a7 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -123,6 +123,8 @@ typedef __u32 __bitwise req_flags_t; /* Look at ->special_vec for the actual data payload instead of the bio chain. */ #define RQF_SPECIAL_PAYLOAD ((__force req_flags_t)(1 << 18)) +/* timeout is expired */ +#define RQF_MQ_TIMEOUT_EXPIRED ((__force req_flags_t)(1 << 19)) /* flags that prevent us from merging requests: */ #define RQF_NOMERGE_FLAGS \
After the recent updates to use generation number and state based synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except to avoid firing the same timeout multiple times. Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same timeout multiple times. This removes atomic bitops from hot paths too. v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out(). v3: Added RQF_MQ_TIMEOUT_EXPIRED flag. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "jianchao.wang" <jianchao.w.wang@oracle.com> --- block/blk-mq.c | 18 ++++++++---------- block/blk-timeout.c | 1 + include/linux/blkdev.h | 2 ++ 3 files changed, 11 insertions(+), 10 deletions(-)