Message ID | 20190308174006.5032-1-keith.busch@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/5] blk-mq: Export reading mq request state | expand |
On Fri, 2019-03-08 at 10:40 -0700, Keith Busch wrote: > Drivers may need to know the state of their requets. Hi Keith, What makes you think that drivers should be able to check the state of their requests? Please elaborate. > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index faed9d9eb84c..db113aee48bb 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -241,6 +241,15 @@ struct request { > struct request *next_rq; > }; > > +/** > + * blk_mq_rq_state() - read the current MQ_RQ_* state of a request > + * @rq: target request. > + */ > +static inline enum mq_rq_state blk_mq_rq_state(struct request *rq) > +{ > + return READ_ONCE(rq->state); > +} Please also explain how drivers can use this function without triggering a race condition with the code that modifies rq->state. Thanks, Bart.
On Fri, Mar 08, 2019 at 10:07:23AM -0800, Bart Van Assche wrote: > On Fri, 2019-03-08 at 10:40 -0700, Keith Busch wrote: > > Drivers may need to know the state of their requets. > > Hi Keith, > > What makes you think that drivers should be able to check the state of their > requests? Please elaborate. Patches 4 and 5 in this series. > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > > index faed9d9eb84c..db113aee48bb 100644 > > --- a/include/linux/blkdev.h > > +++ b/include/linux/blkdev.h > > @@ -241,6 +241,15 @@ struct request { > > struct request *next_rq; > > }; > > > > +/** > > + * blk_mq_rq_state() - read the current MQ_RQ_* state of a request > > + * @rq: target request. > > + */ > > +static inline enum mq_rq_state blk_mq_rq_state(struct request *rq) > > +{ > > + return READ_ONCE(rq->state); > > +} > > Please also explain how drivers can use this function without triggering a > race condition with the code that modifies rq->state. Either queisced or within a timeout handler that already locks the request lifetime.
On Fri, 2019-03-08 at 11:15 -0700, Keith Busch wrote: > On Fri, Mar 08, 2019 at 10:07:23AM -0800, Bart Van Assche wrote: > > On Fri, 2019-03-08 at 10:40 -0700, Keith Busch wrote: > > > Drivers may need to know the state of their requets. > > > > Hi Keith, > > > > What makes you think that drivers should be able to check the state of their > > requests? Please elaborate. > > Patches 4 and 5 in this series. > > > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > > > index faed9d9eb84c..db113aee48bb 100644 > > > --- a/include/linux/blkdev.h > > > +++ b/include/linux/blkdev.h > > > @@ -241,6 +241,15 @@ struct request { > > > struct request *next_rq; > > > }; > > > > > > +/** > > > + * blk_mq_rq_state() - read the current MQ_RQ_* state of a request > > > + * @rq: target request. > > > + */ > > > +static inline enum mq_rq_state blk_mq_rq_state(struct request *rq) > > > +{ > > > + return READ_ONCE(rq->state); > > > +} > > > > Please also explain how drivers can use this function without triggering a > > race condition with the code that modifies rq->state. > > Either queisced or within a timeout handler that already locks the > request lifetime. Hi Keith, For future patch series submissions please include a cover letter. The two patch series that you posted today don't have a cover letter so I can only guess what the purpose of these two patch series is. Is the purpose of this patch series perhaps to speed up error handling? If so, why did you choose the approach of iterating over outstanding requests and telling the block layer to terminate these requests? I think that the NVMe spec provides a more elegant mechanism, namely deleting the I/O submission queues. According to what I read in the 1.3c spec deleting an I/O submission queue forces an NVMe controller to post a completion for every outstanding request. See also section 5.6 in the NVMe 1.3c spec. Bart.
On Fri, Mar 08, 2019 at 10:42:17AM -0800, Bart Van Assche wrote: > On Fri, 2019-03-08 at 11:15 -0700, Keith Busch wrote: > > On Fri, Mar 08, 2019 at 10:07:23AM -0800, Bart Van Assche wrote: > > > On Fri, 2019-03-08 at 10:40 -0700, Keith Busch wrote: > > > > Drivers may need to know the state of their requets. > > > > > > Hi Keith, > > > > > > What makes you think that drivers should be able to check the state of their > > > requests? Please elaborate. > > > > Patches 4 and 5 in this series. > > > > > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > > > > index faed9d9eb84c..db113aee48bb 100644 > > > > --- a/include/linux/blkdev.h > > > > +++ b/include/linux/blkdev.h > > > > @@ -241,6 +241,15 @@ struct request { > > > > struct request *next_rq; > > > > }; > > > > > > > > +/** > > > > + * blk_mq_rq_state() - read the current MQ_RQ_* state of a request > > > > + * @rq: target request. > > > > + */ > > > > +static inline enum mq_rq_state blk_mq_rq_state(struct request *rq) > > > > +{ > > > > + return READ_ONCE(rq->state); > > > > +} > > > > > > Please also explain how drivers can use this function without triggering a > > > race condition with the code that modifies rq->state. > > > > Either queisced or within a timeout handler that already locks the > > request lifetime. > > Hi Keith, > > For future patch series submissions please include a cover letter. The two patch > series that you posted today don't have a cover letter so I can only guess what > the purpose of these two patch series is. Is the purpose of this patch series > perhaps to speed up error handling? If so, why did you choose the approach of > iterating over outstanding requests and telling the block layer to terminate > these requests? Okay, good point. Will do. > I think that the NVMe spec provides a more elegant mechanism, > namely deleting the I/O submission queues. According to what I read in the > 1.3c spec deleting an I/O submission queue forces an NVMe controller to post a > completion for every outstanding request. See also section 5.6 in the NVMe > 1.3c spec. That's actually not what it says. The controller may or may not post a completion entry with a delete SQ command. The first behavior is defined in the spec as "explicit" and the second as "implicit". For implicit, we have to iterate inflight tags.
For some reason I didn't get patches 2/5 and 3/5...
On Fri, Mar 08, 2019 at 12:21:16PM -0800, Sagi Grimberg wrote:
> For some reason I didn't get patches 2/5 and 3/5...
Unreliable 'git send-email'?! :)
They're copied to patchwork too:
https://patchwork.kernel.org/patch/10845225/
https://patchwork.kernel.org/patch/10845229/
On Fri, 2019-03-08 at 12:19 -0700, Keith Busch wrote: > On Fri, Mar 08, 2019 at 10:42:17AM -0800, Bart Van Assche wrote: > > I think that the NVMe spec provides a more elegant mechanism, > > namely deleting the I/O submission queues. According to what I read in the > > 1.3c spec deleting an I/O submission queue forces an NVMe controller to post a > > completion for every outstanding request. See also section 5.6 in the NVMe > > 1.3c spec. > > That's actually not what it says. The controller may or may not post a > completion entry with a delete SQ command. The first behavior is defined > in the spec as "explicit" and the second as "implicit". For implicit, > we have to iterate inflight tags. Hi Keith, Thanks for the clarification. Are you aware of any mechanism in the NVMe spec that causes all outstanding requests to fail? With RDMA this is easy - all one has to do is to change the queue pair state into IB_QPS_ERR. See also ib_drain_qp() in the RDMA core. If no such mechanism has been defined in the NVMe spec: have you considered to cancel all outstanding requests instead of calling blk_mq_end_request() for all outstanding requests? Thanks, Bart.
On Fri, Mar 08, 2019 at 12:47:10PM -0800, Bart Van Assche wrote: > Thanks for the clarification. Are you aware of any mechanism in the NVMe spec > that causes all outstanding requests to fail? With RDMA this is easy - all > one has to do is to change the queue pair state into IB_QPS_ERR. See also > ib_drain_qp() in the RDMA core. Well, we can't always rely on hardware to provide completions. The driver must be able to make forward progress if the device decides to stop responding, or maybe it was disconnected, asserts, or experiences an unsafe shutdown/powerloss. > If no such mechanism has been defined in the NVMe spec: have you considered > to cancel all outstanding requests instead of calling blk_mq_end_request() for > all outstanding requests? Isn't this cancelling requests? Is there an existing block interface that accomplishes this?
On Fri, 2019-03-08 at 14:14 -0700, Keith Busch wrote: > On Fri, Mar 08, 2019 at 12:47:10PM -0800, Bart Van Assche wrote: > > If no such mechanism has been defined in the NVMe spec: have you considered > > to cancel all outstanding requests instead of calling blk_mq_end_request() for > > all outstanding requests? > > Isn't this cancelling requests? Is there an existing block interface > that accomplishes this? Hi Keith, Sorry if I was unclear. With "canceling outstanding requests" I was referring to the NVMe abort command. I think aborting outstanding requests has the advantage that no new explicit call to blk_mq_end_request() call has to be added. Bart.
On Fri, Mar 08, 2019 at 01:25:16PM -0800, Bart Van Assche wrote: > On Fri, 2019-03-08 at 14:14 -0700, Keith Busch wrote: > > On Fri, Mar 08, 2019 at 12:47:10PM -0800, Bart Van Assche wrote: > > > If no such mechanism has been defined in the NVMe spec: have you considered > > > to cancel all outstanding requests instead of calling blk_mq_end_request() for > > > all outstanding requests? > > > > Isn't this cancelling requests? Is there an existing block interface > > that accomplishes this? > > Hi Keith, > > Sorry if I was unclear. With "canceling outstanding requests" I was referring to > the NVMe abort command. I think aborting outstanding requests has the advantage > that no new explicit call to blk_mq_end_request() call has to be added. Ah, gotchya. NVMe abort usage is different than what this series wants to do: we've determined the device is no longer usable, we need to terminate all requests that are queued in software, but haven't been dispatched to hardware.
diff --git a/block/blk-mq.h b/block/blk-mq.h index c11353a3749d..99ab7e472e62 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -128,15 +128,6 @@ extern void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx); void blk_mq_release(struct request_queue *q); -/** - * blk_mq_rq_state() - read the current MQ_RQ_* state of a request - * @rq: target request. - */ -static inline enum mq_rq_state blk_mq_rq_state(struct request *rq) -{ - return READ_ONCE(rq->state); -} - static inline struct blk_mq_ctx *__blk_mq_get_ctx(struct request_queue *q, unsigned int cpu) { diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index faed9d9eb84c..db113aee48bb 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -241,6 +241,15 @@ struct request { struct request *next_rq; }; +/** + * blk_mq_rq_state() - read the current MQ_RQ_* state of a request + * @rq: target request. + */ +static inline enum mq_rq_state blk_mq_rq_state(struct request *rq) +{ + return READ_ONCE(rq->state); +} + static inline bool blk_op_is_scsi(unsigned int op) { return op == REQ_OP_SCSI_IN || op == REQ_OP_SCSI_OUT;
Drivers may need to know the state of their requets. Signed-off-by: Keith Busch <keith.busch@intel.com> --- block/blk-mq.h | 9 --------- include/linux/blkdev.h | 9 +++++++++ 2 files changed, 9 insertions(+), 9 deletions(-)