Message ID | 20200831153127.3561733-1-krisman@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: Fix inflight statistic for MQ submission with !elevator | expand |
On 8/31/20 9:31 AM, Gabriel Krisman Bertazi wrote: > According to Documentation/block/stat.rst, inflight should not include > I/O requests that are in the queue but not yet dispatched to the device, > but blk-mq identifies as inflight any request that has a tag allocated, > which, for queues without elevator, happens at request allocation time > and before it is queued in the ctx (default case in blk_mq_submit_bio). > > A more precise approach would be to only consider requests with state > MQ_RQ_IN_FLIGHT. We've had some churn here, last change in this area was: commit 6131837b1de66116459ef4413e26fdbc70d066dc Author: Omar Sandoval <osandov@fb.com> Date: Thu Apr 26 00:21:58 2018 -0700 blk-mq: count allocated but not started requests in iostats inflight which your patch basically just reverts. So more testing/explanation needed on why it's necessary.
Jens Axboe <axboe@kernel.dk> writes: > On 8/31/20 9:31 AM, Gabriel Krisman Bertazi wrote: >> According to Documentation/block/stat.rst, inflight should not include >> I/O requests that are in the queue but not yet dispatched to the device, >> but blk-mq identifies as inflight any request that has a tag allocated, >> which, for queues without elevator, happens at request allocation time >> and before it is queued in the ctx (default case in blk_mq_submit_bio). >> >> A more precise approach would be to only consider requests with state >> MQ_RQ_IN_FLIGHT. > > We've had some churn here, last change in this area was: Hi Jens, Thanks for the quick reply. I wasn't aware of that patch. I'm collecting statistics on the queue depth of a NCQ disk, and my end goal is to have a time_in_driver clock, which is the time spent by IOs in the driver, similarly to what is shown in diskstats, but the latter includes the block layer time. I stumbled this issue with inflight, where we noticed a difference between inflight and what was actually dispatched to the driver. The problem is the current behavior doesn't seem to match the documentation in Documentation/block/stat.sh. I went back to history.git and found that this difference in behavior from the documentation has always been there for legacy path, unless I am misreading the documentation. The patch I proposed consolidates inflight in favor of documentation instead of the legacy patch. The documentation reads: """ This value counts the number of I/O requests that have been issued to the device driver but have not yet completed. It does not include I/O requests that are in the queue but not yet issued to the device driver. """ Should I patch the documentation instead? Thinking about semantics, it seem more useful to have inflight include only the time between dispatching and completion, but if you think there is a concern about stable abi here, would you accept a new in_device file tracking this metric?
On Mon, Aug 31, 2020 at 11:37 PM Gabriel Krisman Bertazi <krisman@collabora.com> wrote: > > According to Documentation/block/stat.rst, inflight should not include > I/O requests that are in the queue but not yet dispatched to the device, > but blk-mq identifies as inflight any request that has a tag allocated, > which, for queues without elevator, happens at request allocation time > and before it is queued in the ctx (default case in blk_mq_submit_bio). > > A more precise approach would be to only consider requests with state > MQ_RQ_IN_FLIGHT. > > Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> > --- > block/blk-mq.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 0015a1892153..997b3327eaa8 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -105,7 +105,7 @@ static bool blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx, > { > struct mq_inflight *mi = priv; > > - if (rq->part == mi->part) > + if (rq->part == mi->part && rq->state == MQ_RQ_IN_FLIGHT) > mi->inflight[rq_data_dir(rq)]++; The fix looks fine. However, we have helper of blk_mq_request_started() for this purpose. Thanks, Ming Lei
On 8/31/20 7:18 PM, Ming Lei wrote: > On Mon, Aug 31, 2020 at 11:37 PM Gabriel Krisman Bertazi > <krisman@collabora.com> wrote: >> >> According to Documentation/block/stat.rst, inflight should not include >> I/O requests that are in the queue but not yet dispatched to the device, >> but blk-mq identifies as inflight any request that has a tag allocated, >> which, for queues without elevator, happens at request allocation time >> and before it is queued in the ctx (default case in blk_mq_submit_bio). >> >> A more precise approach would be to only consider requests with state >> MQ_RQ_IN_FLIGHT. >> >> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> >> --- >> block/blk-mq.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 0015a1892153..997b3327eaa8 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -105,7 +105,7 @@ static bool blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx, >> { >> struct mq_inflight *mi = priv; >> >> - if (rq->part == mi->part) >> + if (rq->part == mi->part && rq->state == MQ_RQ_IN_FLIGHT) >> mi->inflight[rq_data_dir(rq)]++; > > The fix looks fine. However, we have helper of > blk_mq_request_started() for this purpose. Why does it look fine? Did you see the older commit I referenced? I'm not saying the change is wrong per se, just that this is the behavior we've always had, and making this change would deviate from that. As Gabriel states in the follow up, it's either changing the documentation or the patch. When replying to patches that have had previous discussion, please reference that when making followups. That makes for a more productive discussion on how to proceed.
Hi Jens, On Mon, Aug 31, 2020 at 09:42:05PM -0600, Jens Axboe wrote: > On 8/31/20 7:18 PM, Ming Lei wrote: > > On Mon, Aug 31, 2020 at 11:37 PM Gabriel Krisman Bertazi > > <krisman@collabora.com> wrote: > >> > >> According to Documentation/block/stat.rst, inflight should not include > >> I/O requests that are in the queue but not yet dispatched to the device, > >> but blk-mq identifies as inflight any request that has a tag allocated, > >> which, for queues without elevator, happens at request allocation time > >> and before it is queued in the ctx (default case in blk_mq_submit_bio). > >> > >> A more precise approach would be to only consider requests with state > >> MQ_RQ_IN_FLIGHT. > >> > >> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> > >> --- > >> block/blk-mq.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/block/blk-mq.c b/block/blk-mq.c > >> index 0015a1892153..997b3327eaa8 100644 > >> --- a/block/blk-mq.c > >> +++ b/block/blk-mq.c > >> @@ -105,7 +105,7 @@ static bool blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx, > >> { > >> struct mq_inflight *mi = priv; > >> > >> - if (rq->part == mi->part) > >> + if (rq->part == mi->part && rq->state == MQ_RQ_IN_FLIGHT) > >> mi->inflight[rq_data_dir(rq)]++; > > > > The fix looks fine. However, we have helper of > > blk_mq_request_started() for this purpose. > > Why does it look fine? Did you see the older commit I referenced? I'm Looks my gmail inbox has problem, and I didn't see your referenced commit. but I can see your reply just now in my redhat email box, sorry for that. BTW, commit 6131837b1de6 ("blk-mq: count allocated but not started requests in iostats inflight") didn't does what it claimed. blk_mq_queue_tag_busy_iter() iterates over driver tags, so for real io scheduler, blk_mq_check_inflight() basically returns count of inflight request, instead of allocated request. Even worse, since commit 6131837b1de6 blk_mq_in_flight() behaves inconsistently between q->elevator and !q->elevator. > not saying the change is wrong per se, just that this is the behavior > we've always had, and making this change would deviate from that. As > Gabriel states in the follow up, it's either changing the documentation > or the patch. Looks iostat doesn't use the 'inflight' count, so what is the userspace's expectation on this counter? If it counts allocated request, it is easy for userspace to observe different statistics if someone updates nr_requests via sysfs. Thanks, Ming
Ming Lei <tom.leiming@gmail.com> writes: > On Mon, Aug 31, 2020 at 11:37 PM Gabriel Krisman Bertazi > <krisman@collabora.com> wrote: >> - if (rq->part == mi->part) >> + if (rq->part == mi->part && rq->state == MQ_RQ_IN_FLIGHT) >> mi->inflight[rq_data_dir(rq)]++; > > The fix looks fine. However, we have helper of > blk_mq_request_started() for this purpose. Thanks for the review. I was aware of blk_mq_request_started, but it forces a READ_ONCE which on Alpha includes a mb() for every tagged request, which doesn't seem necessary or desired here. I might be wrong though, memory barriers are hard. :) let's see what Jens says about the other points, so I don't spin this unnecessarily.
On 9/1/20 12:36 AM, Ming Lei wrote: > Hi Jens, > > On Mon, Aug 31, 2020 at 09:42:05PM -0600, Jens Axboe wrote: >> On 8/31/20 7:18 PM, Ming Lei wrote: >>> On Mon, Aug 31, 2020 at 11:37 PM Gabriel Krisman Bertazi >>> <krisman@collabora.com> wrote: >>>> >>>> According to Documentation/block/stat.rst, inflight should not include >>>> I/O requests that are in the queue but not yet dispatched to the device, >>>> but blk-mq identifies as inflight any request that has a tag allocated, >>>> which, for queues without elevator, happens at request allocation time >>>> and before it is queued in the ctx (default case in blk_mq_submit_bio). >>>> >>>> A more precise approach would be to only consider requests with state >>>> MQ_RQ_IN_FLIGHT. >>>> >>>> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> >>>> --- >>>> block/blk-mq.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>> index 0015a1892153..997b3327eaa8 100644 >>>> --- a/block/blk-mq.c >>>> +++ b/block/blk-mq.c >>>> @@ -105,7 +105,7 @@ static bool blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx, >>>> { >>>> struct mq_inflight *mi = priv; >>>> >>>> - if (rq->part == mi->part) >>>> + if (rq->part == mi->part && rq->state == MQ_RQ_IN_FLIGHT) >>>> mi->inflight[rq_data_dir(rq)]++; >>> >>> The fix looks fine. However, we have helper of >>> blk_mq_request_started() for this purpose. >> >> Why does it look fine? Did you see the older commit I referenced? I'm > > Looks my gmail inbox has problem, and I didn't see your referenced > commit. but I can see your reply just now in my redhat email box, > sorry for that. Ah ok, that explains it, just felt like it was being ignored. > BTW, commit 6131837b1de6 ("blk-mq: count allocated but not started requests > in iostats inflight") didn't does what it claimed. blk_mq_queue_tag_busy_iter() > iterates over driver tags, so for real io scheduler, blk_mq_check_inflight() > basically returns count of inflight request, instead of allocated request. > > Even worse, since commit 6131837b1de6 blk_mq_in_flight() behaves inconsistently > between q->elevator and !q->elevator. I agree, that is definitely a problem, and I've run into this internally at FB in fact. >> not saying the change is wrong per se, just that this is the behavior >> we've always had, and making this change would deviate from that. As >> Gabriel states in the follow up, it's either changing the documentation >> or the patch. > > Looks iostat doesn't use the 'inflight' count, so what is the > userspace's expectation on this counter? > > If it counts allocated request, it is easy for userspace to observe > different statistics if someone updates nr_requests via sysfs. I don't think there's necessarily an expectation, but if we change the scope then that might cause problems for folks. All of a sudden the iowait looks less, for example, and that can be hard to explain. That said, I do think the patch makes sense, obviously, since that's how I originally did the mq accounting. But it's hard to argue with history and expectations, and that is my worry. I don't want to apply this patch, then have to revert the behavior a few revisions later. And then we end up with different kernels that have different metrics, and that's somewhat of a mess.
On 9/1/20 12:37 PM, Gabriel Krisman Bertazi wrote: > Ming Lei <tom.leiming@gmail.com> writes: > >> On Mon, Aug 31, 2020 at 11:37 PM Gabriel Krisman Bertazi >> <krisman@collabora.com> wrote: > >>> - if (rq->part == mi->part) >>> + if (rq->part == mi->part && rq->state == MQ_RQ_IN_FLIGHT) >>> mi->inflight[rq_data_dir(rq)]++; >> >> The fix looks fine. However, we have helper of >> blk_mq_request_started() for this purpose. > > Thanks for the review. > > I was aware of blk_mq_request_started, but it forces a READ_ONCE which > on Alpha includes a mb() for every tagged request, which doesn't seem > necessary or desired here. I might be wrong though, memory barriers > are hard. :) > > let's see what Jens says about the other points, so I don't spin this > unnecessarily. On the READ_ONCE() part, I don't really care about alpha to the extent that I'll sacrifice readability and code unification to cater to that specific architecture :-) We just need to decide if this makes sense or not. I think we should apply this for 5.10, with Ming's suggestion of using blk_mq_request_started(). Then I guess we'll see what happens...
diff --git a/block/blk-mq.c b/block/blk-mq.c index 0015a1892153..997b3327eaa8 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -105,7 +105,7 @@ static bool blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx, { struct mq_inflight *mi = priv; - if (rq->part == mi->part) + if (rq->part == mi->part && rq->state == MQ_RQ_IN_FLIGHT) mi->inflight[rq_data_dir(rq)]++; return true;
According to Documentation/block/stat.rst, inflight should not include I/O requests that are in the queue but not yet dispatched to the device, but blk-mq identifies as inflight any request that has a tag allocated, which, for queues without elevator, happens at request allocation time and before it is queued in the ctx (default case in blk_mq_submit_bio). A more precise approach would be to only consider requests with state MQ_RQ_IN_FLIGHT. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> --- block/blk-mq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)