diff mbox series

block: Fix inflight statistic for MQ submission with !elevator

Message ID 20200831153127.3561733-1-krisman@collabora.com (mailing list archive)
State New, archived
Headers show
Series block: Fix inflight statistic for MQ submission with !elevator | expand

Commit Message

Gabriel Krisman Bertazi Aug. 31, 2020, 3:31 p.m. UTC
According to Documentation/block/stat.rst, inflight should not include
I/O requests that are in the queue but not yet dispatched to the device,
but blk-mq identifies as inflight any request that has a tag allocated,
which, for queues without elevator, happens at request allocation time
and before it is queued in the ctx (default case in blk_mq_submit_bio).

A more precise approach would be to only consider requests with state
MQ_RQ_IN_FLIGHT.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 block/blk-mq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jens Axboe Aug. 31, 2020, 3:33 p.m. UTC | #1
On 8/31/20 9:31 AM, Gabriel Krisman Bertazi wrote:
> According to Documentation/block/stat.rst, inflight should not include
> I/O requests that are in the queue but not yet dispatched to the device,
> but blk-mq identifies as inflight any request that has a tag allocated,
> which, for queues without elevator, happens at request allocation time
> and before it is queued in the ctx (default case in blk_mq_submit_bio).
> 
> A more precise approach would be to only consider requests with state
> MQ_RQ_IN_FLIGHT.

We've had some churn here, last change in this area was:

commit 6131837b1de66116459ef4413e26fdbc70d066dc
Author: Omar Sandoval <osandov@fb.com>
Date:   Thu Apr 26 00:21:58 2018 -0700

    blk-mq: count allocated but not started requests in iostats inflight

which your patch basically just reverts. So more testing/explanation
needed on why it's necessary.
Gabriel Krisman Bertazi Aug. 31, 2020, 3:50 p.m. UTC | #2
Jens Axboe <axboe@kernel.dk> writes:

> On 8/31/20 9:31 AM, Gabriel Krisman Bertazi wrote:
>> According to Documentation/block/stat.rst, inflight should not include
>> I/O requests that are in the queue but not yet dispatched to the device,
>> but blk-mq identifies as inflight any request that has a tag allocated,
>> which, for queues without elevator, happens at request allocation time
>> and before it is queued in the ctx (default case in blk_mq_submit_bio).
>> 
>> A more precise approach would be to only consider requests with state
>> MQ_RQ_IN_FLIGHT.
>
> We've had some churn here, last change in this area was:

Hi Jens,

Thanks for the quick reply.  I wasn't aware of that patch.

I'm collecting statistics on the queue depth of a NCQ disk, and my end
goal is to have a time_in_driver clock, which is the time spent by IOs
in the driver, similarly to what is shown in diskstats, but the latter
includes the block layer time.  I stumbled this issue with inflight,
where we noticed a difference between inflight and what was actually
dispatched to the driver.  The problem is the current behavior doesn't
seem to match the documentation in Documentation/block/stat.sh.

I went back to history.git and found that this difference in behavior
from the documentation has always been there for legacy path, unless I
am misreading the documentation.  The patch I proposed consolidates
inflight in favor of documentation instead of the legacy patch.

The documentation reads:

"""
This value counts the number of I/O requests that have been issued to
the device driver but have not yet completed.  It does not include I/O
requests that are in the queue but not yet issued to the device driver.
"""

Should I patch the documentation instead?

Thinking about semantics, it seem more useful to have inflight include
only the time between dispatching and completion, but if you think there
is a concern about stable abi here, would you accept a new in_device file
tracking this metric?
Ming Lei Sept. 1, 2020, 1:18 a.m. UTC | #3
On Mon, Aug 31, 2020 at 11:37 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> According to Documentation/block/stat.rst, inflight should not include
> I/O requests that are in the queue but not yet dispatched to the device,
> but blk-mq identifies as inflight any request that has a tag allocated,
> which, for queues without elevator, happens at request allocation time
> and before it is queued in the ctx (default case in blk_mq_submit_bio).
>
> A more precise approach would be to only consider requests with state
> MQ_RQ_IN_FLIGHT.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  block/blk-mq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 0015a1892153..997b3327eaa8 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -105,7 +105,7 @@ static bool blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
>  {
>         struct mq_inflight *mi = priv;
>
> -       if (rq->part == mi->part)
> +       if (rq->part == mi->part && rq->state == MQ_RQ_IN_FLIGHT)
>                 mi->inflight[rq_data_dir(rq)]++;

The fix looks fine. However, we have helper of
blk_mq_request_started() for this purpose.


Thanks,
Ming Lei
Jens Axboe Sept. 1, 2020, 3:42 a.m. UTC | #4
On 8/31/20 7:18 PM, Ming Lei wrote:
> On Mon, Aug 31, 2020 at 11:37 PM Gabriel Krisman Bertazi
> <krisman@collabora.com> wrote:
>>
>> According to Documentation/block/stat.rst, inflight should not include
>> I/O requests that are in the queue but not yet dispatched to the device,
>> but blk-mq identifies as inflight any request that has a tag allocated,
>> which, for queues without elevator, happens at request allocation time
>> and before it is queued in the ctx (default case in blk_mq_submit_bio).
>>
>> A more precise approach would be to only consider requests with state
>> MQ_RQ_IN_FLIGHT.
>>
>> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
>> ---
>>  block/blk-mq.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 0015a1892153..997b3327eaa8 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -105,7 +105,7 @@ static bool blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
>>  {
>>         struct mq_inflight *mi = priv;
>>
>> -       if (rq->part == mi->part)
>> +       if (rq->part == mi->part && rq->state == MQ_RQ_IN_FLIGHT)
>>                 mi->inflight[rq_data_dir(rq)]++;
> 
> The fix looks fine. However, we have helper of
> blk_mq_request_started() for this purpose.

Why does it look fine? Did you see the older commit I referenced? I'm
not saying the change is wrong per se, just that this is the behavior
we've always had, and making this change would deviate from that. As
Gabriel states in the follow up, it's either changing the documentation
or the patch.

When replying to patches that have had previous discussion, please
reference that when making followups. That makes for a more productive
discussion on how to proceed.
Ming Lei Sept. 1, 2020, 6:36 a.m. UTC | #5
Hi Jens,

On Mon, Aug 31, 2020 at 09:42:05PM -0600, Jens Axboe wrote:
> On 8/31/20 7:18 PM, Ming Lei wrote:
> > On Mon, Aug 31, 2020 at 11:37 PM Gabriel Krisman Bertazi
> > <krisman@collabora.com> wrote:
> >>
> >> According to Documentation/block/stat.rst, inflight should not include
> >> I/O requests that are in the queue but not yet dispatched to the device,
> >> but blk-mq identifies as inflight any request that has a tag allocated,
> >> which, for queues without elevator, happens at request allocation time
> >> and before it is queued in the ctx (default case in blk_mq_submit_bio).
> >>
> >> A more precise approach would be to only consider requests with state
> >> MQ_RQ_IN_FLIGHT.
> >>
> >> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> >> ---
> >>  block/blk-mq.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >> index 0015a1892153..997b3327eaa8 100644
> >> --- a/block/blk-mq.c
> >> +++ b/block/blk-mq.c
> >> @@ -105,7 +105,7 @@ static bool blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
> >>  {
> >>         struct mq_inflight *mi = priv;
> >>
> >> -       if (rq->part == mi->part)
> >> +       if (rq->part == mi->part && rq->state == MQ_RQ_IN_FLIGHT)
> >>                 mi->inflight[rq_data_dir(rq)]++;
> > 
> > The fix looks fine. However, we have helper of
> > blk_mq_request_started() for this purpose.
> 
> Why does it look fine? Did you see the older commit I referenced? I'm

Looks my gmail inbox has problem, and I didn't see your referenced commit.
but I can see your reply just now in my redhat email box, sorry for that.

BTW, commit 6131837b1de6 ("blk-mq: count allocated but not started requests
in iostats inflight") didn't does what it claimed. blk_mq_queue_tag_busy_iter()
iterates over driver tags, so for real io scheduler, blk_mq_check_inflight()
basically returns count of inflight request, instead of allocated request.

Even worse, since commit 6131837b1de6 blk_mq_in_flight() behaves inconsistently
between q->elevator and !q->elevator.

> not saying the change is wrong per se, just that this is the behavior
> we've always had, and making this change would deviate from that. As
> Gabriel states in the follow up, it's either changing the documentation
> or the patch.

Looks iostat doesn't use the 'inflight' count, so what is the userspace's
expectation on this counter?

If it counts allocated request, it is easy for userspace to observe different
statistics if someone updates nr_requests via sysfs.


Thanks,
Ming
Gabriel Krisman Bertazi Sept. 1, 2020, 6:37 p.m. UTC | #6
Ming Lei <tom.leiming@gmail.com> writes:

> On Mon, Aug 31, 2020 at 11:37 PM Gabriel Krisman Bertazi
> <krisman@collabora.com> wrote:

>> -       if (rq->part == mi->part)
>> +       if (rq->part == mi->part && rq->state == MQ_RQ_IN_FLIGHT)
>>                 mi->inflight[rq_data_dir(rq)]++;
>
> The fix looks fine. However, we have helper of
> blk_mq_request_started() for this purpose.

Thanks for the review.

I was aware of blk_mq_request_started, but it forces a READ_ONCE which
on Alpha includes a mb() for every tagged request, which doesn't seem
necessary or desired here.  I might be wrong though, memory barriers
are hard. :)

let's see what Jens says about the other points, so I don't spin this
unnecessarily.
Jens Axboe Sept. 1, 2020, 10:37 p.m. UTC | #7
On 9/1/20 12:36 AM, Ming Lei wrote:
> Hi Jens,
> 
> On Mon, Aug 31, 2020 at 09:42:05PM -0600, Jens Axboe wrote:
>> On 8/31/20 7:18 PM, Ming Lei wrote:
>>> On Mon, Aug 31, 2020 at 11:37 PM Gabriel Krisman Bertazi
>>> <krisman@collabora.com> wrote:
>>>>
>>>> According to Documentation/block/stat.rst, inflight should not include
>>>> I/O requests that are in the queue but not yet dispatched to the device,
>>>> but blk-mq identifies as inflight any request that has a tag allocated,
>>>> which, for queues without elevator, happens at request allocation time
>>>> and before it is queued in the ctx (default case in blk_mq_submit_bio).
>>>>
>>>> A more precise approach would be to only consider requests with state
>>>> MQ_RQ_IN_FLIGHT.
>>>>
>>>> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
>>>> ---
>>>>  block/blk-mq.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> index 0015a1892153..997b3327eaa8 100644
>>>> --- a/block/blk-mq.c
>>>> +++ b/block/blk-mq.c
>>>> @@ -105,7 +105,7 @@ static bool blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
>>>>  {
>>>>         struct mq_inflight *mi = priv;
>>>>
>>>> -       if (rq->part == mi->part)
>>>> +       if (rq->part == mi->part && rq->state == MQ_RQ_IN_FLIGHT)
>>>>                 mi->inflight[rq_data_dir(rq)]++;
>>>
>>> The fix looks fine. However, we have helper of
>>> blk_mq_request_started() for this purpose.
>>
>> Why does it look fine? Did you see the older commit I referenced? I'm
> 
> Looks my gmail inbox has problem, and I didn't see your referenced
> commit. but I can see your reply just now in my redhat email box,
> sorry for that.

Ah ok, that explains it, just felt like it was being ignored.

> BTW, commit 6131837b1de6 ("blk-mq: count allocated but not started requests
> in iostats inflight") didn't does what it claimed. blk_mq_queue_tag_busy_iter()
> iterates over driver tags, so for real io scheduler, blk_mq_check_inflight()
> basically returns count of inflight request, instead of allocated request.
> 
> Even worse, since commit 6131837b1de6 blk_mq_in_flight() behaves inconsistently
> between q->elevator and !q->elevator.

I agree, that is definitely a problem, and I've run into this internally
at FB in fact.

>> not saying the change is wrong per se, just that this is the behavior
>> we've always had, and making this change would deviate from that. As
>> Gabriel states in the follow up, it's either changing the documentation
>> or the patch.
> 
> Looks iostat doesn't use the 'inflight' count, so what is the
> userspace's expectation on this counter?
> 
> If it counts allocated request, it is easy for userspace to observe
> different statistics if someone updates nr_requests via sysfs.

I don't think there's necessarily an expectation, but if we change the
scope then that might cause problems for folks. All of a sudden the
iowait looks less, for example, and that can be hard to explain.

That said, I do think the patch makes sense, obviously, since that's how
I originally did the mq accounting. But it's hard to argue with history
and expectations, and that is my worry. I don't want to apply this
patch, then have to revert the behavior a few revisions later. And then
we end up with different kernels that have different metrics, and that's
somewhat of a mess.
Jens Axboe Sept. 1, 2020, 10:39 p.m. UTC | #8
On 9/1/20 12:37 PM, Gabriel Krisman Bertazi wrote:
> Ming Lei <tom.leiming@gmail.com> writes:
> 
>> On Mon, Aug 31, 2020 at 11:37 PM Gabriel Krisman Bertazi
>> <krisman@collabora.com> wrote:
> 
>>> -       if (rq->part == mi->part)
>>> +       if (rq->part == mi->part && rq->state == MQ_RQ_IN_FLIGHT)
>>>                 mi->inflight[rq_data_dir(rq)]++;
>>
>> The fix looks fine. However, we have helper of
>> blk_mq_request_started() for this purpose.
> 
> Thanks for the review.
> 
> I was aware of blk_mq_request_started, but it forces a READ_ONCE which
> on Alpha includes a mb() for every tagged request, which doesn't seem
> necessary or desired here.  I might be wrong though, memory barriers
> are hard. :)
> 
> let's see what Jens says about the other points, so I don't spin this
> unnecessarily.

On the READ_ONCE() part, I don't really care about alpha to the extent
that I'll sacrifice readability and code unification to cater to that
specific architecture :-)

We just need to decide if this makes sense or not. I think we should
apply this for 5.10, with Ming's suggestion of using
blk_mq_request_started(). Then I guess we'll see what happens...
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0015a1892153..997b3327eaa8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -105,7 +105,7 @@  static bool blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
 {
 	struct mq_inflight *mi = priv;
 
-	if (rq->part == mi->part)
+	if (rq->part == mi->part && rq->state == MQ_RQ_IN_FLIGHT)
 		mi->inflight[rq_data_dir(rq)]++;
 
 	return true;