[1/3] bfq: Avoid false bfq queue merging

Message ID	20200605141629.15347-1-jack@suse.cz (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=tuS3=7S=vger.kernel.org=linux-block-owner@kernel.org> From: Jan Kara <jack@suse.cz> To: <linux-block@vger.kernel.org> Cc: Paolo Valente <paolo.valente@linaro.org>, Jan Kara <jack@suse.cz>, stable@vger.kernel.org Subject: [PATCH 1/3] bfq: Avoid false bfq queue merging Date: Fri, 5 Jun 2020 16:16:16 +0200 Message-Id: <20200605141629.15347-1-jack@suse.cz> In-Reply-To: <20200605140837.5394-1-jack@suse.cz> References: <20200605140837.5394-1-jack@suse.cz> Sender: linux-block-owner@vger.kernel.org Precedence: bulk
Series	[1/3] bfq: Avoid false bfq queue merging \| expand [1/3] bfq: Avoid false bfq queue merging [2/3] bfq: Use 'ttime' local variable [3/3] bfq: Use only idle IO periods for think time calculations

Message ID

20200605141629.15347-1-jack@suse.cz (mailing list archive)

State

New, archived

Headers

From: Jan Kara <jack@suse.cz>
To: <linux-block@vger.kernel.org>
Cc: Paolo Valente <paolo.valente@linaro.org>, Jan Kara <jack@suse.cz>,
        stable@vger.kernel.org
Subject: [PATCH 1/3] bfq: Avoid false bfq queue merging
Date: Fri,  5 Jun 2020 16:16:16 +0200
Message-Id: <20200605141629.15347-1-jack@suse.cz>
In-Reply-To: <20200605140837.5394-1-jack@suse.cz>
References: <20200605140837.5394-1-jack@suse.cz>
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk

Series

[1/3] bfq: Avoid false bfq queue merging | expand

Commit Message

Jan Kara June 5, 2020, 2:16 p.m. UTC

bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
makes sense to merge current bfq queue with the in-service queue.
However if the in-service queue is freshly scheduled and didn't dispatch
any requests yet, bfqd->in_serv_last_pos is stale and contains value
from the previously scheduled bfq queue which can thus result in a bogus
decision that the two queues should be merged. This bug can be observed
for example with the following fio jobfile:

[global]
direct=0
ioengine=sync
invalidate=1
size=1g
rw=read

[reader]
numjobs=4
directory=/mnt

where the 4 processes will end up in the one shared bfq queue although
they do IO to physically very distant files (for some reason I was able to
observe this only with slice_idle=1ms setting).

Fix the problem by invalidating bfqd->in_serv_last_pos when switching
in-service queue.

Fixes: 058fdecc6de7 ("block, bfq: fix in-service-queue check for queue merging")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Paolo Valente June 11, 2020, 7:13 a.m. UTC | #1

> Il giorno 5 giu 2020, alle ore 16:16, Jan Kara <jack@suse.cz> ha scritto:
> 
> bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
> makes sense to merge current bfq queue with the in-service queue.
> However if the in-service queue is freshly scheduled and didn't dispatch
> any requests yet, bfqd->in_serv_last_pos is stale and contains value
> from the previously scheduled bfq queue which can thus result in a bogus
> decision that the two queues should be merged.

Good catch! 

> This bug can be observed
> for example with the following fio jobfile:
> 
> [global]
> direct=0
> ioengine=sync
> invalidate=1
> size=1g
> rw=read
> 
> [reader]
> numjobs=4
> directory=/mnt
> 
> where the 4 processes will end up in the one shared bfq queue although
> they do IO to physically very distant files (for some reason I was able to
> observe this only with slice_idle=1ms setting).
> 
> Fix the problem by invalidating bfqd->in_serv_last_pos when switching
> in-service queue.
> 

Apart from the nonexistent problem that even 0 is a valid LBA :)

Acked-by: Paolo Valente <paolo.valente@linaro.org>

Jan Kara June 11, 2020, 8:31 a.m. UTC | #2

On Thu 11-06-20 09:13:07, Paolo Valente wrote:
> 
> 
> > Il giorno 5 giu 2020, alle ore 16:16, Jan Kara <jack@suse.cz> ha scritto:
> > 
> > bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
> > makes sense to merge current bfq queue with the in-service queue.
> > However if the in-service queue is freshly scheduled and didn't dispatch
> > any requests yet, bfqd->in_serv_last_pos is stale and contains value
> > from the previously scheduled bfq queue which can thus result in a bogus
> > decision that the two queues should be merged.
> 
> Good catch! 
> 
> > This bug can be observed
> > for example with the following fio jobfile:
> > 
> > [global]
> > direct=0
> > ioengine=sync
> > invalidate=1
> > size=1g
> > rw=read
> > 
> > [reader]
> > numjobs=4
> > directory=/mnt
> > 
> > where the 4 processes will end up in the one shared bfq queue although
> > they do IO to physically very distant files (for some reason I was able to
> > observe this only with slice_idle=1ms setting).
> > 
> > Fix the problem by invalidating bfqd->in_serv_last_pos when switching
> > in-service queue.
> > 
> 
> Apart from the nonexistent problem that even 0 is a valid LBA :)

Yes, I was also thinking about that and decided 0 is "good enough" :). But
I just as well just switch to (sector_t)-1 if you think it would be better.

> Acked-by: Paolo Valente <paolo.valente@linaro.org>

Thanks!

								Honza

Paolo Valente June 11, 2020, 2:12 p.m. UTC | #3

> Il giorno 11 giu 2020, alle ore 10:31, Jan Kara <jack@suse.cz> ha scritto:
> 
> On Thu 11-06-20 09:13:07, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 5 giu 2020, alle ore 16:16, Jan Kara <jack@suse.cz> ha scritto:
>>> 
>>> bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
>>> makes sense to merge current bfq queue with the in-service queue.
>>> However if the in-service queue is freshly scheduled and didn't dispatch
>>> any requests yet, bfqd->in_serv_last_pos is stale and contains value
>>> from the previously scheduled bfq queue which can thus result in a bogus
>>> decision that the two queues should be merged.
>> 
>> Good catch! 
>> 
>>> This bug can be observed
>>> for example with the following fio jobfile:
>>> 
>>> [global]
>>> direct=0
>>> ioengine=sync
>>> invalidate=1
>>> size=1g
>>> rw=read
>>> 
>>> [reader]
>>> numjobs=4
>>> directory=/mnt
>>> 
>>> where the 4 processes will end up in the one shared bfq queue although
>>> they do IO to physically very distant files (for some reason I was able to
>>> observe this only with slice_idle=1ms setting).
>>> 
>>> Fix the problem by invalidating bfqd->in_serv_last_pos when switching
>>> in-service queue.
>>> 
>> 
>> Apart from the nonexistent problem that even 0 is a valid LBA :)
> 
> Yes, I was also thinking about that and decided 0 is "good enough" :). But
> I just as well just switch to (sector_t)-1 if you think it would be better.
> 

0 is ok :)

Thanks,
Paolo

>> Acked-by: Paolo Valente <paolo.valente@linaro.org>
> 
> Thanks!
> 
> 								Honza
> 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

Paolo Valente Jan. 10, 2021, 9:21 a.m. UTC | #4

> Il giorno 11 giu 2020, alle ore 16:12, Paolo Valente <paolo.valente@linaro.org> ha scritto:
> 
> 
> 
>> Il giorno 11 giu 2020, alle ore 10:31, Jan Kara <jack@suse.cz> ha scritto:
>> 
>> On Thu 11-06-20 09:13:07, Paolo Valente wrote:
>>> 
>>> 
>>>> Il giorno 5 giu 2020, alle ore 16:16, Jan Kara <jack@suse.cz> ha scritto:
>>>> 
>>>> bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
>>>> makes sense to merge current bfq queue with the in-service queue.
>>>> However if the in-service queue is freshly scheduled and didn't dispatch
>>>> any requests yet, bfqd->in_serv_last_pos is stale and contains value
>>>> from the previously scheduled bfq queue which can thus result in a bogus
>>>> decision that the two queues should be merged.
>>> 
>>> Good catch! 
>>> 
>>>> This bug can be observed
>>>> for example with the following fio jobfile:
>>>> 
>>>> [global]
>>>> direct=0
>>>> ioengine=sync
>>>> invalidate=1
>>>> size=1g
>>>> rw=read
>>>> 
>>>> [reader]
>>>> numjobs=4
>>>> directory=/mnt
>>>> 
>>>> where the 4 processes will end up in the one shared bfq queue although
>>>> they do IO to physically very distant files (for some reason I was able to
>>>> observe this only with slice_idle=1ms setting).
>>>> 
>>>> Fix the problem by invalidating bfqd->in_serv_last_pos when switching
>>>> in-service queue.
>>>> 
>>> 
>>> Apart from the nonexistent problem that even 0 is a valid LBA :)
>> 
>> Yes, I was also thinking about that and decided 0 is "good enough" :). But
>> I just as well just switch to (sector_t)-1 if you think it would be better.
>> 
> 
> 0 is ok :)
> 

Hi Jan,
I've finally tested this patch of yours. No regression.

Once again:
Acked-by: Paolo Valente <paolo.valente@linaro.org>

Thanks,
Paolo

> Thanks,
> Paolo
> 
>>> Acked-by: Paolo Valente <paolo.valente@linaro.org>
>> 
>> Thanks!
>> 
>> 								Honza
>> 
>> -- 
>> Jan Kara <jack@suse.com>
>> SUSE Labs, CR

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 3d411716d7ee..50017275915f 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2937,6 +2937,7 @@  static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
 	}
 
 	bfqd->in_service_queue = bfqq;
+	bfqd->in_serv_last_pos = 0;
 }
 
 /*

[1/3] bfq: Avoid false bfq queue merging

Commit Message

Comments

Patch