diff mbox series

[3/3] block: introducing a bias over deadline's fifo_time

Message ID 20240208093136.178797-3-zhaoyang.huang@unisoc.com (mailing list archive)
State New, archived
Headers show
Series [1/3] sched: fix compiling error on kernel/sched/sched.h | expand

Commit Message

zhaoyang.huang Feb. 8, 2024, 9:31 a.m. UTC
From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

According to current policy, RT tasks possess the privilege for both of
CPU and IO scheduler which could have the preempted CFS tasks suffer big
IO-latency unfairly. This commit introduce an approximate method to
deduct the preempt affection.

TaskA
sched in
|
|
|
submit_bio
|
|
|
fifo_time = jiffies + expire
(insert_request)

TaskB
sched in
|
|
preempted by RT task
|\
| This period time is unfair to TaskB's IO request, should be adjust
|/
submit_bio
|
|
|
fifo_time = jiffies + expire * CFS_PROPORTION(rq)
(insert_request)

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 block/mq-deadline.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Bart Van Assche Feb. 8, 2024, 5:46 p.m. UTC | #1
On 2/8/24 01:31, zhaoyang.huang wrote:
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index f958e79277b8..43c08c3d6f18 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -15,6 +15,7 @@
>   #include <linux/compiler.h>
>   #include <linux/rbtree.h>
>   #include <linux/sbitmap.h>
> +#include "../kernel/sched/sched.h"

Is kernel/sched/sched.h perhaps a private scheduler kernel header file? Shouldn't
block layer code only include public scheduler header files?

> @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>   		/*
>   		 * set expire time and add to fifo list
>   		 */
> -		rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> +		fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
> +			CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
> +		rq->fifo_time = jiffies + fifo_expire;
>   		insert_before = &per_prio->fifo_list[data_dir];
>   #ifdef CONFIG_BLK_DEV_ZONED
>   		/*

Making the mq-deadline request expiry time dependent on the task priority seems wrong
to me.

Thanks,

Bart.
Jens Axboe Feb. 8, 2024, 5:49 p.m. UTC | #2
On 2/8/24 2:31 AM, zhaoyang.huang wrote:
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index f958e79277b8..43c08c3d6f18 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -15,6 +15,7 @@
>  #include <linux/compiler.h>
>  #include <linux/rbtree.h>
>  #include <linux/sbitmap.h>
> +#include "../kernel/sched/sched.h"
>  
>  #include <trace/events/block.h>
>  
> @@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  	u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
>  	struct dd_per_prio *per_prio;
>  	enum dd_prio prio;
> +	int fifo_expire;
>  
>  	lockdep_assert_held(&dd->lock);
>  
> @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  		/*
>  		 * set expire time and add to fifo list
>  		 */
> -		rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> +		fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
> +			CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
> +		rq->fifo_time = jiffies + fifo_expire;
>  		insert_before = &per_prio->fifo_list[data_dir];
>  #ifdef CONFIG_BLK_DEV_ZONED
>  		/*

Hard pass on this blatant layering violation. Just like the priority
changes, this utterly fails to understand how things are properly
designed.
Zhaoyang Huang Feb. 8, 2024, 11:52 p.m. UTC | #3
On Fri, Feb 9, 2024 at 1:46 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 2/8/24 01:31, zhaoyang.huang wrote:
> > diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> > index f958e79277b8..43c08c3d6f18 100644
> > --- a/block/mq-deadline.c
> > +++ b/block/mq-deadline.c
> > @@ -15,6 +15,7 @@
> >   #include <linux/compiler.h>
> >   #include <linux/rbtree.h>
> >   #include <linux/sbitmap.h>
> > +#include "../kernel/sched/sched.h"
>
> Is kernel/sched/sched.h perhaps a private scheduler kernel header file? Shouldn't
> block layer code only include public scheduler header files?
>
> > @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> >               /*
> >                * set expire time and add to fifo list
> >                */
> > -             rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> > +             fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
> > +                     CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
> > +             rq->fifo_time = jiffies + fifo_expire;
> >               insert_before = &per_prio->fifo_list[data_dir];
> >   #ifdef CONFIG_BLK_DEV_ZONED
> >               /*
>
> Making the mq-deadline request expiry time dependent on the task priority seems wrong
> to me.
But bio_set_ioprio has done this before
>
> Thanks,
>
> Bart.
Zhaoyang Huang Feb. 9, 2024, 12:02 a.m. UTC | #4
On Fri, Feb 9, 2024 at 1:49 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 2/8/24 2:31 AM, zhaoyang.huang wrote:
> > diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> > index f958e79277b8..43c08c3d6f18 100644
> > --- a/block/mq-deadline.c
> > +++ b/block/mq-deadline.c
> > @@ -15,6 +15,7 @@
> >  #include <linux/compiler.h>
> >  #include <linux/rbtree.h>
> >  #include <linux/sbitmap.h>
> > +#include "../kernel/sched/sched.h"
> >
> >  #include <trace/events/block.h>
> >
> > @@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> >       u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
> >       struct dd_per_prio *per_prio;
> >       enum dd_prio prio;
> > +     int fifo_expire;
> >
> >       lockdep_assert_held(&dd->lock);
> >
> > @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> >               /*
> >                * set expire time and add to fifo list
> >                */
> > -             rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> > +             fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
> > +                     CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
> > +             rq->fifo_time = jiffies + fifo_expire;
> >               insert_before = &per_prio->fifo_list[data_dir];
> >  #ifdef CONFIG_BLK_DEV_ZONED
> >               /*
>
> Hard pass on this blatant layering violation. Just like the priority
> changes, this utterly fails to understand how things are properly
> designed.
IMHO, I don't think this is a layering violation. bio_set_ioprio is
the one which introduces the scheduler thing into the block layer,
this commit just wants to do a little improvement based on that. This
commit helps CFS task save some IO time when preempted by RT heavily.

PS: [PATCHv9 1/1] block: introduce content activity based ioprio has
solved layering violation issue. Could you please have a look.
>
> --
> Jens Axboe
>
Jens Axboe Feb. 9, 2024, 12:10 a.m. UTC | #5
On 2/8/24 5:02 PM, Zhaoyang Huang wrote:
> On Fri, Feb 9, 2024 at 1:49?AM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 2/8/24 2:31 AM, zhaoyang.huang wrote:
>>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>>> index f958e79277b8..43c08c3d6f18 100644
>>> --- a/block/mq-deadline.c
>>> +++ b/block/mq-deadline.c
>>> @@ -15,6 +15,7 @@
>>>  #include <linux/compiler.h>
>>>  #include <linux/rbtree.h>
>>>  #include <linux/sbitmap.h>
>>> +#include "../kernel/sched/sched.h"
>>>
>>>  #include <trace/events/block.h>
>>>
>>> @@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>       u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
>>>       struct dd_per_prio *per_prio;
>>>       enum dd_prio prio;
>>> +     int fifo_expire;
>>>
>>>       lockdep_assert_held(&dd->lock);
>>>
>>> @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>               /*
>>>                * set expire time and add to fifo list
>>>                */
>>> -             rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
>>> +             fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
>>> +                     CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
>>> +             rq->fifo_time = jiffies + fifo_expire;
>>>               insert_before = &per_prio->fifo_list[data_dir];
>>>  #ifdef CONFIG_BLK_DEV_ZONED
>>>               /*
>>
>> Hard pass on this blatant layering violation. Just like the priority
>> changes, this utterly fails to understand how things are properly
>> designed.
> IMHO, I don't think this is a layering violation. bio_set_ioprio is
> the one which introduces the scheduler thing into the block layer,
> this commit just wants to do a little improvement based on that. This
> commit helps CFS task save some IO time when preempted by RT heavily.

Listen, both this and the previous content ioprio thing show a glaring
misunderstanding of how to design these kinds of things. You have no
grasp of what the different layers do, or how they interact. I'm not
sure how to put this kindly, but it's really an awful idea to hardcore
some CFS helper into the IO scheduler. The fact that you had to fiddle
around with headers to make it work was the first warning sign, and the
fact that you didn't stop at that point to consider how it could be
properly done makes it even worse.

You need to stop sending kernel patches until you understand basic
software design. Neither of these patches are going anywhere until this
happens. There's been plenty of feedback to telling you that, but you
seem to just ignore it and plow on ahead. Stop.
Zhaoyang Huang Feb. 9, 2024, 12:28 a.m. UTC | #6
On Fri, Feb 9, 2024 at 8:11 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 2/8/24 5:02 PM, Zhaoyang Huang wrote:
> > On Fri, Feb 9, 2024 at 1:49?AM Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> On 2/8/24 2:31 AM, zhaoyang.huang wrote:
> >>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> >>> index f958e79277b8..43c08c3d6f18 100644
> >>> --- a/block/mq-deadline.c
> >>> +++ b/block/mq-deadline.c
> >>> @@ -15,6 +15,7 @@
> >>>  #include <linux/compiler.h>
> >>>  #include <linux/rbtree.h>
> >>>  #include <linux/sbitmap.h>
> >>> +#include "../kernel/sched/sched.h"
> >>>
> >>>  #include <trace/events/block.h>
> >>>
> >>> @@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> >>>       u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
> >>>       struct dd_per_prio *per_prio;
> >>>       enum dd_prio prio;
> >>> +     int fifo_expire;
> >>>
> >>>       lockdep_assert_held(&dd->lock);
> >>>
> >>> @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> >>>               /*
> >>>                * set expire time and add to fifo list
> >>>                */
> >>> -             rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> >>> +             fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
> >>> +                     CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
> >>> +             rq->fifo_time = jiffies + fifo_expire;
> >>>               insert_before = &per_prio->fifo_list[data_dir];
> >>>  #ifdef CONFIG_BLK_DEV_ZONED
> >>>               /*
> >>
> >> Hard pass on this blatant layering violation. Just like the priority
> >> changes, this utterly fails to understand how things are properly
> >> designed.
> > IMHO, I don't think this is a layering violation. bio_set_ioprio is
> > the one which introduces the scheduler thing into the block layer,
> > this commit just wants to do a little improvement based on that. This
> > commit helps CFS task save some IO time when preempted by RT heavily.
>
> Listen, both this and the previous content ioprio thing show a glaring
> misunderstanding of how to design these kinds of things. You have no
> grasp of what the different layers do, or how they interact. I'm not
> sure how to put this kindly, but it's really an awful idea to hardcore
> some CFS helper into the IO scheduler. The fact that you had to fiddle
> around with headers to make it work was the first warning sign, and the
> fact that you didn't stop at that point to consider how it could be
> properly done makes it even worse.
>
> You need to stop sending kernel patches until you understand basic
> software design. Neither of these patches are going anywhere until this
> happens. There's been plenty of feedback to telling you that, but you
> seem to just ignore it and plow on ahead. Stop.
Ok, thanks for pointing this out, I will follow your advice. But I
have to say that '[PATCHv9 1/1] block: introduce content activity
based ioprio' really solved layering violation things. I would like to
humbly ask for your kindly patient to have a look, as it is really
helpful.
>
> --
> Jens Axboe
>
Damien Le Moal Feb. 9, 2024, 1:58 a.m. UTC | #7
On 2/9/24 09:28, Zhaoyang Huang wrote:
> On Fri, Feb 9, 2024 at 8:11 AM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 2/8/24 5:02 PM, Zhaoyang Huang wrote:
>>> On Fri, Feb 9, 2024 at 1:49?AM Jens Axboe <axboe@kernel.dk> wrote:
>>>>
>>>> On 2/8/24 2:31 AM, zhaoyang.huang wrote:
>>>>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>>>>> index f958e79277b8..43c08c3d6f18 100644
>>>>> --- a/block/mq-deadline.c
>>>>> +++ b/block/mq-deadline.c
>>>>> @@ -15,6 +15,7 @@
>>>>>  #include <linux/compiler.h>
>>>>>  #include <linux/rbtree.h>
>>>>>  #include <linux/sbitmap.h>
>>>>> +#include "../kernel/sched/sched.h"
>>>>>
>>>>>  #include <trace/events/block.h>
>>>>>
>>>>> @@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>>>       u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
>>>>>       struct dd_per_prio *per_prio;
>>>>>       enum dd_prio prio;
>>>>> +     int fifo_expire;
>>>>>
>>>>>       lockdep_assert_held(&dd->lock);
>>>>>
>>>>> @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>>>               /*
>>>>>                * set expire time and add to fifo list
>>>>>                */
>>>>> -             rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
>>>>> +             fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
>>>>> +                     CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
>>>>> +             rq->fifo_time = jiffies + fifo_expire;
>>>>>               insert_before = &per_prio->fifo_list[data_dir];
>>>>>  #ifdef CONFIG_BLK_DEV_ZONED
>>>>>               /*
>>>>
>>>> Hard pass on this blatant layering violation. Just like the priority
>>>> changes, this utterly fails to understand how things are properly
>>>> designed.
>>> IMHO, I don't think this is a layering violation. bio_set_ioprio is
>>> the one which introduces the scheduler thing into the block layer,
>>> this commit just wants to do a little improvement based on that. This
>>> commit helps CFS task save some IO time when preempted by RT heavily.
>>
>> Listen, both this and the previous content ioprio thing show a glaring
>> misunderstanding of how to design these kinds of things. You have no
>> grasp of what the different layers do, or how they interact. I'm not
>> sure how to put this kindly, but it's really an awful idea to hardcore
>> some CFS helper into the IO scheduler. The fact that you had to fiddle
>> around with headers to make it work was the first warning sign, and the
>> fact that you didn't stop at that point to consider how it could be
>> properly done makes it even worse.
>>
>> You need to stop sending kernel patches until you understand basic
>> software design. Neither of these patches are going anywhere until this
>> happens. There's been plenty of feedback to telling you that, but you
>> seem to just ignore it and plow on ahead. Stop.
> Ok, thanks for pointing this out, I will follow your advice. But I
> have to say that '[PATCHv9 1/1] block: introduce content activity
> based ioprio' really solved layering violation things. I would like to
> humbly ask for your kindly patient to have a look, as it is really
> helpful.

If properly designed, that patch would *not* be a block layer API/function and
so does not need review by block layer folks/Jens as it would simply set an IO
prio for a BIO issued by an FS. So that patch needs to be accepted by FS
people, for the FS you are interested in.
Zhaoyang Huang Feb. 9, 2024, 3:08 a.m. UTC | #8
On Fri, Feb 9, 2024 at 9:58 AM Damien Le Moal <dlemoal@kernel.org> wrote:
>
> On 2/9/24 09:28, Zhaoyang Huang wrote:
> > On Fri, Feb 9, 2024 at 8:11 AM Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> On 2/8/24 5:02 PM, Zhaoyang Huang wrote:
> >>> On Fri, Feb 9, 2024 at 1:49?AM Jens Axboe <axboe@kernel.dk> wrote:
> >>>>
> >>>> On 2/8/24 2:31 AM, zhaoyang.huang wrote:
> >>>>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> >>>>> index f958e79277b8..43c08c3d6f18 100644
> >>>>> --- a/block/mq-deadline.c
> >>>>> +++ b/block/mq-deadline.c
> >>>>> @@ -15,6 +15,7 @@
> >>>>>  #include <linux/compiler.h>
> >>>>>  #include <linux/rbtree.h>
> >>>>>  #include <linux/sbitmap.h>
> >>>>> +#include "../kernel/sched/sched.h"
> >>>>>
> >>>>>  #include <trace/events/block.h>
> >>>>>
> >>>>> @@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> >>>>>       u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
> >>>>>       struct dd_per_prio *per_prio;
> >>>>>       enum dd_prio prio;
> >>>>> +     int fifo_expire;
> >>>>>
> >>>>>       lockdep_assert_held(&dd->lock);
> >>>>>
> >>>>> @@ -840,7 +842,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> >>>>>               /*
> >>>>>                * set expire time and add to fifo list
> >>>>>                */
> >>>>> -             rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> >>>>> +             fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
> >>>>> +                     CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
> >>>>> +             rq->fifo_time = jiffies + fifo_expire;
> >>>>>               insert_before = &per_prio->fifo_list[data_dir];
> >>>>>  #ifdef CONFIG_BLK_DEV_ZONED
> >>>>>               /*
> >>>>
> >>>> Hard pass on this blatant layering violation. Just like the priority
> >>>> changes, this utterly fails to understand how things are properly
> >>>> designed.
> >>> IMHO, I don't think this is a layering violation. bio_set_ioprio is
> >>> the one which introduces the scheduler thing into the block layer,
> >>> this commit just wants to do a little improvement based on that. This
> >>> commit helps CFS task save some IO time when preempted by RT heavily.
> >>
> >> Listen, both this and the previous content ioprio thing show a glaring
> >> misunderstanding of how to design these kinds of things. You have no
> >> grasp of what the different layers do, or how they interact. I'm not
> >> sure how to put this kindly, but it's really an awful idea to hardcore
> >> some CFS helper into the IO scheduler. The fact that you had to fiddle
> >> around with headers to make it work was the first warning sign, and the
> >> fact that you didn't stop at that point to consider how it could be
> >> properly done makes it even worse.
> >>
> >> You need to stop sending kernel patches until you understand basic
> >> software design. Neither of these patches are going anywhere until this
> >> happens. There's been plenty of feedback to telling you that, but you
> >> seem to just ignore it and plow on ahead. Stop.
> > Ok, thanks for pointing this out, I will follow your advice. But I
> > have to say that '[PATCHv9 1/1] block: introduce content activity
> > based ioprio' really solved layering violation things. I would like to
> > humbly ask for your kindly patient to have a look, as it is really
> > helpful.
>
> If properly designed, that patch would *not* be a block layer API/function and
> so does not need review by block layer folks/Jens as it would simply set an IO
> prio for a BIO issued by an FS. So that patch needs to be accepted by FS
> people, for the FS you are interested in.
Thanks for the heads-up, sorry for my none-sense on the needs of
maintaining the whole framework. IMHO, the newly introduced API is a
little bit like bio_set_pages_dirty which is mainly related to bio and
the pages inside. Patchv9 has changed a lot to meet your kind advice.
I would be grateful to you if you could review it.
>
>
> --
> Damien Le Moal
> Western Digital Research
>
diff mbox series

Patch

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index f958e79277b8..43c08c3d6f18 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -15,6 +15,7 @@ 
 #include <linux/compiler.h>
 #include <linux/rbtree.h>
 #include <linux/sbitmap.h>
+#include "../kernel/sched/sched.h"
 
 #include <trace/events/block.h>
 
@@ -802,6 +803,7 @@  static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
 	struct dd_per_prio *per_prio;
 	enum dd_prio prio;
+	int fifo_expire;
 
 	lockdep_assert_held(&dd->lock);
 
@@ -840,7 +842,9 @@  static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 		/*
 		 * set expire time and add to fifo list
 		 */
-		rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
+		fifo_expire = task_is_realtime(current) ? dd->fifo_expire[data_dir] :
+			CFS_PROPORTION(current, dd->fifo_expire[data_dir]);
+		rq->fifo_time = jiffies + fifo_expire;
 		insert_before = &per_prio->fifo_list[data_dir];
 #ifdef CONFIG_BLK_DEV_ZONED
 		/*