[PATCHv2,2/2] block: adjust CFS request expire time

Message ID	20240220114536.513494-1-zhaoyang.huang@unisoc.com (mailing list archive)
State	New, archived
Headers	show Received: from SHSQR01.spreadtrum.com (mx1.unisoc.com [222.66.158.135]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D0AF67C56 for <linux-block@vger.kernel.org>; Tue, 20 Feb 2024 11:45:58 +0000 (UTC) From: "zhaoyang.huang" <zhaoyang.huang@unisoc.com> To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Jens Axboe <axboe@kernel.dk>, <linux-block@vger.kernel.org>, <linux-kernel@vger.kernel.org>, Zhaoyang Huang <huangzhaoyang@gmail.com>, <steve.kang@unisoc.com> Subject: [PATCHv2 2/2] block: adjust CFS request expire time Date: Tue, 20 Feb 2024 19:45:36 +0800 Message-ID: <20240220114536.513494-1-zhaoyang.huang@unisoc.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain
Series	None \| expand [PATCHv2,2/2] block: adjust CFS request expire time

Message ID

20240220114536.513494-1-zhaoyang.huang@unisoc.com (mailing list archive)

State

New, archived

Headers

From: "zhaoyang.huang" <zhaoyang.huang@unisoc.com>
To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot
	<vincent.guittot@linaro.org>,
        Jens Axboe <axboe@kernel.dk>, <linux-block@vger.kernel.org>,
        <linux-kernel@vger.kernel.org>,
        Zhaoyang Huang
	<huangzhaoyang@gmail.com>, <steve.kang@unisoc.com>
Subject: [PATCHv2 2/2] block: adjust CFS request expire time
Date: Tue, 20 Feb 2024 19:45:36 +0800
Message-ID: <20240220114536.513494-1-zhaoyang.huang@unisoc.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain

Series

None | expand

Commit Message

zhaoyang.huang Feb. 20, 2024, 11:45 a.m. UTC

From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

According to current policy, CFS's may suffer involuntary IO-latency by
being preempted by RT/DL tasks or IRQ since they possess the privilege for
both of CPU and IO scheduler. This commit introduce an approximate and
light method to decrease these affection by adjusting the expire time
via the CFS's proportion among the whole cpu active time.
The average utilization of cpu's run queue could reflect the historical
active proportion of different types of task that can be proved valid for
this goal from belowing three perspective,

1. All types of sched class's load(util) are tracked and calculated in the
same way(using a geometric series which known as PELT)
2. Keep the legacy policy by NOT adjusting rq's position in fifo_list
but only make changes over expire_time.
3. The fixed expire time(hundreds of ms) is in the same range of cpu
avg_load's account series(the utilization will be decayed to 0.5 in 32ms)

TaskA
sched in
|
|
|
submit_bio
|
|
|
fifo_time = jiffies + expire
(insert_request)

TaskB
sched in
|
|
vfs_xxx
|
|preempted by RT,DL,IRQ
|\
| This period time is unfair to TaskB's IO request, should be adjust
|/
|
submit_bio
|
|
|
fifo_time = jiffies + expire * CFS_PROPORTION(rq)
(insert_request)

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
change of v2: introduce direction and threshold to make the hack working
as a guard for CFS's over-preempted.
---
---
 block/mq-deadline.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

Comments

Zhaoyang Huang Feb. 20, 2024, 11:56 a.m. UTC | #1

Patchv2 make the adjustment work as a CFS's over-preempted guard which
only take effect for READ

On Tue, Feb 20, 2024 at 7:46 PM zhaoyang.huang
<zhaoyang.huang@unisoc.com> wrote:
>
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>
> According to current policy, CFS's may suffer involuntary IO-latency by
> being preempted by RT/DL tasks or IRQ since they possess the privilege for
> both of CPU and IO scheduler. This commit introduce an approximate and
> light method to decrease these affection by adjusting the expire time
> via the CFS's proportion among the whole cpu active time.
> The average utilization of cpu's run queue could reflect the historical
> active proportion of different types of task that can be proved valid for
> this goal from belowing three perspective,
>
> 1. All types of sched class's load(util) are tracked and calculated in the
> same way(using a geometric series which known as PELT)
> 2. Keep the legacy policy by NOT adjusting rq's position in fifo_list
> but only make changes over expire_time.
> 3. The fixed expire time(hundreds of ms) is in the same range of cpu
> avg_load's account series(the utilization will be decayed to 0.5 in 32ms)
>
> TaskA
> sched in
> |
> |
> |
> submit_bio
> |
> |
> |
> fifo_time = jiffies + expire
> (insert_request)
>
> TaskB
> sched in
> |
> |
> vfs_xxx
> |
> |preempted by RT,DL,IRQ
> |\
> | This period time is unfair to TaskB's IO request, should be adjust
> |/
> |
> submit_bio
> |
> |
> |
> fifo_time = jiffies + expire * CFS_PROPORTION(rq)
> (insert_request)
>
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
> change of v2: introduce direction and threshold to make the hack working
> as a guard for CFS's over-preempted.
> ---
> ---
>  block/mq-deadline.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index f958e79277b8..b5aa544d69a3 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -54,6 +54,7 @@ enum dd_prio {
>
>  enum { DD_PRIO_COUNT = 3 };
>
> +#define CFS_PROP_THRESHOLD 60
>  /*
>   * I/O statistics per I/O priority. It is fine if these counters overflow.
>   * What matters is that these counters are at least as wide as
> @@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>         u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
>         struct dd_per_prio *per_prio;
>         enum dd_prio prio;
> +       int fifo_expire;
>
>         lockdep_assert_held(&dd->lock);
>
> @@ -839,8 +841,20 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>
>                 /*
>                  * set expire time and add to fifo list
> +                * The expire time is adjusted when current CFS task is
> +                * over-preempted by RT/DL/IRQ which is calculated by the
> +                * proportion of CFS's activation among whole cpu time during
> +                * last several dozen's ms.Whearas, this would NOT affect the
> +                * rq's position in fifo_list but only take effect when this
> +                * rq is checked for its expire time when at head.
>                  */
> -               rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
> +               fifo_expire = dd->fifo_expire[data_dir];
> +               if (data_dir == DD_READ &&
> +                       (cfs_prop_by_util(current, 100) < CFS_PROP_THRESHOLD))
> +                       fifo_expire = cfs_prop_by_util(current, dd->fifo_expire[data_dir]);
> +
> +               rq->fifo_time = jiffies + fifo_expire;
> +
>                 insert_before = &per_prio->fifo_list[data_dir];
>  #ifdef CONFIG_BLK_DEV_ZONED
>                 /*
> --
> 2.25.1
>

Bart Van Assche Feb. 20, 2024, 5:12 p.m. UTC | #2

Where is patch 1/2 of this series? I don't see it in my mailbox.

On 2/20/24 03:45, zhaoyang.huang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> According to current policy, CFS's may suffer involuntary IO-latency by

Did you perhaps mean "cause" instead of "suffer"?

> being preempted by RT/DL tasks or IRQ.

For which workloads? Sequential I/O or random I/O? If it is for random I/O,
please take a look at patch "scsi: ufs: core: Add CPU latency QoS support
for UFS driver"
(https://lore.kernel.org/all/20231219123706.6463-1-quic_mnaresh@quicinc.com/)
and let us know whether or not the Power Management Quality of Service (PM QoS)
framework is perhaps a better solution.

Thanks,

Bart.

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index f958e79277b8..b5aa544d69a3 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -54,6 +54,7 @@  enum dd_prio {
 
 enum { DD_PRIO_COUNT = 3 };
 
+#define CFS_PROP_THRESHOLD 60
 /*
  * I/O statistics per I/O priority. It is fine if these counters overflow.
  * What matters is that these counters are at least as wide as
@@ -802,6 +803,7 @@  static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
 	struct dd_per_prio *per_prio;
 	enum dd_prio prio;
+	int fifo_expire;
 
 	lockdep_assert_held(&dd->lock);
 
@@ -839,8 +841,20 @@  static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 
 		/*
 		 * set expire time and add to fifo list
+		 * The expire time is adjusted when current CFS task is
+		 * over-preempted by RT/DL/IRQ which is calculated by the
+		 * proportion of CFS's activation among whole cpu time during
+		 * last several dozen's ms.Whearas, this would NOT affect the
+		 * rq's position in fifo_list but only take effect when this
+		 * rq is checked for its expire time when at head.
 		 */
-		rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
+		fifo_expire = dd->fifo_expire[data_dir];
+		if (data_dir == DD_READ &&
+			(cfs_prop_by_util(current, 100) < CFS_PROP_THRESHOLD))
+			fifo_expire = cfs_prop_by_util(current, dd->fifo_expire[data_dir]);
+
+		rq->fifo_time = jiffies + fifo_expire;
+
 		insert_before = &per_prio->fifo_list[data_dir];
 #ifdef CONFIG_BLK_DEV_ZONED
 		/*