From patchwork Mon Apr 14 13:27:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zizhi Wo X-Patchwork-Id: 14050448 Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA655129A78 for ; Mon, 14 Apr 2025 13:37:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744637846; cv=none; b=t0hCIlwTqHTUP/NFJJEuft09IzeGr4V0E93gY35IS2bEd2HQxahOPtD5Y3CpeR41KDRpdWze1y6gzx/eODZXfKEnGhey9goSiFLGMXz4g5H8Ad1XfAkLQDwNC/wlv450eEDBo/RjmyhdaGChWtYOeb9U7R954JjUhd4ziRRpKyM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744637846; c=relaxed/simple; bh=XKN1pJtUR6eFHYcKN4IiOh5n/NUy1AWx3wv8gj0T6xM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=i4e8WJ4ggQKwYSjXpz+6VvMv1LY7SBvjNJExVBDgTpbdPfskhhbgSC+FWCAZu8xeQBDHzh46X2FNoa+370TVyx3CyXpNnLpPAVvtREZ48wvMFvgdt8qqW5tF9g+TzHIA2Bgvf7kU7zvGr+LObYbBFTmNQSf5tXGkLN7IcleZvAw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.44]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4ZbpJS5XmJz27hSJ; Mon, 14 Apr 2025 21:38:00 +0800 (CST) Received: from kwepemf100017.china.huawei.com (unknown [7.202.181.16]) by mail.maildlp.com (Postfix) with ESMTPS id 4871A1400D4; Mon, 14 Apr 2025 21:37:19 +0800 (CST) Received: from localhost.localdomain (10.175.112.188) by kwepemf100017.china.huawei.com (7.202.181.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 14 Apr 2025 21:37:18 +0800 From: Zizhi Wo To: , CC: , , , , Subject: [PATCH 1/7] blk-throttle: Rename tg_may_dispatch() to tg_dispatch_time() Date: Mon, 14 Apr 2025 21:27:25 +0800 Message-ID: <20250414132731.167620-2-wozizhi@huawei.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20250414132731.167620-1-wozizhi@huawei.com> References: <20250414132731.167620-1-wozizhi@huawei.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemf100017.china.huawei.com (7.202.181.16) tg_may_dispatch() can directly indicate whether bio can be dispatched by returning the time to wait, without the need for the redundant "wait" parameter. Remove it and modify the function's return type accordingly. Since we have determined by the return time whether bio can be dispatched, rename tg_may_dispatch() to tg_dispatch_time(). Signed-off-by: Zizhi Wo Reviewed-by: Yu Kuai --- block/blk-throttle.c | 40 +++++++++++++++------------------------- 1 file changed, 15 insertions(+), 25 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 91dab43c65ab..dc23c961c028 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -743,14 +743,13 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, } /* - * Returns whether one can dispatch a bio or not. Also returns approx number - * of jiffies to wait before this bio is with-in IO rate and can be dispatched + * Returns approx number of jiffies to wait before this bio is with-in IO rate + * and can be dispatched. */ -static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, - unsigned long *wait) +static unsigned long tg_dispatch_time(struct throtl_grp *tg, struct bio *bio) { bool rw = bio_data_dir(bio); - unsigned long bps_wait = 0, iops_wait = 0, max_wait = 0; + unsigned long bps_wait, iops_wait, max_wait; u64 bps_limit = tg_bps_limit(tg, rw); u32 iops_limit = tg_iops_limit(tg, rw); @@ -765,11 +764,8 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, /* If tg->bps = -1, then BW is unlimited */ if ((bps_limit == U64_MAX && iops_limit == UINT_MAX) || - tg->flags & THROTL_TG_CANCELING) { - if (wait) - *wait = 0; - return true; - } + tg->flags & THROTL_TG_CANCELING) + return 0; /* * If previous slice expired, start a new one otherwise renew/extend @@ -789,21 +785,15 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, bps_wait = tg_within_bps_limit(tg, bio, bps_limit); iops_wait = tg_within_iops_limit(tg, bio, iops_limit); - if (bps_wait + iops_wait == 0) { - if (wait) - *wait = 0; - return true; - } + if (bps_wait + iops_wait == 0) + return 0; max_wait = max(bps_wait, iops_wait); - if (wait) - *wait = max_wait; - if (time_before(tg->slice_end[rw], jiffies + max_wait)) throtl_extend_slice(tg, rw, jiffies + max_wait); - return false; + return max_wait; } static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio) @@ -854,16 +844,16 @@ static void throtl_add_bio_tg(struct bio *bio, struct throtl_qnode *qn, static void tg_update_disptime(struct throtl_grp *tg) { struct throtl_service_queue *sq = &tg->service_queue; - unsigned long read_wait = -1, write_wait = -1, min_wait = -1, disptime; + unsigned long read_wait = -1, write_wait = -1, min_wait, disptime; struct bio *bio; bio = throtl_peek_queued(&sq->queued[READ]); if (bio) - tg_may_dispatch(tg, bio, &read_wait); + read_wait = tg_dispatch_time(tg, bio); bio = throtl_peek_queued(&sq->queued[WRITE]); if (bio) - tg_may_dispatch(tg, bio, &write_wait); + write_wait = tg_dispatch_time(tg, bio); min_wait = min(read_wait, write_wait); disptime = jiffies + min_wait; @@ -941,7 +931,7 @@ static int throtl_dispatch_tg(struct throtl_grp *tg) /* Try to dispatch 75% READS and 25% WRITES */ while ((bio = throtl_peek_queued(&sq->queued[READ])) && - tg_may_dispatch(tg, bio, NULL)) { + tg_dispatch_time(tg, bio) == 0) { tg_dispatch_one_bio(tg, READ); nr_reads++; @@ -951,7 +941,7 @@ static int throtl_dispatch_tg(struct throtl_grp *tg) } while ((bio = throtl_peek_queued(&sq->queued[WRITE])) && - tg_may_dispatch(tg, bio, NULL)) { + tg_dispatch_time(tg, bio) == 0) { tg_dispatch_one_bio(tg, WRITE); nr_writes++; @@ -1610,7 +1600,7 @@ static bool tg_within_limit(struct throtl_grp *tg, struct bio *bio, bool rw) if (tg->service_queue.nr_queued[rw]) return false; - return tg_may_dispatch(tg, bio, NULL); + return tg_dispatch_time(tg, bio) == 0; } bool __blk_throtl_bio(struct bio *bio) From patchwork Mon Apr 14 13:27:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zizhi Wo X-Patchwork-Id: 14050450 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F123D2750F5 for ; Mon, 14 Apr 2025 13:37:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.255 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744637850; cv=none; b=YaJu6oV9cTx3g2Zv2g9nomAL1JYHwcj7c8f0TzaV9Wh5fnvwIsu3oFMEYy7BVn631REH03b5F4dZbmihiRQdeUM9T7iTqjRVyHB3fzvFCGUu7dvp1guOAq7WwWNtuOy71VmqS4zt4tz0sqbLV6Mhc0aBkKqrZfJMWcsNUrJjuCM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744637850; c=relaxed/simple; bh=XRNg7x3jJ3Gdc8rnbvYE1DsPEgPTRgJqUnjgLpMwDXo=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=LIFAVKi4Rocc4XvmhHIMmbtbdAwNAAP9vO+Dkb+ee8vF6DYqTKvw17FLdLndtwSyWvxNaJq2KbmU8wW9Ll/CysO1sxjwdGgdCY+vllgLGhiPU3UTaEIXcaetto7d1M+CM0vQLeh0cD70Amy365Z1uOtOlQBTWs0HkVB7Feqf828= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.255 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.105]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4ZbpGm3ltvz1d09P; Mon, 14 Apr 2025 21:36:32 +0800 (CST) Received: from kwepemf100017.china.huawei.com (unknown [7.202.181.16]) by mail.maildlp.com (Postfix) with ESMTPS id E49F41402EA; Mon, 14 Apr 2025 21:37:19 +0800 (CST) Received: from localhost.localdomain (10.175.112.188) by kwepemf100017.china.huawei.com (7.202.181.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 14 Apr 2025 21:37:19 +0800 From: Zizhi Wo To: , CC: , , , , Subject: [PATCH 2/7] blk-throttle: Refactor tg_dispatch_time by extracting tg_dispatch_bps/iops_time Date: Mon, 14 Apr 2025 21:27:26 +0800 Message-ID: <20250414132731.167620-3-wozizhi@huawei.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20250414132731.167620-1-wozizhi@huawei.com> References: <20250414132731.167620-1-wozizhi@huawei.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemf100017.china.huawei.com (7.202.181.16) tg_dispatch_time() contained both bps and iops throttling logic. We now split its internal logic into tg_dispatch_bps/iops_time() to improve code consistency for future separation of the bps and iops queues. Besides, merge time_before() from caller into throtl_extend_slice() to make code cleaner. Signed-off-by: Zizhi Wo Reviewed-by: Yu Kuai --- block/blk-throttle.c | 98 +++++++++++++++++++++++++------------------- 1 file changed, 55 insertions(+), 43 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index dc23c961c028..0633ae0cce90 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -520,6 +520,9 @@ static inline void throtl_set_slice_end(struct throtl_grp *tg, bool rw, static inline void throtl_extend_slice(struct throtl_grp *tg, bool rw, unsigned long jiffy_end) { + if (!time_before(tg->slice_end[rw], jiffy_end)) + return; + throtl_set_slice_end(tg, rw, jiffy_end); throtl_log(&tg->service_queue, "[%c] extend slice start=%lu end=%lu jiffies=%lu", @@ -682,10 +685,6 @@ static unsigned long tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio int io_allowed; unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; - if (iops_limit == UINT_MAX) { - return 0; - } - jiffy_elapsed = jiffies - tg->slice_start[rw]; /* Round up to the next throttle slice, wait time must be nonzero */ @@ -711,11 +710,6 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; unsigned int bio_size = throtl_bio_data_size(bio); - /* no need to throttle if this bio's bytes have been accounted */ - if (bps_limit == U64_MAX || bio_flagged(bio, BIO_BPS_THROTTLED)) { - return 0; - } - jiffy_elapsed = jiffy_elapsed_rnd = jiffies - tg->slice_start[rw]; /* Slice has just started. Consider one slice interval */ @@ -742,6 +736,54 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, return jiffy_wait; } +/* + * If previous slice expired, start a new one otherwise renew/extend existing + * slice to make sure it is at least throtl_slice interval long since now. + * New slice is started only for empty throttle group. If there is queued bio, + * that means there should be an active slice and it should be extended instead. + */ +static void tg_update_slice(struct throtl_grp *tg, bool rw) +{ + if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw])) + throtl_start_new_slice(tg, rw, true); + else + throtl_extend_slice(tg, rw, jiffies + tg->td->throtl_slice); +} + +static unsigned long tg_dispatch_bps_time(struct throtl_grp *tg, struct bio *bio) +{ + bool rw = bio_data_dir(bio); + u64 bps_limit = tg_bps_limit(tg, rw); + unsigned long bps_wait; + + /* no need to throttle if this bio's bytes have been accounted */ + if (bps_limit == U64_MAX || tg->flags & THROTL_TG_CANCELING || + bio_flagged(bio, BIO_BPS_THROTTLED)) + return 0; + + tg_update_slice(tg, rw); + bps_wait = tg_within_bps_limit(tg, bio, bps_limit); + throtl_extend_slice(tg, rw, jiffies + bps_wait); + + return bps_wait; +} + +static unsigned long tg_dispatch_iops_time(struct throtl_grp *tg, struct bio *bio) +{ + bool rw = bio_data_dir(bio); + u32 iops_limit = tg_iops_limit(tg, rw); + unsigned long iops_wait; + + if (iops_limit == UINT_MAX || tg->flags & THROTL_TG_CANCELING) + return 0; + + tg_update_slice(tg, rw); + iops_wait = tg_within_iops_limit(tg, bio, iops_limit); + throtl_extend_slice(tg, rw, jiffies + iops_wait); + + return iops_wait; +} + /* * Returns approx number of jiffies to wait before this bio is with-in IO rate * and can be dispatched. @@ -749,9 +791,7 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, static unsigned long tg_dispatch_time(struct throtl_grp *tg, struct bio *bio) { bool rw = bio_data_dir(bio); - unsigned long bps_wait, iops_wait, max_wait; - u64 bps_limit = tg_bps_limit(tg, rw); - u32 iops_limit = tg_iops_limit(tg, rw); + unsigned long bps_wait, iops_wait; /* * Currently whole state machine of group depends on first bio @@ -762,38 +802,10 @@ static unsigned long tg_dispatch_time(struct throtl_grp *tg, struct bio *bio) BUG_ON(tg->service_queue.nr_queued[rw] && bio != throtl_peek_queued(&tg->service_queue.queued[rw])); - /* If tg->bps = -1, then BW is unlimited */ - if ((bps_limit == U64_MAX && iops_limit == UINT_MAX) || - tg->flags & THROTL_TG_CANCELING) - return 0; - - /* - * If previous slice expired, start a new one otherwise renew/extend - * existing slice to make sure it is at least throtl_slice interval - * long since now. New slice is started only for empty throttle group. - * If there is queued bio, that means there should be an active - * slice and it should be extended instead. - */ - if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw])) - throtl_start_new_slice(tg, rw, true); - else { - if (time_before(tg->slice_end[rw], - jiffies + tg->td->throtl_slice)) - throtl_extend_slice(tg, rw, - jiffies + tg->td->throtl_slice); - } - - bps_wait = tg_within_bps_limit(tg, bio, bps_limit); - iops_wait = tg_within_iops_limit(tg, bio, iops_limit); - if (bps_wait + iops_wait == 0) - return 0; - - max_wait = max(bps_wait, iops_wait); - - if (time_before(tg->slice_end[rw], jiffies + max_wait)) - throtl_extend_slice(tg, rw, jiffies + max_wait); + bps_wait = tg_dispatch_bps_time(tg, bio); + iops_wait = tg_dispatch_iops_time(tg, bio); - return max_wait; + return max(bps_wait, iops_wait); } static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio) From patchwork Mon Apr 14 13:27:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zizhi Wo X-Patchwork-Id: 14050446 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5171B2522A5 for ; Mon, 14 Apr 2025 13:37:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744637846; cv=none; b=SbkQ/MNGiUr9l+zeK9aNuZ7bXW6PvfyrUTxkA4FUpecXAkvX2Il8cXG3r/0MP/3Sr/z7nDUi+nU/VI7hOll7h5jDNgGJYFTlqWxflQHVCltGBSS2bVWdG2q1H/PcvkCI98M/PIBZMioTWp5O5HmKa2QNeJbx8srdSiRhQEP1KwM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744637846; c=relaxed/simple; bh=4Rs3RIQ/EVL64cZA25Q/Sqt8igFFe7wJWS6u3FWacQo=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PnCg8rHXHzptYt0v0D230iNr2Ku8q6i6rh3+WqmG1Or2jnEAZJCP+g4rWw4NnTy8xraWhAzbIQNNB20Gl2W0QylCBWOH/suW2Bdk1KfR6BgoywHz5gRVNtUM00J6lp44+eTdq6t178GgK3/ZfiWLpRKNYYQTdK2VXjITmcyUpSY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4ZbpCk61rXzHrP8; Mon, 14 Apr 2025 21:33:54 +0800 (CST) Received: from kwepemf100017.china.huawei.com (unknown [7.202.181.16]) by mail.maildlp.com (Postfix) with ESMTPS id 877F4180080; Mon, 14 Apr 2025 21:37:20 +0800 (CST) Received: from localhost.localdomain (10.175.112.188) by kwepemf100017.china.huawei.com (7.202.181.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 14 Apr 2025 21:37:19 +0800 From: Zizhi Wo To: , CC: , , , , Subject: [PATCH 3/7] blk-throttle: Split throtl_charge_bio() into bps and iops functions Date: Mon, 14 Apr 2025 21:27:27 +0800 Message-ID: <20250414132731.167620-4-wozizhi@huawei.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20250414132731.167620-1-wozizhi@huawei.com> References: <20250414132731.167620-1-wozizhi@huawei.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemf100017.china.huawei.com (7.202.181.16) Split throtl_charge_bio() to facilitate subsequent patches that will separately charge bps and iops after queue separation. Signed-off-by: Zizhi Wo Reviewed-by: Yu Kuai --- block/blk-throttle.c | 35 ++++++++++++++++++++--------------- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 0633ae0cce90..91ee1c502b41 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -736,6 +736,20 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, return jiffy_wait; } +static void throtl_charge_bps_bio(struct throtl_grp *tg, struct bio *bio) +{ + unsigned int bio_size = throtl_bio_data_size(bio); + + /* Charge the bio to the group */ + if (!bio_flagged(bio, BIO_BPS_THROTTLED)) + tg->bytes_disp[bio_data_dir(bio)] += bio_size; +} + +static void throtl_charge_iops_bio(struct throtl_grp *tg, struct bio *bio) +{ + tg->io_disp[bio_data_dir(bio)]++; +} + /* * If previous slice expired, start a new one otherwise renew/extend existing * slice to make sure it is at least throtl_slice interval long since now. @@ -808,18 +822,6 @@ static unsigned long tg_dispatch_time(struct throtl_grp *tg, struct bio *bio) return max(bps_wait, iops_wait); } -static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio) -{ - bool rw = bio_data_dir(bio); - unsigned int bio_size = throtl_bio_data_size(bio); - - /* Charge the bio to the group */ - if (!bio_flagged(bio, BIO_BPS_THROTTLED)) - tg->bytes_disp[rw] += bio_size; - - tg->io_disp[rw]++; -} - /** * throtl_add_bio_tg - add a bio to the specified throtl_grp * @bio: bio to add @@ -906,7 +908,8 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw) bio = throtl_pop_queued(&sq->queued[rw], &tg_to_put); sq->nr_queued[rw]--; - throtl_charge_bio(tg, bio); + throtl_charge_bps_bio(tg, bio); + throtl_charge_iops_bio(tg, bio); /* * If our parent is another tg, we just need to transfer @bio to @@ -1633,7 +1636,8 @@ bool __blk_throtl_bio(struct bio *bio) while (true) { if (tg_within_limit(tg, bio, rw)) { /* within limits, let's charge and dispatch directly */ - throtl_charge_bio(tg, bio); + throtl_charge_bps_bio(tg, bio); + throtl_charge_iops_bio(tg, bio); /* * We need to trim slice even when bios are not being @@ -1656,7 +1660,8 @@ bool __blk_throtl_bio(struct bio *bio) * control algorithm is adaptive, and extra IO bytes * will be throttled for paying the debt */ - throtl_charge_bio(tg, bio); + throtl_charge_bps_bio(tg, bio); + throtl_charge_iops_bio(tg, bio); } else { /* if above limits, break to queue */ break; From patchwork Mon Apr 14 13:27:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zizhi Wo X-Patchwork-Id: 14050445 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A66412522BA for ; Mon, 14 Apr 2025 13:37:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.190 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744637845; cv=none; b=qppSJp8v+Zzt2N09NZvUzySF37g2Woq6ZMgTp66bz+yJjrcnyAoCYcZDwcRIj2uUj88BCAGJ6yLk6E730quKVrqYAed2nJCy3rs3oNHVVysTt3JeQxbrfvnISerUIjsbFjNZjKV433TWGq3OW3UpCbWI0Js4w0nui9LqRrQSvGg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744637845; c=relaxed/simple; bh=zXXkWZJwPfLE6lx7sQqdHy6crJ+l9NlMTq2p9dlWyaE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QRTMr+KaeUkO0TtwpuZTNKF+9MwU0utd/tJuOL3X+90PeD34JkV2e59z6sbWy0r68+x/umd7nW+DM1O5Uib9TwRWVPQjUcfdVMoxMK/7S3DGQdzE6LWsnMh1egmf+r+B7OgFzNM6xHPBSqsxBix7jeuCwDIjEq9W3JHNK9PfkkI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.190 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.112]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4ZbpHb6wlZz2TS1K; Mon, 14 Apr 2025 21:37:15 +0800 (CST) Received: from kwepemf100017.china.huawei.com (unknown [7.202.181.16]) by mail.maildlp.com (Postfix) with ESMTPS id 33DFF14027A; Mon, 14 Apr 2025 21:37:21 +0800 (CST) Received: from localhost.localdomain (10.175.112.188) by kwepemf100017.china.huawei.com (7.202.181.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 14 Apr 2025 21:37:20 +0800 From: Zizhi Wo To: , CC: , , , , Subject: [PATCH 4/7] blk-throttle: Introduce flag "BIO_TG_BPS_THROTTLED" Date: Mon, 14 Apr 2025 21:27:28 +0800 Message-ID: <20250414132731.167620-5-wozizhi@huawei.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20250414132731.167620-1-wozizhi@huawei.com> References: <20250414132731.167620-1-wozizhi@huawei.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemf100017.china.huawei.com (7.202.181.16) Subsequent patches will split the single queue into separate bps and iops queues. To prevent IO that has already passed through the bps queue at a single tg level from being counted toward bps wait time again, we introduce "BIO_TG_BPS_THROTTLED" flag. Since throttle and QoS operate at different levels, we reuse the value as "BIO_QOS_THROTTLED". We set this flag when charge bps and clear it when charge iops, as the bio will move to the upper-level tg or be dispatched. This patch does not involve functional changes. Signed-off-by: Zizhi Wo Reviewed-by: Yu Kuai --- block/blk-throttle.c | 9 +++++++-- include/linux/blk_types.h | 5 +++++ 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 91ee1c502b41..caae2e3b7534 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -741,12 +741,16 @@ static void throtl_charge_bps_bio(struct throtl_grp *tg, struct bio *bio) unsigned int bio_size = throtl_bio_data_size(bio); /* Charge the bio to the group */ - if (!bio_flagged(bio, BIO_BPS_THROTTLED)) + if (!bio_flagged(bio, BIO_BPS_THROTTLED) && + !bio_flagged(bio, BIO_TG_BPS_THROTTLED)) { + bio_set_flag(bio, BIO_TG_BPS_THROTTLED); tg->bytes_disp[bio_data_dir(bio)] += bio_size; + } } static void throtl_charge_iops_bio(struct throtl_grp *tg, struct bio *bio) { + bio_clear_flag(bio, BIO_TG_BPS_THROTTLED); tg->io_disp[bio_data_dir(bio)]++; } @@ -772,7 +776,8 @@ static unsigned long tg_dispatch_bps_time(struct throtl_grp *tg, struct bio *bio /* no need to throttle if this bio's bytes have been accounted */ if (bps_limit == U64_MAX || tg->flags & THROTL_TG_CANCELING || - bio_flagged(bio, BIO_BPS_THROTTLED)) + bio_flagged(bio, BIO_BPS_THROTTLED) || + bio_flagged(bio, BIO_TG_BPS_THROTTLED)) return 0; tg_update_slice(tg, rw); diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index dce7615c35e7..f9d1230e27a7 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -296,6 +296,11 @@ enum { * of this bio. */ BIO_CGROUP_ACCT, /* has been accounted to a cgroup */ BIO_QOS_THROTTLED, /* bio went through rq_qos throttle path */ + /* + * This bio has undergone rate limiting at the single throtl_grp level bps + * queue. Since throttle and QoS are not at the same level, reused the value. + */ + BIO_TG_BPS_THROTTLED = BIO_QOS_THROTTLED, BIO_QOS_MERGED, /* but went through rq_qos merge path */ BIO_REMAPPED, BIO_ZONE_WRITE_PLUGGING, /* bio handled through zone write plugging */ From patchwork Mon Apr 14 13:27:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zizhi Wo X-Patchwork-Id: 14050447 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36ADE25392A for ; Mon, 14 Apr 2025 13:37:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744637846; cv=none; b=a8cuYABLobQZPfsX6coU9z89avKD3BxkJ6kbw+AAYk4xCYqLsvgoZ1zzmwivlbA60r1pPZYS9n97+F04RRVeRxG/R/3abkH1PvHhPnqqTPyws8fbZ2d3OZCTvlj0wDfaCnAhK6wzOZ/nNfxSEv6XFy/RLWYkbf0CfgjMIhHMIcc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744637846; c=relaxed/simple; bh=8BqsgyJ7gZkqz+bmoZDTV8piKYk3wMri8io8ImKw86w=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=XJB3Sze+2fe+iRS8iF7JGW+rGKxHvuXBbPisK/m6hNbHYErjr03RPed7lQRPJyrG3Q5lx6PJfa62BxQkl88b6pmH0tEAP11X1w7imIdDmNmNO0UJ1bG1yrC/P5JOlq3ndJFh1pDx8XuaV6Ut5LGM44vPcGuih3fo+sHT58XWGp0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.194]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4ZbpCm0zBNzHrPv; Mon, 14 Apr 2025 21:33:56 +0800 (CST) Received: from kwepemf100017.china.huawei.com (unknown [7.202.181.16]) by mail.maildlp.com (Postfix) with ESMTPS id CD95E140144; Mon, 14 Apr 2025 21:37:21 +0800 (CST) Received: from localhost.localdomain (10.175.112.188) by kwepemf100017.china.huawei.com (7.202.181.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 14 Apr 2025 21:37:21 +0800 From: Zizhi Wo To: , CC: , , , , Subject: [PATCH 5/7] blk-throttle: Split the blkthrotl queue Date: Mon, 14 Apr 2025 21:27:29 +0800 Message-ID: <20250414132731.167620-6-wozizhi@huawei.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20250414132731.167620-1-wozizhi@huawei.com> References: <20250414132731.167620-1-wozizhi@huawei.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemf100017.china.huawei.com (7.202.181.16) This patch splits the single queue into separate bps and iops queues. Now, an IO request must first pass through the bps queue, then the iops queue, and finally be dispatched. Due to the queue splitting, we need to modify the throtl add/peek/pop function. Additionally, the patch modifies the logic related to tg_dispatch_time(). If bio needs to wait for bps, function directly returns the bps wait time; otherwise, it charges bps and returns the iops wait time so that bio can be directly placed into the iops queue afterward. Note that this may lead to more frequent updates to disptime, but the overhead is negligible for the slow path. Signed-off-by: Zizhi Wo Reviewed-by: Yu Kuai --- block/blk-throttle.c | 49 ++++++++++++++++++++++++++++++-------------- block/blk-throttle.h | 3 ++- 2 files changed, 36 insertions(+), 16 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index caae2e3b7534..542db54f995c 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -143,7 +143,8 @@ static inline unsigned int throtl_bio_data_size(struct bio *bio) static void throtl_qnode_init(struct throtl_qnode *qn, struct throtl_grp *tg) { INIT_LIST_HEAD(&qn->node); - bio_list_init(&qn->bios); + bio_list_init(&qn->bios_bps); + bio_list_init(&qn->bios_iops); qn->tg = tg; } @@ -160,7 +161,11 @@ static void throtl_qnode_init(struct throtl_qnode *qn, struct throtl_grp *tg) static void throtl_qnode_add_bio(struct bio *bio, struct throtl_qnode *qn, struct list_head *queued) { - bio_list_add(&qn->bios, bio); + if (bio_flagged(bio, BIO_TG_BPS_THROTTLED)) + bio_list_add(&qn->bios_iops, bio); + else + bio_list_add(&qn->bios_bps, bio); + if (list_empty(&qn->node)) { list_add_tail(&qn->node, queued); blkg_get(tg_to_blkg(qn->tg)); @@ -170,6 +175,10 @@ static void throtl_qnode_add_bio(struct bio *bio, struct throtl_qnode *qn, /** * throtl_peek_queued - peek the first bio on a qnode list * @queued: the qnode list to peek + * + * Always take a bio from the head of the iops queue first. If the queue + * is empty, we then take it from the bps queue to maintain the overall + * idea of fetching bios from the head. */ static struct bio *throtl_peek_queued(struct list_head *queued) { @@ -180,7 +189,9 @@ static struct bio *throtl_peek_queued(struct list_head *queued) return NULL; qn = list_first_entry(queued, struct throtl_qnode, node); - bio = bio_list_peek(&qn->bios); + bio = bio_list_peek(&qn->bios_iops); + if (!bio) + bio = bio_list_peek(&qn->bios_bps); WARN_ON_ONCE(!bio); return bio; } @@ -190,9 +201,10 @@ static struct bio *throtl_peek_queued(struct list_head *queued) * @queued: the qnode list to pop a bio from * @tg_to_put: optional out argument for throtl_grp to put * - * Pop the first bio from the qnode list @queued. After popping, the first - * qnode is removed from @queued if empty or moved to the end of @queued so - * that the popping order is round-robin. + * Pop the first bio from the qnode list @queued. Note that we firstly focus + * on the iops list here because bios are ultimately dispatched from it. + * After popping, the first qnode is removed from @queued if empty or moved to + * the end of @queued so that the popping order is round-robin. * * When the first qnode is removed, its associated throtl_grp should be put * too. If @tg_to_put is NULL, this function automatically puts it; @@ -209,10 +221,12 @@ static struct bio *throtl_pop_queued(struct list_head *queued, return NULL; qn = list_first_entry(queued, struct throtl_qnode, node); - bio = bio_list_pop(&qn->bios); + bio = bio_list_pop(&qn->bios_iops); + if (!bio) + bio = bio_list_pop(&qn->bios_bps); WARN_ON_ONCE(!bio); - if (bio_list_empty(&qn->bios)) { + if (bio_list_empty(&qn->bios_bps) && bio_list_empty(&qn->bios_iops)) { list_del_init(&qn->node); if (tg_to_put) *tg_to_put = qn->tg; @@ -805,12 +819,12 @@ static unsigned long tg_dispatch_iops_time(struct throtl_grp *tg, struct bio *bi /* * Returns approx number of jiffies to wait before this bio is with-in IO rate - * and can be dispatched. + * and can be moved to other queue or dispatched. */ static unsigned long tg_dispatch_time(struct throtl_grp *tg, struct bio *bio) { bool rw = bio_data_dir(bio); - unsigned long bps_wait, iops_wait; + unsigned long wait; /* * Currently whole state machine of group depends on first bio @@ -821,10 +835,17 @@ static unsigned long tg_dispatch_time(struct throtl_grp *tg, struct bio *bio) BUG_ON(tg->service_queue.nr_queued[rw] && bio != throtl_peek_queued(&tg->service_queue.queued[rw])); - bps_wait = tg_dispatch_bps_time(tg, bio); - iops_wait = tg_dispatch_iops_time(tg, bio); + wait = tg_dispatch_bps_time(tg, bio); + if (wait != 0) + return wait; - return max(bps_wait, iops_wait); + /* + * Charge bps here because @bio will be directly placed into the + * iops queue afterward. + */ + throtl_charge_bps_bio(tg, bio); + + return tg_dispatch_iops_time(tg, bio); } /** @@ -913,7 +934,6 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw) bio = throtl_pop_queued(&sq->queued[rw], &tg_to_put); sq->nr_queued[rw]--; - throtl_charge_bps_bio(tg, bio); throtl_charge_iops_bio(tg, bio); /* @@ -1641,7 +1661,6 @@ bool __blk_throtl_bio(struct bio *bio) while (true) { if (tg_within_limit(tg, bio, rw)) { /* within limits, let's charge and dispatch directly */ - throtl_charge_bps_bio(tg, bio); throtl_charge_iops_bio(tg, bio); /* diff --git a/block/blk-throttle.h b/block/blk-throttle.h index 7964cc041e06..5257e5c053e6 100644 --- a/block/blk-throttle.h +++ b/block/blk-throttle.h @@ -28,7 +28,8 @@ */ struct throtl_qnode { struct list_head node; /* service_queue->queued[] */ - struct bio_list bios; /* queued bios */ + struct bio_list bios_bps; /* queued bios for bps limit */ + struct bio_list bios_iops; /* queued bios for iops limit */ struct throtl_grp *tg; /* tg this qnode belongs to */ }; From patchwork Mon Apr 14 13:27:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zizhi Wo X-Patchwork-Id: 14050449 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 10DE32750F6 for ; Mon, 14 Apr 2025 13:37:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744637847; cv=none; b=Db2JYKcbr4l7vjKGPYOlW+q5RUxQ6hF8pQTUkpthPHCmT/jva6OzgJ6S6bSX5j4P63aCzFhoojdraptVveUf/EwplEwxi/cJDLnPLxNMsIQf40GhtimIHKGBGgsfmNofvtX6fZKE6uk3euALYRXWtc1wNJ0Ji3sNKJ6SAzP98so= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744637847; c=relaxed/simple; bh=L5Isw4q5FdYeLVoEJejd15cnxDcV4QJQSMTH0Zz03Ms=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HXEROEEdpHkQXza2b+pl/dnJewygjQ1wKlEhnlhkXWGvIrspzBJHgteFvPq9uCApwfSR1eeRoR3Pu86RNr8aM1erWL5wexinfMEKE0SjK57w8NU/lxgZdqQNecydFy/LB4KRn2yNKoLiU5AO1SCp9XkW4ETHcRtgWb/0d6w74DM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.252]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4ZbpCm5PPczHrGg; Mon, 14 Apr 2025 21:33:56 +0800 (CST) Received: from kwepemf100017.china.huawei.com (unknown [7.202.181.16]) by mail.maildlp.com (Postfix) with ESMTPS id 747CB180B44; Mon, 14 Apr 2025 21:37:22 +0800 (CST) Received: from localhost.localdomain (10.175.112.188) by kwepemf100017.china.huawei.com (7.202.181.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 14 Apr 2025 21:37:21 +0800 From: Zizhi Wo To: , CC: , , , , Subject: [PATCH 6/7] blk-throttle: Split the service queue Date: Mon, 14 Apr 2025 21:27:30 +0800 Message-ID: <20250414132731.167620-7-wozizhi@huawei.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20250414132731.167620-1-wozizhi@huawei.com> References: <20250414132731.167620-1-wozizhi@huawei.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemf100017.china.huawei.com (7.202.181.16) This patch splits throtl_service_queue->nr_queued into "nr_queued_bps" and "nr_queued_iops", allowing separate accounting of BPS and IOPS queued bios. This prepares for future changes that need to check whether the BPS or IOPS queues are empty. To facilitate updating the number of IOs in the BPS and IOPS queues, the addition logic will be moved from throtl_add_bio_tg() to throtl_qnode_add_bio(), and similarly, the removal logic will be moved from tg_dispatch_one_bio() to throtl_pop_queued(). And introduce sq_queued() to calculate the total sum of sq->nr_queued. Signed-off-by: Zizhi Wo --- block/blk-throttle.c | 56 ++++++++++++++++++++++++++++++-------------- block/blk-throttle.h | 3 ++- 2 files changed, 40 insertions(+), 19 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 542db54f995c..faae10e2e6e3 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -161,10 +161,17 @@ static void throtl_qnode_init(struct throtl_qnode *qn, struct throtl_grp *tg) static void throtl_qnode_add_bio(struct bio *bio, struct throtl_qnode *qn, struct list_head *queued) { - if (bio_flagged(bio, BIO_TG_BPS_THROTTLED)) + bool rw = bio_data_dir(bio); + struct throtl_service_queue *sq = container_of(queued, + struct throtl_service_queue, queued[rw]); + + if (bio_flagged(bio, BIO_TG_BPS_THROTTLED)) { bio_list_add(&qn->bios_iops, bio); - else + sq->nr_queued_iops[rw]++; + } else { bio_list_add(&qn->bios_bps, bio); + sq->nr_queued_bps[rw]++; + } if (list_empty(&qn->node)) { list_add_tail(&qn->node, queued); @@ -200,6 +207,7 @@ static struct bio *throtl_peek_queued(struct list_head *queued) * throtl_pop_queued - pop the first bio form a qnode list * @queued: the qnode list to pop a bio from * @tg_to_put: optional out argument for throtl_grp to put + * @rw: read/write * * Pop the first bio from the qnode list @queued. Note that we firstly focus * on the iops list here because bios are ultimately dispatched from it. @@ -212,8 +220,10 @@ static struct bio *throtl_peek_queued(struct list_head *queued) * responsible for putting it. */ static struct bio *throtl_pop_queued(struct list_head *queued, - struct throtl_grp **tg_to_put) + struct throtl_grp **tg_to_put, bool rw) { + struct throtl_service_queue *sq = container_of(queued, + struct throtl_service_queue, queued[rw]); struct throtl_qnode *qn; struct bio *bio; @@ -222,8 +232,12 @@ static struct bio *throtl_pop_queued(struct list_head *queued, qn = list_first_entry(queued, struct throtl_qnode, node); bio = bio_list_pop(&qn->bios_iops); - if (!bio) + if (!bio) { bio = bio_list_pop(&qn->bios_bps); + sq->nr_queued_bps[rw]--; + } else { + sq->nr_queued_iops[rw]--; + } WARN_ON_ONCE(!bio); if (bio_list_empty(&qn->bios_bps) && bio_list_empty(&qn->bios_iops)) { @@ -553,6 +567,11 @@ static bool throtl_slice_used(struct throtl_grp *tg, bool rw) return true; } +static unsigned int sq_queued(struct throtl_service_queue *sq, int type) +{ + return sq->nr_queued_bps[type] + sq->nr_queued_iops[type]; +} + static unsigned int calculate_io_allowed(u32 iops_limit, unsigned long jiffy_elapsed) { @@ -682,9 +701,9 @@ static void tg_update_carryover(struct throtl_grp *tg) long long bytes[2] = {0}; int ios[2] = {0}; - if (tg->service_queue.nr_queued[READ]) + if (sq_queued(&tg->service_queue, READ)) __tg_update_carryover(tg, READ, &bytes[READ], &ios[READ]); - if (tg->service_queue.nr_queued[WRITE]) + if (sq_queued(&tg->service_queue, WRITE)) __tg_update_carryover(tg, WRITE, &bytes[WRITE], &ios[WRITE]); /* see comments in struct throtl_grp for meaning of these fields. */ @@ -776,7 +795,8 @@ static void throtl_charge_iops_bio(struct throtl_grp *tg, struct bio *bio) */ static void tg_update_slice(struct throtl_grp *tg, bool rw) { - if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw])) + if (throtl_slice_used(tg, rw) && + sq_queued(&tg->service_queue, rw) == 0) throtl_start_new_slice(tg, rw, true); else throtl_extend_slice(tg, rw, jiffies + tg->td->throtl_slice); @@ -832,7 +852,7 @@ static unsigned long tg_dispatch_time(struct throtl_grp *tg, struct bio *bio) * this function with a different bio if there are other bios * queued. */ - BUG_ON(tg->service_queue.nr_queued[rw] && + BUG_ON(sq_queued(&tg->service_queue, rw) && bio != throtl_peek_queued(&tg->service_queue.queued[rw])); wait = tg_dispatch_bps_time(tg, bio); @@ -872,12 +892,11 @@ static void throtl_add_bio_tg(struct bio *bio, struct throtl_qnode *qn, * dispatched. Mark that @tg was empty. This is automatically * cleared on the next tg_update_disptime(). */ - if (!sq->nr_queued[rw]) + if (sq_queued(sq, rw) == 0) tg->flags |= THROTL_TG_WAS_EMPTY; throtl_qnode_add_bio(bio, qn, &sq->queued[rw]); - sq->nr_queued[rw]++; throtl_enqueue_tg(tg); } @@ -931,8 +950,7 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw) * getting released prematurely. Remember the tg to put and put it * after @bio is transferred to @parent_sq. */ - bio = throtl_pop_queued(&sq->queued[rw], &tg_to_put); - sq->nr_queued[rw]--; + bio = throtl_pop_queued(&sq->queued[rw], &tg_to_put, rw); throtl_charge_iops_bio(tg, bio); @@ -1014,7 +1032,7 @@ static int throtl_select_dispatch(struct throtl_service_queue *parent_sq) nr_disp += throtl_dispatch_tg(tg); sq = &tg->service_queue; - if (sq->nr_queued[READ] || sq->nr_queued[WRITE]) + if (sq_queued(sq, READ) || sq_queued(sq, WRITE)) tg_update_disptime(tg); else throtl_dequeue_tg(tg); @@ -1067,9 +1085,11 @@ static void throtl_pending_timer_fn(struct timer_list *t) dispatched = false; while (true) { + unsigned int bio_cnt_r = sq_queued(sq, READ); + unsigned int bio_cnt_w = sq_queued(sq, WRITE); + throtl_log(sq, "dispatch nr_queued=%u read=%u write=%u", - sq->nr_queued[READ] + sq->nr_queued[WRITE], - sq->nr_queued[READ], sq->nr_queued[WRITE]); + bio_cnt_r + bio_cnt_w, bio_cnt_r, bio_cnt_w); ret = throtl_select_dispatch(sq); if (ret) { @@ -1131,7 +1151,7 @@ static void blk_throtl_dispatch_work_fn(struct work_struct *work) spin_lock_irq(&q->queue_lock); for (rw = READ; rw <= WRITE; rw++) - while ((bio = throtl_pop_queued(&td_sq->queued[rw], NULL))) + while ((bio = throtl_pop_queued(&td_sq->queued[rw], NULL, rw))) bio_list_add(&bio_list_on_stack, bio); spin_unlock_irq(&q->queue_lock); @@ -1637,7 +1657,7 @@ void blk_throtl_cancel_bios(struct gendisk *disk) static bool tg_within_limit(struct throtl_grp *tg, struct bio *bio, bool rw) { /* throtl is FIFO - if bios are already queued, should queue */ - if (tg->service_queue.nr_queued[rw]) + if (sq_queued(&tg->service_queue, rw)) return false; return tg_dispatch_time(tg, bio) == 0; @@ -1711,7 +1731,7 @@ bool __blk_throtl_bio(struct bio *bio) tg->bytes_disp[rw], bio->bi_iter.bi_size, tg_bps_limit(tg, rw), tg->io_disp[rw], tg_iops_limit(tg, rw), - sq->nr_queued[READ], sq->nr_queued[WRITE]); + sq_queued(sq, READ), sq_queued(sq, WRITE)); td->nr_queued[rw]++; throtl_add_bio_tg(bio, qn, tg); diff --git a/block/blk-throttle.h b/block/blk-throttle.h index 5257e5c053e6..04e92cfd0ab1 100644 --- a/block/blk-throttle.h +++ b/block/blk-throttle.h @@ -41,7 +41,8 @@ struct throtl_service_queue { * children throtl_grp's. */ struct list_head queued[2]; /* throtl_qnode [READ/WRITE] */ - unsigned int nr_queued[2]; /* number of queued bios */ + unsigned int nr_queued_bps[2]; /* number of queued bps bios */ + unsigned int nr_queued_iops[2]; /* number of queued iops bios */ /* * RB tree of active children throtl_grp's, which are sorted by From patchwork Mon Apr 14 13:27:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zizhi Wo X-Patchwork-Id: 14050451 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE9A32750F5 for ; Mon, 14 Apr 2025 13:37:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.187 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744637864; cv=none; b=ECRhMH1kR/STwYgYH6Dh5Ka67rJ6QDLxKp6dZ2152YsouP8BE5lDoxIwNSTRm9b/RAOO5WDxRNUBBUt50dSw43EYL7n8rxFFtR1fKaBShsrgIx0ygAwLEZy2ugV0sI5oCDVeqMqo3b5pK7RTzLCkgCLxnPT3Gf4XAIZkNsdhpRo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744637864; c=relaxed/simple; bh=B33Nj6fjon1eZr5vx5WpSgKUK3bUej6tmHY9XcrnjdI=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=mJaLi83eDiwFKdRN6R2kvw7ClK3G1kakCj5EUyq3DnF7IIbiEUR1EljMY+6+5qGfWfb/8wBINBborHJZQKTzwaHLuZ7B2zctChzJtXMxM9+m5D0xI1lxrsh2nekttMt/wPiECYjl1NfLOhMRl6+9lbxxet7MYCDIhNTpvXT7bCk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.105]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4ZbpC166FZzvWtM; Mon, 14 Apr 2025 21:33:17 +0800 (CST) Received: from kwepemf100017.china.huawei.com (unknown [7.202.181.16]) by mail.maildlp.com (Postfix) with ESMTPS id 1C6011402EA; Mon, 14 Apr 2025 21:37:23 +0800 (CST) Received: from localhost.localdomain (10.175.112.188) by kwepemf100017.china.huawei.com (7.202.181.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 14 Apr 2025 21:37:22 +0800 From: Zizhi Wo To: , CC: , , , , Subject: [PATCH 7/7] blk-throttle: Prevents the bps restricted io from entering the bps queue again Date: Mon, 14 Apr 2025 21:27:31 +0800 Message-ID: <20250414132731.167620-8-wozizhi@huawei.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20250414132731.167620-1-wozizhi@huawei.com> References: <20250414132731.167620-1-wozizhi@huawei.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemf100017.china.huawei.com (7.202.181.16) [BUG] There has an issue of io delayed dispatch caused by io splitting. Consider the following scenario: 1) If we set a BPS limit of 1MB/s and restrict the maximum IO size per dispatch to 4KB, submitting -two- 1MB IO requests results in completion times of 1s and 2s, which is expected. 2) However, if we additionally set an IOPS limit of 1,000,000/s with the same BPS limit of 1MB/s, submitting -two- 1MB IO requests again results in both completing in 2s, even though the IOPS constraint is being met. [CAUSE] This issue arises because BPS and IOPS currently share the same queue in the blkthrotl mechanism: 1) This issue does not occur when only BPS is limited because the split IOs return false in blk_should_throtl() and do not go through to throtl again. 2) For split IOs, even if they have been tagged with BIO_BPS_THROTTLED, they still get queued alternately in the same list due to continuous splitting and reordering. As a result, the two IO requests are both completed at the 2-second mark, causing an unintended delay. 3) It is not difficult to imagine that in this scenario, if N 1MB IOs are issued at once, all IOs will eventually complete together in N seconds. [FIX] With the queue separation introduced in the previous patch, we now have separate BPS and IOPS queues. For IOs that have already passed the BPS limitation, they do not need to re-enter the BPS queue and can directly placed to the IOPS queue. Since we have split the queues, when the IOPS queue is previously empty and a new bio is added to the first qnode in the service_queue, we also need to update the disptime. This patch introduces "THROTL_TG_IOPS_WAS__EMPTY" flag to mark it. Signed-off-by: Zizhi Wo Reviewed-by: Yu Kuai --- block/blk-throttle.c | 53 +++++++++++++++++++++++++++++++++++++------- block/blk-throttle.h | 8 ++++--- 2 files changed, 50 insertions(+), 11 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index faae10e2e6e3..b82ce8e927d3 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -165,7 +165,12 @@ static void throtl_qnode_add_bio(struct bio *bio, struct throtl_qnode *qn, struct throtl_service_queue *sq = container_of(queued, struct throtl_service_queue, queued[rw]); - if (bio_flagged(bio, BIO_TG_BPS_THROTTLED)) { + /* + * Split bios have already been throttled by bps, so they are + * directly queued into the iops path. + */ + if (bio_flagged(bio, BIO_TG_BPS_THROTTLED) || + bio_flagged(bio, BIO_BPS_THROTTLED)) { bio_list_add(&qn->bios_iops, bio); sq->nr_queued_iops[rw]++; } else { @@ -897,6 +902,15 @@ static void throtl_add_bio_tg(struct bio *bio, struct throtl_qnode *qn, throtl_qnode_add_bio(bio, qn, &sq->queued[rw]); + /* + * Since we have split the queues, when the iops queue is + * previously empty and a new @bio is added into the first @qn, + * we also need to update the @tg->disptime. + */ + if (bio_flagged(bio, BIO_BPS_THROTTLED) && + bio == throtl_peek_queued(&sq->queued[rw])) + tg->flags |= THROTL_TG_IOPS_WAS_EMPTY; + throtl_enqueue_tg(tg); } @@ -924,6 +938,7 @@ static void tg_update_disptime(struct throtl_grp *tg) /* see throtl_add_bio_tg() */ tg->flags &= ~THROTL_TG_WAS_EMPTY; + tg->flags &= ~THROTL_TG_IOPS_WAS_EMPTY; } static void start_parent_slice_with_credit(struct throtl_grp *child_tg, @@ -1111,7 +1126,8 @@ static void throtl_pending_timer_fn(struct timer_list *t) if (parent_sq) { /* @parent_sq is another throl_grp, propagate dispatch */ - if (tg->flags & THROTL_TG_WAS_EMPTY) { + if (tg->flags & THROTL_TG_WAS_EMPTY || + tg->flags & THROTL_TG_IOPS_WAS_EMPTY) { tg_update_disptime(tg); if (!throtl_schedule_next_dispatch(parent_sq, false)) { /* window is already open, repeat dispatching */ @@ -1656,9 +1672,28 @@ void blk_throtl_cancel_bios(struct gendisk *disk) static bool tg_within_limit(struct throtl_grp *tg, struct bio *bio, bool rw) { - /* throtl is FIFO - if bios are already queued, should queue */ - if (sq_queued(&tg->service_queue, rw)) + struct throtl_service_queue *sq = &tg->service_queue; + + /* + * For a split bio, we need to specifically distinguish whether the + * iops queue is empty. + */ + if (bio_flagged(bio, BIO_BPS_THROTTLED)) + return sq->nr_queued_iops[rw] == 0 && + tg_dispatch_iops_time(tg, bio) == 0; + + /* + * Throtl is FIFO - if bios are already queued, should queue. + * If the bps queue is empty and @bio is within the bps limit, charge + * bps here for direct placement into the iops queue. + */ + if (sq_queued(&tg->service_queue, rw)) { + if (sq->nr_queued_bps[rw] == 0 && + tg_dispatch_bps_time(tg, bio) == 0) + throtl_charge_bps_bio(tg, bio); + return false; + } return tg_dispatch_time(tg, bio) == 0; } @@ -1739,11 +1774,13 @@ bool __blk_throtl_bio(struct bio *bio) /* * Update @tg's dispatch time and force schedule dispatch if @tg - * was empty before @bio. The forced scheduling isn't likely to - * cause undue delay as @bio is likely to be dispatched directly if - * its @tg's disptime is not in the future. + * was empty before @bio, or the iops queue is empty and @bio will + * add to. The forced scheduling isn't likely to cause undue + * delay as @bio is likely to be dispatched directly if its @tg's + * disptime is not in the future. */ - if (tg->flags & THROTL_TG_WAS_EMPTY) { + if (tg->flags & THROTL_TG_WAS_EMPTY || + tg->flags & THROTL_TG_IOPS_WAS_EMPTY) { tg_update_disptime(tg); throtl_schedule_next_dispatch(tg->service_queue.parent_sq, true); } diff --git a/block/blk-throttle.h b/block/blk-throttle.h index 04e92cfd0ab1..6f11aaabe7e7 100644 --- a/block/blk-throttle.h +++ b/block/blk-throttle.h @@ -55,9 +55,11 @@ struct throtl_service_queue { }; enum tg_state_flags { - THROTL_TG_PENDING = 1 << 0, /* on parent's pending tree */ - THROTL_TG_WAS_EMPTY = 1 << 1, /* bio_lists[] became non-empty */ - THROTL_TG_CANCELING = 1 << 2, /* starts to cancel bio */ + THROTL_TG_PENDING = 1 << 0, /* on parent's pending tree */ + THROTL_TG_WAS_EMPTY = 1 << 1, /* bio_lists[] became non-empty */ + /* iops queue is empty, and a bio is about to be enqueued to the first qnode. */ + THROTL_TG_IOPS_WAS_EMPTY = 1 << 2, + THROTL_TG_CANCELING = 1 << 3, /* starts to cancel bio */ }; struct throtl_grp {