Message ID | 20220802140415.2960284-2-yukuai1@huaweicloud.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | bugfix and cleanup for blk-throttle | expand |
On Tue, Aug 02, 2022 at 10:04:07PM +0800, Yu Kuai wrote: ... > commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") > support to count splited bios for iops limit, thus it adds flaged bio ^ flagged > checking in tg_with_in_bps_limit() so that splited bios will only count ^ split > once for bps limit, however, it introduce a new problem that io throttle > won't work if multiple bios are throttled. > > In order to fix the problem, at first, don't skip flaged bio in > tg_with_in_bps_limit(), however, this will break that splited bios should > only count once for bps limit. And this patch tries to avoid > over-accounting by decrementing it first in __blk_throtl_bio(), and > then counting it again while dispatching it. > > Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > Reviewed-by: Ming Lei <ming.lei@redhat.com> Please cc stable w/ version tag. > --- > block/blk-throttle.c | 26 ++++++++++++++++++++------ > 1 file changed, 20 insertions(+), 6 deletions(-) > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > index 9f5fe62afff9..2957e2c643f4 100644 > --- a/block/blk-throttle.c > +++ b/block/blk-throttle.c > @@ -811,7 +811,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, > unsigned int bio_size = throtl_bio_data_size(bio); > > /* no need to throttle if this bio's bytes have been accounted */ > - if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) { > + if (bps_limit == U64_MAX) { > if (wait) > *wait = 0; > return true; > @@ -921,11 +921,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio) > unsigned int bio_size = throtl_bio_data_size(bio); > > /* Charge the bio to the group */ > - if (!bio_flagged(bio, BIO_THROTTLED)) { > - tg->bytes_disp[rw] += bio_size; > - tg->last_bytes_disp[rw] += bio_size; > - } > - > + tg->bytes_disp[rw] += bio_size; > + tg->last_bytes_disp[rw] += bio_size; > tg->io_disp[rw]++; > tg->last_io_disp[rw]++; > > @@ -2121,6 +2118,23 @@ bool __blk_throtl_bio(struct bio *bio) > tg->last_low_overflow_time[rw] = jiffies; > throtl_downgrade_check(tg); > throtl_upgrade_check(tg); > + > + /* > + * Splited bios can be re-entered because iops limit should be ^ ^^^^^^^^^^^^^ Split re-enter > + * counted again, however, bps limit should not. Since bps limit > + * will be counted again while dispatching it, compensate the > + * over-accounting here. Noted that compensation can fail if > + * new slice is started. I can't really follow the comment. Please improve the explanation. > + */ > + if (bio_flagged(bio, BIO_THROTTLED)) { > + unsigned int bio_size = throtl_bio_data_size(bio); > + > + if (tg->bytes_disp[rw] >= bio_size) > + tg->bytes_disp[rw] -= bio_size; > + if (tg->last_bytes_disp[rw] >= bio_size) > + tg->last_bytes_disp[rw] -= bio_size; > + } So, as a fix for the immediate problem, I guess this might do but this feels really fragile. How can we be certain that re-entering only happens because of splitting? What if future core development changes that? It seems to be solving the problem in the wrong place. Shouldn't we flag the bio indicating that it's split when we're splitting the bio so that we only limit them for iops in the first place? Thanks.
Hi, Tejun 在 2022/08/17 3:37, Tejun Heo 写道: > On Tue, Aug 02, 2022 at 10:04:07PM +0800, Yu Kuai wrote: > ... >> commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") >> support to count splited bios for iops limit, thus it adds flaged bio > ^ > flagged > >> checking in tg_with_in_bps_limit() so that splited bios will only count > ^ > split > >> once for bps limit, however, it introduce a new problem that io throttle >> won't work if multiple bios are throttled. >> >> In order to fix the problem, at first, don't skip flaged bio in >> tg_with_in_bps_limit(), however, this will break that splited bios should >> only count once for bps limit. And this patch tries to avoid >> over-accounting by decrementing it first in __blk_throtl_bio(), and >> then counting it again while dispatching it. >> >> Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >> Reviewed-by: Ming Lei <ming.lei@redhat.com> > > Please cc stable w/ version tag. > >> --- >> block/blk-throttle.c | 26 ++++++++++++++++++++------ >> 1 file changed, 20 insertions(+), 6 deletions(-) >> >> diff --git a/block/blk-throttle.c b/block/blk-throttle.c >> index 9f5fe62afff9..2957e2c643f4 100644 >> --- a/block/blk-throttle.c >> +++ b/block/blk-throttle.c >> @@ -811,7 +811,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, >> unsigned int bio_size = throtl_bio_data_size(bio); >> >> /* no need to throttle if this bio's bytes have been accounted */ >> - if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) { >> + if (bps_limit == U64_MAX) { >> if (wait) >> *wait = 0; >> return true; >> @@ -921,11 +921,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio) >> unsigned int bio_size = throtl_bio_data_size(bio); >> >> /* Charge the bio to the group */ >> - if (!bio_flagged(bio, BIO_THROTTLED)) { >> - tg->bytes_disp[rw] += bio_size; >> - tg->last_bytes_disp[rw] += bio_size; >> - } >> - >> + tg->bytes_disp[rw] += bio_size; >> + tg->last_bytes_disp[rw] += bio_size; >> tg->io_disp[rw]++; >> tg->last_io_disp[rw]++; >> >> @@ -2121,6 +2118,23 @@ bool __blk_throtl_bio(struct bio *bio) >> tg->last_low_overflow_time[rw] = jiffies; >> throtl_downgrade_check(tg); >> throtl_upgrade_check(tg); >> + >> + /* >> + * Splited bios can be re-entered because iops limit should be > ^ ^^^^^^^^^^^^^ > Split re-enter > >> + * counted again, however, bps limit should not. Since bps limit >> + * will be counted again while dispatching it, compensate the >> + * over-accounting here. Noted that compensation can fail if >> + * new slice is started. > > I can't really follow the comment. Please improve the explanation. > >> + */ >> + if (bio_flagged(bio, BIO_THROTTLED)) { >> + unsigned int bio_size = throtl_bio_data_size(bio); >> + >> + if (tg->bytes_disp[rw] >= bio_size) >> + tg->bytes_disp[rw] -= bio_size; >> + if (tg->last_bytes_disp[rw] >= bio_size) >> + tg->last_bytes_disp[rw] -= bio_size; >> + } > > So, as a fix for the immediate problem, I guess this might do but this feels > really fragile. How can we be certain that re-entering only happens because > of splitting? What if future core development changes that? It seems to be > solving the problem in the wrong place. Shouldn't we flag the bio indicating > that it's split when we're splitting the bio so that we only limit them for > iops in the first place? > Splited bio is tracked in __bio_clone: if (bio_flagged(bio_src, BIO_THROTTLED)) bio_set_flag(bio, BIO_THROTTLED); And currenty, the iops limit and bps limit are treated differently, however there are only one flag 'BIO_THROTTLED' and they can't be distinguished. Perhaps I can use two flags, for example BIO_IOPS_THROTTLED and BIO_BPS_THROTTLED, this way only iops limit can be handled and bps limit can be skipped for splited bio. What do you think? Thanks, Kuai > Thanks. >
Hello, On Wed, Aug 17, 2022 at 09:13:38AM +0800, Yu Kuai wrote: > > So, as a fix for the immediate problem, I guess this might do but this feels > > really fragile. How can we be certain that re-entering only happens because > > of splitting? What if future core development changes that? It seems to be > > solving the problem in the wrong place. Shouldn't we flag the bio indicating > > that it's split when we're splitting the bio so that we only limit them for > > iops in the first place? > > Splited bio is tracked in __bio_clone: As the word is used in commit messages and comments, the past perfect form of the verb "split" is "split". It looks like "splitted" is used in rare cases but dictionary says it's an archaic form. > if (bio_flagged(bio_src, BIO_THROTTLED)) > bio_set_flag(bio, BIO_THROTTLED); > > And currenty, the iops limit and bps limit are treated differently, > however there are only one flag 'BIO_THROTTLED' and they can't be > distinguished. > > Perhaps I can use two flags, for example BIO_IOPS_THROTTLED and > BIO_BPS_THROTTLED, this way only iops limit can be handled and bps > limit can be skipped for splited bio. > > What do you think? I think the code would be a lot more intuitive and less fragile if we used two flags but the bits in the bi_flags field are a scarce resource unfortunately. Even then, I think the right thing to do here is using two flags. Thanks.
Hi, Tejun! 在 2022/08/18 1:50, Tejun Heo 写道: > Hello, > > On Wed, Aug 17, 2022 at 09:13:38AM +0800, Yu Kuai wrote: >>> So, as a fix for the immediate problem, I guess this might do but this feels >>> really fragile. How can we be certain that re-entering only happens because >>> of splitting? What if future core development changes that? It seems to be >>> solving the problem in the wrong place. Shouldn't we flag the bio indicating >>> that it's split when we're splitting the bio so that we only limit them for >>> iops in the first place? >> >> Splited bio is tracked in __bio_clone: > > As the word is used in commit messages and comments, the past perfect form > of the verb "split" is "split". It looks like "splitted" is used in rare > cases but dictionary says it's an archaic form. Ok, thanks for pointing it out, I'll change that in next iteration. > >> if (bio_flagged(bio_src, BIO_THROTTLED)) >> bio_set_flag(bio, BIO_THROTTLED); >> >> And currenty, the iops limit and bps limit are treated differently, >> however there are only one flag 'BIO_THROTTLED' and they can't be >> distinguished. >> >> Perhaps I can use two flags, for example BIO_IOPS_THROTTLED and >> BIO_BPS_THROTTLED, this way only iops limit can be handled and bps >> limit can be skipped for splited bio. >> >> What do you think? > > I think the code would be a lot more intuitive and less fragile if we used > two flags but the bits in the bi_flags field are a scarce resource > unfortunately. Even then, I think the right thing to do here is using two > flags. Yes, the field 'bio->bi_flags' is unsigned short, and there are only two bits left. I'll use the new sulution which will acquire a new bit. Thanks, Kuai > > Thanks. >
Hi, 在 2022/08/18 9:23, Yu Kuai 写道: > Hi, Tejun! > > 在 2022/08/18 1:50, Tejun Heo 写道: >> Hello, >> >> On Wed, Aug 17, 2022 at 09:13:38AM +0800, Yu Kuai wrote: >>>> So, as a fix for the immediate problem, I guess this might do but >>>> this feels >>>> really fragile. How can we be certain that re-entering only happens >>>> because >>>> of splitting? What if future core development changes that? It seems >>>> to be >>>> solving the problem in the wrong place. Shouldn't we flag the bio >>>> indicating >>>> that it's split when we're splitting the bio so that we only limit >>>> them for >>>> iops in the first place? >>> >>> Splited bio is tracked in __bio_clone: >> >> As the word is used in commit messages and comments, the past perfect >> form >> of the verb "split" is "split". It looks like "splitted" is used in rare >> cases but dictionary says it's an archaic form. > > Ok, thanks for pointing it out, I'll change that in next iteration. >> >>> if (bio_flagged(bio_src, BIO_THROTTLED)) >>> bio_set_flag(bio, BIO_THROTTLED); While implementing the new method, I found that there seems to be a misunderstanding here, the code seems to try to add flag to split bio so that it won't be throttled again for bps limit, however: 1) for blk throttle, split bio is issued directly and will never be throttled again, while orignal bio will go through throttle path again. 2) if cloned bio is directed to a new disk, the flag is cleared anyway. >>> >>> And currenty, the iops limit and bps limit are treated differently, >>> however there are only one flag 'BIO_THROTTLED' and they can't be >>> distinguished. >>> >>> Perhaps I can use two flags, for example BIO_IOPS_THROTTLED and >>> BIO_BPS_THROTTLED, this way only iops limit can be handled and bps >>> limit can be skipped for splited bio. >>> >>> What do you think? >> >> I think the code would be a lot more intuitive and less fragile if we >> used >> two flags but the bits in the bi_flags field are a scarce resource >> unfortunately. Even then, I think the right thing to do here is using two >> flags. > > Yes, the field 'bio->bi_flags' is unsigned short, and there are only two > bits left. I'll use the new sulution which will acquire a new bit. > > Thanks, > Kuai >> >> Thanks. >> > > . >
Hello, On Mon, Aug 22, 2022 at 11:06:44AM +0800, Yu Kuai wrote: > While implementing the new method, I found that there seems to be a > misunderstanding here, the code seems to try to add flag to split bio > so that it won't be throttled again for bps limit, however: > > 1) for blk throttle, split bio is issued directly and will never be > throttled again, while orignal bio will go through throttle path again. > 2) if cloned bio is directed to a new disk, the flag is cleared anyway. Doesn't that make the current code correct then? But you were seeing incorrect behaviors, right? Thanks.
Hi, Tejun 在 2022/08/22 15:25, Tejun Heo 写道: > Hello, > > On Mon, Aug 22, 2022 at 11:06:44AM +0800, Yu Kuai wrote: >> While implementing the new method, I found that there seems to be a >> misunderstanding here, the code seems to try to add flag to split bio >> so that it won't be throttled again for bps limit, however: >> >> 1) for blk throttle, split bio is issued directly and will never be >> throttled again, while orignal bio will go through throttle path again. >> 2) if cloned bio is directed to a new disk, the flag is cleared anyway. > > Doesn't that make the current code correct then? But you were seeing > incorrect behaviors, right? According to the commit message in commit 111be8839817 ("block-throttle: avoid double charge"): If the bio is cloned/split, we copy the flag to new bio too to avoid a double charge. Which make me think the split bio will be resubmitted, and after implementing the new solution, I found that test result is not as expected. After spending sometime figuring out what is wrong, I found that split bio will be dispatched directly from caller, while orignal bio will be resubmitted. I guess commit 111be8839817 made a mistake, however, there should be no problem because orignal bio is flagged already, and it's handled correctly. Anyway, I removed the code in __bio_clone() and check flag in __bio_split_to_limits in my patch: --- a/block/bio.c +++ b/block/bio.c @@ -760,8 +760,6 @@ EXPORT_SYMBOL(bio_put); static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp) { bio_set_flag(bio, BIO_CLONED); - if (bio_flagged(bio_src, BIO_THROTTLED)) - bio_set_flag(bio, BIO_THROTTLED); bio->bi_ioprio = bio_src->bi_ioprio; bio->bi_iter = bio_src->bi_iter; diff --git a/block/blk-merge.c b/block/blk-merge.c index ff04e9290715..10330bb038ca 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -358,6 +358,13 @@ struct bio *__bio_split_to_limits(struct bio *bio, struct queue_limits *lim, blkcg_bio_issue_init(split); bio_chain(split, bio); trace_block_split(split, bio->bi_iter.bi_sector); + + /* + * original bio will be resubmited and throttled again, clear + * the iops flag so that it can be count again for iops limit. + */ + if (bio_flagged(bio, BIO_IOPS_THROTTLED)) + bio_clear_flag(bio, BIO_IOPS_THROTTLED); submit_bio_noacct(bio); return split; } > > Thanks. >
diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 9f5fe62afff9..2957e2c643f4 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -811,7 +811,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, unsigned int bio_size = throtl_bio_data_size(bio); /* no need to throttle if this bio's bytes have been accounted */ - if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) { + if (bps_limit == U64_MAX) { if (wait) *wait = 0; return true; @@ -921,11 +921,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio) unsigned int bio_size = throtl_bio_data_size(bio); /* Charge the bio to the group */ - if (!bio_flagged(bio, BIO_THROTTLED)) { - tg->bytes_disp[rw] += bio_size; - tg->last_bytes_disp[rw] += bio_size; - } - + tg->bytes_disp[rw] += bio_size; + tg->last_bytes_disp[rw] += bio_size; tg->io_disp[rw]++; tg->last_io_disp[rw]++; @@ -2121,6 +2118,23 @@ bool __blk_throtl_bio(struct bio *bio) tg->last_low_overflow_time[rw] = jiffies; throtl_downgrade_check(tg); throtl_upgrade_check(tg); + + /* + * Splited bios can be re-entered because iops limit should be + * counted again, however, bps limit should not. Since bps limit + * will be counted again while dispatching it, compensate the + * over-accounting here. Noted that compensation can fail if + * new slice is started. + */ + if (bio_flagged(bio, BIO_THROTTLED)) { + unsigned int bio_size = throtl_bio_data_size(bio); + + if (tg->bytes_disp[rw] >= bio_size) + tg->bytes_disp[rw] -= bio_size; + if (tg->last_bytes_disp[rw] >= bio_size) + tg->last_bytes_disp[rw] -= bio_size; + } + /* throtl is FIFO - if bios are already queued, should queue */ if (sq->nr_queued[rw]) break;