diff mbox series

block: throttle: don't add one extra jiffy mistakenly for bps limit

Message ID 20250220111735.1187999-1-ming.lei@redhat.com (mailing list archive)
State New
Headers show
Series block: throttle: don't add one extra jiffy mistakenly for bps limit | expand

Commit Message

Ming Lei Feb. 20, 2025, 11:17 a.m. UTC
When the current bio needs to be throttled because of bps limit, the wait
time for the extra bytes may be less than 1 jiffy, tg_within_bps_limit()
adds one extra 1 jiffy.

However, when taking roundup time into account, the extra 1 jiffy
may become not necessary, then bps limit becomes not accurate. This way
causes blktests throtl/001 failure in case of CONFIG_HZ_100=y.

Fix it by not adding the 1 jiffy in case that the roundup time can
cover it.

Cc: Tejun Heo <tj@kernel.org>
Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-throttle.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Yu Kuai Feb. 20, 2025, 1:38 p.m. UTC | #1
Hi,

在 2025/02/20 19:17, Ming Lei 写道:
> When the current bio needs to be throttled because of bps limit, the wait
> time for the extra bytes may be less than 1 jiffy, tg_within_bps_limit()
> adds one extra 1 jiffy.
> 
> However, when taking roundup time into account, the extra 1 jiffy
> may become not necessary, then bps limit becomes not accurate. This way
> causes blktests throtl/001 failure in case of CONFIG_HZ_100=y.
> 
> Fix it by not adding the 1 jiffy in case that the roundup time can
> cover it.
> 
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Yu Kuai <yukuai3@huawei.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   block/blk-throttle.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 8d149aff9fd0..8348972c517b 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -729,14 +729,14 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
>   	extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
>   	jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
>   
> -	if (!jiffy_wait)
> -		jiffy_wait = 1;
> -
>   	/*
>   	 * This wait time is without taking into consideration the rounding
>   	 * up we did. Add that time also.
>   	 */
>   	jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed);
> +	if (!jiffy_wait)
> +		jiffy_wait = 1;

Just wonder, will wait (0, 1) less jiffies is better than wait (0, 1)
more jiffies.

How about following changes?

Thanks,
Kuai

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 8d149aff9fd0..f8430baf3544 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -703,6 +703,7 @@ static unsigned long tg_within_bps_limit(struct 
throtl_grp *tg, struct bio *bio,
                                 u64 bps_limit)
  {
         bool rw = bio_data_dir(bio);
+       long long carryover_bytes;
         long long bytes_allowed;
         u64 extra_bytes;
         unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
@@ -727,10 +728,11 @@ static unsigned long tg_within_bps_limit(struct 
throtl_grp *tg, struct bio *bio,

         /* Calc approx time to dispatch */
         extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
-       jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
+       jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit, 
carryover_bytes);

+       /* carryover_bytes is dispatched without waiting */
         if (!jiffy_wait)
-               jiffy_wait = 1;
+               tg->carryover_bytes[rw] -= carryover_bytes;

         /*
          * This wait time is without taking into consideration the rounding

> +
>   	return jiffy_wait;
>   }
>   
>
Ming Lei Feb. 21, 2025, 2:55 a.m. UTC | #2
Hi Yukuai,

On Thu, Feb 20, 2025 at 09:38:12PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2025/02/20 19:17, Ming Lei 写道:
> > When the current bio needs to be throttled because of bps limit, the wait
> > time for the extra bytes may be less than 1 jiffy, tg_within_bps_limit()
> > adds one extra 1 jiffy.
> > 
> > However, when taking roundup time into account, the extra 1 jiffy
> > may become not necessary, then bps limit becomes not accurate. This way
> > causes blktests throtl/001 failure in case of CONFIG_HZ_100=y.
> > 
> > Fix it by not adding the 1 jiffy in case that the roundup time can
> > cover it.
> > 
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Yu Kuai <yukuai3@huawei.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >   block/blk-throttle.c | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> > index 8d149aff9fd0..8348972c517b 100644
> > --- a/block/blk-throttle.c
> > +++ b/block/blk-throttle.c
> > @@ -729,14 +729,14 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
> >   	extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
> >   	jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
> > -	if (!jiffy_wait)
> > -		jiffy_wait = 1;
> > -
> >   	/*
> >   	 * This wait time is without taking into consideration the rounding
> >   	 * up we did. Add that time also.
> >   	 */
> >   	jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed);
> > +	if (!jiffy_wait)
> > +		jiffy_wait = 1;
> 
> Just wonder, will wait (0, 1) less jiffies is better than wait (0, 1)
> more jiffies.
> 
> How about following changes?
> 
> Thanks,
> Kuai
> 
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 8d149aff9fd0..f8430baf3544 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -703,6 +703,7 @@ static unsigned long tg_within_bps_limit(struct
> throtl_grp *tg, struct bio *bio,
>                                 u64 bps_limit)
>  {
>         bool rw = bio_data_dir(bio);
> +       long long carryover_bytes;
>         long long bytes_allowed;
>         u64 extra_bytes;
>         unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
> @@ -727,10 +728,11 @@ static unsigned long tg_within_bps_limit(struct
> throtl_grp *tg, struct bio *bio,
> 
>         /* Calc approx time to dispatch */
>         extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
> -       jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
> +       jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit,
> carryover_bytes);
> 

&carryover_bytes

> +       /* carryover_bytes is dispatched without waiting */
>         if (!jiffy_wait)
> -               jiffy_wait = 1;
> +               tg->carryover_bytes[rw] -= carryover_bytes;
> 
>         /*
>          * This wait time is without taking into consideration the rounding
> 
> > +
> >   	return jiffy_wait;

Looks result is worse with your patch:

throtl/001 (basic functionality)                             [failed]
    runtime  6.488s  ...  28.862s
    --- tests/throtl/001.out	2024-11-21 09:20:47.514353642 +0000
    +++ /root/git/blktests/results/nodev/throtl/001.out.bad	2025-02-21 02:51:36.723754146 +0000
    @@ -1,6 +1,6 @@
     Running throtl/001
    +13
     1
    -1
    -1
    +13
     1
    ...
    (Run 'diff -u tests/throtl/001.out /root/git/blktests/results/nodev/throtl/001.out.bad' to see the entire diff)


thanks,
Ming
Ming Lei Feb. 21, 2025, 3:16 a.m. UTC | #3
On Fri, Feb 21, 2025 at 10:55:23AM +0800, Ming Lei wrote:
> Hi Yukuai,
> 
> On Thu, Feb 20, 2025 at 09:38:12PM +0800, Yu Kuai wrote:
> > Hi,
> > 
> > 在 2025/02/20 19:17, Ming Lei 写道:
> > > When the current bio needs to be throttled because of bps limit, the wait
> > > time for the extra bytes may be less than 1 jiffy, tg_within_bps_limit()
> > > adds one extra 1 jiffy.
> > > 
> > > However, when taking roundup time into account, the extra 1 jiffy
> > > may become not necessary, then bps limit becomes not accurate. This way
> > > causes blktests throtl/001 failure in case of CONFIG_HZ_100=y.
> > > 
> > > Fix it by not adding the 1 jiffy in case that the roundup time can
> > > cover it.
> > > 
> > > Cc: Tejun Heo <tj@kernel.org>
> > > Cc: Yu Kuai <yukuai3@huawei.com>
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > >   block/blk-throttle.c | 6 +++---
> > >   1 file changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> > > index 8d149aff9fd0..8348972c517b 100644
> > > --- a/block/blk-throttle.c
> > > +++ b/block/blk-throttle.c
> > > @@ -729,14 +729,14 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
> > >   	extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
> > >   	jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
> > > -	if (!jiffy_wait)
> > > -		jiffy_wait = 1;
> > > -
> > >   	/*
> > >   	 * This wait time is without taking into consideration the rounding
> > >   	 * up we did. Add that time also.
> > >   	 */
> > >   	jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed);
> > > +	if (!jiffy_wait)
> > > +		jiffy_wait = 1;
> > 
> > Just wonder, will wait (0, 1) less jiffies is better than wait (0, 1)
> > more jiffies.
> > 
> > How about following changes?
> > 
> > Thanks,
> > Kuai
> > 
> > diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> > index 8d149aff9fd0..f8430baf3544 100644
> > --- a/block/blk-throttle.c
> > +++ b/block/blk-throttle.c
> > @@ -703,6 +703,7 @@ static unsigned long tg_within_bps_limit(struct
> > throtl_grp *tg, struct bio *bio,
> >                                 u64 bps_limit)
> >  {
> >         bool rw = bio_data_dir(bio);
> > +       long long carryover_bytes;
> >         long long bytes_allowed;
> >         u64 extra_bytes;
> >         unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
> > @@ -727,10 +728,11 @@ static unsigned long tg_within_bps_limit(struct
> > throtl_grp *tg, struct bio *bio,
> > 
> >         /* Calc approx time to dispatch */
> >         extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
> > -       jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
> > +       jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit,
> > carryover_bytes);
> > 
> 
> &carryover_bytes
> 
> > +       /* carryover_bytes is dispatched without waiting */
> >         if (!jiffy_wait)
> > -               jiffy_wait = 1;
> > +               tg->carryover_bytes[rw] -= carryover_bytes;

Not sure ->carryover_bytes[] can be used here, the comment said
clearly it is only for updating config.

Neither it is good to add one extra, nor add one less, maybe
DIV64_U64_ROUND_CLOSEST() is better?

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 8d149aff9fd0..5791612b3543 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -727,16 +727,16 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
 
 	/* Calc approx time to dispatch */
 	extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
-	jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
-
-	if (!jiffy_wait)
-		jiffy_wait = 1;
+	jiffy_wait = DIV64_U64_ROUND_CLOSEST(extra_bytes * HZ, bps_limit);
 
 	/*
 	 * This wait time is without taking into consideration the rounding
 	 * up we did. Add that time also.
 	 */
 	jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed);
+	if (!jiffy_wait)
+		jiffy_wait = 1;
+
 	return jiffy_wait;
 }
 


Thanks,
Ming
Yu Kuai Feb. 21, 2025, 3:39 a.m. UTC | #4
Hi,

在 2025/02/21 10:55, Ming Lei 写道:
> Hi Yukuai,
> 
> On Thu, Feb 20, 2025 at 09:38:12PM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2025/02/20 19:17, Ming Lei 写道:
>>> When the current bio needs to be throttled because of bps limit, the wait
>>> time for the extra bytes may be less than 1 jiffy, tg_within_bps_limit()
>>> adds one extra 1 jiffy.
>>>
>>> However, when taking roundup time into account, the extra 1 jiffy
>>> may become not necessary, then bps limit becomes not accurate. This way
>>> causes blktests throtl/001 failure in case of CONFIG_HZ_100=y.
>>>
>>> Fix it by not adding the 1 jiffy in case that the roundup time can
>>> cover it.
>>>
>>> Cc: Tejun Heo <tj@kernel.org>
>>> Cc: Yu Kuai <yukuai3@huawei.com>
>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>> ---
>>>    block/blk-throttle.c | 6 +++---
>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
>>> index 8d149aff9fd0..8348972c517b 100644
>>> --- a/block/blk-throttle.c
>>> +++ b/block/blk-throttle.c
>>> @@ -729,14 +729,14 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
>>>    	extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
>>>    	jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
>>> -	if (!jiffy_wait)
>>> -		jiffy_wait = 1;
>>> -
>>>    	/*
>>>    	 * This wait time is without taking into consideration the rounding
>>>    	 * up we did. Add that time also.
>>>    	 */
>>>    	jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed);
>>> +	if (!jiffy_wait)
>>> +		jiffy_wait = 1;
>>
>> Just wonder, will wait (0, 1) less jiffies is better than wait (0, 1)
>> more jiffies.
>>
>> How about following changes?
>>
>> Thanks,
>> Kuai
>>
>> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
>> index 8d149aff9fd0..f8430baf3544 100644
>> --- a/block/blk-throttle.c
>> +++ b/block/blk-throttle.c
>> @@ -703,6 +703,7 @@ static unsigned long tg_within_bps_limit(struct
>> throtl_grp *tg, struct bio *bio,
>>                                  u64 bps_limit)
>>   {
>>          bool rw = bio_data_dir(bio);
>> +       long long carryover_bytes;
>>          long long bytes_allowed;
>>          u64 extra_bytes;
>>          unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
>> @@ -727,10 +728,11 @@ static unsigned long tg_within_bps_limit(struct
>> throtl_grp *tg, struct bio *bio,
>>
>>          /* Calc approx time to dispatch */
>>          extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
>> -       jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
>> +       jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit,
>> carryover_bytes);
Hi, Thanks for the test.

This is a mistake, carryover_bytes is much bigger than expected :(
That's why the result is much worse. My bad.
>>
> 
> &carryover_bytes
> 
>> +       /* carryover_bytes is dispatched without waiting */
>>          if (!jiffy_wait)
The if condition shound be removed.
>> -               jiffy_wait = 1;
>> +               tg->carryover_bytes[rw] -= carryover_bytes;
>>
>>          /*
>>           * This wait time is without taking into consideration the rounding
>>
>>> +
>>>    	return jiffy_wait;
> 
> Looks result is worse with your patch:
> 
> throtl/001 (basic functionality)                             [failed]
>      runtime  6.488s  ...  28.862s
>      --- tests/throtl/001.out	2024-11-21 09:20:47.514353642 +0000
>      +++ /root/git/blktests/results/nodev/throtl/001.out.bad	2025-02-21 02:51:36.723754146 +0000
>      @@ -1,6 +1,6 @@
>       Running throtl/001
>      +13
>       1
>      -1
>      -1
>      +13
>       1
>      ...
>      (Run 'diff -u tests/throtl/001.out /root/git/blktests/results/nodev/throtl/001.out.bad' to see the entire diff)

And I realize now that throtl_start_new_slice() will just clear
the carryover_bytes, I tested in my VM and with following changes,
throtl/001 never fail with CONFIG_HZ_100.

Thanks,
Kuai

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 8d149aff9fd0..4fc005af82e0 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -703,6 +703,7 @@ static unsigned long tg_within_bps_limit(struct 
throtl_grp *tg, struct bio *bio,
                                 u64 bps_limit)
  {
         bool rw = bio_data_dir(bio);
+       long long carryover_bytes;
         long long bytes_allowed;
         u64 extra_bytes;
         unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
@@ -727,10 +728,8 @@ static unsigned long tg_within_bps_limit(struct 
throtl_grp *tg, struct bio *bio,

         /* Calc approx time to dispatch */
         extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
-       jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
-
-       if (!jiffy_wait)
-               jiffy_wait = 1;
+       jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit, 
&carryover_bytes);
+       tg->carryover_bytes[rw] -= div64_u64(carryover_bytes, HZ);

         /*
          * This wait time is without taking into consideration the rounding
@@ -775,10 +774,14 @@ static bool tg_may_dispatch(struct throtl_grp *tg, 
struct bio *bio,
          * long since now. New slice is started only for empty throttle 
group.
          * If there is queued bio, that means there should be an active
          * slice and it should be extended instead.
+        *
+        * If throtl_trim_slice() doesn't clear carryover_bytes, which means
+        * debt is still not paid, don't start new slice in this case.
          */
-       if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw]))
+       if (throtl_slice_used(tg, rw) && 
!(tg->service_queue.nr_queued[rw]) &&
+           tg->carryover_bytes[rw] >= 0) {
                 throtl_start_new_slice(tg, rw, true);
-       else {
+       } else {
                 if (time_before(tg->slice_end[rw],
                     jiffies + tg->td->throtl_slice))
                         throtl_extend_slice(tg, rw,

> 
> 
> thanks,
> Ming
> 
> 
> .
>
Ming Lei Feb. 21, 2025, 4:18 a.m. UTC | #5
On Fri, Feb 21, 2025 at 11:39:17AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2025/02/21 10:55, Ming Lei 写道:
> > Hi Yukuai,
> > 
> > On Thu, Feb 20, 2025 at 09:38:12PM +0800, Yu Kuai wrote:
> > > Hi,
> > > 
> > > 在 2025/02/20 19:17, Ming Lei 写道:
> > > > When the current bio needs to be throttled because of bps limit, the wait
> > > > time for the extra bytes may be less than 1 jiffy, tg_within_bps_limit()
> > > > adds one extra 1 jiffy.
> > > > 
> > > > However, when taking roundup time into account, the extra 1 jiffy
> > > > may become not necessary, then bps limit becomes not accurate. This way
> > > > causes blktests throtl/001 failure in case of CONFIG_HZ_100=y.
> > > > 
> > > > Fix it by not adding the 1 jiffy in case that the roundup time can
> > > > cover it.
> > > > 
> > > > Cc: Tejun Heo <tj@kernel.org>
> > > > Cc: Yu Kuai <yukuai3@huawei.com>
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > >    block/blk-throttle.c | 6 +++---
> > > >    1 file changed, 3 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> > > > index 8d149aff9fd0..8348972c517b 100644
> > > > --- a/block/blk-throttle.c
> > > > +++ b/block/blk-throttle.c
> > > > @@ -729,14 +729,14 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
> > > >    	extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
> > > >    	jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
> > > > -	if (!jiffy_wait)
> > > > -		jiffy_wait = 1;
> > > > -
> > > >    	/*
> > > >    	 * This wait time is without taking into consideration the rounding
> > > >    	 * up we did. Add that time also.
> > > >    	 */
> > > >    	jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed);
> > > > +	if (!jiffy_wait)
> > > > +		jiffy_wait = 1;
> > > 
> > > Just wonder, will wait (0, 1) less jiffies is better than wait (0, 1)
> > > more jiffies.
> > > 
> > > How about following changes?
> > > 
> > > Thanks,
> > > Kuai
> > > 
> > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> > > index 8d149aff9fd0..f8430baf3544 100644
> > > --- a/block/blk-throttle.c
> > > +++ b/block/blk-throttle.c
> > > @@ -703,6 +703,7 @@ static unsigned long tg_within_bps_limit(struct
> > > throtl_grp *tg, struct bio *bio,
> > >                                  u64 bps_limit)
> > >   {
> > >          bool rw = bio_data_dir(bio);
> > > +       long long carryover_bytes;
> > >          long long bytes_allowed;
> > >          u64 extra_bytes;
> > >          unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
> > > @@ -727,10 +728,11 @@ static unsigned long tg_within_bps_limit(struct
> > > throtl_grp *tg, struct bio *bio,
> > > 
> > >          /* Calc approx time to dispatch */
> > >          extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
> > > -       jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
> > > +       jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit,
> > > carryover_bytes);
> Hi, Thanks for the test.
> 
> This is a mistake, carryover_bytes is much bigger than expected :(
> That's why the result is much worse. My bad.
> > > 
> > 
> > &carryover_bytes
> > 
> > > +       /* carryover_bytes is dispatched without waiting */
> > >          if (!jiffy_wait)
> The if condition shound be removed.
> > > -               jiffy_wait = 1;
> > > +               tg->carryover_bytes[rw] -= carryover_bytes;
> > > 
> > >          /*
> > >           * This wait time is without taking into consideration the rounding
> > > 
> > > > +
> > > >    	return jiffy_wait;
> > 
> > Looks result is worse with your patch:
> > 
> > throtl/001 (basic functionality)                             [failed]
> >      runtime  6.488s  ...  28.862s
> >      --- tests/throtl/001.out	2024-11-21 09:20:47.514353642 +0000
> >      +++ /root/git/blktests/results/nodev/throtl/001.out.bad	2025-02-21 02:51:36.723754146 +0000
> >      @@ -1,6 +1,6 @@
> >       Running throtl/001
> >      +13
> >       1
> >      -1
> >      -1
> >      +13
> >       1
> >      ...
> >      (Run 'diff -u tests/throtl/001.out /root/git/blktests/results/nodev/throtl/001.out.bad' to see the entire diff)
> 
> And I realize now that throtl_start_new_slice() will just clear
> the carryover_bytes, I tested in my VM and with following changes,
> throtl/001 never fail with CONFIG_HZ_100.

If carryover_bytes can cover this issue, I think it is preferred.

> 
> Thanks,
> Kuai
> 
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 8d149aff9fd0..4fc005af82e0 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -703,6 +703,7 @@ static unsigned long tg_within_bps_limit(struct
> throtl_grp *tg, struct bio *bio,
>                                 u64 bps_limit)
>  {
>         bool rw = bio_data_dir(bio);
> +       long long carryover_bytes;
>         long long bytes_allowed;
>         u64 extra_bytes;
>         unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
> @@ -727,10 +728,8 @@ static unsigned long tg_within_bps_limit(struct
> throtl_grp *tg, struct bio *bio,
> 
>         /* Calc approx time to dispatch */
>         extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
> -       jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
> -
> -       if (!jiffy_wait)
> -               jiffy_wait = 1;
> +       jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit,
> &carryover_bytes);
> +       tg->carryover_bytes[rw] -= div64_u64(carryover_bytes, HZ);

Can you explain a bit why `carryover_bytes/HZ` is subtracted instead of
carryover_bytes?

Also tg_within_bps_limit() may return 0 now, which isn't expected.


Thanks, 
Ming
Yu Kuai Feb. 21, 2025, 6:29 a.m. UTC | #6
Hi,

在 2025/02/21 12:18, Ming Lei 写道:
>> -       jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
>> -
>> -       if (!jiffy_wait)
>> -               jiffy_wait = 1;
>> +       jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit,
>> &carryover_bytes);
>> +       tg->carryover_bytes[rw] -= div64_u64(carryover_bytes, HZ);
> Can you explain a bit why `carryover_bytes/HZ` is subtracted instead of
> carryover_bytes?

For example, if bps_limit is 1000, extra_bytes is 9, then:

jiffy_wait = (9 * 100) / 1000 = 0;
carryover_bytes = (9 * 100) % 1000 = 900;

Hence we need to divide it by HZ:
tg->carryover_bytes = 0 - 900/100 = -9;

-9 can be considered debt, and for the next IO, the bytes_allowed will
consider the carryover_bytes.
> 
> Also tg_within_bps_limit() may return 0 now, which isn't expected.

I think it's expected, this IO will now be dispatched directly instead
of wait for one more jiffies, and debt will be paid if there are
following IOs.

And if the tg idle for a long time before dispatching the next IO,
tg_trim_slice() should handle this case and avoid long slice end. This
is not quite handy, perhaps it's better to add a helper like
tg_in_debt() before throtl_start_new_slice() to hande this case.

BTW, we must update the comment about carryover_bytes/ios, it's alredy
used as debt.

Thanks,
Kuai
> 
> 
> Thanks,
> Ming
Ming Lei Feb. 21, 2025, 8:59 a.m. UTC | #7
On Fri, Feb 21, 2025 at 02:29:30PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2025/02/21 12:18, Ming Lei 写道:
> > > -       jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
> > > -
> > > -       if (!jiffy_wait)
> > > -               jiffy_wait = 1;
> > > +       jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit,
> > > &carryover_bytes);
> > > +       tg->carryover_bytes[rw] -= div64_u64(carryover_bytes, HZ);
> > Can you explain a bit why `carryover_bytes/HZ` is subtracted instead of
> > carryover_bytes?
> 
> For example, if bps_limit is 1000, extra_bytes is 9, then:
> 
> jiffy_wait = (9 * 100) / 1000 = 0;
> carryover_bytes = (9 * 100) % 1000 = 900;
> 
> Hence we need to divide it by HZ:
> tg->carryover_bytes = 0 - 900/100 = -9;
> 
> -9 can be considered debt, and for the next IO, the bytes_allowed will
> consider the carryover_bytes.

Got it, it is just because the dividend is 'extra_bytes * HZ', so the remainder
need to be divided by HZ.

> > 
> > Also tg_within_bps_limit() may return 0 now, which isn't expected.
> 
> I think it's expected, this IO will now be dispatched directly instead
> of wait for one more jiffies, and debt will be paid if there are
> following IOs.

OK.

> 
> And if the tg idle for a long time before dispatching the next IO,
> tg_trim_slice() should handle this case and avoid long slice end. This
> is not quite handy, perhaps it's better to add a helper like
> tg_in_debt() before throtl_start_new_slice() to hande this case.
> 
> BTW, we must update the comment about carryover_bytes/ios, it's alredy
> used as debt.

Indeed, the approach is similar with the handling for
bio_issue_as_root_blkg().


Tested-by: Ming Lei <ming.lei@redhat.com>



Thanks,
Ming
Yu Kuai Feb. 22, 2025, 3:01 a.m. UTC | #8
Hi,

在 2025/02/21 16:59, Ming Lei 写道:
>> And if the tg idle for a long time before dispatching the next IO,
>> tg_trim_slice() should handle this case and avoid long slice end. This
>> is not quite handy, perhaps it's better to add a helper like
>> tg_in_debt() before throtl_start_new_slice() to hande this case.
>>
>> BTW, we must update the comment about carryover_bytes/ios, it's alredy
>> used as debt.
> Indeed, the approach is similar with the handling for
> bio_issue_as_root_blkg().
> 
> 
> Tested-by: Ming Lei<ming.lei@redhat.com>

While cooking the patch, I found a better way than tg_in_debt() before
throtl_start_new_slice(), by making sure the current slice has 1 more
jiffies to repay the debt. I'll send the patch soon, let me know if you
don't want me to kee the test tag.

Thanks,
Kuai
diff mbox series

Patch

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 8d149aff9fd0..8348972c517b 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -729,14 +729,14 @@  static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
 	extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed;
 	jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit);
 
-	if (!jiffy_wait)
-		jiffy_wait = 1;
-
 	/*
 	 * This wait time is without taking into consideration the rounding
 	 * up we did. Add that time also.
 	 */
 	jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed);
+	if (!jiffy_wait)
+		jiffy_wait = 1;
+
 	return jiffy_wait;
 }