diff mbox

[v2] blk-throttle: fix possible io stall when upgrade to max

Message ID 526810b1-72e3-859f-aeaa-b5192144c589@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Joseph Qi Sept. 30, 2017, 6:38 a.m. UTC
From: Joseph Qi <qijiang.qj@alibaba-inc.com>

There is a case which will lead to io stall. The case is described as
follows. 
/test1
  |-subtest1
/test2
  |-subtest2
And subtest1 and subtest2 each has 32 queued bios already.

Now upgrade to max. In throtl_upgrade_state, it will try to dispatch
bios as follows:
1) tg=subtest1, do nothing;
2) tg=test1, transfer 32 queued bios from subtest1 to test1; no pending
left, no need to schedule next dispatch;
3) tg=subtest2, do nothing;
4) tg=test2, transfer 32 queued bios from subtest2 to test2; no pending
left, no need to schedule next dispatch;
5) tg=/, transfer 8 queued bios from test1 to /, 8 queued bios from
test2 to /, 8 queued bios from test1 to /, and 8 queued bios from test2
to /; note that test1 and test2 each still has 16 queued bios left;
6) tg=/, try to schedule next dispatch, but since disptime is now
(update in tg_update_disptime, wait=0), pending timer is not scheduled
in fact;
7) In throtl_upgrade_state it totally dispatches 32 queued bios and with
32 left. test1 and test2 each has 16 queued bios;
8) throtl_pending_timer_fn sees the left over bios, but could do
nothing, because throtl_select_dispatch returns 0, and test1/test2 has
no pending tg.

The blktrace shows the following:
8,32   0        0     2.539007641     0  m   N throtl upgrade to max
8,32   0        0     2.539072267     0  m   N throtl /test2 dispatch nr_queued=16 read=0 write=16
8,32   7        0     2.539077142     0  m   N throtl /test1 dispatch nr_queued=16 read=0 write=16

So force schedule dispatch if there are pending children.

Signed-off-by: Joseph Qi <qijiang.qj@alibaba-inc.com>
---
 block/blk-throttle.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Shaohua Li Sept. 30, 2017, 6:02 a.m. UTC | #1
On Sat, Sep 30, 2017 at 02:38:49PM +0800, Joseph Qi wrote:
> From: Joseph Qi <qijiang.qj@alibaba-inc.com>
> 
> There is a case which will lead to io stall. The case is described as
> follows. 
> /test1
>   |-subtest1
> /test2
>   |-subtest2
> And subtest1 and subtest2 each has 32 queued bios already.
> 
> Now upgrade to max. In throtl_upgrade_state, it will try to dispatch
> bios as follows:
> 1) tg=subtest1, do nothing;
> 2) tg=test1, transfer 32 queued bios from subtest1 to test1; no pending
> left, no need to schedule next dispatch;
> 3) tg=subtest2, do nothing;
> 4) tg=test2, transfer 32 queued bios from subtest2 to test2; no pending
> left, no need to schedule next dispatch;
> 5) tg=/, transfer 8 queued bios from test1 to /, 8 queued bios from
> test2 to /, 8 queued bios from test1 to /, and 8 queued bios from test2
> to /; note that test1 and test2 each still has 16 queued bios left;
> 6) tg=/, try to schedule next dispatch, but since disptime is now
> (update in tg_update_disptime, wait=0), pending timer is not scheduled
> in fact;
> 7) In throtl_upgrade_state it totally dispatches 32 queued bios and with
> 32 left. test1 and test2 each has 16 queued bios;
> 8) throtl_pending_timer_fn sees the left over bios, but could do
> nothing, because throtl_select_dispatch returns 0, and test1/test2 has
> no pending tg.
> 
> The blktrace shows the following:
> 8,32   0        0     2.539007641     0  m   N throtl upgrade to max
> 8,32   0        0     2.539072267     0  m   N throtl /test2 dispatch nr_queued=16 read=0 write=16
> 8,32   7        0     2.539077142     0  m   N throtl /test1 dispatch nr_queued=16 read=0 write=16
> 
> So force schedule dispatch if there are pending children.
>
> Signed-off-by: Joseph Qi <qijiang.qj@alibaba-inc.com>
> ---
>  block/blk-throttle.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 0fea76a..17816a0 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -1911,11 +1911,11 @@ static void throtl_upgrade_state(struct throtl_data *td)
>  
>  		tg->disptime = jiffies - 1;
>  		throtl_select_dispatch(sq);
> -		throtl_schedule_next_dispatch(sq, false);
> +		throtl_schedule_next_dispatch(sq, true);
>  	}
>  	rcu_read_unlock();
>  	throtl_select_dispatch(&td->service_queue);
> -	throtl_schedule_next_dispatch(&td->service_queue, false);
> +	throtl_schedule_next_dispatch(&td->service_queue, true);
>  	queue_work(kthrotld_workqueue, &td->dispatch_work);
>  }

Reviewed-by: Shaohua Li <shli@fb.com>
Jens Axboe Oct. 3, 2017, 9:42 p.m. UTC | #2
On 09/30/2017 12:38 AM, Joseph Qi wrote:
> From: Joseph Qi <qijiang.qj@alibaba-inc.com>
> 
> There is a case which will lead to io stall. The case is described as
> follows. 
> /test1
>   |-subtest1
> /test2
>   |-subtest2
> And subtest1 and subtest2 each has 32 queued bios already.
> 
> Now upgrade to max. In throtl_upgrade_state, it will try to dispatch
> bios as follows:
> 1) tg=subtest1, do nothing;
> 2) tg=test1, transfer 32 queued bios from subtest1 to test1; no pending
> left, no need to schedule next dispatch;
> 3) tg=subtest2, do nothing;
> 4) tg=test2, transfer 32 queued bios from subtest2 to test2; no pending
> left, no need to schedule next dispatch;
> 5) tg=/, transfer 8 queued bios from test1 to /, 8 queued bios from
> test2 to /, 8 queued bios from test1 to /, and 8 queued bios from test2
> to /; note that test1 and test2 each still has 16 queued bios left;
> 6) tg=/, try to schedule next dispatch, but since disptime is now
> (update in tg_update_disptime, wait=0), pending timer is not scheduled
> in fact;
> 7) In throtl_upgrade_state it totally dispatches 32 queued bios and with
> 32 left. test1 and test2 each has 16 queued bios;
> 8) throtl_pending_timer_fn sees the left over bios, but could do
> nothing, because throtl_select_dispatch returns 0, and test1/test2 has
> no pending tg.
> 
> The blktrace shows the following:
> 8,32   0        0     2.539007641     0  m   N throtl upgrade to max
> 8,32   0        0     2.539072267     0  m   N throtl /test2 dispatch nr_queued=16 read=0 write=16
> 8,32   7        0     2.539077142     0  m   N throtl /test1 dispatch nr_queued=16 read=0 write=16
> 
> So force schedule dispatch if there are pending children.

Applied for 4.14, thanks.
diff mbox

Patch

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 0fea76a..17816a0 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1911,11 +1911,11 @@  static void throtl_upgrade_state(struct throtl_data *td)
 
 		tg->disptime = jiffies - 1;
 		throtl_select_dispatch(sq);
-		throtl_schedule_next_dispatch(sq, false);
+		throtl_schedule_next_dispatch(sq, true);
 	}
 	rcu_read_unlock();
 	throtl_select_dispatch(&td->service_queue);
-	throtl_schedule_next_dispatch(&td->service_queue, false);
+	throtl_schedule_next_dispatch(&td->service_queue, true);
 	queue_work(kthrotld_workqueue, &td->dispatch_work);
 }