Message ID | 20220526133554.21079-1-zhouchengming@bytedance.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-iocost: fix false positive lagging | expand |
Hello, On Thu, May 26, 2022 at 09:35:54PM +0800, Chengming Zhou wrote: > I found many false positive lagging during iocost test. > > Since iocg->vtime will be advanced to (vnow - margins.target) > in hweight_after_donation(), which called throw away excess, > the iocg->done_vtime will also be advanced that much. > > period_at_vtime <--period_vtime--> vnow > | | > ---------------------------------------------------> > |<--->| > margins.target > |-> > vtime, done_vtime All it does is shifting the vtime (and done_vtime) within the current window so that we don't build up budget too lage a budget possibly spanning multiple periods. The lagging detection is supposed to detect IOs which are issued two+ periods ago which didn't finish in the last period. So, I don't think the above sliding up the window affects that detection given that the lagging detection is done before the window sliding. All it's checking is whether there still are in-flight IOs which were issued two+ windows ago, so how the last window has been fast forwarded shouldn't affect the detection, no? Thanks.
Hello, On 2022/5/27 01:43, Tejun Heo wrote: > Hello, > > On Thu, May 26, 2022 at 09:35:54PM +0800, Chengming Zhou wrote: >> I found many false positive lagging during iocost test. >> >> Since iocg->vtime will be advanced to (vnow - margins.target) >> in hweight_after_donation(), which called throw away excess, >> the iocg->done_vtime will also be advanced that much. >> >> period_at_vtime <--period_vtime--> vnow >> | | >> ---------------------------------------------------> >> |<--->| >> margins.target >> |-> >> vtime, done_vtime > > All it does is shifting the vtime (and done_vtime) within the current window > so that we don't build up budget too lage a budget possibly spanning > multiple periods. Yes, this is necessary. Suppose in the last timer, the iocg doesn't have inflights and have excess, then iocg->vtime = iocg->done_vtime = (period_at_vtime - margins.target) > The lagging detection is supposed to detect IOs which are > issued two+ periods ago which didn't finish in the last period. So, I don't Yes, I understand. > think the above sliding up the window affects that detection given that the > lagging detection is done before the window sliding. All it's checking is > whether there still are in-flight IOs which were issued two+ windows ago, so > how the last window has been fast forwarded shouldn't affect the detection, > no? Right, the lagging detection is done before the window sliding in this period timer. The conditions that it checks vtime, done_vtime have been slided in the last timer. time_after64(vtime, vdone) && time_after64(vtime, now.vnow - MAX_LAGGING_PERIODS * period_vtime) && time_before64(vdone, now.vnow - period_vtime) The first condition says it has some inflights, the second condition is always true if vtime has been slided in the last timer, the third condition will be true if the cost of io completed since last timer < ioc->margins.target. So I think it doesn't check correctly whether it has inflights that were issued two+ windows ago. Thanks. > > Thanks. >
On 2022/5/26 21:35, Chengming Zhou wrote: > I found many false positive lagging during iocost test. > > Since iocg->vtime will be advanced to (vnow - margins.target) > in hweight_after_donation(), which called throw away excess, > the iocg->done_vtime will also be advanced that much. > > period_at_vtime <--period_vtime--> vnow > | | > ---------------------------------------------------> > |<--->| > margins.target > |-> > vtime, done_vtime > > If that iocg has some inflight io when vnow, but its done_vtime > is before period_at_vtime, ioc_timer_fn() will think it has > lagging io, even these io maybe issued just before now. > > This patch change the condition to check if vdone is before > (period_at_vtime - margins.target) instead of period_at_vtime. > > But there is another problem that this patch doesn't fix. > Since vtime will be advanced, we can't check if vtime is > after (vnow - MAX_LAGGING_PERIODS * period_vtime) to tell > whether this iocg pin lagging for too long. > > Maybe we can add lagging_periods in iocg to record how many > periods this iocg pin lagging, but I don't know when to clean it. Hello tejun, I add lagging_periods in iocg based on the original patch, to record how many periods this iocg pin lagging. So we can use it to avoid letting cmds which take a very long time pin lagging for too long. Thanks. diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 33a11ba971ea..998bb38ffb37 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -541,6 +541,8 @@ struct ioc_gq { u64 indebt_since; u64 indelay_since; + int lagging_periods; + /* this iocg's depth in the hierarchy and ancestors including self */ int level; struct ioc_gq *ancestors[]; @@ -2257,10 +2259,13 @@ static void ioc_timer_fn(struct timer_list *timer) if ((ppm_rthr != MILLION || ppm_wthr != MILLION) && !atomic_read(&iocg_to_blkg(iocg)->use_delay) && time_after64(vtime, vdone) && - time_after64(vtime, now.vnow - - MAX_LAGGING_PERIODS * period_vtime) && - time_before64(vdone, now.vnow - period_vtime)) - nr_lagging++; + time_before64(vdone, ioc->period_at_vtime - ioc->margins.target)) { + if (iocg->lagging_periods < MAX_LAGGING_PERIODS) { + nr_lagging++; + iocg->lagging_periods++; + } + } else if (iocg->lagging_periods) + iocg->lagging_periods = 0; /* * Determine absolute usage factoring in in-flight IOs to avoid > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > --- > block/blk-iocost.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/blk-iocost.c b/block/blk-iocost.c > index 33a11ba971ea..42e301b7527b 100644 > --- a/block/blk-iocost.c > +++ b/block/blk-iocost.c > @@ -2259,7 +2259,7 @@ static void ioc_timer_fn(struct timer_list *timer) > time_after64(vtime, vdone) && > time_after64(vtime, now.vnow - > MAX_LAGGING_PERIODS * period_vtime) && > - time_before64(vdone, now.vnow - period_vtime)) > + time_before64(vdone, ioc->period_at_vtime - ioc->margins.target)) > nr_lagging++; > > /*
On 2022/5/28 16:17, Chengming Zhou wrote: > On 2022/5/26 21:35, Chengming Zhou wrote: >> I found many false positive lagging during iocost test. >> >> Since iocg->vtime will be advanced to (vnow - margins.target) >> in hweight_after_donation(), which called throw away excess, >> the iocg->done_vtime will also be advanced that much. >> >> period_at_vtime <--period_vtime--> vnow >> | | >> ---------------------------------------------------> >> |<--->| >> margins.target >> |-> >> vtime, done_vtime >> >> If that iocg has some inflight io when vnow, but its done_vtime >> is before period_at_vtime, ioc_timer_fn() will think it has >> lagging io, even these io maybe issued just before now. >> >> This patch change the condition to check if vdone is before >> (period_at_vtime - margins.target) instead of period_at_vtime. >> >> But there is another problem that this patch doesn't fix. >> Since vtime will be advanced, we can't check if vtime is >> after (vnow - MAX_LAGGING_PERIODS * period_vtime) to tell >> whether this iocg pin lagging for too long. >> >> Maybe we can add lagging_periods in iocg to record how many >> periods this iocg pin lagging, but I don't know when to clean it. > > Hello tejun, I add lagging_periods in iocg based on the original patch, > to record how many periods this iocg pin lagging. So we can use it to > avoid letting cmds which take a very long time pin lagging for too long. > > Thanks. > > > diff --git a/block/blk-iocost.c b/block/blk-iocost.c > index 33a11ba971ea..998bb38ffb37 100644 > --- a/block/blk-iocost.c > +++ b/block/blk-iocost.c > @@ -541,6 +541,8 @@ struct ioc_gq { > u64 indebt_since; > u64 indelay_since; > > + int lagging_periods; > + > /* this iocg's depth in the hierarchy and ancestors including self */ > int level; > struct ioc_gq *ancestors[]; > @@ -2257,10 +2259,13 @@ static void ioc_timer_fn(struct timer_list *timer) > if ((ppm_rthr != MILLION || ppm_wthr != MILLION) && > !atomic_read(&iocg_to_blkg(iocg)->use_delay) && > time_after64(vtime, vdone) && > - time_after64(vtime, now.vnow - > - MAX_LAGGING_PERIODS * period_vtime) && > - time_before64(vdone, now.vnow - period_vtime)) > - nr_lagging++; > + time_before64(vdone, ioc->period_at_vtime - ioc->margins.target)) { > + if (iocg->lagging_periods < MAX_LAGGING_PERIODS) { > + nr_lagging++; > + iocg->lagging_periods++; > + } > + } else if (iocg->lagging_periods) > + iocg->lagging_periods = 0; > > /* > * Determine absolute usage factoring in in-flight IOs to avoid > Hi, I tested with this version, previous false laggings are gone. So I wonder if I should send v2 for review? Thanks! > >> >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> >> --- >> block/blk-iocost.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/block/blk-iocost.c b/block/blk-iocost.c >> index 33a11ba971ea..42e301b7527b 100644 >> --- a/block/blk-iocost.c >> +++ b/block/blk-iocost.c >> @@ -2259,7 +2259,7 @@ static void ioc_timer_fn(struct timer_list *timer) >> time_after64(vtime, vdone) && >> time_after64(vtime, now.vnow - >> MAX_LAGGING_PERIODS * period_vtime) && >> - time_before64(vdone, now.vnow - period_vtime)) >> + time_before64(vdone, ioc->period_at_vtime - ioc->margins.target)) >> nr_lagging++; >> >> /*
diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 33a11ba971ea..42e301b7527b 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -2259,7 +2259,7 @@ static void ioc_timer_fn(struct timer_list *timer) time_after64(vtime, vdone) && time_after64(vtime, now.vnow - MAX_LAGGING_PERIODS * period_vtime) && - time_before64(vdone, now.vnow - period_vtime)) + time_before64(vdone, ioc->period_at_vtime - ioc->margins.target)) nr_lagging++; /*
I found many false positive lagging during iocost test. Since iocg->vtime will be advanced to (vnow - margins.target) in hweight_after_donation(), which called throw away excess, the iocg->done_vtime will also be advanced that much. period_at_vtime <--period_vtime--> vnow | | ---------------------------------------------------> |<--->| margins.target |-> vtime, done_vtime If that iocg has some inflight io when vnow, but its done_vtime is before period_at_vtime, ioc_timer_fn() will think it has lagging io, even these io maybe issued just before now. This patch change the condition to check if vdone is before (period_at_vtime - margins.target) instead of period_at_vtime. But there is another problem that this patch doesn't fix. Since vtime will be advanced, we can't check if vtime is after (vnow - MAX_LAGGING_PERIODS * period_vtime) to tell whether this iocg pin lagging for too long. Maybe we can add lagging_periods in iocg to record how many periods this iocg pin lagging, but I don't know when to clean it. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- block/blk-iocost.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)