Message ID | 20220121105503.14069-1-jack@suse.cz (mailing list archive) |
---|---|
Headers | show |
Series | bfq: Avoid use-after-free when moving processes between cgroups | expand |
在 2022/01/21 19:42, yukuai (C) 写道: > 在 2022/01/21 18:56, Jan Kara 写道: >> Hello, >> >> here is the fifth version of my patches to fix use-after-free issues >> in BFQ >> when processes with merged queues get moved to different cgroups. The >> patches >> have survived some beating in my test VM, but so far I fail to >> reproduce the >> original KASAN reports so testing from people who can reproduce them >> is most >> welcome. Kuai, can you please give these patches a run in your setup? >> Thanks >> a lot for your help with fixing this! >> >> Changes since v4: >> * Even more aggressive splitting of merged bfq queues to avoid >> problems with >> long merge chains. >> >> Changes since v3: >> * Changed handling of bfq group move to handle the case when target of >> the >> merge has moved. >> >> Changes since v2: >> * Improved handling of bfq queue splitting on move between cgroups >> * Removed broken change to bfq_put_cooperator() >> >> Changes since v1: >> * Added fix for bfq_put_cooperator() >> * Added fix to handle move between cgroups in bfq_merge_bio() >> >> Honza >> Previous versions: >> Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1 >> Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2 >> Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3 >> Link: http://lore.kernel.org/r/20220114164215.28972-1-jack@suse.cz # v4 >> . >> > Hi, Jan > > I add a new BUG_ON() in bfq_setup_merge() while iterating new_bfqq, and > this time this BUG_ON() is triggered: > > diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c > index 00184530c644..4b17eb4a29bc 100644 > --- a/block/bfq-cgroup.c > +++ b/block/bfq-cgroup.c > @@ -731,8 +731,12 @@ static struct bfq_group > *__bfq_bic_change_cgroup(struct bfq_data *bfqd, > if (sync_bfqq) { > if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) { > /* We are the only user of this bfqq, just move > it */ > - if (sync_bfqq->entity.sched_data != > &bfqg->sched_data) > + if (sync_bfqq->entity.sched_data != > &bfqg->sched_data) { > + printk("%s: bfqq %px move from %px to > %px\n", > + __func__, sync_bfqq, > + bfqq_group(sync_bfqq), bfqg); > bfq_bfqq_move(bfqd, sync_bfqq, bfqg); > + } > } else { > struct bfq_queue *bfqq; > > @@ -741,10 +745,13 @@ static struct bfq_group > *__bfq_bic_change_cgroup(struct bfq_data *bfqd, > * that the merge chain still belongs to the same > * cgroup. > */ > - for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq) > + for (bfqq = sync_bfqq; bfqq; bfqq = > bfqq->new_bfqq) { > + printk("%s: bfqq %px(%px) bfqg %px\n", > __func__, > + bfqq, bfqq_group(bfqq), bfqg); > if (bfqq->entity.sched_data != > &bfqg->sched_data) > break; > + } > if (bfqq) { > /* > * Some queue changed cgroup so the > merge is > @@ -878,6 +885,8 @@ static void bfq_reparent_leaf_entity(struct bfq_data > *bfqd, > } > > bfqq = bfq_entity_to_bfqq(child_entity); > + printk("%s: bfqq %px move from %px to %px\n", __func__, bfqq, > + bfqq_group(bfqq), bfqd->root_group); > bfq_bfqq_move(bfqd, bfqq, bfqd->root_group); > } > > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c > index 07be51bc229b..6d4e243c9a1e 100644 > --- a/block/bfq-iosched.c > +++ b/block/bfq-iosched.c > @@ -2753,6 +2753,14 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct > bfq_queue *new_bfqq) > while ((__bfqq = new_bfqq->new_bfqq)) { > if (__bfqq == bfqq) > return NULL; > + if (new_bfqq->entity.parent != __bfqq->entity.parent && > + bfqq_group(__bfqq) != __bfqq->bfqd->root_group) { > + printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n", > __func__, > + new_bfqq, bfqq_group(new_bfqq), __bfqq, > + bfqq_group(__bfqq)); > + BUG_ON(1); > + } > + > new_bfqq = __bfqq; > } > > @@ -2797,6 +2805,8 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct > bfq_queue *new_bfqq) > * are likely to increase the throughput. > */ > bfqq->new_bfqq = new_bfqq; > + printk("%s: set bfqq %px(%px) new_bfqq %px(%px)\n", __func__, > bfqq, > + bfqq_group(bfqq), new_bfqq, bfqq_group(new_bfqq)); > new_bfqq->ref += process_refs; > return new_bfqq; > } > @@ -2963,8 +2973,16 @@ bfq_setup_cooperator(struct bfq_data *bfqd, > struct bfq_queue *bfqq, > if (bfq_too_late_for_merging(bfqq)) > return NULL; > > - if (bfqq->new_bfqq) > + if (bfqq->new_bfqq) { > + if (bfqq->entity.parent != bfqq->new_bfqq->entity.parent && > + bfqq_group(bfqq->new_bfqq) != bfqd->root_group) { > + printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n", > __func__, > + bfqq, bfqq_group(bfqq), bfqq->new_bfqq, > + bfqq_group(bfqq->new_bfqq)); > + BUG_ON(1); > + } > return bfqq->new_bfqq; > + } > > if (!io_struct || unlikely(bfqq == &bfqd->oom_bfqq)) > return NULL; > @@ -6928,6 +6946,9 @@ static void __bfq_put_async_bfqq(struct bfq_data > *bfqd, > > bfq_log(bfqd, "put_async_bfqq: %p", bfqq); > if (bfqq) { > + printk("%s: bfqq %px move from %px to %px\n", __func__, > bfqq, > + bfqq_group(bfqq), bfqd->root_group); > + > bfq_bfqq_move(bfqd, bfqq, bfqd->root_group); > > bfq_log_bfqq(bfqd, bfqq, "put_async_bfqq: putting %p, %d", > Hi, Jan It seems to me this problem is related to your orignal patch to move all bfqqs to root during bfqg offline: bfqg ffff8881721ee000 offline: [ 51.083018] bfq_reparent_leaf_entity: bfqq ffff8881130898c0 move from ffff8881721ee000 to ffff88817cb0f000 [ 51.093889] bfq_reparent_leaf_entity: bfqq ffff8881222e2940 move from ffff8881721ee000 to ffff88817cb0f000 [ 51.094756] bfq_reparent_leaf_entity: bfqq ffff88810e6a58c0 move from ffff8881721ee000 to ffff88817cb0f000 [ 51.095626] bfq_reparent_leaf_entity: bfqq ffff888136b95600 move from ffff8881721ee000 to ffff88817cb0f000 however parent of ffff88817f8d0dc0 is still ffff8881721ee000 : [ 51.224771] bfq_setup_merge: bfqq ffff8881130898c0(ffff88817cb0f000) new_bfqq ffff88817f8d0dc0(ffff8881721ee000)
On Fri 21-01-22 19:42:11, yukuai (C) wrote: > 在 2022/01/21 18:56, Jan Kara 写道: > > Hello, > > > > here is the fifth version of my patches to fix use-after-free issues in BFQ > > when processes with merged queues get moved to different cgroups. The patches > > have survived some beating in my test VM, but so far I fail to reproduce the > > original KASAN reports so testing from people who can reproduce them is most > > welcome. Kuai, can you please give these patches a run in your setup? Thanks > > a lot for your help with fixing this! > > > > Changes since v4: > > * Even more aggressive splitting of merged bfq queues to avoid problems with > > long merge chains. > > > > Changes since v3: > > * Changed handling of bfq group move to handle the case when target of the > > merge has moved. > > > > Changes since v2: > > * Improved handling of bfq queue splitting on move between cgroups > > * Removed broken change to bfq_put_cooperator() > > > > Changes since v1: > > * Added fix for bfq_put_cooperator() > > * Added fix to handle move between cgroups in bfq_merge_bio() > > > > Honza > > Previous versions: > > Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1 > > Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2 > > Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3 > > Link: http://lore.kernel.org/r/20220114164215.28972-1-jack@suse.cz # v4 > > . > > > Hi, Jan > > I add a new BUG_ON() in bfq_setup_merge() while iterating new_bfqq, and > this time this BUG_ON() is triggered: Thanks for testing! > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c > index 07be51bc229b..6d4e243c9a1e 100644 > --- a/block/bfq-iosched.c > +++ b/block/bfq-iosched.c > @@ -2753,6 +2753,14 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct > bfq_queue *new_bfqq) > while ((__bfqq = new_bfqq->new_bfqq)) { > if (__bfqq == bfqq) > return NULL; > + if (new_bfqq->entity.parent != __bfqq->entity.parent && > + bfqq_group(__bfqq) != __bfqq->bfqd->root_group) { > + printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n", > __func__, > + new_bfqq, bfqq_group(new_bfqq), __bfqq, > + bfqq_group(__bfqq)); > + BUG_ON(1); This seems to be too early to check and BUG_ON(). Yes, we can walk through and even end up with a bfqq with a different parent however in that case we refuse to setup merge a few lines below and so there is no problem. Are you still able to reproduce the use-after-free issue with this version of my patches? Honza
[sending once again as I forgot to snip debug log at the end and mail got bounced by vger mail server] On Tue 25-01-22 16:23:28, yukuai (C) wrote: > 在 2022/01/24 22:02, Jan Kara 写道: > > On Fri 21-01-22 19:42:11, yukuai (C) wrote: > > > 在 2022/01/21 18:56, Jan Kara 写道: > > > > Hello, > > > > > > > > here is the fifth version of my patches to fix use-after-free issues in BFQ > > > > when processes with merged queues get moved to different cgroups. The patches > > > > have survived some beating in my test VM, but so far I fail to reproduce the > > > > original KASAN reports so testing from people who can reproduce them is most > > > > welcome. Kuai, can you please give these patches a run in your setup? Thanks > > > > a lot for your help with fixing this! > > > > > > > > Changes since v4: > > > > * Even more aggressive splitting of merged bfq queues to avoid problems with > > > > long merge chains. > > > > > > > > Changes since v3: > > > > * Changed handling of bfq group move to handle the case when target of the > > > > merge has moved. > > > > > > > > Changes since v2: > > > > * Improved handling of bfq queue splitting on move between cgroups > > > > * Removed broken change to bfq_put_cooperator() > > > > > > > > Changes since v1: > > > > * Added fix for bfq_put_cooperator() > > > > * Added fix to handle move between cgroups in bfq_merge_bio() > > > > > > > > Honza > > > > Previous versions: > > > > Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1 > > > > Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2 > > > > Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3 > > > > Link: http://lore.kernel.org/r/20220114164215.28972-1-jack@suse.cz # v4 > > > > . > > > > > > > Hi, Jan > > > > > > I add a new BUG_ON() in bfq_setup_merge() while iterating new_bfqq, and > > > this time this BUG_ON() is triggered: > > > > Thanks for testing! > > > > > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c > > > index 07be51bc229b..6d4e243c9a1e 100644 > > > --- a/block/bfq-iosched.c > > > +++ b/block/bfq-iosched.c > > > @@ -2753,6 +2753,14 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct > > > bfq_queue *new_bfqq) > > > while ((__bfqq = new_bfqq->new_bfqq)) { > > > if (__bfqq == bfqq) > > > return NULL; > > > + if (new_bfqq->entity.parent != __bfqq->entity.parent && > > > + bfqq_group(__bfqq) != __bfqq->bfqd->root_group) { > > > + printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n", > > > __func__, > > > + new_bfqq, bfqq_group(new_bfqq), __bfqq, > > > + bfqq_group(__bfqq)); > > > + BUG_ON(1); > > > > This seems to be too early to check and BUG_ON(). Yes, we can walk through > > and even end up with a bfqq with a different parent however in that case we > > refuse to setup merge a few lines below and so there is no problem. > > > > Are you still able to reproduce the use-after-free issue with this version > > of my patches? > > > > Honza > > > Hi, Jan > > I add following additional debug info: > > @ -926,6 +935,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd) > if (!entity) /* root group */ > goto put_async_queues; > > + printk("%s: bfqg %px offlined\n", __func__, bfqg); > /* > * Empty all service_trees belonging to this group before > * deactivating the group itself. > @@ -965,6 +975,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd) > > put_async_queues: > bfq_put_async_queues(bfqd, bfqg); > + pd->plid = BLKCG_MAX_POLS; > > spin_unlock_irqrestore(&bfqd->lock, flags); > /* > > @@ -6039,6 +6050,13 @@ static bool __bfq_insert_request(struct bfq_data > *bfqd, struct request *rq) > *new_bfqq = bfq_setup_cooperator(bfqd, bfqq, rq, true, > RQ_BIC(rq)); > bool waiting, idle_timer_disabled = false; > + if (new_bfqq) { > + printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n", __func__, > + bfqq, bfqq_group(bfqq), new_bfqq, > bfqq_group(new_bfqq)); > + } else { > + printk("%s: bfqq %px(%px) new_bfqq null \n", __func__, > + bfqq, bfqq_group(bfqq)); > + } > > @@ -1696,6 +1696,11 @@ void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct > bfq_queue *bfqq) > > bfq_activate_bfqq(bfqd, bfqq); > > + if (bfqq->entity.parent && bfqq_group(bfqq)->pd.plid >= > BLKCG_MAX_POLS) { > + printk("%s: bfqq %px(%px) with parent offlined\n", __func__, > + bfqq, bfqq_group(bfqq)); > + BUG_ON(1); > + } > > And found that the uaf is triggered when bfq_setup_cooperator return > NULL, that's why the BUG_ON() in bfq_setup_cooperator() is not > triggered: > > [ 51.833290] __bfq_insert_request: bfqq ffff888106913700(ffff888107d67000) > new_bfqq null > [ 51.834762] bfq_add_bfqq_busy: bfqq ffff888106913700(ffff888107d67000) > with parent offlined > > The new_bfqq chain relate to bfqq ffff888106913700: > > t1: ffff8881114e9600 ------> t4: ffff888106913700 ------> t5: > ffff88810719e3c0 > | > t2: ffff888106913440 ---- > | > t3: ffff8881114e98c0 ---- > > I'm still not sure about the root cause, hope these debuginfo > can be helpful Thanks for debugging! I was looking into this but I also do not understand how what your tracing shows can happen. In particular I don't see why there is no __bfq_bic_change_cgroup() call from bfq_insert_request() -> bfq_init_rq() for the problematic __bfq_insert_request() into ffff888106913700. I have two possible explanations. Either bio is submitted to the offlined cgroup ffff888107d67000 or bic->blkcg_serial_nr is pointing to different cgroup than bic_to_bfqq(bic, 1)->entity.parent. So can you extented the debugging a bit like: 1) Add current->pid to all debug messages so that we can distinguish different processes and see which already detached from the bfqq and which not. 2) Print bic->blkcg_serial_nr and __bio_blkcg(bio)->css.serial_nr before crashing in bfq_add_bfqq_busy(). 3) Add BUG_ON to bic_set_bfqq() like: if (bfqq_group(bfqq)->css.serial_nr != bic->blkcg_serial_nr) { printk("%s: bfqq %px(%px) serial %d bic serial %d\n", bfqq, bfqq_group(bfqq), bfqq_group(bfqq)->css.serial_nr, bic->blkcg_serial_nr); BUG_ON(1); } and perhaps this scheds more light on the problem... Thanks! Honza
在 2022/02/03 5:53, Jan Kara 写道: > [sending once again as I forgot to snip debug log at the end and mail got > bounced by vger mail server] > > On Tue 25-01-22 16:23:28, yukuai (C) wrote: >> 在 2022/01/24 22:02, Jan Kara 写道: >>> On Fri 21-01-22 19:42:11, yukuai (C) wrote: >>>> 在 2022/01/21 18:56, Jan Kara 写道: >>>>> Hello, >>>>> >>>>> here is the fifth version of my patches to fix use-after-free issues in BFQ >>>>> when processes with merged queues get moved to different cgroups. The patches >>>>> have survived some beating in my test VM, but so far I fail to reproduce the >>>>> original KASAN reports so testing from people who can reproduce them is most >>>>> welcome. Kuai, can you please give these patches a run in your setup? Thanks >>>>> a lot for your help with fixing this! >>>>> >>>>> Changes since v4: >>>>> * Even more aggressive splitting of merged bfq queues to avoid problems with >>>>> long merge chains. >>>>> >>>>> Changes since v3: >>>>> * Changed handling of bfq group move to handle the case when target of the >>>>> merge has moved. >>>>> >>>>> Changes since v2: >>>>> * Improved handling of bfq queue splitting on move between cgroups >>>>> * Removed broken change to bfq_put_cooperator() >>>>> >>>>> Changes since v1: >>>>> * Added fix for bfq_put_cooperator() >>>>> * Added fix to handle move between cgroups in bfq_merge_bio() >>>>> >>>>> Honza >>>>> Previous versions: >>>>> Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1 >>>>> Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2 >>>>> Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3 >>>>> Link: http://lore.kernel.org/r/20220114164215.28972-1-jack@suse.cz # v4 >>>>> . >>>>> >>>> Hi, Jan >>>> >>>> I add a new BUG_ON() in bfq_setup_merge() while iterating new_bfqq, and >>>> this time this BUG_ON() is triggered: >>> >>> Thanks for testing! >>> >>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c >>>> index 07be51bc229b..6d4e243c9a1e 100644 >>>> --- a/block/bfq-iosched.c >>>> +++ b/block/bfq-iosched.c >>>> @@ -2753,6 +2753,14 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct >>>> bfq_queue *new_bfqq) >>>> while ((__bfqq = new_bfqq->new_bfqq)) { >>>> if (__bfqq == bfqq) >>>> return NULL; >>>> + if (new_bfqq->entity.parent != __bfqq->entity.parent && >>>> + bfqq_group(__bfqq) != __bfqq->bfqd->root_group) { >>>> + printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n", >>>> __func__, >>>> + new_bfqq, bfqq_group(new_bfqq), __bfqq, >>>> + bfqq_group(__bfqq)); >>>> + BUG_ON(1); >>> >>> This seems to be too early to check and BUG_ON(). Yes, we can walk through >>> and even end up with a bfqq with a different parent however in that case we >>> refuse to setup merge a few lines below and so there is no problem. >>> >>> Are you still able to reproduce the use-after-free issue with this version >>> of my patches? >>> >>> Honza >>> >> Hi, Jan >> >> I add following additional debug info: >> >> @ -926,6 +935,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd) >> if (!entity) /* root group */ >> goto put_async_queues; >> >> + printk("%s: bfqg %px offlined\n", __func__, bfqg); >> /* >> * Empty all service_trees belonging to this group before >> * deactivating the group itself. >> @@ -965,6 +975,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd) >> >> put_async_queues: >> bfq_put_async_queues(bfqd, bfqg); >> + pd->plid = BLKCG_MAX_POLS; >> >> spin_unlock_irqrestore(&bfqd->lock, flags); >> /* >> >> @@ -6039,6 +6050,13 @@ static bool __bfq_insert_request(struct bfq_data >> *bfqd, struct request *rq) >> *new_bfqq = bfq_setup_cooperator(bfqd, bfqq, rq, true, >> RQ_BIC(rq)); >> bool waiting, idle_timer_disabled = false; >> + if (new_bfqq) { >> + printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n", __func__, >> + bfqq, bfqq_group(bfqq), new_bfqq, >> bfqq_group(new_bfqq)); >> + } else { >> + printk("%s: bfqq %px(%px) new_bfqq null \n", __func__, >> + bfqq, bfqq_group(bfqq)); >> + } >> >> @@ -1696,6 +1696,11 @@ void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct >> bfq_queue *bfqq) >> >> bfq_activate_bfqq(bfqd, bfqq); >> >> + if (bfqq->entity.parent && bfqq_group(bfqq)->pd.plid >= >> BLKCG_MAX_POLS) { >> + printk("%s: bfqq %px(%px) with parent offlined\n", __func__, >> + bfqq, bfqq_group(bfqq)); >> + BUG_ON(1); >> + } >> >> And found that the uaf is triggered when bfq_setup_cooperator return >> NULL, that's why the BUG_ON() in bfq_setup_cooperator() is not >> triggered: >> >> [ 51.833290] __bfq_insert_request: bfqq ffff888106913700(ffff888107d67000) >> new_bfqq null >> [ 51.834762] bfq_add_bfqq_busy: bfqq ffff888106913700(ffff888107d67000) >> with parent offlined >> >> The new_bfqq chain relate to bfqq ffff888106913700: >> >> t1: ffff8881114e9600 ------> t4: ffff888106913700 ------> t5: >> ffff88810719e3c0 >> | >> t2: ffff888106913440 ---- >> | >> t3: ffff8881114e98c0 ---- >> >> I'm still not sure about the root cause, hope these debuginfo >> can be helpful > > Thanks for debugging! I was looking into this but I also do not understand > how what your tracing shows can happen. In particular I don't see why there > is no __bfq_bic_change_cgroup() call from bfq_insert_request() -> > bfq_init_rq() for the problematic __bfq_insert_request() into > ffff888106913700. I have two possible explanations. Either bio is submitted > to the offlined cgroup ffff888107d67000 or bic->blkcg_serial_nr is pointing > to different cgroup than bic_to_bfqq(bic, 1)->entity.parent. > > So can you extented the debugging a bit like: > 1) Add current->pid to all debug messages so that we can distinguish > different processes and see which already detached from the bfqq and which > not. > > 2) Print bic->blkcg_serial_nr and __bio_blkcg(bio)->css.serial_nr before > crashing in bfq_add_bfqq_busy(). > > 3) Add BUG_ON to bic_set_bfqq() like: > if (bfqq_group(bfqq)->css.serial_nr != bic->blkcg_serial_nr) { > printk("%s: bfqq %px(%px) serial %d bic serial %d\n", bfqq, > bfqq_group(bfqq), bfqq_group(bfqq)->css.serial_nr, > bic->blkcg_serial_nr); > BUG_ON(1); > } > > and perhaps this scheds more light on the problem... Thanks! > > Honza > Hi, Jan Sorry about the delay, I'm on vacation for the Spring Festival. I'll try the debugging soon. Thanks, Kuai
On Tue 08-02-22 21:19:14, yukuai (C) wrote: > 在 2022/02/03 5:53, Jan Kara 写道: > > Thanks for debugging! I was looking into this but I also do not understand > > how what your tracing shows can happen. In particular I don't see why there > > is no __bfq_bic_change_cgroup() call from bfq_insert_request() -> > > bfq_init_rq() for the problematic __bfq_insert_request() into > > ffff888106913700. I have two possible explanations. Either bio is submitted > > to the offlined cgroup ffff888107d67000 or bic->blkcg_serial_nr is pointing > > to different cgroup than bic_to_bfqq(bic, 1)->entity.parent. > > > > So can you extented the debugging a bit like: > > 1) Add current->pid to all debug messages so that we can distinguish > > different processes and see which already detached from the bfqq and which > > not. > > > > 2) Print bic->blkcg_serial_nr and __bio_blkcg(bio)->css.serial_nr before > > crashing in bfq_add_bfqq_busy(). > > > > 3) Add BUG_ON to bic_set_bfqq() like: > > if (bfqq_group(bfqq)->css.serial_nr != bic->blkcg_serial_nr) { > > printk("%s: bfqq %px(%px) serial %d bic serial %d\n", bfqq, > > bfqq_group(bfqq), bfqq_group(bfqq)->css.serial_nr, > > bic->blkcg_serial_nr); > > BUG_ON(1); > > } > > > > and perhaps this scheds more light on the problem... Thanks! > > > > Honza > > > > The debuging patch and log is attached. > > bic_set_bfqq found serial mismatch serval times, bfqq_group(bfqq)->css > doesn't exist, is the following debugging what you expected? > > @@ -386,6 +386,13 @@ static void bfq_put_stable_ref(struct bfq_queue *bfqq); > > void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool > is_sync) > { > + if (bfqq && bfqq_group(bfqq)->pd.blkg->blkcg->css.serial_nr != > + bic->blkcg_serial_nr) { > + bfqq_dbg(bfqq, "serial %lld bic serial %lld\n", > + bfqq_group(bfqq)->pd.blkg->blkcg->css.serial_nr, > + bic->blkcg_serial_nr); > + WARN_ON_ONCE(1); > + } Yes, this is what I wanted. Thanks! > > The problematic bfqq ffff8881682581f0 doesn't merge with any bfqq, > which is weird. > > The pid from __bfq_insert_request() is changed for the problematic > bfqq, and each time the pid is changed, there is a bic_set_bfqq() > shows that serial mismatch. I had a look into debug data and now I think I understand both the WARN_ON hit in bic_set_bfqq() as well as the final BUG_ON in bfq_add_bfqq_busy(). The first problem is apparently hit because __bio_blkcg() can change while we are processing the bio. So bfq_bic_update_cgroup() sees different __bio_blkcg() than bfq_get_queue() called from bfq_get_bfqq_handle_split(). This then causes mismatch between what bic & bfqq think about cgroup membership which can lead to interesting inconsistencies down the road. The second problem is hit because clearly __bio_blkcg() can be pointing to a blkcg that has been already offlined. Previously I didn't think this was possible but apparently there is nothing that would prevent this race. So we need to handle this gracefully inside BFQ. I need to think what would be best fixes for these issues since especially the second one is tricky. Honza
在 2022/02/10 1:40, Jan Kara 写道: > > I had a look into debug data and now I think I understand both the WARN_ON > hit in bic_set_bfqq() as well as the final BUG_ON in bfq_add_bfqq_busy(). > > The first problem is apparently hit because __bio_blkcg() can change while > we are processing the bio. So bfq_bic_update_cgroup() sees different > __bio_blkcg() than bfq_get_queue() called from bfq_get_bfqq_handle_split(). > This then causes mismatch between what bic & bfqq think about cgroup > membership which can lead to interesting inconsistencies down the road. > > The second problem is hit because clearly __bio_blkcg() can be pointing to > a blkcg that has been already offlined. Previously I didn't think this was > possible but apparently there is nothing that would prevent this race. So > we need to handle this gracefully inside BFQ. > > I need to think what would be best fixes for these issues since especially > the second one is tricky. Hi, Jan Just to be curiosity, do you have any ideas on how to fix these issues? Thanks, Kuai > > Honza >
On Fri 11-03-22 14:39:10, yukuai (C) wrote: > 在 2022/02/10 1:40, Jan Kara 写道: > > > > I had a look into debug data and now I think I understand both the WARN_ON > > hit in bic_set_bfqq() as well as the final BUG_ON in bfq_add_bfqq_busy(). > > > > The first problem is apparently hit because __bio_blkcg() can change while > > we are processing the bio. So bfq_bic_update_cgroup() sees different > > __bio_blkcg() than bfq_get_queue() called from bfq_get_bfqq_handle_split(). > > This then causes mismatch between what bic & bfqq think about cgroup > > membership which can lead to interesting inconsistencies down the road. > > > > The second problem is hit because clearly __bio_blkcg() can be pointing to > > a blkcg that has been already offlined. Previously I didn't think this was > > possible but apparently there is nothing that would prevent this race. So > > we need to handle this gracefully inside BFQ. > > > > I need to think what would be best fixes for these issues since especially > > the second one is tricky. > > Hi, Jan > > Just to be curiosity, do you have any ideas on how to fix these issues? Sorry for the delay. I was on vacation and then busy with other stuff. I have some version of the fixes ready but I want to clean them up a bit before posting. It shouldn't take me long... Honza