[0/4,v5] bfq: Avoid use-after-free when moving processes between cgroups

Message ID	20220121105503.14069-1-jack@suse.cz (mailing list archive)
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Jan Kara <jack@suse.cz> To: <linux-block@vger.kernel.org> Cc: Paolo Valente <paolo.valente@linaro.org>, Jens Axboe <axboe@kernel.dk>, "yukuai (C)" <yukuai3@huawei.com>, Jan Kara <jack@suse.cz> Subject: [PATCH 0/4 v5] bfq: Avoid use-after-free when moving processes between cgroups Date: Fri, 21 Jan 2022 11:56:41 +0100 Message-Id: <20220121105503.14069-1-jack@suse.cz> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	bfq: Avoid use-after-free when moving processes between cgroups \| expand [0/4,v5] bfq: Avoid use-after-free when moving processes between cgroups [1/4] bfq: Avoid false marking of bic as stably merged [2/4] bfq: Avoid merging queues with different parents [3/4] bfq: Split shared queues on move between cgroups [4/4] bfq: Update cgroup information before merging bio

Jan Kara Jan. 21, 2022, 10:56 a.m. UTC

Hello,

here is the fifth version of my patches to fix use-after-free issues in BFQ
when processes with merged queues get moved to different cgroups. The patches
have survived some beating in my test VM, but so far I fail to reproduce the
original KASAN reports so testing from people who can reproduce them is most
welcome. Kuai, can you please give these patches a run in your setup? Thanks
a lot for your help with fixing this!

Changes since v4:
* Even more aggressive splitting of merged bfq queues to avoid problems with
  long merge chains.

Changes since v3:
* Changed handling of bfq group move to handle the case when target of the
  merge has moved.

Changes since v2:
* Improved handling of bfq queue splitting on move between cgroups
* Removed broken change to bfq_put_cooperator()

Changes since v1:
* Added fix for bfq_put_cooperator()
* Added fix to handle move between cgroups in bfq_merge_bio()

								Honza
Previous versions:
Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2
Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3
Link: http://lore.kernel.org/r/20220114164215.28972-1-jack@suse.cz # v4

Yu Kuai Jan. 21, 2022, 12:28 p.m. UTC | #1

在 2022/01/21 19:42, yukuai (C) 写道:
> 在 2022/01/21 18:56, Jan Kara 写道:
>> Hello,
>>
>> here is the fifth version of my patches to fix use-after-free issues 
>> in BFQ
>> when processes with merged queues get moved to different cgroups. The 
>> patches
>> have survived some beating in my test VM, but so far I fail to 
>> reproduce the
>> original KASAN reports so testing from people who can reproduce them 
>> is most
>> welcome. Kuai, can you please give these patches a run in your setup? 
>> Thanks
>> a lot for your help with fixing this!
>>
>> Changes since v4:
>> * Even more aggressive splitting of merged bfq queues to avoid 
>> problems with
>>    long merge chains.
>>
>> Changes since v3:
>> * Changed handling of bfq group move to handle the case when target of 
>> the
>>    merge has moved.
>>
>> Changes since v2:
>> * Improved handling of bfq queue splitting on move between cgroups
>> * Removed broken change to bfq_put_cooperator()
>>
>> Changes since v1:
>> * Added fix for bfq_put_cooperator()
>> * Added fix to handle move between cgroups in bfq_merge_bio()
>>
>>                                 Honza
>> Previous versions:
>> Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
>> Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2
>> Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3
>> Link: http://lore.kernel.org/r/20220114164215.28972-1-jack@suse.cz # v4
>> .
>>
> Hi, Jan
> 
> I add a new BUG_ON() in bfq_setup_merge() while iterating new_bfqq, and
> this time this BUG_ON() is triggered:
> 
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index 00184530c644..4b17eb4a29bc 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -731,8 +731,12 @@ static struct bfq_group 
> *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
>          if (sync_bfqq) {
>                  if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
>                          /* We are the only user of this bfqq, just move 
> it */
> -                       if (sync_bfqq->entity.sched_data != 
> &bfqg->sched_data)
> +                       if (sync_bfqq->entity.sched_data != 
> &bfqg->sched_data) {
> +                               printk("%s: bfqq %px move from %px to 
> %px\n",
> +                                      __func__, sync_bfqq,
> +                                      bfqq_group(sync_bfqq), bfqg);
>                                  bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
> +                       }
>                  } else {
>                          struct bfq_queue *bfqq;
> 
> @@ -741,10 +745,13 @@ static struct bfq_group 
> *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
>                           * that the merge chain still belongs to the same
>                           * cgroup.
>                           */
> -                       for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
> +                       for (bfqq = sync_bfqq; bfqq; bfqq = 
> bfqq->new_bfqq) {
> +                               printk("%s: bfqq %px(%px) bfqg %px\n", 
> __func__,
> +                                       bfqq, bfqq_group(bfqq), bfqg);
>                                  if (bfqq->entity.sched_data !=
>                                      &bfqg->sched_data)
>                                          break;
> +                       }
>                          if (bfqq) {
>                                  /*
>                                   * Some queue changed cgroup so the 
> merge is
> @@ -878,6 +885,8 @@ static void bfq_reparent_leaf_entity(struct bfq_data 
> *bfqd,
>          }
> 
>          bfqq = bfq_entity_to_bfqq(child_entity);
> +       printk("%s: bfqq %px move from %px to %px\n",  __func__, bfqq,
> +               bfqq_group(bfqq), bfqd->root_group);
>          bfq_bfqq_move(bfqd, bfqq, bfqd->root_group);
>   }
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 07be51bc229b..6d4e243c9a1e 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -2753,6 +2753,14 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct 
> bfq_queue *new_bfqq)
>          while ((__bfqq = new_bfqq->new_bfqq)) {
>                  if (__bfqq == bfqq)
>                          return NULL;
> +               if (new_bfqq->entity.parent != __bfqq->entity.parent &&
> +                   bfqq_group(__bfqq) != __bfqq->bfqd->root_group) {
> +                       printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n", 
> __func__,
> +                               new_bfqq, bfqq_group(new_bfqq), __bfqq,
> +                               bfqq_group(__bfqq));
> +                       BUG_ON(1);
> +               }
> +
>                  new_bfqq = __bfqq;
>          }
> 
> @@ -2797,6 +2805,8 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct 
> bfq_queue *new_bfqq)
>           * are likely to increase the throughput.
>           */
>          bfqq->new_bfqq = new_bfqq;
> +        printk("%s: set bfqq %px(%px) new_bfqq %px(%px)\n", __func__, 
> bfqq,
> +               bfqq_group(bfqq), new_bfqq, bfqq_group(new_bfqq));
>          new_bfqq->ref += process_refs;
>          return new_bfqq;
>   }
> @@ -2963,8 +2973,16 @@ bfq_setup_cooperator(struct bfq_data *bfqd, 
> struct bfq_queue *bfqq,
>          if (bfq_too_late_for_merging(bfqq))
>                  return NULL;
> 
> -       if (bfqq->new_bfqq)
> +       if (bfqq->new_bfqq) {
> +               if (bfqq->entity.parent != bfqq->new_bfqq->entity.parent &&
> +                   bfqq_group(bfqq->new_bfqq) != bfqd->root_group) {
> +                       printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n", 
> __func__,
> +                               bfqq, bfqq_group(bfqq), bfqq->new_bfqq,
> +                               bfqq_group(bfqq->new_bfqq));
> +                       BUG_ON(1);
> +               }
>                  return bfqq->new_bfqq;
> +       }
> 
>          if (!io_struct || unlikely(bfqq == &bfqd->oom_bfqq))
>                  return NULL;
> @@ -6928,6 +6946,9 @@ static void __bfq_put_async_bfqq(struct bfq_data 
> *bfqd,
> 
>          bfq_log(bfqd, "put_async_bfqq: %p", bfqq);
>          if (bfqq) {
> +               printk("%s: bfqq %px move from %px to %px\n",  __func__, 
> bfqq,
> +                       bfqq_group(bfqq), bfqd->root_group);
> +
>                  bfq_bfqq_move(bfqd, bfqq, bfqd->root_group);
> 
>                  bfq_log_bfqq(bfqd, bfqq, "put_async_bfqq: putting %p, %d",
> 
Hi, Jan

It seems to me this problem is related to your orignal patch to move all
bfqqs to root during bfqg offline:

bfqg ffff8881721ee000  offline:
[   51.083018] bfq_reparent_leaf_entity: bfqq ffff8881130898c0 move from 
ffff8881721ee000 to ffff88817cb0f000
[   51.093889] bfq_reparent_leaf_entity: bfqq ffff8881222e2940 move from 
ffff8881721ee000 to ffff88817cb0f000
[   51.094756] bfq_reparent_leaf_entity: bfqq ffff88810e6a58c0 move from 
ffff8881721ee000 to ffff88817cb0f000
[   51.095626] bfq_reparent_leaf_entity: bfqq ffff888136b95600 move from 
ffff8881721ee000 to ffff88817cb0f000

however parent of ffff88817f8d0dc0 is still ffff8881721ee000 :
[   51.224771] bfq_setup_merge: bfqq ffff8881130898c0(ffff88817cb0f000) 
new_bfqq ffff88817f8d0dc0(ffff8881721ee000)

Jan Kara Jan. 24, 2022, 2:02 p.m. UTC | #2

On Fri 21-01-22 19:42:11, yukuai (C) wrote:
> 在 2022/01/21 18:56, Jan Kara 写道:
> > Hello,
> > 
> > here is the fifth version of my patches to fix use-after-free issues in BFQ
> > when processes with merged queues get moved to different cgroups. The patches
> > have survived some beating in my test VM, but so far I fail to reproduce the
> > original KASAN reports so testing from people who can reproduce them is most
> > welcome. Kuai, can you please give these patches a run in your setup? Thanks
> > a lot for your help with fixing this!
> > 
> > Changes since v4:
> > * Even more aggressive splitting of merged bfq queues to avoid problems with
> >    long merge chains.
> > 
> > Changes since v3:
> > * Changed handling of bfq group move to handle the case when target of the
> >    merge has moved.
> > 
> > Changes since v2:
> > * Improved handling of bfq queue splitting on move between cgroups
> > * Removed broken change to bfq_put_cooperator()
> > 
> > Changes since v1:
> > * Added fix for bfq_put_cooperator()
> > * Added fix to handle move between cgroups in bfq_merge_bio()
> > 
> > 								Honza
> > Previous versions:
> > Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
> > Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2
> > Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3
> > Link: http://lore.kernel.org/r/20220114164215.28972-1-jack@suse.cz # v4
> > .
> > 
> Hi, Jan
> 
> I add a new BUG_ON() in bfq_setup_merge() while iterating new_bfqq, and
> this time this BUG_ON() is triggered:

Thanks for testing!

> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 07be51bc229b..6d4e243c9a1e 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -2753,6 +2753,14 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct
> bfq_queue *new_bfqq)
>         while ((__bfqq = new_bfqq->new_bfqq)) {
>                 if (__bfqq == bfqq)
>                         return NULL;
> +               if (new_bfqq->entity.parent != __bfqq->entity.parent &&
> +                   bfqq_group(__bfqq) != __bfqq->bfqd->root_group) {
> +                       printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n",
> __func__,
> +                               new_bfqq, bfqq_group(new_bfqq), __bfqq,
> +                               bfqq_group(__bfqq));
> +                       BUG_ON(1);

This seems to be too early to check and BUG_ON(). Yes, we can walk through
and even end up with a bfqq with a different parent however in that case we
refuse to setup merge a few lines below and so there is no problem.

Are you still able to reproduce the use-after-free issue with this version
of my patches?

								Honza

Jan Kara Feb. 2, 2022, 9:53 p.m. UTC | #3

[sending once again as I forgot to snip debug log at the end and mail got
bounced by vger mail server]

On Tue 25-01-22 16:23:28, yukuai (C) wrote:
> 在 2022/01/24 22:02, Jan Kara 写道:
> > On Fri 21-01-22 19:42:11, yukuai (C) wrote:
> > > 在 2022/01/21 18:56, Jan Kara 写道:
> > > > Hello,
> > > > 
> > > > here is the fifth version of my patches to fix use-after-free issues in BFQ
> > > > when processes with merged queues get moved to different cgroups. The patches
> > > > have survived some beating in my test VM, but so far I fail to reproduce the
> > > > original KASAN reports so testing from people who can reproduce them is most
> > > > welcome. Kuai, can you please give these patches a run in your setup? Thanks
> > > > a lot for your help with fixing this!
> > > > 
> > > > Changes since v4:
> > > > * Even more aggressive splitting of merged bfq queues to avoid problems with
> > > >     long merge chains.
> > > > 
> > > > Changes since v3:
> > > > * Changed handling of bfq group move to handle the case when target of the
> > > >     merge has moved.
> > > > 
> > > > Changes since v2:
> > > > * Improved handling of bfq queue splitting on move between cgroups
> > > > * Removed broken change to bfq_put_cooperator()
> > > > 
> > > > Changes since v1:
> > > > * Added fix for bfq_put_cooperator()
> > > > * Added fix to handle move between cgroups in bfq_merge_bio()
> > > > 
> > > > 								Honza
> > > > Previous versions:
> > > > Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
> > > > Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2
> > > > Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3
> > > > Link: http://lore.kernel.org/r/20220114164215.28972-1-jack@suse.cz # v4
> > > > .
> > > > 
> > > Hi, Jan
> > > 
> > > I add a new BUG_ON() in bfq_setup_merge() while iterating new_bfqq, and
> > > this time this BUG_ON() is triggered:
> > 
> > Thanks for testing!
> > 
> > > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> > > index 07be51bc229b..6d4e243c9a1e 100644
> > > --- a/block/bfq-iosched.c
> > > +++ b/block/bfq-iosched.c
> > > @@ -2753,6 +2753,14 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct
> > > bfq_queue *new_bfqq)
> > >          while ((__bfqq = new_bfqq->new_bfqq)) {
> > >                  if (__bfqq == bfqq)
> > >                          return NULL;
> > > +               if (new_bfqq->entity.parent != __bfqq->entity.parent &&
> > > +                   bfqq_group(__bfqq) != __bfqq->bfqd->root_group) {
> > > +                       printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n",
> > > __func__,
> > > +                               new_bfqq, bfqq_group(new_bfqq), __bfqq,
> > > +                               bfqq_group(__bfqq));
> > > +                       BUG_ON(1);
> > 
> > This seems to be too early to check and BUG_ON(). Yes, we can walk through
> > and even end up with a bfqq with a different parent however in that case we
> > refuse to setup merge a few lines below and so there is no problem.
> > 
> > Are you still able to reproduce the use-after-free issue with this version
> > of my patches?
> > 
> > 								Honza
> > 
> Hi, Jan
> 
> I add following additional debug info:
> 
> @ -926,6 +935,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
>         if (!entity) /* root group */
>                 goto put_async_queues;
> 
> +       printk("%s: bfqg %px offlined\n", __func__, bfqg);
>         /*
>          * Empty all service_trees belonging to this group before
>          * deactivating the group itself.
> @@ -965,6 +975,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
> 
>  put_async_queues:
>         bfq_put_async_queues(bfqd, bfqg);
> +       pd->plid = BLKCG_MAX_POLS;
> 
>         spin_unlock_irqrestore(&bfqd->lock, flags);
>         /*
> 
> @@ -6039,6 +6050,13 @@ static bool __bfq_insert_request(struct bfq_data
> *bfqd, struct request *rq)
>                 *new_bfqq = bfq_setup_cooperator(bfqd, bfqq, rq, true,
>                                                  RQ_BIC(rq));
>         bool waiting, idle_timer_disabled = false;
> +       if (new_bfqq) {
> +               printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n", __func__,
> +                       bfqq, bfqq_group(bfqq), new_bfqq,
> bfqq_group(new_bfqq));
> +       } else {
> +               printk("%s: bfqq %px(%px) new_bfqq null \n", __func__,
> +                       bfqq, bfqq_group(bfqq));
> +       }
> 
> @@ -1696,6 +1696,11 @@ void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct
> bfq_queue *bfqq)
> 
>         bfq_activate_bfqq(bfqd, bfqq);
> 
> +       if (bfqq->entity.parent && bfqq_group(bfqq)->pd.plid >=
> BLKCG_MAX_POLS) {
> +               printk("%s: bfqq %px(%px) with parent offlined\n", __func__,
> +                               bfqq, bfqq_group(bfqq));
> +               BUG_ON(1);
> +       }
> 
> And found that the uaf is triggered when bfq_setup_cooperator return
> NULL, that's why the BUG_ON() in bfq_setup_cooperator() is not
> triggered:
> 
> [   51.833290] __bfq_insert_request: bfqq ffff888106913700(ffff888107d67000)
> new_bfqq null
> [   51.834762] bfq_add_bfqq_busy: bfqq ffff888106913700(ffff888107d67000)
> with parent offlined
> 
> The new_bfqq chain relate to bfqq ffff888106913700:
> 
> t1: ffff8881114e9600 ------> t4: ffff888106913700 ------> t5:
> ffff88810719e3c0
>                         |
> t2: ffff888106913440 ----
>                         |
> t3: ffff8881114e98c0 ----
> 
> I'm still not sure about the root cause, hope these debuginfo
> can be helpful

Thanks for debugging! I was looking into this but I also do not understand
how what your tracing shows can happen. In particular I don't see why there
is no __bfq_bic_change_cgroup() call from bfq_insert_request() ->
bfq_init_rq() for the problematic __bfq_insert_request() into
ffff888106913700. I have two possible explanations. Either bio is submitted
to the offlined cgroup ffff888107d67000 or bic->blkcg_serial_nr is pointing
to different cgroup than bic_to_bfqq(bic, 1)->entity.parent.

So can you extented the debugging a bit like:
1) Add current->pid to all debug messages so that we can distinguish
different processes and see which already detached from the bfqq and which
not.

2) Print bic->blkcg_serial_nr and __bio_blkcg(bio)->css.serial_nr before
crashing in bfq_add_bfqq_busy().

3) Add BUG_ON to bic_set_bfqq() like:
	if (bfqq_group(bfqq)->css.serial_nr != bic->blkcg_serial_nr) {
		printk("%s: bfqq %px(%px) serial %d bic serial %d\n", bfqq,
			bfqq_group(bfqq), bfqq_group(bfqq)->css.serial_nr,
			bic->blkcg_serial_nr);
		BUG_ON(1);
	}

and perhaps this scheds more light on the problem... Thanks!

								Honza

Yu Kuai Feb. 8, 2022, 3:56 a.m. UTC | #4

在 2022/02/03 5:53, Jan Kara 写道:
> [sending once again as I forgot to snip debug log at the end and mail got
> bounced by vger mail server]
> 
> On Tue 25-01-22 16:23:28, yukuai (C) wrote:
>> 在 2022/01/24 22:02, Jan Kara 写道:
>>> On Fri 21-01-22 19:42:11, yukuai (C) wrote:
>>>> 在 2022/01/21 18:56, Jan Kara 写道:
>>>>> Hello,
>>>>>
>>>>> here is the fifth version of my patches to fix use-after-free issues in BFQ
>>>>> when processes with merged queues get moved to different cgroups. The patches
>>>>> have survived some beating in my test VM, but so far I fail to reproduce the
>>>>> original KASAN reports so testing from people who can reproduce them is most
>>>>> welcome. Kuai, can you please give these patches a run in your setup? Thanks
>>>>> a lot for your help with fixing this!
>>>>>
>>>>> Changes since v4:
>>>>> * Even more aggressive splitting of merged bfq queues to avoid problems with
>>>>>      long merge chains.
>>>>>
>>>>> Changes since v3:
>>>>> * Changed handling of bfq group move to handle the case when target of the
>>>>>      merge has moved.
>>>>>
>>>>> Changes since v2:
>>>>> * Improved handling of bfq queue splitting on move between cgroups
>>>>> * Removed broken change to bfq_put_cooperator()
>>>>>
>>>>> Changes since v1:
>>>>> * Added fix for bfq_put_cooperator()
>>>>> * Added fix to handle move between cgroups in bfq_merge_bio()
>>>>>
>>>>> 								Honza
>>>>> Previous versions:
>>>>> Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
>>>>> Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2
>>>>> Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3
>>>>> Link: http://lore.kernel.org/r/20220114164215.28972-1-jack@suse.cz # v4
>>>>> .
>>>>>
>>>> Hi, Jan
>>>>
>>>> I add a new BUG_ON() in bfq_setup_merge() while iterating new_bfqq, and
>>>> this time this BUG_ON() is triggered:
>>>
>>> Thanks for testing!
>>>
>>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>>> index 07be51bc229b..6d4e243c9a1e 100644
>>>> --- a/block/bfq-iosched.c
>>>> +++ b/block/bfq-iosched.c
>>>> @@ -2753,6 +2753,14 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct
>>>> bfq_queue *new_bfqq)
>>>>           while ((__bfqq = new_bfqq->new_bfqq)) {
>>>>                   if (__bfqq == bfqq)
>>>>                           return NULL;
>>>> +               if (new_bfqq->entity.parent != __bfqq->entity.parent &&
>>>> +                   bfqq_group(__bfqq) != __bfqq->bfqd->root_group) {
>>>> +                       printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n",
>>>> __func__,
>>>> +                               new_bfqq, bfqq_group(new_bfqq), __bfqq,
>>>> +                               bfqq_group(__bfqq));
>>>> +                       BUG_ON(1);
>>>
>>> This seems to be too early to check and BUG_ON(). Yes, we can walk through
>>> and even end up with a bfqq with a different parent however in that case we
>>> refuse to setup merge a few lines below and so there is no problem.
>>>
>>> Are you still able to reproduce the use-after-free issue with this version
>>> of my patches?
>>>
>>> 								Honza
>>>
>> Hi, Jan
>>
>> I add following additional debug info:
>>
>> @ -926,6 +935,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
>>          if (!entity) /* root group */
>>                  goto put_async_queues;
>>
>> +       printk("%s: bfqg %px offlined\n", __func__, bfqg);
>>          /*
>>           * Empty all service_trees belonging to this group before
>>           * deactivating the group itself.
>> @@ -965,6 +975,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
>>
>>   put_async_queues:
>>          bfq_put_async_queues(bfqd, bfqg);
>> +       pd->plid = BLKCG_MAX_POLS;
>>
>>          spin_unlock_irqrestore(&bfqd->lock, flags);
>>          /*
>>
>> @@ -6039,6 +6050,13 @@ static bool __bfq_insert_request(struct bfq_data
>> *bfqd, struct request *rq)
>>                  *new_bfqq = bfq_setup_cooperator(bfqd, bfqq, rq, true,
>>                                                   RQ_BIC(rq));
>>          bool waiting, idle_timer_disabled = false;
>> +       if (new_bfqq) {
>> +               printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n", __func__,
>> +                       bfqq, bfqq_group(bfqq), new_bfqq,
>> bfqq_group(new_bfqq));
>> +       } else {
>> +               printk("%s: bfqq %px(%px) new_bfqq null \n", __func__,
>> +                       bfqq, bfqq_group(bfqq));
>> +       }
>>
>> @@ -1696,6 +1696,11 @@ void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct
>> bfq_queue *bfqq)
>>
>>          bfq_activate_bfqq(bfqd, bfqq);
>>
>> +       if (bfqq->entity.parent && bfqq_group(bfqq)->pd.plid >=
>> BLKCG_MAX_POLS) {
>> +               printk("%s: bfqq %px(%px) with parent offlined\n", __func__,
>> +                               bfqq, bfqq_group(bfqq));
>> +               BUG_ON(1);
>> +       }
>>
>> And found that the uaf is triggered when bfq_setup_cooperator return
>> NULL, that's why the BUG_ON() in bfq_setup_cooperator() is not
>> triggered:
>>
>> [   51.833290] __bfq_insert_request: bfqq ffff888106913700(ffff888107d67000)
>> new_bfqq null
>> [   51.834762] bfq_add_bfqq_busy: bfqq ffff888106913700(ffff888107d67000)
>> with parent offlined
>>
>> The new_bfqq chain relate to bfqq ffff888106913700:
>>
>> t1: ffff8881114e9600 ------> t4: ffff888106913700 ------> t5:
>> ffff88810719e3c0
>>                          |
>> t2: ffff888106913440 ----
>>                          |
>> t3: ffff8881114e98c0 ----
>>
>> I'm still not sure about the root cause, hope these debuginfo
>> can be helpful
> 
> Thanks for debugging! I was looking into this but I also do not understand
> how what your tracing shows can happen. In particular I don't see why there
> is no __bfq_bic_change_cgroup() call from bfq_insert_request() ->
> bfq_init_rq() for the problematic __bfq_insert_request() into
> ffff888106913700. I have two possible explanations. Either bio is submitted
> to the offlined cgroup ffff888107d67000 or bic->blkcg_serial_nr is pointing
> to different cgroup than bic_to_bfqq(bic, 1)->entity.parent.
> 
> So can you extented the debugging a bit like:
> 1) Add current->pid to all debug messages so that we can distinguish
> different processes and see which already detached from the bfqq and which
> not.
> 
> 2) Print bic->blkcg_serial_nr and __bio_blkcg(bio)->css.serial_nr before
> crashing in bfq_add_bfqq_busy().
> 
> 3) Add BUG_ON to bic_set_bfqq() like:
> 	if (bfqq_group(bfqq)->css.serial_nr != bic->blkcg_serial_nr) {
> 		printk("%s: bfqq %px(%px) serial %d bic serial %d\n", bfqq,
> 			bfqq_group(bfqq), bfqq_group(bfqq)->css.serial_nr,
> 			bic->blkcg_serial_nr);
> 		BUG_ON(1);
> 	}
> 
> and perhaps this scheds more light on the problem... Thanks!
> 
> 								Honza
> 

Hi, Jan

Sorry about the delay, I'm on vacation for the Spring Festival.

I'll try the debugging soon.

Thanks,
Kuai

Jan Kara Feb. 9, 2022, 5:40 p.m. UTC | #5

On Tue 08-02-22 21:19:14, yukuai (C) wrote:
> 在 2022/02/03 5:53, Jan Kara 写道:
> > Thanks for debugging! I was looking into this but I also do not understand
> > how what your tracing shows can happen. In particular I don't see why there
> > is no __bfq_bic_change_cgroup() call from bfq_insert_request() ->
> > bfq_init_rq() for the problematic __bfq_insert_request() into
> > ffff888106913700. I have two possible explanations. Either bio is submitted
> > to the offlined cgroup ffff888107d67000 or bic->blkcg_serial_nr is pointing
> > to different cgroup than bic_to_bfqq(bic, 1)->entity.parent.
> > 
> > So can you extented the debugging a bit like:
> > 1) Add current->pid to all debug messages so that we can distinguish
> > different processes and see which already detached from the bfqq and which
> > not.
> > 
> > 2) Print bic->blkcg_serial_nr and __bio_blkcg(bio)->css.serial_nr before
> > crashing in bfq_add_bfqq_busy().
> > 
> > 3) Add BUG_ON to bic_set_bfqq() like:
> > 	if (bfqq_group(bfqq)->css.serial_nr != bic->blkcg_serial_nr) {
> > 		printk("%s: bfqq %px(%px) serial %d bic serial %d\n", bfqq,
> > 			bfqq_group(bfqq), bfqq_group(bfqq)->css.serial_nr,
> > 			bic->blkcg_serial_nr);
> > 		BUG_ON(1);
> > 	}
> > 
> > and perhaps this scheds more light on the problem... Thanks!
> > 
> > 								Honza
> > 
> 
> The debuging patch and log is attached.
> 
> bic_set_bfqq found serial mismatch serval times, bfqq_group(bfqq)->css
> doesn't exist, is the following debugging what you expected?
> 
> @@ -386,6 +386,13 @@ static void bfq_put_stable_ref(struct bfq_queue *bfqq);
> 
>  void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool
> is_sync)
>  {
> +       if (bfqq && bfqq_group(bfqq)->pd.blkg->blkcg->css.serial_nr !=
> +                   bic->blkcg_serial_nr) {
> +               bfqq_dbg(bfqq, "serial %lld bic serial %lld\n",
> +                        bfqq_group(bfqq)->pd.blkg->blkcg->css.serial_nr,
> +                        bic->blkcg_serial_nr);
> +               WARN_ON_ONCE(1);
> +       }

Yes, this is what I wanted. Thanks!

> 
> The problematic bfqq ffff8881682581f0 doesn't merge with any bfqq,
> which is weird.
> 
> The pid from __bfq_insert_request() is changed for the problematic
> bfqq, and each time the pid is changed, there is a bic_set_bfqq()
> shows that serial mismatch.

I had a look into debug data and now I think I understand both the WARN_ON
hit in bic_set_bfqq() as well as the final BUG_ON in bfq_add_bfqq_busy().

The first problem is apparently hit because __bio_blkcg() can change while
we are processing the bio. So bfq_bic_update_cgroup() sees different
__bio_blkcg() than bfq_get_queue() called from bfq_get_bfqq_handle_split().
This then causes mismatch between what bic & bfqq think about cgroup
membership which can lead to interesting inconsistencies down the road.

The second problem is hit because clearly __bio_blkcg() can be pointing to
a blkcg that has been already offlined. Previously I didn't think this was
possible but apparently there is nothing that would prevent this race. So
we need to handle this gracefully inside BFQ.

I need to think what would be best fixes for these issues since especially
the second one is tricky.

								Honza

Yu Kuai March 11, 2022, 6:39 a.m. UTC | #6

在 2022/02/10 1:40, Jan Kara 写道:
>
> I had a look into debug data and now I think I understand both the WARN_ON
> hit in bic_set_bfqq() as well as the final BUG_ON in bfq_add_bfqq_busy().
> 
> The first problem is apparently hit because __bio_blkcg() can change while
> we are processing the bio. So bfq_bic_update_cgroup() sees different
> __bio_blkcg() than bfq_get_queue() called from bfq_get_bfqq_handle_split().
> This then causes mismatch between what bic & bfqq think about cgroup
> membership which can lead to interesting inconsistencies down the road.
> 
> The second problem is hit because clearly __bio_blkcg() can be pointing to
> a blkcg that has been already offlined. Previously I didn't think this was
> possible but apparently there is nothing that would prevent this race. So
> we need to handle this gracefully inside BFQ.
> 
> I need to think what would be best fixes for these issues since especially
> the second one is tricky.

Hi, Jan

Just to be curiosity, do you have any ideas on how to fix these issues?

Thanks,
Kuai
> 
> 								Honza
>

Jan Kara March 11, 2022, 11:28 a.m. UTC | #7

On Fri 11-03-22 14:39:10, yukuai (C) wrote:
> 在 2022/02/10 1:40, Jan Kara 写道:
> > 
> > I had a look into debug data and now I think I understand both the WARN_ON
> > hit in bic_set_bfqq() as well as the final BUG_ON in bfq_add_bfqq_busy().
> > 
> > The first problem is apparently hit because __bio_blkcg() can change while
> > we are processing the bio. So bfq_bic_update_cgroup() sees different
> > __bio_blkcg() than bfq_get_queue() called from bfq_get_bfqq_handle_split().
> > This then causes mismatch between what bic & bfqq think about cgroup
> > membership which can lead to interesting inconsistencies down the road.
> > 
> > The second problem is hit because clearly __bio_blkcg() can be pointing to
> > a blkcg that has been already offlined. Previously I didn't think this was
> > possible but apparently there is nothing that would prevent this race. So
> > we need to handle this gracefully inside BFQ.
> > 
> > I need to think what would be best fixes for these issues since especially
> > the second one is tricky.
> 
> Hi, Jan
> 
> Just to be curiosity, do you have any ideas on how to fix these issues?

Sorry for the delay. I was on vacation and then busy with other stuff. I
have some version of the fixes ready but I want to clean them up a bit
before posting. It shouldn't take me long...

								Honza

[0/4,v5] bfq: Avoid use-after-free when moving processes between cgroups

Message

Comments