Message ID | b7809887-24e6-3ad7-e8bd-4fe7ea0927c0@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 11, 2018 at 01:55:25PM -0600, Bart Van Assche wrote: > On 04/11/18 13:00, Alexandru Moise wrote: > >But the root cause of it is in blkcg_init_queue() when blkg_create() returns > >an ERR ptr, because it tries to insert into a populated index into blkcg->blkg_tree, > >the entry that we fail to remove at __blk_release_queue(). > > Hello Alex, > > Had you considered something like the untested patch below? But queue init shouldn't fail here, right? Thanks.
On Wed, 2018-04-11 at 12:57 -0700, tj@kernel.org wrote: > On Wed, Apr 11, 2018 at 01:55:25PM -0600, Bart Van Assche wrote: > > On 04/11/18 13:00, Alexandru Moise wrote: > > > But the root cause of it is in blkcg_init_queue() when blkg_create() returns > > > an ERR ptr, because it tries to insert into a populated index into blkcg->blkg_tree, > > > the entry that we fail to remove at __blk_release_queue(). > > > > Hello Alex, > > > > Had you considered something like the untested patch below? > > But queue init shouldn't fail here, right? Hello Tejun, Your question is not entirely clear to me. Are you referring to the atomic allocations in blkg_create() or are you perhaps referring to something else? Bart.
On Wed, Apr 11, 2018 at 08:00:29PM +0000, Bart Van Assche wrote: > On Wed, 2018-04-11 at 12:57 -0700, tj@kernel.org wrote: > > On Wed, Apr 11, 2018 at 01:55:25PM -0600, Bart Van Assche wrote: > > > On 04/11/18 13:00, Alexandru Moise wrote: > > > > But the root cause of it is in blkcg_init_queue() when blkg_create() returns > > > > an ERR ptr, because it tries to insert into a populated index into blkcg->blkg_tree, > > > > the entry that we fail to remove at __blk_release_queue(). > > > > > > Hello Alex, > > > > > > Had you considered something like the untested patch below? > > > > But queue init shouldn't fail here, right? > > Hello Tejun, > > Your question is not entirely clear to me. Are you referring to the atomic > allocations in blkg_create() or are you perhaps referring to something else? Hmm.. maybe I'm confused but I thought that the fact that blkcg_init_queue() fails itself is already a bug, which happens because a previously destroyed queue left behind blkgs. Thanks.
On Wed, 2018-04-11 at 13:02 -0700, tj@kernel.org wrote: > On Wed, Apr 11, 2018 at 08:00:29PM +0000, Bart Van Assche wrote: > > On Wed, 2018-04-11 at 12:57 -0700, tj@kernel.org wrote: > > > On Wed, Apr 11, 2018 at 01:55:25PM -0600, Bart Van Assche wrote: > > > > On 04/11/18 13:00, Alexandru Moise wrote: > > > > > But the root cause of it is in blkcg_init_queue() when blkg_create() returns > > > > > an ERR ptr, because it tries to insert into a populated index into blkcg->blkg_tree, > > > > > the entry that we fail to remove at __blk_release_queue(). > > > > > > > > Hello Alex, > > > > > > > > Had you considered something like the untested patch below? > > > > > > But queue init shouldn't fail here, right? > > > > Hello Tejun, > > > > Your question is not entirely clear to me. Are you referring to the atomic > > allocations in blkg_create() or are you perhaps referring to something else? > > Hmm.. maybe I'm confused but I thought that the fact that > blkcg_init_queue() fails itself is already a bug, which happens > because a previously destroyed queue left behind blkgs. Hello Tejun, I had missed the start of this thread so I was not aware of which problem Alex was trying to solve. In the description of v1 of this patch I read that Alex thinks that he ran into a scenario in which blk_queue_alloc_node() assigns a q->id that is still in use by another request queue? That's weird. The following code still occurs in __blk_release_queue(): ida_simple_remove(&blk_queue_ida, q->id); It's not clear to me how that remove call could happen *before* q->id is removed from the blkcg radix tree. Bart.
On Wed, Apr 11, 2018 at 01:55:25PM -0600, Bart Van Assche wrote: > On 04/11/18 13:00, Alexandru Moise wrote: > > But the root cause of it is in blkcg_init_queue() when blkg_create() returns > > an ERR ptr, because it tries to insert into a populated index into blkcg->blkg_tree, > > the entry that we fail to remove at __blk_release_queue(). > > Hello Alex, > > Had you considered something like the untested patch below? > > Thanks, > > Bart. > > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > index 1c16694ae145..f2ced19e74b8 100644 > --- a/block/blk-cgroup.c > +++ b/block/blk-cgroup.c > @@ -1191,14 +1191,17 @@ int blkcg_init_queue(struct request_queue *q) > if (preloaded) > radix_tree_preload_end(); > > - if (IS_ERR(blkg)) > - return PTR_ERR(blkg); > + if (IS_ERR(blkg)) { > + ret = PTR_ERR(blkg); > + goto destroy_all; > + } > > q->root_blkg = blkg; > q->root_rl.blkg = blkg; > > ret = blk_throtl_init(q); > if (ret) { > +destroy_all: > spin_lock_irq(q->queue_lock); > blkg_destroy_all(q); > spin_unlock_irq(q->queue_lock); > Hi, I tested it, it doesn't solve the problem. By the time you get here it's already too late, my patch prevents this from failing in the first place. I would have liked this more than my solution though. ../Alex
On Wed, 2018-04-11 at 23:23 +0200, Alexandru Moise wrote: > Hi, I tested it, it doesn't solve the problem. > By the time you get here it's already too late, my patch > prevents this from failing in the first place. Hello Alex, If you can share the steps to follow to trigger the bug you reported then I will have a closer look at this. Thanks, Bart.
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 1c16694ae145..f2ced19e74b8 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1191,14 +1191,17 @@ int blkcg_init_queue(struct request_queue *q) if (preloaded) radix_tree_preload_end(); - if (IS_ERR(blkg)) - return PTR_ERR(blkg); + if (IS_ERR(blkg)) { + ret = PTR_ERR(blkg); + goto destroy_all; + } q->root_blkg = blkg; q->root_rl.blkg = blkg; ret = blk_throtl_init(q); if (ret) { +destroy_all: spin_lock_irq(q->queue_lock); blkg_destroy_all(q); spin_unlock_irq(q->queue_lock);