Message ID | 20170807193818.GA22737@rage (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Aug 8, 2017 at 3:38 AM, David Jeffery <djeffery@redhat.com> wrote: > There is a race between changing I/O elevator and request_queue removal > which can trigger the warning in kobject_add_internal. A program can > use sysfs to request a change of elevator at the same time another task > is unregistering the request_queue the elevator would be attached to. > The elevator's kobject will then attempt to be connected to the > request_queue in the object tree when the request_queue has just been > removed from sysfs. This triggers the warning in kobject_add_internal > as the request_queue no longer has a sysfs directory: > > kobject_add_internal failed for iosched (error: -2 parent: queue) > ------------[ cut here ]------------ > WARNING: CPU: 3 PID: 14075 at lib/kobject.c:244 kobject_add_internal+0x103/0x2d0 > > > To fix this warning, we can check the QUEUE_FLAG_REGISTERED flag when > changing the elevator and use the request_queue's sysfs_lock to > serialize between clearing the flag and the elevator testing the flag. I remember I saw this issue too. > > Signed-off-by: David Jeffery <djeffery@redhat.com> > --- > block/blk-sysfs.c | 2 ++ > block/elevator.c | 4 ++++ > 2 files changed, 6 insertions(+) > > > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > index 27aceab..b8362c0 100644 > --- a/block/blk-sysfs.c > +++ b/block/blk-sysfs.c > @@ -931,7 +931,9 @@ void blk_unregister_queue(struct gendisk *disk) > if (WARN_ON(!q)) > return; > > + mutex_lock(&q->sysfs_lock); > queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q); > + mutex_unlock(&q->sysfs_lock); Could you share why the lock of 'q->sysfs_lock' is needed here? > > wbt_exit(q); > > diff --git a/block/elevator.c b/block/elevator.c > index 4bb2f0c..51da592 100644 > --- a/block/elevator.c > +++ b/block/elevator.c > @@ -1055,6 +1055,10 @@ static int __elevator_change(struct request_queue *q, const char *name) > char elevator_name[ELV_NAME_MAX]; > struct elevator_type *e; > > + /* Make sure queue is not in the middle of being removed */ > + if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags)) > + return -ENOENT; > + I suggest to check 'e->registered' here, which should be more reasonable or straightforward.
On 08/07/2017 07:53 PM, Ming Lei wrote: > On Tue, Aug 8, 2017 at 3:38 AM, David Jeffery <djeffery@redhat.com> wrote: >> >> Signed-off-by: David Jeffery <djeffery@redhat.com> >> --- >> block/blk-sysfs.c | 2 ++ >> block/elevator.c | 4 ++++ >> 2 files changed, 6 insertions(+) >> >> >> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c >> index 27aceab..b8362c0 100644 >> --- a/block/blk-sysfs.c >> +++ b/block/blk-sysfs.c >> @@ -931,7 +931,9 @@ void blk_unregister_queue(struct gendisk *disk) >> if (WARN_ON(!q)) >> return; >> >> + mutex_lock(&q->sysfs_lock); >> queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q); >> + mutex_unlock(&q->sysfs_lock); > > Could you share why the lock of 'q->sysfs_lock' is needed here? As the elevator change is initiated through a sysfs attr file in the queue directory, the task doing the elevator change already acquires the q->sysfs_lock before it can try and change the elevator. Adding the lock around clearing QUEUE_FLAG_REGISTERED ensures that the queue state will be stable while the elevator is being changed. It prevents a race condition where the bit is checked but then cleared and queue removed from sysfs before the elevator change completes. > >> >> wbt_exit(q); >> >> diff --git a/block/elevator.c b/block/elevator.c >> index 4bb2f0c..51da592 100644 >> --- a/block/elevator.c >> +++ b/block/elevator.c >> @@ -1055,6 +1055,10 @@ static int __elevator_change(struct request_queue *q, const char *name) >> char elevator_name[ELV_NAME_MAX]; >> struct elevator_type *e; >> >> + /* Make sure queue is not in the middle of being removed */ >> + if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags)) >> + return -ENOENT; >> + > > I suggest to check 'e->registered' here, which should be more > reasonable or straightforward. > e->registered is not the state needing to be checked. We need to know the state of the associated request queue. Before changing the elevator, we need to ensure the request queue is still connected to sysfs. i.e. We need to know that kobject_del has not been called on the request queue. When QUEUE_FLAG_REGISTERED is not set it means the request queue either has had kobject_del called or will have it called soon, so we should fail the elevator change attempt.
Hi David, On Wed, Aug 9, 2017 at 2:13 AM, David Jeffery <djeffery@redhat.com> wrote: > On 08/07/2017 07:53 PM, Ming Lei wrote: >> On Tue, Aug 8, 2017 at 3:38 AM, David Jeffery <djeffery@redhat.com> wrote: > >>> >>> Signed-off-by: David Jeffery <djeffery@redhat.com> >>> --- >>> block/blk-sysfs.c | 2 ++ >>> block/elevator.c | 4 ++++ >>> 2 files changed, 6 insertions(+) >>> >>> >>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c >>> index 27aceab..b8362c0 100644 >>> --- a/block/blk-sysfs.c >>> +++ b/block/blk-sysfs.c >>> @@ -931,7 +931,9 @@ void blk_unregister_queue(struct gendisk *disk) >>> if (WARN_ON(!q)) >>> return; >>> >>> + mutex_lock(&q->sysfs_lock); >>> queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q); >>> + mutex_unlock(&q->sysfs_lock); >> >> Could you share why the lock of 'q->sysfs_lock' is needed here? > > As the elevator change is initiated through a sysfs attr file in the > queue directory, the task doing the elevator change already acquires the > q->sysfs_lock before it can try and change the elevator. Adding the It is e->sysfs_lock which is held in elv_attr_store(), instead of q->sysfs_lock. > lock around clearing QUEUE_FLAG_REGISTERED ensures that the queue state > will be stable while the elevator is being changed. It prevents a race > condition where the bit is checked but then cleared and queue removed > from sysfs before the elevator change completes. > >> >>> >>> wbt_exit(q); >>> >>> diff --git a/block/elevator.c b/block/elevator.c >>> index 4bb2f0c..51da592 100644 >>> --- a/block/elevator.c >>> +++ b/block/elevator.c >>> @@ -1055,6 +1055,10 @@ static int __elevator_change(struct request_queue *q, const char *name) >>> char elevator_name[ELV_NAME_MAX]; >>> struct elevator_type *e; >>> >>> + /* Make sure queue is not in the middle of being removed */ >>> + if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags)) >>> + return -ENOENT; >>> + >> >> I suggest to check 'e->registered' here, which should be more >> reasonable or straightforward. >> > > e->registered is not the state needing to be checked. We need to know > the state of the associated request queue. > > Before changing the elevator, we need to ensure the request queue is > still connected to sysfs. i.e. We need to know that kobject_del has not > been called on the request queue. When QUEUE_FLAG_REGISTERED is not set > it means the request queue either has had kobject_del called or will > have it called soon, so we should fail the elevator change attempt. elv_unregister_queue() is called in blk_unregister_queue() too, that is why I suggest to check e->registered.
On Wed, Aug 9, 2017 at 9:44 AM, Ming Lei <tom.leiming@gmail.com> wrote: > Hi David, > > On Wed, Aug 9, 2017 at 2:13 AM, David Jeffery <djeffery@redhat.com> wrote: >> On 08/07/2017 07:53 PM, Ming Lei wrote: >>> On Tue, Aug 8, 2017 at 3:38 AM, David Jeffery <djeffery@redhat.com> wrote: >> >>>> >>>> Signed-off-by: David Jeffery <djeffery@redhat.com> >>>> --- >>>> block/blk-sysfs.c | 2 ++ >>>> block/elevator.c | 4 ++++ >>>> 2 files changed, 6 insertions(+) >>>> >>>> >>>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c >>>> index 27aceab..b8362c0 100644 >>>> --- a/block/blk-sysfs.c >>>> +++ b/block/blk-sysfs.c >>>> @@ -931,7 +931,9 @@ void blk_unregister_queue(struct gendisk *disk) >>>> if (WARN_ON(!q)) >>>> return; >>>> >>>> + mutex_lock(&q->sysfs_lock); >>>> queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q); >>>> + mutex_unlock(&q->sysfs_lock); >>> >>> Could you share why the lock of 'q->sysfs_lock' is needed here? >> >> As the elevator change is initiated through a sysfs attr file in the >> queue directory, the task doing the elevator change already acquires the >> q->sysfs_lock before it can try and change the elevator. Adding the > > It is e->sysfs_lock which is held in elv_attr_store(), instead of q->sysfs_lock. Looks I was wrong, and the store is from queue_attr_store() instead of elv_attr_store(). I can reproduce the issue and this patch can fix the issue in my test on scsi_debug, so: Tested-by: Ming Lei <ming.lei@redhat.com> And it is a typical race between removing queue kobj and adding children of this queue kobj, we can't acquire q->sysfs_lock in blk_unregister_queue() because of sysfs's limit(may cause deadlock), so one state has to be used for this purpose, just like what register/unregister hctx kobjs does, and it should be fine to use QUEUE_FLAG_REGISTERED here. More changes are required if we use e->registered, so this patch looks fine: Reviewed-by: Ming Lei <ming.lei@redhat.com> Thanks, Ming Lei
On Wed, Aug 23, 2017 at 11:34 AM, Ming Lei <tom.leiming@gmail.com> wrote: > On Wed, Aug 9, 2017 at 9:44 AM, Ming Lei <tom.leiming@gmail.com> wrote: >> Hi David, >> >> On Wed, Aug 9, 2017 at 2:13 AM, David Jeffery <djeffery@redhat.com> wrote: >>> On 08/07/2017 07:53 PM, Ming Lei wrote: >>>> On Tue, Aug 8, 2017 at 3:38 AM, David Jeffery <djeffery@redhat.com> wrote: >>> >>>>> >>>>> Signed-off-by: David Jeffery <djeffery@redhat.com> >>>>> --- >>>>> block/blk-sysfs.c | 2 ++ >>>>> block/elevator.c | 4 ++++ >>>>> 2 files changed, 6 insertions(+) >>>>> >>>>> >>>>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c >>>>> index 27aceab..b8362c0 100644 >>>>> --- a/block/blk-sysfs.c >>>>> +++ b/block/blk-sysfs.c >>>>> @@ -931,7 +931,9 @@ void blk_unregister_queue(struct gendisk *disk) >>>>> if (WARN_ON(!q)) >>>>> return; >>>>> >>>>> + mutex_lock(&q->sysfs_lock); >>>>> queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q); >>>>> + mutex_unlock(&q->sysfs_lock); >>>> >>>> Could you share why the lock of 'q->sysfs_lock' is needed here? >>> >>> As the elevator change is initiated through a sysfs attr file in the >>> queue directory, the task doing the elevator change already acquires the >>> q->sysfs_lock before it can try and change the elevator. Adding the >> >> It is e->sysfs_lock which is held in elv_attr_store(), instead of q->sysfs_lock. > > Looks I was wrong, and the store is from queue_attr_store() instead of > elv_attr_store(). > > I can reproduce the issue and this patch can fix the issue in my test > on scsi_debug, > so: > > Tested-by: Ming Lei <ming.lei@redhat.com> > > And it is a typical race between removing queue kobj and adding children of > this queue kobj, we can't acquire q->sysfs_lock in blk_unregister_queue() > because of sysfs's limit(may cause deadlock), so one state has to be used > for this purpose, just like what register/unregister hctx kobjs does, > and it should > be fine to use QUEUE_FLAG_REGISTERED here. More changes are required if > we use e->registered, so this patch looks fine: > > Reviewed-by: Ming Lei <ming.lei@redhat.com> Hi Jens, Could you consider this patch for v4.13 or v4.14? Thanks, Ming Lei
On 08/27/2017 07:36 PM, Ming Lei wrote: > On Wed, Aug 23, 2017 at 11:34 AM, Ming Lei <tom.leiming@gmail.com> wrote: >> On Wed, Aug 9, 2017 at 9:44 AM, Ming Lei <tom.leiming@gmail.com> wrote: >>> Hi David, >>> >>> On Wed, Aug 9, 2017 at 2:13 AM, David Jeffery <djeffery@redhat.com> wrote: >>>> On 08/07/2017 07:53 PM, Ming Lei wrote: >>>>> On Tue, Aug 8, 2017 at 3:38 AM, David Jeffery <djeffery@redhat.com> wrote: >>>> >>>>>> >>>>>> Signed-off-by: David Jeffery <djeffery@redhat.com> >>>>>> --- >>>>>> block/blk-sysfs.c | 2 ++ >>>>>> block/elevator.c | 4 ++++ >>>>>> 2 files changed, 6 insertions(+) >>>>>> >>>>>> >>>>>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c >>>>>> index 27aceab..b8362c0 100644 >>>>>> --- a/block/blk-sysfs.c >>>>>> +++ b/block/blk-sysfs.c >>>>>> @@ -931,7 +931,9 @@ void blk_unregister_queue(struct gendisk *disk) >>>>>> if (WARN_ON(!q)) >>>>>> return; >>>>>> >>>>>> + mutex_lock(&q->sysfs_lock); >>>>>> queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q); >>>>>> + mutex_unlock(&q->sysfs_lock); >>>>> >>>>> Could you share why the lock of 'q->sysfs_lock' is needed here? >>>> >>>> As the elevator change is initiated through a sysfs attr file in the >>>> queue directory, the task doing the elevator change already acquires the >>>> q->sysfs_lock before it can try and change the elevator. Adding the >>> >>> It is e->sysfs_lock which is held in elv_attr_store(), instead of q->sysfs_lock. >> >> Looks I was wrong, and the store is from queue_attr_store() instead of >> elv_attr_store(). >> >> I can reproduce the issue and this patch can fix the issue in my test >> on scsi_debug, >> so: >> >> Tested-by: Ming Lei <ming.lei@redhat.com> >> >> And it is a typical race between removing queue kobj and adding children of >> this queue kobj, we can't acquire q->sysfs_lock in blk_unregister_queue() >> because of sysfs's limit(may cause deadlock), so one state has to be used >> for this purpose, just like what register/unregister hctx kobjs does, >> and it should >> be fine to use QUEUE_FLAG_REGISTERED here. More changes are required if >> we use e->registered, so this patch looks fine: >> >> Reviewed-by: Ming Lei <ming.lei@redhat.com> > > > Hi Jens, > > Could you consider this patch for v4.13 or v4.14? Yep, added for 4.14, thanks.
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 27aceab..b8362c0 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -931,7 +931,9 @@ void blk_unregister_queue(struct gendisk *disk) if (WARN_ON(!q)) return; + mutex_lock(&q->sysfs_lock); queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q); + mutex_unlock(&q->sysfs_lock); wbt_exit(q); diff --git a/block/elevator.c b/block/elevator.c index 4bb2f0c..51da592 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -1055,6 +1055,10 @@ static int __elevator_change(struct request_queue *q, const char *name) char elevator_name[ELV_NAME_MAX]; struct elevator_type *e; + /* Make sure queue is not in the middle of being removed */ + if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags)) + return -ENOENT; + /* * Special case for mq, turn off scheduling */
There is a race between changing I/O elevator and request_queue removal which can trigger the warning in kobject_add_internal. A program can use sysfs to request a change of elevator at the same time another task is unregistering the request_queue the elevator would be attached to. The elevator's kobject will then attempt to be connected to the request_queue in the object tree when the request_queue has just been removed from sysfs. This triggers the warning in kobject_add_internal as the request_queue no longer has a sysfs directory: kobject_add_internal failed for iosched (error: -2 parent: queue) ------------[ cut here ]------------ WARNING: CPU: 3 PID: 14075 at lib/kobject.c:244 kobject_add_internal+0x103/0x2d0 To fix this warning, we can check the QUEUE_FLAG_REGISTERED flag when changing the elevator and use the request_queue's sysfs_lock to serialize between clearing the flag and the elevator testing the flag. Signed-off-by: David Jeffery <djeffery@redhat.com> --- block/blk-sysfs.c | 2 ++ block/elevator.c | 4 ++++ 2 files changed, 6 insertions(+)