[3/5] blktrace: refcount the request_queue during ioctl

Message ID	20200414041902.16769-4-mcgrof@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=BpL9=56=vger.kernel.org=linux-fsdevel-owner@kernel.org> From: Luis Chamberlain <mcgrof@kernel.org> To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain <mcgrof@kernel.org>, Omar Sandoval <osandov@fb.com>, Hannes Reinecke <hare@suse.com>, Michal Hocko <mhocko@kernel.org> Subject: [PATCH 3/5] blktrace: refcount the request_queue during ioctl Date: Tue, 14 Apr 2020 04:19:00 +0000 Message-Id: <20200414041902.16769-4-mcgrof@kernel.org> In-Reply-To: <20200414041902.16769-1-mcgrof@kernel.org> References: <20200414041902.16769-1-mcgrof@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk
Series	blktrace: fix use after free \| expand [0/5] blktrace: fix use after free [1/5] block: move main block debugfs initialization to its own file [2/5] blktrace: fix debugfs use after free [3/5] blktrace: refcount the request_queue during ioctl [4/5] mm/swapfile: refcount block and queue before using blkcg_schedule_throttle() [5/5] block: revert back to synchronous request_queue removal

Luis Chamberlain April 14, 2020, 4:19 a.m. UTC

Ensure that the request_queue is refcounted during its full
ioctl cycle. This avoids possible races against removal, given
blk_get_queue() also checks to ensure the queue is not dying.

This small race is possible if you defer removal of the request_queue
and userspace fires off an ioctl for the device in the meantime.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: yu kuai <yukuai3@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 kernel/trace/blktrace.c | 6 ++++++
 1 file changed, 6 insertions(+)

Christoph Hellwig April 14, 2020, 3:40 p.m. UTC | #1

On Tue, Apr 14, 2020 at 04:19:00AM +0000, Luis Chamberlain wrote:
> Ensure that the request_queue is refcounted during its full
> ioctl cycle. This avoids possible races against removal, given
> blk_get_queue() also checks to ensure the queue is not dying.
> 
> This small race is possible if you defer removal of the request_queue
> and userspace fires off an ioctl for the device in the meantime.

Hmm, where exactly does the race come in so that it can only happen
after where you take the reference, but not before it?  I'm probably
missing something, but that just means it needs to be explained a little
better :)

Luis Chamberlain April 15, 2020, 6:16 a.m. UTC | #2

On Tue, Apr 14, 2020 at 08:40:44AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 14, 2020 at 04:19:00AM +0000, Luis Chamberlain wrote:
> > Ensure that the request_queue is refcounted during its full
> > ioctl cycle. This avoids possible races against removal, given
> > blk_get_queue() also checks to ensure the queue is not dying.
> > 
> > This small race is possible if you defer removal of the request_queue
> > and userspace fires off an ioctl for the device in the meantime.
> 
> Hmm, where exactly does the race come in so that it can only happen
> after where you take the reference, but not before it?  I'm probably
> missing something, but that just means it needs to be explained a little
> better :)

From the trace on patch 2/5:

    BLKTRACE_SETUP(loop0) #2
    [   13.933961] == blk_trace_ioctl(2, BLKTRACESETUP) start
    [   13.936758] === do_blk_trace_setup(2) start
    [   13.938944] === do_blk_trace_setup(2) creating directory
    [   13.941029] === do_blk_trace_setup(2) using what debugfs_lookup() gave
    
    ---> From LOOP_CTL_DEL(loop0) #2
    [   13.971046] === blk_trace_cleanup(7) end
    [   13.973175] == __blk_trace_remove(7) end
    [   13.975352] == blk_trace_shutdown(7) end
    [   13.977415] = __blk_release_queue(7) calling blk_mq_debugfs_unregister()
    [   13.980645] ==== blk_mq_debugfs_unregister(7) begin
    [   13.980696] ==== blk_mq_debugfs_unregister(7) debugfs_remove_recursive(q->debugfs_dir)
    [   13.983118] ==== blk_mq_debugfs_unregister(7) end q->debugfs_dir is NULL
    [   13.986945] = __blk_release_queue(7) blk_mq_debugfs_unregister() end
    [   13.993155] = __blk_release_queue(7) end
    
    ---> From BLKTRACE_SETUP(loop0) #2
    [   13.995928] === do_blk_trace_setup(2) end with ret: 0
    [   13.997623] == blk_trace_ioctl(2, BLKTRACESETUP) end

The BLKTRACESETUP above works on request_queue which later
LOOP_CTL_DEL races on and sweeps the debugfs dir underneath us.
If you use this commit alone though, this doesn't fix the race issue
however, and that's because of both still the debugfs_lookup() use
and that we're still using asynchronous removal at this point.

refcounting will just ensure we don't take the request_queue underneath
our noses.

Should I just add this to the commit log?

  Luis

Christoph Hellwig April 15, 2020, 7:14 a.m. UTC | #3

On Wed, Apr 15, 2020 at 06:16:49AM +0000, Luis Chamberlain wrote:
> The BLKTRACESETUP above works on request_queue which later
> LOOP_CTL_DEL races on and sweeps the debugfs dir underneath us.
> If you use this commit alone though, this doesn't fix the race issue
> however, and that's because of both still the debugfs_lookup() use
> and that we're still using asynchronous removal at this point.
> 
> refcounting will just ensure we don't take the request_queue underneath
> our noses.
> 
> Should I just add this to the commit log?

That sounds much more useful than the trace.

Btw, Isn't blk_get_queue racy as well?  Shouldn't we check
blk_queue_dying after getting the reference and undo it if the queue is
indeeed dying?

Luis Chamberlain April 15, 2020, 12:34 p.m. UTC | #4

On Wed, Apr 15, 2020 at 12:14:25AM -0700, Christoph Hellwig wrote:
> On Wed, Apr 15, 2020 at 06:16:49AM +0000, Luis Chamberlain wrote:
> > The BLKTRACESETUP above works on request_queue which later
> > LOOP_CTL_DEL races on and sweeps the debugfs dir underneath us.
> > If you use this commit alone though, this doesn't fix the race issue
> > however, and that's because of both still the debugfs_lookup() use
> > and that we're still using asynchronous removal at this point.
> > 
> > refcounting will just ensure we don't take the request_queue underneath
> > our noses.
> > 
> > Should I just add this to the commit log?
> 
> That sounds much more useful than the trace.
> 
> Btw, Isn't blk_get_queue racy as well?  Shouldn't we check
> blk_queue_dying after getting the reference and undo it if the queue is
> indeeed dying?

Yes that race should be possible:

bool blk_get_queue(struct request_queue *q)                                     
{                                                                               
	if (likely(!blk_queue_dying(q))) {
       ----------> we can get the queue to go dying here <---------
		__blk_get_queue(q);
		return true;
	}                                                                       

	return false;
}                                                                               
EXPORT_SYMBOL(blk_get_queue);

I'll pile up a fix. I've also considered doing a full review of callers
outside of the core block layer using it, and maybe just unexporting
this. It was originally exported due to commit d86e0e83b ("block: export
blk_{get,put}_queue()") to fix a scsi bug, but I can't find such
respective fix. I suspec that using bdgrab()/bdput() seems more likely
what drivers should be using. That would allow us to keep this
functionality internal.

Think that's worthy review?

  Luis

Christoph Hellwig April 15, 2020, 12:39 p.m. UTC | #5

On Wed, Apr 15, 2020 at 12:34:34PM +0000, Luis Chamberlain wrote:
> I'll pile up a fix. I've also considered doing a full review of callers
> outside of the core block layer using it, and maybe just unexporting
> this. It was originally exported due to commit d86e0e83b ("block: export
> blk_{get,put}_queue()") to fix a scsi bug, but I can't find such
> respective fix. I suspec that using bdgrab()/bdput() seems more likely
> what drivers should be using. That would allow us to keep this
> functionality internal.
> 
> Think that's worthy review?

Probably.  I did in fact very quickly look into that but then gave
up due to the fair amount of modular users.

Luis Chamberlain April 15, 2020, 1:25 p.m. UTC | #6

On Wed, Apr 15, 2020 at 05:39:25AM -0700, Christoph Hellwig wrote:
> On Wed, Apr 15, 2020 at 12:34:34PM +0000, Luis Chamberlain wrote:
> > I'll pile up a fix. I've also considered doing a full review of callers
> > outside of the core block layer using it, and maybe just unexporting
> > this. It was originally exported due to commit d86e0e83b ("block: export
> > blk_{get,put}_queue()") to fix a scsi bug, but I can't find such
> > respective fix. I suspec that using bdgrab()/bdput() seems more likely
> > what drivers should be using. That would allow us to keep this
> > functionality internal.
> > 
> > Think that's worthy review?
> 
> Probably.  I did in fact very quickly look into that but then gave
> up due to the fair amount of modular users.

Alright, then might as well then verify if the existing practice of
bdgrab()/bdput() is indeed valid logic, as otherwise we'd be puting
the atomic context / sleep concern to bdput(). As noted earlier I
am able to confirm easily that bdgrab() can be called in atomic contex,
however I cannot easily yet vet for *why* this was a safe assumption for
bdput().

  Luis

Bart Van Assche April 15, 2020, 2:18 p.m. UTC | #7

On 2020-04-15 05:34, Luis Chamberlain wrote:
> On Wed, Apr 15, 2020 at 12:14:25AM -0700, Christoph Hellwig wrote:
>> Btw, Isn't blk_get_queue racy as well?  Shouldn't we check
>> blk_queue_dying after getting the reference and undo it if the queue is
>> indeeed dying?
> 
> Yes that race should be possible:
> 
> bool blk_get_queue(struct request_queue *q)                                     
> {                                                                               
> 	if (likely(!blk_queue_dying(q))) {
>        ----------> we can get the queue to go dying here <---------
> 		__blk_get_queue(q);
> 		return true;
> 	}                                                                       
> 
> 	return false;
> }                                                                               
> EXPORT_SYMBOL(blk_get_queue);
> 
> I'll pile up a fix. I've also considered doing a full review of callers
> outside of the core block layer using it, and maybe just unexporting
> this. It was originally exported due to commit d86e0e83b ("block: export
> blk_{get,put}_queue()") to fix a scsi bug, but I can't find such
> respective fix. I suspec that using bdgrab()/bdput() seems more likely
> what drivers should be using. That would allow us to keep this
> functionality internal.

blk_get_queue() prevents concurrent freeing of struct request_queue but
does not prevent concurrent blk_cleanup_queue() calls. Callers of
blk_get_queue() may encounter a change of the queue state from normal
into dying any time during the blk_get_queue() call or after
blk_get_queue() has finished. Maybe I'm overlooking something but I
doubt that modifying blk_get_queue() will help.

Bart.

Bart Van Assche April 15, 2020, 2:45 p.m. UTC | #8

On 2020-04-14 23:16, Luis Chamberlain wrote:
> On Tue, Apr 14, 2020 at 08:40:44AM -0700, Christoph Hellwig wrote:
>> Hmm, where exactly does the race come in so that it can only happen
>> after where you take the reference, but not before it?  I'm probably
>> missing something, but that just means it needs to be explained a little
>> better :)
> 
>>From the trace on patch 2/5:
> 
>     BLKTRACE_SETUP(loop0) #2
>     [   13.933961] == blk_trace_ioctl(2, BLKTRACESETUP) start
>     [   13.936758] === do_blk_trace_setup(2) start
>     [   13.938944] === do_blk_trace_setup(2) creating directory
>     [   13.941029] === do_blk_trace_setup(2) using what debugfs_lookup() gave
>     
>     ---> From LOOP_CTL_DEL(loop0) #2
>     [   13.971046] === blk_trace_cleanup(7) end
>     [   13.973175] == __blk_trace_remove(7) end
>     [   13.975352] == blk_trace_shutdown(7) end
>     [   13.977415] = __blk_release_queue(7) calling blk_mq_debugfs_unregister()
>     [   13.980645] ==== blk_mq_debugfs_unregister(7) begin
>     [   13.980696] ==== blk_mq_debugfs_unregister(7) debugfs_remove_recursive(q->debugfs_dir)
>     [   13.983118] ==== blk_mq_debugfs_unregister(7) end q->debugfs_dir is NULL
>     [   13.986945] = __blk_release_queue(7) blk_mq_debugfs_unregister() end
>     [   13.993155] = __blk_release_queue(7) end
>     
>     ---> From BLKTRACE_SETUP(loop0) #2
>     [   13.995928] === do_blk_trace_setup(2) end with ret: 0
>     [   13.997623] == blk_trace_ioctl(2, BLKTRACESETUP) end
> 
> The BLKTRACESETUP above works on request_queue which later
> LOOP_CTL_DEL races on and sweeps the debugfs dir underneath us.
> If you use this commit alone though, this doesn't fix the race issue
> however, and that's because of both still the debugfs_lookup() use
> and that we're still using asynchronous removal at this point.
> 
> refcounting will just ensure we don't take the request_queue underneath
> our noses.

I think the above trace reveals a bug in the loop driver. The loop
driver shouldn't allow the associated request queue to disappear while
the loop device is open. One may want to have a look at sd_open() in the
sd driver. The scsi_disk_get() call in that function not only increases
the reference count of the SCSI disk but also of the underlying SCSI device.

Thanks,

Bart.

Luis Chamberlain April 16, 2020, 1:12 a.m. UTC | #9

On Wed, Apr 15, 2020 at 07:18:22AM -0700, Bart Van Assche wrote:
> On 2020-04-15 05:34, Luis Chamberlain wrote:
> > On Wed, Apr 15, 2020 at 12:14:25AM -0700, Christoph Hellwig wrote:
> >> Btw, Isn't blk_get_queue racy as well?  Shouldn't we check
> >> blk_queue_dying after getting the reference and undo it if the queue is
> >> indeeed dying?
> > 
> > Yes that race should be possible:
> > 
> > bool blk_get_queue(struct request_queue *q)                                     
> > {                                                                               
> > 	if (likely(!blk_queue_dying(q))) {
> >        ----------> we can get the queue to go dying here <---------
> > 		__blk_get_queue(q);
> > 		return true;
> > 	}                                                                       
> > 
> > 	return false;
> > }                                                                               
> > EXPORT_SYMBOL(blk_get_queue);
> > 
> > I'll pile up a fix. I've also considered doing a full review of callers
> > outside of the core block layer using it, and maybe just unexporting
> > this. It was originally exported due to commit d86e0e83b ("block: export
> > blk_{get,put}_queue()") to fix a scsi bug, but I can't find such
> > respective fix. I suspec that using bdgrab()/bdput() seems more likely
> > what drivers should be using. That would allow us to keep this
> > functionality internal.
> 
> blk_get_queue() prevents concurrent freeing of struct request_queue but
> does not prevent concurrent blk_cleanup_queue() calls.

Wouldn't concurrent blk_cleanup_queue() calls be a bug? If so should
I make it clear that it would be or simply prevent it?

> Callers of
> blk_get_queue() may encounter a change of the queue state from normal
> into dying any time during the blk_get_queue() call or after
> blk_get_queue() has finished. Maybe I'm overlooking something but I
> doubt that modifying blk_get_queue() will help.

Good point, to fix that race described by Christoph we'd have to take
into consideration refcounts of the request_queue to prevent queues from
changing state to dying if the refcount is > 1, however that'd also
would  mean not allowing the request_queue from ever dying.

One way we could resolve this could be to to keep track of a
quiesce/dying request, then at that point prevent blk_get_queue() from
allowing increments, and once the refcount is down to 1, flip the switch
to dying.

  Luis

Luis Chamberlain April 16, 2020, 1:17 a.m. UTC | #10

On Wed, Apr 15, 2020 at 07:45:18AM -0700, Bart Van Assche wrote:
> On 2020-04-14 23:16, Luis Chamberlain wrote:
> > On Tue, Apr 14, 2020 at 08:40:44AM -0700, Christoph Hellwig wrote:
> >> Hmm, where exactly does the race come in so that it can only happen
> >> after where you take the reference, but not before it?  I'm probably
> >> missing something, but that just means it needs to be explained a little
> >> better :)
> > 
> >>From the trace on patch 2/5:
> > 
> >     BLKTRACE_SETUP(loop0) #2
> >     [   13.933961] == blk_trace_ioctl(2, BLKTRACESETUP) start
> >     [   13.936758] === do_blk_trace_setup(2) start
> >     [   13.938944] === do_blk_trace_setup(2) creating directory
> >     [   13.941029] === do_blk_trace_setup(2) using what debugfs_lookup() gave
> >     
> >     ---> From LOOP_CTL_DEL(loop0) #2
> >     [   13.971046] === blk_trace_cleanup(7) end
> >     [   13.973175] == __blk_trace_remove(7) end
> >     [   13.975352] == blk_trace_shutdown(7) end
> >     [   13.977415] = __blk_release_queue(7) calling blk_mq_debugfs_unregister()
> >     [   13.980645] ==== blk_mq_debugfs_unregister(7) begin
> >     [   13.980696] ==== blk_mq_debugfs_unregister(7) debugfs_remove_recursive(q->debugfs_dir)
> >     [   13.983118] ==== blk_mq_debugfs_unregister(7) end q->debugfs_dir is NULL
> >     [   13.986945] = __blk_release_queue(7) blk_mq_debugfs_unregister() end
> >     [   13.993155] = __blk_release_queue(7) end
> >     
> >     ---> From BLKTRACE_SETUP(loop0) #2
> >     [   13.995928] === do_blk_trace_setup(2) end with ret: 0
> >     [   13.997623] == blk_trace_ioctl(2, BLKTRACESETUP) end
> > 
> > The BLKTRACESETUP above works on request_queue which later
> > LOOP_CTL_DEL races on and sweeps the debugfs dir underneath us.
> > If you use this commit alone though, this doesn't fix the race issue
> > however, and that's because of both still the debugfs_lookup() use
> > and that we're still using asynchronous removal at this point.
> > 
> > refcounting will just ensure we don't take the request_queue underneath
> > our noses.
> 
> I think the above trace reveals a bug in the loop driver. The loop
> driver shouldn't allow the associated request queue to disappear while
> the loop device is open.

The bug was *not* in the driver, the bug was in that deferal of removal
was allowed to be asynchronous, therefore the removal from a userspace
perspective *finishes*, but its not actually really done. Back when
the removal was synchronous, the loop driver waited on cleanup, and
didn't return to userspace until it was really removed.

This is why I annotated that the move to asynch removal turns out to
actually be a userspace API regression.

> One may want to have a look at sd_open() in the
> sd driver. The scsi_disk_get() call in that function not only increases
> the reference count of the SCSI disk but also of the underlying SCSI device.

Are you saying to use this as a template for what a driver should do or
do you suspect there is a bug there? Not sure what you mean here.

  Luis

Ming Lei April 16, 2020, 2:31 a.m. UTC | #11

On Tue, Apr 14, 2020 at 04:19:00AM +0000, Luis Chamberlain wrote:
> Ensure that the request_queue is refcounted during its full
> ioctl cycle. This avoids possible races against removal, given
> blk_get_queue() also checks to ensure the queue is not dying.
> 
> This small race is possible if you defer removal of the request_queue
> and userspace fires off an ioctl for the device in the meantime.
> 
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Nicolai Stange <nstange@suse.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: yu kuai <yukuai3@huawei.com>
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  kernel/trace/blktrace.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> index 15086227592f..17e144d15779 100644
> --- a/kernel/trace/blktrace.c
> +++ b/kernel/trace/blktrace.c
> @@ -701,6 +701,9 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg)
>  	if (!q)
>  		return -ENXIO;
>  
> +	if (!blk_get_queue(q))
> +		return -ENXIO;
> +
>  	mutex_lock(&q->blk_trace_mutex);
>  
>  	switch (cmd) {
> @@ -729,6 +732,9 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg)
>  	}
>  
>  	mutex_unlock(&q->blk_trace_mutex);
> +
> +	blk_put_queue(q);
> +
>  	return ret;
>  }

Actually when bdev is opened, one extra refcount is held on gendisk, so
gendisk won't go away. And __device_add_disk() does grab one extra
refcount on request queue, so request queue shouldn't go away when ioctl
is running.

Can you describe a bit more what the issue is to be addressed by this
patch?

Thanks,
Ming

Bart Van Assche April 16, 2020, 3:43 a.m. UTC | #12

On 2020-04-15 18:12, Luis Chamberlain wrote:
> On Wed, Apr 15, 2020 at 07:18:22AM -0700, Bart Van Assche wrote:
>> blk_get_queue() prevents concurrent freeing of struct request_queue but
>> does not prevent concurrent blk_cleanup_queue() calls.
> 
> Wouldn't concurrent blk_cleanup_queue() calls be a bug? If so should
> I make it clear that it would be or simply prevent it?

I think calling blk_cleanup_queue() while the queue refcount > 0 is well
established behavior. At least the SCSI core triggers that behavior
since a very long time. I prefer not to change that behavior.

Regarding patch 3/5: how about dropping that patch? If the queue
refcount can drop to zero while blk_trace_ioctl() is in progress I think
that should be fixed in the block_device_operations.open callback
instead of in blk_trace_ioctl().

Thanks,

Bart.

Luis Chamberlain April 16, 2020, 5:29 a.m. UTC | #13

On Wed, Apr 15, 2020 at 08:43:32PM -0700, Bart Van Assche wrote:
> On 2020-04-15 18:12, Luis Chamberlain wrote:
> > On Wed, Apr 15, 2020 at 07:18:22AM -0700, Bart Van Assche wrote:
> >> blk_get_queue() prevents concurrent freeing of struct request_queue but
> >> does not prevent concurrent blk_cleanup_queue() calls.
> > 
> > Wouldn't concurrent blk_cleanup_queue() calls be a bug? If so should
> > I make it clear that it would be or simply prevent it?
> 
> I think calling blk_cleanup_queue() while the queue refcount > 0 is well
> established behavior. At least the SCSI core triggers that behavior
> since a very long time. I prefer not to change that behavior.

I see. An alternative is to simply check if we already are cleaning up
and if so abort early on the blk_cleanup_queue(). That would allow
re-entrant calls, and just be a no-op to the additional calls. Or is
the re-entrant, two attemps to really do all the work
blk_cleanup_queue() expected functionality already?

> Regarding patch 3/5: how about dropping that patch? If the queue
> refcount can drop to zero while blk_trace_ioctl() is in progress I think
> that should be fixed in the block_device_operations.open callback
> instead of in blk_trace_ioctl().

I'll take a look, thanks!

  Luis

Luis Chamberlain April 16, 2020, 5:36 a.m. UTC | #14

On Thu, Apr 16, 2020 at 10:31:22AM +0800, Ming Lei wrote:
> On Tue, Apr 14, 2020 at 04:19:00AM +0000, Luis Chamberlain wrote:
> > Ensure that the request_queue is refcounted during its full
> > ioctl cycle. This avoids possible races against removal, given
> > blk_get_queue() also checks to ensure the queue is not dying.
> > 
> > This small race is possible if you defer removal of the request_queue
> > and userspace fires off an ioctl for the device in the meantime.
> > 
> > Cc: Bart Van Assche <bvanassche@acm.org>
> > Cc: Omar Sandoval <osandov@fb.com>
> > Cc: Hannes Reinecke <hare@suse.com>
> > Cc: Nicolai Stange <nstange@suse.de>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: yu kuai <yukuai3@huawei.com>
> > Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> > ---
> >  kernel/trace/blktrace.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> > index 15086227592f..17e144d15779 100644
> > --- a/kernel/trace/blktrace.c
> > +++ b/kernel/trace/blktrace.c
> > @@ -701,6 +701,9 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg)
> >  	if (!q)
> >  		return -ENXIO;
> >  
> > +	if (!blk_get_queue(q))
> > +		return -ENXIO;
> > +
> >  	mutex_lock(&q->blk_trace_mutex);
> >  
> >  	switch (cmd) {
> > @@ -729,6 +732,9 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg)
> >  	}
> >  
> >  	mutex_unlock(&q->blk_trace_mutex);
> > +
> > +	blk_put_queue(q);
> > +
> >  	return ret;
> >  }
> 
> Actually when bdev is opened, one extra refcount is held on gendisk, so
> gendisk won't go away. And __device_add_disk() does grab one extra
> refcount on request queue, so request queue shouldn't go away when ioctl
> is running.

Alright, then yes, this should not be needed.

  Luis

[3/5] blktrace: refcount the request_queue during ioctl

Commit Message

Comments

Patch