diff mbox

[block,regression] kernel oops triggered by removing scsi device dring IO

Message ID f7dd2dceb3767a0f1fad571b57f5f8e09afb3c3e.camel@wdc.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche April 9, 2018, 10:54 p.m. UTC
On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote:
> The oops happens during generic_make_request_checks(), in
> blk_throtl_bio() exactly.
> So if we want to bypass dying queue, we have to check this before
> generic_make_request_checks(), I think.

How about something like the patch below?

Thanks,

Bart.

Subject: [PATCH] blk-mq: Avoid that submitting a bio concurrently with device
 removal triggers a crash

Because blkcg_exit_queue() is now called from inside blk_cleanup_queue()
it is no longer safe to access cgroup information during or after the
blk_cleanup_queue() call. Hence protect the generic_make_request_checks()
call with a blk_queue_enter() / blk_queue_exit() pair.

---
 block/blk-core.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

-- 
2.16.2

Comments

Jens Axboe April 9, 2018, 10:58 p.m. UTC | #1
On 4/9/18 4:54 PM, Bart Van Assche wrote:
> On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote:
>> The oops happens during generic_make_request_checks(), in
>> blk_throtl_bio() exactly.
>> So if we want to bypass dying queue, we have to check this before
>> generic_make_request_checks(), I think.
> 
> How about something like the patch below?
> 
> Thanks,
> 
> Bart.
> 
> Subject: [PATCH] blk-mq: Avoid that submitting a bio concurrently with device
>  removal triggers a crash
> 
> Because blkcg_exit_queue() is now called from inside blk_cleanup_queue()
> it is no longer safe to access cgroup information during or after the
> blk_cleanup_queue() call. Hence protect the generic_make_request_checks()
> call with a blk_queue_enter() / blk_queue_exit() pair.
> 
> ---
>  block/blk-core.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index d69888ff52f0..0c48bef8490f 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2388,9 +2388,24 @@ blk_qc_t generic_make_request(struct bio *bio)
>  	 * yet.
>  	 */
>  	struct bio_list bio_list_on_stack[2];
> +	blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
> +		BLK_MQ_REQ_NOWAIT : 0;
> +	struct request_queue *q = bio->bi_disk->queue;
> +	bool check_result;
>  	blk_qc_t ret = BLK_QC_T_NONE;
>  
> -	if (!generic_make_request_checks(bio))
> +	if (blk_queue_enter(q, flags) < 0) {
> +		if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT))
> +			bio_wouldblock_error(bio);
> +		else
> +			bio_io_error(bio);
> +		return ret;
> +	}
> +
> +	check_result = generic_make_request_checks(bio);
> +	blk_queue_exit(q);

This ends up being nutty in the generic_make_request() case, where we
do the exact same enter/exit logic right after. That needs to get unified.
Maybe move the queue enter into generic_make_request_checks(), and exit
in the caller?
Bart Van Assche April 9, 2018, 11:07 p.m. UTC | #2
On Mon, 2018-04-09 at 16:58 -0600, Jens Axboe wrote:
> This ends up being nutty in the generic_make_request() case, where we

> do the exact same enter/exit logic right after. That needs to get unified.

> Maybe move the queue enter into generic_make_request_checks(), and exit

> in the caller?


Hello Jens,

There is a challenge: generic_make_request() supports bio chains in which
different bio's apply to different request queues and it also support bio
chains in which some bio's have the flag REQ_WAIT set and others not. Is
it safe to drop that support?

Thanks,

Bart.
Ming Lei April 10, 2018, 1:30 a.m. UTC | #3
On Mon, Apr 09, 2018 at 10:54:57PM +0000, Bart Van Assche wrote:
> On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote:
> > The oops happens during generic_make_request_checks(), in
> > blk_throtl_bio() exactly.
> > So if we want to bypass dying queue, we have to check this before
> > generic_make_request_checks(), I think.
> 
> How about something like the patch below?
> 
> Thanks,
> 
> Bart.
> 
> Subject: [PATCH] blk-mq: Avoid that submitting a bio concurrently with device
>  removal triggers a crash
> 
> Because blkcg_exit_queue() is now called from inside blk_cleanup_queue()
> it is no longer safe to access cgroup information during or after the
> blk_cleanup_queue() call. Hence protect the generic_make_request_checks()
> call with a blk_queue_enter() / blk_queue_exit() pair.
> 
> ---
>  block/blk-core.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index d69888ff52f0..0c48bef8490f 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2388,9 +2388,24 @@ blk_qc_t generic_make_request(struct bio *bio)
>  	 * yet.
>  	 */
>  	struct bio_list bio_list_on_stack[2];
> +	blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
> +		BLK_MQ_REQ_NOWAIT : 0;
> +	struct request_queue *q = bio->bi_disk->queue;
> +	bool check_result;
>  	blk_qc_t ret = BLK_QC_T_NONE;
>  
> -	if (!generic_make_request_checks(bio))
> +	if (blk_queue_enter(q, flags) < 0) {

The queue pointer need to be checked before calling blk_queue_enter
since the check is done in generic_make_request_checks().

Also is it possible to see queue freed here?
Bart Van Assche April 10, 2018, 1:34 a.m. UTC | #4
On Tue, 2018-04-10 at 09:30 +0800, Ming Lei wrote:
> Also is it possible to see queue freed here?


I think the caller should keep a reference on the request queue. Otherwise
we have a much bigger problem than a race between submitting a bio and
removing a request queue from the cgroup controller in blk_cleanup_queue().

Bart.
diff mbox

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index d69888ff52f0..0c48bef8490f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2388,9 +2388,24 @@  blk_qc_t generic_make_request(struct bio *bio)
 	 * yet.
 	 */
 	struct bio_list bio_list_on_stack[2];
+	blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
+		BLK_MQ_REQ_NOWAIT : 0;
+	struct request_queue *q = bio->bi_disk->queue;
+	bool check_result;
 	blk_qc_t ret = BLK_QC_T_NONE;
 
-	if (!generic_make_request_checks(bio))
+	if (blk_queue_enter(q, flags) < 0) {
+		if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT))
+			bio_wouldblock_error(bio);
+		else
+			bio_io_error(bio);
+		return ret;
+	}
+
+	check_result = generic_make_request_checks(bio);
+	blk_queue_exit(q);
+
+	if (!check_result)
 		goto out;
 
 	/*