Message ID | f2f17f46-ff3a-01c4-bfd4-8dec836ec343@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: don't dereference request after flush insertion | expand |
On Sat, Oct 16, 2021 at 07:35:39PM -0600, Jens Axboe wrote: > We could have a race here, where the request gets freed before we call > into blk_mq_run_hw_queue(). If this happens, we cannot rely on the state > of the request. > > Grab the hardware context before inserting the flush. > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > --- > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 2197cfbf081f..22b30a89bf3a 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2468,9 +2468,10 @@ void blk_mq_submit_bio(struct bio *bio) > } > > if (unlikely(is_flush_fua)) { > + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; > /* Bypass scheduler for flush requests */ > blk_insert_flush(rq); > - blk_mq_run_hw_queue(rq->mq_hctx, true); > + blk_mq_run_hw_queue(hctx, true); If the request is freed before running queue, the request queue could be released and the hctx may be freed. Thanks, Ming
On 10/17/21 7:49 PM, Ming Lei wrote: > On Sat, Oct 16, 2021 at 07:35:39PM -0600, Jens Axboe wrote: >> We could have a race here, where the request gets freed before we call >> into blk_mq_run_hw_queue(). If this happens, we cannot rely on the state >> of the request. >> >> Grab the hardware context before inserting the flush. >> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> >> --- >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 2197cfbf081f..22b30a89bf3a 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -2468,9 +2468,10 @@ void blk_mq_submit_bio(struct bio *bio) >> } >> >> if (unlikely(is_flush_fua)) { >> + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; >> /* Bypass scheduler for flush requests */ >> blk_insert_flush(rq); >> - blk_mq_run_hw_queue(rq->mq_hctx, true); >> + blk_mq_run_hw_queue(hctx, true); > > If the request is freed before running queue, the request queue could > be released and the hctx may be freed. No, we still hold a queue enter ref.
On Sun, Oct 17, 2021 at 07:50:24PM -0600, Jens Axboe wrote: > On 10/17/21 7:49 PM, Ming Lei wrote: > > On Sat, Oct 16, 2021 at 07:35:39PM -0600, Jens Axboe wrote: > >> We could have a race here, where the request gets freed before we call > >> into blk_mq_run_hw_queue(). If this happens, we cannot rely on the state > >> of the request. > >> > >> Grab the hardware context before inserting the flush. > >> > >> Signed-off-by: Jens Axboe <axboe@kernel.dk> > >> > >> --- > >> > >> diff --git a/block/blk-mq.c b/block/blk-mq.c > >> index 2197cfbf081f..22b30a89bf3a 100644 > >> --- a/block/blk-mq.c > >> +++ b/block/blk-mq.c > >> @@ -2468,9 +2468,10 @@ void blk_mq_submit_bio(struct bio *bio) > >> } > >> > >> if (unlikely(is_flush_fua)) { > >> + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; > >> /* Bypass scheduler for flush requests */ > >> blk_insert_flush(rq); > >> - blk_mq_run_hw_queue(rq->mq_hctx, true); > >> + blk_mq_run_hw_queue(hctx, true); > > > > If the request is freed before running queue, the request queue could > > be released and the hctx may be freed. > > No, we still hold a queue enter ref. But that one is released when rq is freed since ac7c5675fa45 ("blk-mq: allow blk_mq_make_request to consume the q_usage_counter reference"), isn't it? Thanks, Ming
On 10/17/21 8:02 PM, Ming Lei wrote: > On Sun, Oct 17, 2021 at 07:50:24PM -0600, Jens Axboe wrote: >> On 10/17/21 7:49 PM, Ming Lei wrote: >>> On Sat, Oct 16, 2021 at 07:35:39PM -0600, Jens Axboe wrote: >>>> We could have a race here, where the request gets freed before we call >>>> into blk_mq_run_hw_queue(). If this happens, we cannot rely on the state >>>> of the request. >>>> >>>> Grab the hardware context before inserting the flush. >>>> >>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>> >>>> --- >>>> >>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>> index 2197cfbf081f..22b30a89bf3a 100644 >>>> --- a/block/blk-mq.c >>>> +++ b/block/blk-mq.c >>>> @@ -2468,9 +2468,10 @@ void blk_mq_submit_bio(struct bio *bio) >>>> } >>>> >>>> if (unlikely(is_flush_fua)) { >>>> + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; >>>> /* Bypass scheduler for flush requests */ >>>> blk_insert_flush(rq); >>>> - blk_mq_run_hw_queue(rq->mq_hctx, true); >>>> + blk_mq_run_hw_queue(hctx, true); >>> >>> If the request is freed before running queue, the request queue could >>> be released and the hctx may be freed. >> >> No, we still hold a queue enter ref. > > But that one is released when rq is freed since ac7c5675fa45 ("blk-mq: allow > blk_mq_make_request to consume the q_usage_counter reference"), isn't > it? Yes I think you're right, we need to grab an extra ref in the flush case as we're using it after it may potentially have completed. We could probably make it smarter, but little point for just handling a flush. commit ea0f672e7cc66e7ec12468ff907de6064656b6e7 Author: Jens Axboe <axboe@kernel.dk> Date: Sat Oct 16 07:34:49 2021 -0600 block: grab extra reference for flush insertion We could have a race here, where the request gets freed before we call into blk_mq_run_hw_queue(). If this happens, we cannot rely on the state of the request, nor can we rely on the queue still being alive. Grab an extra queue reference before inserting the flush and then running the queue, to ensure that it is still valid. Signed-off-by: Jens Axboe <axboe@kernel.dk> diff --git a/block/blk-mq.c b/block/blk-mq.c index 87dc2debedfb..d28423ccfe2b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2284,9 +2284,18 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio) } if (unlikely(is_flush_fua)) { + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; + + /* + * Our queue ref may disappears as soon as the flush is + * inserted, grab an extra one. + */ + percpu_ref_tryget_live(&q->q_usage_counter); + /* Bypass scheduler for flush requests */ blk_insert_flush(rq); - blk_mq_run_hw_queue(rq->mq_hctx, true); + blk_mq_run_hw_queue(hctx, true); + blk_queue_exit(q); } else if (plug && (q->nr_hw_queues == 1 || blk_mq_is_shared_tags(rq->mq_hctx->flags) || q->mq_ops->commit_rqs || !blk_queue_nonrot(q))) {
On Mon, Oct 18, 2021 at 10:02:32AM +0800, Ming Lei wrote: > On Sun, Oct 17, 2021 at 07:50:24PM -0600, Jens Axboe wrote: > > On 10/17/21 7:49 PM, Ming Lei wrote: > > > On Sat, Oct 16, 2021 at 07:35:39PM -0600, Jens Axboe wrote: > > >> We could have a race here, where the request gets freed before we call > > >> into blk_mq_run_hw_queue(). If this happens, we cannot rely on the state > > >> of the request. > > >> > > >> Grab the hardware context before inserting the flush. > > >> > > >> Signed-off-by: Jens Axboe <axboe@kernel.dk> > > >> > > >> --- > > >> > > >> diff --git a/block/blk-mq.c b/block/blk-mq.c > > >> index 2197cfbf081f..22b30a89bf3a 100644 > > >> --- a/block/blk-mq.c > > >> +++ b/block/blk-mq.c > > >> @@ -2468,9 +2468,10 @@ void blk_mq_submit_bio(struct bio *bio) > > >> } > > >> > > >> if (unlikely(is_flush_fua)) { > > >> + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; > > >> /* Bypass scheduler for flush requests */ > > >> blk_insert_flush(rq); > > >> - blk_mq_run_hw_queue(rq->mq_hctx, true); > > >> + blk_mq_run_hw_queue(hctx, true); > > > > > > If the request is freed before running queue, the request queue could > > > be released and the hctx may be freed. > > > > No, we still hold a queue enter ref. > > But that one is released when rq is freed since ac7c5675fa45 ("blk-mq: allow > blk_mq_make_request to consume the q_usage_counter reference"), isn't > it? With commit ac7c5675fa45, any reference to hctx after queuing request could lead to UAF in the code path of blk_mq_submit_bio(). Maybe we need to grab two ref in queue enter, and release one after the bio is submitted. Thanks, Ming
On 10/17/21 8:11 PM, Ming Lei wrote: > On Mon, Oct 18, 2021 at 10:02:32AM +0800, Ming Lei wrote: >> On Sun, Oct 17, 2021 at 07:50:24PM -0600, Jens Axboe wrote: >>> On 10/17/21 7:49 PM, Ming Lei wrote: >>>> On Sat, Oct 16, 2021 at 07:35:39PM -0600, Jens Axboe wrote: >>>>> We could have a race here, where the request gets freed before we call >>>>> into blk_mq_run_hw_queue(). If this happens, we cannot rely on the state >>>>> of the request. >>>>> >>>>> Grab the hardware context before inserting the flush. >>>>> >>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>>> >>>>> --- >>>>> >>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>> index 2197cfbf081f..22b30a89bf3a 100644 >>>>> --- a/block/blk-mq.c >>>>> +++ b/block/blk-mq.c >>>>> @@ -2468,9 +2468,10 @@ void blk_mq_submit_bio(struct bio *bio) >>>>> } >>>>> >>>>> if (unlikely(is_flush_fua)) { >>>>> + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; >>>>> /* Bypass scheduler for flush requests */ >>>>> blk_insert_flush(rq); >>>>> - blk_mq_run_hw_queue(rq->mq_hctx, true); >>>>> + blk_mq_run_hw_queue(hctx, true); >>>> >>>> If the request is freed before running queue, the request queue could >>>> be released and the hctx may be freed. >>> >>> No, we still hold a queue enter ref. >> >> But that one is released when rq is freed since ac7c5675fa45 ("blk-mq: allow >> blk_mq_make_request to consume the q_usage_counter reference"), isn't >> it? > > With commit ac7c5675fa45, any reference to hctx after queuing request could > lead to UAF in the code path of blk_mq_submit_bio(). Maybe we need to grab > two ref in queue enter, and release one after the bio is submitted. I'd rather audit and see if there are any, because extra get+put isn't exactly free.
On Sun, Oct 17, 2021 at 08:16:25PM -0600, Jens Axboe wrote: > On 10/17/21 8:11 PM, Ming Lei wrote: > > On Mon, Oct 18, 2021 at 10:02:32AM +0800, Ming Lei wrote: > >> On Sun, Oct 17, 2021 at 07:50:24PM -0600, Jens Axboe wrote: > >>> On 10/17/21 7:49 PM, Ming Lei wrote: > >>>> On Sat, Oct 16, 2021 at 07:35:39PM -0600, Jens Axboe wrote: > >>>>> We could have a race here, where the request gets freed before we call > >>>>> into blk_mq_run_hw_queue(). If this happens, we cannot rely on the state > >>>>> of the request. > >>>>> > >>>>> Grab the hardware context before inserting the flush. > >>>>> > >>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> > >>>>> > >>>>> --- > >>>>> > >>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c > >>>>> index 2197cfbf081f..22b30a89bf3a 100644 > >>>>> --- a/block/blk-mq.c > >>>>> +++ b/block/blk-mq.c > >>>>> @@ -2468,9 +2468,10 @@ void blk_mq_submit_bio(struct bio *bio) > >>>>> } > >>>>> > >>>>> if (unlikely(is_flush_fua)) { > >>>>> + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; > >>>>> /* Bypass scheduler for flush requests */ > >>>>> blk_insert_flush(rq); > >>>>> - blk_mq_run_hw_queue(rq->mq_hctx, true); > >>>>> + blk_mq_run_hw_queue(hctx, true); > >>>> > >>>> If the request is freed before running queue, the request queue could > >>>> be released and the hctx may be freed. > >>> > >>> No, we still hold a queue enter ref. > >> > >> But that one is released when rq is freed since ac7c5675fa45 ("blk-mq: allow > >> blk_mq_make_request to consume the q_usage_counter reference"), isn't > >> it? > > > > With commit ac7c5675fa45, any reference to hctx after queuing request could > > lead to UAF in the code path of blk_mq_submit_bio(). Maybe we need to grab > > two ref in queue enter, and release one after the bio is submitted. > > I'd rather audit and see if there are any, because extra get+put isn't > exactly free. Only direct issue needn't that, looks others do need grab one extra ref if the request has to be queued somewhere before dispatch. Thanks, Ming
On Sat, Oct 16, 2021 at 07:35:39PM -0600, Jens Axboe wrote: > We could have a race here, where the request gets freed before we call > into blk_mq_run_hw_queue(). If this happens, we cannot rely on the state > of the request. > > Grab the hardware context before inserting the flush. > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > --- > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 2197cfbf081f..22b30a89bf3a 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2468,9 +2468,10 @@ void blk_mq_submit_bio(struct bio *bio) > } > > if (unlikely(is_flush_fua)) { > + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; > /* Bypass scheduler for flush requests */ > blk_insert_flush(rq); > - blk_mq_run_hw_queue(rq->mq_hctx, true); > + blk_mq_run_hw_queue(hctx, true); > } else if (plug && (q->nr_hw_queues == 1 || > blk_mq_is_shared_tags(rq->mq_hctx->flags) || > q->mq_ops->commit_rqs || !blk_queue_nonrot(q))) { From report in [1], no device close & queue release is involved, and request freeing could be much easier to trigger than queue release, so looks fine: Reviewed-by: Ming Lei <ming.lei@redhat.com> Fixes: f328476e373a ("blk-mq: cleanup blk_mq_submit_bio") [1] https://lore.kernel.org/linux-block/23531d29-9d96-6744-bab9-797e65379037@kernel.dk/T/#t thanks, Ming
On 10/17/21 8:42 PM, Ming Lei wrote: > On Sat, Oct 16, 2021 at 07:35:39PM -0600, Jens Axboe wrote: >> We could have a race here, where the request gets freed before we call >> into blk_mq_run_hw_queue(). If this happens, we cannot rely on the state >> of the request. >> >> Grab the hardware context before inserting the flush. >> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> >> --- >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 2197cfbf081f..22b30a89bf3a 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -2468,9 +2468,10 @@ void blk_mq_submit_bio(struct bio *bio) >> } >> >> if (unlikely(is_flush_fua)) { >> + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; >> /* Bypass scheduler for flush requests */ >> blk_insert_flush(rq); >> - blk_mq_run_hw_queue(rq->mq_hctx, true); >> + blk_mq_run_hw_queue(hctx, true); >> } else if (plug && (q->nr_hw_queues == 1 || >> blk_mq_is_shared_tags(rq->mq_hctx->flags) || >> q->mq_ops->commit_rqs || !blk_queue_nonrot(q))) { > > From report in [1], no device close & queue release is involved, and > request freeing could be much easier to trigger than queue release, > so looks fine: > > Reviewed-by: Ming Lei <ming.lei@redhat.com> > > Fixes: f328476e373a ("blk-mq: cleanup blk_mq_submit_bio") Wasn't in the one I sent out, but I do have the fixes as well. Thanks, I'll add your reviewed-by.
I think this can be done much simpler. The only place in blk_insert_flush that actually needs to run the queue is the case where no flushes are needed, as all the others are handled via the flush state machine and the requeue list. So something like this should work: diff --git a/block/blk-flush.c b/block/blk-flush.c index 4201728bf3a5a..1fce6d16e6d3a 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -421,7 +421,7 @@ void blk_insert_flush(struct request *rq) */ if ((policy & REQ_FSEQ_DATA) && !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) { - blk_mq_request_bypass_insert(rq, false, false); + blk_mq_request_bypass_insert(rq, false, true); return; } diff --git a/block/blk-mq.c b/block/blk-mq.c index f296edff47246..89a142b61f456 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2314,7 +2314,6 @@ void blk_mq_submit_bio(struct bio *bio) if (unlikely(is_flush_fua)) { /* Bypass scheduler for flush requests */ blk_insert_flush(rq); - blk_mq_run_hw_queue(rq->mq_hctx, true); } else if (plug && (q->nr_hw_queues == 1 || blk_mq_is_shared_tags(rq->mq_hctx->flags) || q->mq_ops->commit_rqs || !blk_queue_nonrot(q))) {
diff --git a/block/blk-mq.c b/block/blk-mq.c index 2197cfbf081f..22b30a89bf3a 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2468,9 +2468,10 @@ void blk_mq_submit_bio(struct bio *bio) } if (unlikely(is_flush_fua)) { + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; /* Bypass scheduler for flush requests */ blk_insert_flush(rq); - blk_mq_run_hw_queue(rq->mq_hctx, true); + blk_mq_run_hw_queue(hctx, true); } else if (plug && (q->nr_hw_queues == 1 || blk_mq_is_shared_tags(rq->mq_hctx->flags) || q->mq_ops->commit_rqs || !blk_queue_nonrot(q))) {
We could have a race here, where the request gets freed before we call into blk_mq_run_hw_queue(). If this happens, we cannot rely on the state of the request. Grab the hardware context before inserting the flush. Signed-off-by: Jens Axboe <axboe@kernel.dk> ---