diff mbox series

[v8,7/7] nbd: fix uaf in nbd_handle_reply()

Message ID 20210916093350.1410403-8-yukuai3@huawei.com (mailing list archive)
State New, archived
Headers show
Series handle unexpected message from server | expand

Commit Message

Yu Kuai Sept. 16, 2021, 9:33 a.m. UTC
There is a problem that nbd_handle_reply() might access freed request:

1) At first, a normal io is submitted and completed with scheduler:

internel_tag = blk_mq_get_tag -> get tag from sched_tags
 blk_mq_rq_ctx_init
  sched_tags->rq[internel_tag] = sched_tag->static_rq[internel_tag]
...
blk_mq_get_driver_tag
 __blk_mq_get_driver_tag -> get tag from tags
 tags->rq[tag] = sched_tag->static_rq[internel_tag]

So, both tags->rq[tag] and sched_tags->rq[internel_tag] are pointing
to the request: sched_tags->static_rq[internal_tag]. Even if the
io is finished.

2) nbd server send a reply with random tag directly:

recv_work
 nbd_handle_reply
  blk_mq_tag_to_rq(tags, tag)
   rq = tags->rq[tag]

3) if the sched_tags->static_rq is freed:

blk_mq_sched_free_requests
 blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i)
  -> step 2) access rq before clearing rq mapping
  blk_mq_clear_rq_mapping(set, tags, hctx_idx);
  __free_pages() -> rq is freed here

4) Then, nbd continue to use the freed request in nbd_handle_reply

Fix the problem by get 'q_usage_counter' before blk_mq_tag_to_rq(),
thus request is ensured not to be freed because 'q_usage_counter' is
not zero.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/block/nbd.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Ming Lei Sept. 16, 2021, 12:58 p.m. UTC | #1
On Thu, Sep 16, 2021 at 05:33:50PM +0800, Yu Kuai wrote:
> There is a problem that nbd_handle_reply() might access freed request:
> 
> 1) At first, a normal io is submitted and completed with scheduler:
> 
> internel_tag = blk_mq_get_tag -> get tag from sched_tags
>  blk_mq_rq_ctx_init
>   sched_tags->rq[internel_tag] = sched_tag->static_rq[internel_tag]
> ...
> blk_mq_get_driver_tag
>  __blk_mq_get_driver_tag -> get tag from tags
>  tags->rq[tag] = sched_tag->static_rq[internel_tag]
> 
> So, both tags->rq[tag] and sched_tags->rq[internel_tag] are pointing
> to the request: sched_tags->static_rq[internal_tag]. Even if the
> io is finished.
> 
> 2) nbd server send a reply with random tag directly:
> 
> recv_work
>  nbd_handle_reply
>   blk_mq_tag_to_rq(tags, tag)
>    rq = tags->rq[tag]
> 
> 3) if the sched_tags->static_rq is freed:
> 
> blk_mq_sched_free_requests
>  blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i)
>   -> step 2) access rq before clearing rq mapping
>   blk_mq_clear_rq_mapping(set, tags, hctx_idx);
>   __free_pages() -> rq is freed here
> 
> 4) Then, nbd continue to use the freed request in nbd_handle_reply
> 
> Fix the problem by get 'q_usage_counter' before blk_mq_tag_to_rq(),
> thus request is ensured not to be freed because 'q_usage_counter' is
> not zero.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/block/nbd.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 69dc5eac9ad3..b3a47fc6237f 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -825,6 +825,7 @@ static void recv_work(struct work_struct *work)
>  						     work);
>  	struct nbd_device *nbd = args->nbd;
>  	struct nbd_config *config = nbd->config;
> +	struct request_queue *q = nbd->disk->queue;
>  	struct nbd_sock *nsock;
>  	struct nbd_cmd *cmd;
>  	struct request *rq;
> @@ -835,7 +836,20 @@ static void recv_work(struct work_struct *work)
>  		if (nbd_read_reply(nbd, args->index, &reply))
>  			break;
>  
> +		/*
> +		 * Grab .q_usage_counter so request pool won't go away, then no
> +		 * request use-after-free is possible during nbd_handle_reply().
> +		 * If queue is frozen, there won't be any inflight requests, we
> +		 * needn't to handle the incoming garbage message.
> +		 */
> +		if (!percpu_ref_tryget(&q->q_usage_counter)) {
> +			dev_err(disk_to_dev(nbd->disk), "%s: no io inflight\n",
> +				__func__);
> +			break;
> +		}
> +
>  		cmd = nbd_handle_reply(nbd, args->index, &reply);
> +		percpu_ref_put(&q->q_usage_counter);
>  		if (IS_ERR(cmd))
>  			break;

The refcount needs to be grabbed when completing the request because
the request may be completed from other code path, then the request pool
will be freed from that code path when the request is referred.


Thanks,
Ming
Yu Kuai Sept. 16, 2021, 1:10 p.m. UTC | #2
On 2021/09/16 20:58, Ming Lei wrote:
> On Thu, Sep 16, 2021 at 05:33:50PM +0800, Yu Kuai wrote:
>> There is a problem that nbd_handle_reply() might access freed request:
>>
>> 1) At first, a normal io is submitted and completed with scheduler:
>>
>> internel_tag = blk_mq_get_tag -> get tag from sched_tags
>>   blk_mq_rq_ctx_init
>>    sched_tags->rq[internel_tag] = sched_tag->static_rq[internel_tag]
>> ...
>> blk_mq_get_driver_tag
>>   __blk_mq_get_driver_tag -> get tag from tags
>>   tags->rq[tag] = sched_tag->static_rq[internel_tag]
>>
>> So, both tags->rq[tag] and sched_tags->rq[internel_tag] are pointing
>> to the request: sched_tags->static_rq[internal_tag]. Even if the
>> io is finished.
>>
>> 2) nbd server send a reply with random tag directly:
>>
>> recv_work
>>   nbd_handle_reply
>>    blk_mq_tag_to_rq(tags, tag)
>>     rq = tags->rq[tag]
>>
>> 3) if the sched_tags->static_rq is freed:
>>
>> blk_mq_sched_free_requests
>>   blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i)
>>    -> step 2) access rq before clearing rq mapping
>>    blk_mq_clear_rq_mapping(set, tags, hctx_idx);
>>    __free_pages() -> rq is freed here
>>
>> 4) Then, nbd continue to use the freed request in nbd_handle_reply
>>
>> Fix the problem by get 'q_usage_counter' before blk_mq_tag_to_rq(),
>> thus request is ensured not to be freed because 'q_usage_counter' is
>> not zero.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/block/nbd.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>> index 69dc5eac9ad3..b3a47fc6237f 100644
>> --- a/drivers/block/nbd.c
>> +++ b/drivers/block/nbd.c
>> @@ -825,6 +825,7 @@ static void recv_work(struct work_struct *work)
>>   						     work);
>>   	struct nbd_device *nbd = args->nbd;
>>   	struct nbd_config *config = nbd->config;
>> +	struct request_queue *q = nbd->disk->queue;
>>   	struct nbd_sock *nsock;
>>   	struct nbd_cmd *cmd;
>>   	struct request *rq;
>> @@ -835,7 +836,20 @@ static void recv_work(struct work_struct *work)
>>   		if (nbd_read_reply(nbd, args->index, &reply))
>>   			break;
>>   
>> +		/*
>> +		 * Grab .q_usage_counter so request pool won't go away, then no
>> +		 * request use-after-free is possible during nbd_handle_reply().
>> +		 * If queue is frozen, there won't be any inflight requests, we
>> +		 * needn't to handle the incoming garbage message.
>> +		 */
>> +		if (!percpu_ref_tryget(&q->q_usage_counter)) {
>> +			dev_err(disk_to_dev(nbd->disk), "%s: no io inflight\n",
>> +				__func__);
>> +			break;
>> +		}
>> +
>>   		cmd = nbd_handle_reply(nbd, args->index, &reply);
>> +		percpu_ref_put(&q->q_usage_counter);
>>   		if (IS_ERR(cmd))
>>   			break;
> 
> The refcount needs to be grabbed when completing the request because
> the request may be completed from other code path, then the request pool
> will be freed from that code path when the request is referred.

Hi,

The request can't complete concurrently, thus put ref here is safe.

There used to be a commet here that I tried to explain it... It's fine
to me to move it behind anyway.

Thanks,
Kuai
Ming Lei Sept. 16, 2021, 1:55 p.m. UTC | #3
On Thu, Sep 16, 2021 at 09:10:37PM +0800, yukuai (C) wrote:
> On 2021/09/16 20:58, Ming Lei wrote:
> > On Thu, Sep 16, 2021 at 05:33:50PM +0800, Yu Kuai wrote:
> > > There is a problem that nbd_handle_reply() might access freed request:
> > > 
> > > 1) At first, a normal io is submitted and completed with scheduler:
> > > 
> > > internel_tag = blk_mq_get_tag -> get tag from sched_tags
> > >   blk_mq_rq_ctx_init
> > >    sched_tags->rq[internel_tag] = sched_tag->static_rq[internel_tag]
> > > ...
> > > blk_mq_get_driver_tag
> > >   __blk_mq_get_driver_tag -> get tag from tags
> > >   tags->rq[tag] = sched_tag->static_rq[internel_tag]
> > > 
> > > So, both tags->rq[tag] and sched_tags->rq[internel_tag] are pointing
> > > to the request: sched_tags->static_rq[internal_tag]. Even if the
> > > io is finished.
> > > 
> > > 2) nbd server send a reply with random tag directly:
> > > 
> > > recv_work
> > >   nbd_handle_reply
> > >    blk_mq_tag_to_rq(tags, tag)
> > >     rq = tags->rq[tag]
> > > 
> > > 3) if the sched_tags->static_rq is freed:
> > > 
> > > blk_mq_sched_free_requests
> > >   blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i)
> > >    -> step 2) access rq before clearing rq mapping
> > >    blk_mq_clear_rq_mapping(set, tags, hctx_idx);
> > >    __free_pages() -> rq is freed here
> > > 
> > > 4) Then, nbd continue to use the freed request in nbd_handle_reply
> > > 
> > > Fix the problem by get 'q_usage_counter' before blk_mq_tag_to_rq(),
> > > thus request is ensured not to be freed because 'q_usage_counter' is
> > > not zero.
> > > 
> > > Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> > > ---
> > >   drivers/block/nbd.c | 14 ++++++++++++++
> > >   1 file changed, 14 insertions(+)
> > > 
> > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> > > index 69dc5eac9ad3..b3a47fc6237f 100644
> > > --- a/drivers/block/nbd.c
> > > +++ b/drivers/block/nbd.c
> > > @@ -825,6 +825,7 @@ static void recv_work(struct work_struct *work)
> > >   						     work);
> > >   	struct nbd_device *nbd = args->nbd;
> > >   	struct nbd_config *config = nbd->config;
> > > +	struct request_queue *q = nbd->disk->queue;
> > >   	struct nbd_sock *nsock;
> > >   	struct nbd_cmd *cmd;
> > >   	struct request *rq;
> > > @@ -835,7 +836,20 @@ static void recv_work(struct work_struct *work)
> > >   		if (nbd_read_reply(nbd, args->index, &reply))
> > >   			break;
> > > +		/*
> > > +		 * Grab .q_usage_counter so request pool won't go away, then no
> > > +		 * request use-after-free is possible during nbd_handle_reply().
> > > +		 * If queue is frozen, there won't be any inflight requests, we
> > > +		 * needn't to handle the incoming garbage message.
> > > +		 */
> > > +		if (!percpu_ref_tryget(&q->q_usage_counter)) {
> > > +			dev_err(disk_to_dev(nbd->disk), "%s: no io inflight\n",
> > > +				__func__);
> > > +			break;
> > > +		}
> > > +
> > >   		cmd = nbd_handle_reply(nbd, args->index, &reply);
> > > +		percpu_ref_put(&q->q_usage_counter);
> > >   		if (IS_ERR(cmd))
> > >   			break;
> > 
> > The refcount needs to be grabbed when completing the request because
> > the request may be completed from other code path, then the request pool
> > will be freed from that code path when the request is referred.
> 
> Hi,
> 
> The request can't complete concurrently, thus put ref here is safe.
> 
> There used to be a commet here that I tried to explain it... It's fine
> to me to move it behind anyway.

Never see such comment. cmd->lock isn't held here, so I believe
concurrent completion is possible here.


Thanks,
Ming
Yu Kuai Sept. 16, 2021, 2:05 p.m. UTC | #4
On 2021/09/16 21:55, Ming Lei write:
> On Thu, Sep 16, 2021 at 09:10:37PM +0800, yukuai (C) wrote:
>> On 2021/09/16 20:58, Ming Lei wrote:
>>> On Thu, Sep 16, 2021 at 05:33:50PM +0800, Yu Kuai wrote:
>>>> There is a problem that nbd_handle_reply() might access freed request:
>>>>
>>>> 1) At first, a normal io is submitted and completed with scheduler:
>>>>
>>>> internel_tag = blk_mq_get_tag -> get tag from sched_tags
>>>>    blk_mq_rq_ctx_init
>>>>     sched_tags->rq[internel_tag] = sched_tag->static_rq[internel_tag]
>>>> ...
>>>> blk_mq_get_driver_tag
>>>>    __blk_mq_get_driver_tag -> get tag from tags
>>>>    tags->rq[tag] = sched_tag->static_rq[internel_tag]
>>>>
>>>> So, both tags->rq[tag] and sched_tags->rq[internel_tag] are pointing
>>>> to the request: sched_tags->static_rq[internal_tag]. Even if the
>>>> io is finished.
>>>>
>>>> 2) nbd server send a reply with random tag directly:
>>>>
>>>> recv_work
>>>>    nbd_handle_reply
>>>>     blk_mq_tag_to_rq(tags, tag)
>>>>      rq = tags->rq[tag]
>>>>
>>>> 3) if the sched_tags->static_rq is freed:
>>>>
>>>> blk_mq_sched_free_requests
>>>>    blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i)
>>>>     -> step 2) access rq before clearing rq mapping
>>>>     blk_mq_clear_rq_mapping(set, tags, hctx_idx);
>>>>     __free_pages() -> rq is freed here
>>>>
>>>> 4) Then, nbd continue to use the freed request in nbd_handle_reply
>>>>
>>>> Fix the problem by get 'q_usage_counter' before blk_mq_tag_to_rq(),
>>>> thus request is ensured not to be freed because 'q_usage_counter' is
>>>> not zero.
>>>>
>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>>> ---
>>>>    drivers/block/nbd.c | 14 ++++++++++++++
>>>>    1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>>>> index 69dc5eac9ad3..b3a47fc6237f 100644
>>>> --- a/drivers/block/nbd.c
>>>> +++ b/drivers/block/nbd.c
>>>> @@ -825,6 +825,7 @@ static void recv_work(struct work_struct *work)
>>>>    						     work);
>>>>    	struct nbd_device *nbd = args->nbd;
>>>>    	struct nbd_config *config = nbd->config;
>>>> +	struct request_queue *q = nbd->disk->queue;
>>>>    	struct nbd_sock *nsock;
>>>>    	struct nbd_cmd *cmd;
>>>>    	struct request *rq;
>>>> @@ -835,7 +836,20 @@ static void recv_work(struct work_struct *work)
>>>>    		if (nbd_read_reply(nbd, args->index, &reply))
>>>>    			break;
>>>> +		/*
>>>> +		 * Grab .q_usage_counter so request pool won't go away, then no
>>>> +		 * request use-after-free is possible during nbd_handle_reply().
>>>> +		 * If queue is frozen, there won't be any inflight requests, we
>>>> +		 * needn't to handle the incoming garbage message.
>>>> +		 */
>>>> +		if (!percpu_ref_tryget(&q->q_usage_counter)) {
>>>> +			dev_err(disk_to_dev(nbd->disk), "%s: no io inflight\n",
>>>> +				__func__);
>>>> +			break;
>>>> +		}
>>>> +
>>>>    		cmd = nbd_handle_reply(nbd, args->index, &reply);
>>>> +		percpu_ref_put(&q->q_usage_counter);
>>>>    		if (IS_ERR(cmd))
>>>>    			break;
>>>
>>> The refcount needs to be grabbed when completing the request because
>>> the request may be completed from other code path, then the request pool
>>> will be freed from that code path when the request is referred.
>>
>> Hi,
>>
>> The request can't complete concurrently, thus put ref here is safe.
>>
>> There used to be a commet here that I tried to explain it... It's fine
>> to me to move it behind anyway.
> 
> Never see such comment. cmd->lock isn't held here, so I believe
> concurrent completion is possible here.
> 

After patch 2, __test_and_clear_bit(NBD_CMD_INFLIGHT) must pass
while cmd->lock is held before completing the request, thus request
completion won't concurrent...

Thanks,
Kuai
diff mbox series

Patch

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 69dc5eac9ad3..b3a47fc6237f 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -825,6 +825,7 @@  static void recv_work(struct work_struct *work)
 						     work);
 	struct nbd_device *nbd = args->nbd;
 	struct nbd_config *config = nbd->config;
+	struct request_queue *q = nbd->disk->queue;
 	struct nbd_sock *nsock;
 	struct nbd_cmd *cmd;
 	struct request *rq;
@@ -835,7 +836,20 @@  static void recv_work(struct work_struct *work)
 		if (nbd_read_reply(nbd, args->index, &reply))
 			break;
 
+		/*
+		 * Grab .q_usage_counter so request pool won't go away, then no
+		 * request use-after-free is possible during nbd_handle_reply().
+		 * If queue is frozen, there won't be any inflight requests, we
+		 * needn't to handle the incoming garbage message.
+		 */
+		if (!percpu_ref_tryget(&q->q_usage_counter)) {
+			dev_err(disk_to_dev(nbd->disk), "%s: no io inflight\n",
+				__func__);
+			break;
+		}
+
 		cmd = nbd_handle_reply(nbd, args->index, &reply);
+		percpu_ref_put(&q->q_usage_counter);
 		if (IS_ERR(cmd))
 			break;