diff mbox series

[RFC] tcmu: wrong input from userspace can confuse tcmu

Message ID 20200430151041.31053-1-bstroesser@ts.fujitsu.com (mailing list archive)
State New, archived
Headers show
Series [RFC] tcmu: wrong input from userspace can confuse tcmu | expand

Commit Message

Bodo Stroesser April 30, 2020, 3:10 p.m. UTC
Hi,

When tcmu queues a new command - no matter whether in command
ring or in qfull_queue - a cmd_id from IDR udev->commands is
assigned to the command.

If userspaces sends a wrong command completion containing the
cmd_id of a command on the qfull_queue, tcmu_handle_completions()
finds the command in the IDR and calls tcmu_handle_completion()
for it. This might do some nasty things, because commands in
qfull_queue do not have a valid dbi list.

Of course this is easy to fix. E.g.:



To avoid the second idr_*() call in main data path, it would also
be possible to not replace the idr_remove() by idr_find(), but
in case cmd was found without TCMU_CMD_BIT_INFLIGHT being set,
during error handling re-add the cmd to the idr with the same id
as before by calling idr_alloc(...,cmd_id, crmdd_id+1, ...).


OTOH, I'm wondering whether the better solution would be to
remove idr_alloc() from tcmu_setup_cmd_timer() and add it
to queue_cmd_ring() immediately before or after it calls
tcmu_setup_cmd_timer().
Doing so would mean that commands in qfull_queue no longer have
a cmd_id (which is not necessary AFAICS) and therefore the problem
cannot happen, because tcmu_handle_completions() cannot find them
in the IDR.

Of course, this change would cause a number of further small
changes in target_core_user.c, but the effort is not too high,
it seems.

What do you think is the best solution?

Thank you,
Bodo

Comments

Mike Christie April 30, 2020, 3:59 p.m. UTC | #1
On 4/30/20 10:10 AM, Bodo Stroesser wrote:
> Hi,
> 
> When tcmu queues a new command - no matter whether in command
> ring or in qfull_queue - a cmd_id from IDR udev->commands is
> assigned to the command.
> 
> If userspaces sends a wrong command completion containing the
> cmd_id of a command on the qfull_queue, tcmu_handle_completions()
> finds the command in the IDR and calls tcmu_handle_completion()
> for it. This might do some nasty things, because commands in
> qfull_queue do not have a valid dbi list.
> 
> Of course this is easy to fix. E.g.:
> 
> 
> --- a/drivers/target/target_core_user.c	2020-04-30 14:15:36.177184668 +0200
> +++ b/drivers/target/target_core_user.c	2020-04-30 14:29:43.331882066 +0200
> @@ -1242,13 +1242,14 @@ static unsigned int tcmu_handle_completi
>  		}
>  		WARN_ON(tcmu_hdr_get_op(entry->hdr.len_op) != TCMU_OP_CMD);
>  
> -		cmd = idr_remove(&udev->commands, entry->hdr.cmd_id);
> -		if (!cmd) {
> +		cmd = idr_find(&udev->commands, entry->hdr.cmd_id);
> +		if (!cmd || !test_bit(TCMU_CMD_BIT_INFLIGHT, &cmd->flags)) {
>  			pr_err("cmd_id %u not found, ring is broken\n",
>  			       entry->hdr.cmd_id);
>  			set_bit(TCMU_DEV_BIT_BROKEN, &udev->flags);
>  			break;
>  		}
> +		idr_remove(&udev->commands, entry->hdr.cmd_id);
>  
>  		tcmu_handle_completion(cmd, entry);
>  
> 
> To avoid the second idr_*() call in main data path, it would also
> be possible to not replace the idr_remove() by idr_find(), but
> in case cmd was found without TCMU_CMD_BIT_INFLIGHT being set,
> during error handling re-add the cmd to the idr with the same id
> as before by calling idr_alloc(...,cmd_id, crmdd_id+1, ...).
> 
> 
> OTOH, I'm wondering whether the better solution would be to
> remove idr_alloc() from tcmu_setup_cmd_timer() and add it
> to queue_cmd_ring() immediately before or after it calls
> tcmu_setup_cmd_timer().
> Doing so would mean that commands in qfull_queue no longer have
> a cmd_id (which is not necessary AFAICS) and therefore the problem

It's done for logging, but its only debug logging so we could just print
the pointer value.

> cannot happen, because tcmu_handle_completions() cannot find them
> in the IDR.
> 

This is probably best if it's not a lot of trouble.

If you go this route then in tcmu_reset_ring I think I made a mistake
and we just continue the loop if TCMU_CMD_BIT_INFLIGHT is not set. I
think we want to do:

for cmd in qfull_list
    target_complete_cmd()

for cmd in idr
     target_complete_cmd()
Bodo Stroesser April 30, 2020, 4:40 p.m. UTC | #2
On 04/30/20 17:59, Mike Christie wrote:
> On 4/30/20 10:10 AM, Bodo Stroesser wrote:
>> Hi,
>>
>> When tcmu queues a new command - no matter whether in command
>> ring or in qfull_queue - a cmd_id from IDR udev->commands is
>> assigned to the command.
>>
>> If userspaces sends a wrong command completion containing the
>> cmd_id of a command on the qfull_queue, tcmu_handle_completions()
>> finds the command in the IDR and calls tcmu_handle_completion()
>> for it. This might do some nasty things, because commands in
>> qfull_queue do not have a valid dbi list.
>>
>> Of course this is easy to fix. E.g.:
>>
>>
>> --- a/drivers/target/target_core_user.c	2020-04-30 14:15:36.177184668 +0200
>> +++ b/drivers/target/target_core_user.c	2020-04-30 14:29:43.331882066 +0200
>> @@ -1242,13 +1242,14 @@ static unsigned int tcmu_handle_completi
>>   		}
>>   		WARN_ON(tcmu_hdr_get_op(entry->hdr.len_op) != TCMU_OP_CMD);
>>   
>> -		cmd = idr_remove(&udev->commands, entry->hdr.cmd_id);
>> -		if (!cmd) {
>> +		cmd = idr_find(&udev->commands, entry->hdr.cmd_id);
>> +		if (!cmd || !test_bit(TCMU_CMD_BIT_INFLIGHT, &cmd->flags)) {
>>   			pr_err("cmd_id %u not found, ring is broken\n",
>>   			       entry->hdr.cmd_id);
>>   			set_bit(TCMU_DEV_BIT_BROKEN, &udev->flags);
>>   			break;
>>   		}
>> +		idr_remove(&udev->commands, entry->hdr.cmd_id);
>>   
>>   		tcmu_handle_completion(cmd, entry);
>>   
>>
>> To avoid the second idr_*() call in main data path, it would also
>> be possible to not replace the idr_remove() by idr_find(), but
>> in case cmd was found without TCMU_CMD_BIT_INFLIGHT being set,
>> during error handling re-add the cmd to the idr with the same id
>> as before by calling idr_alloc(...,cmd_id, crmdd_id+1, ...).
>>
>>
>> OTOH, I'm wondering whether the better solution would be to
>> remove idr_alloc() from tcmu_setup_cmd_timer() and add it
>> to queue_cmd_ring() immediately before or after it calls
>> tcmu_setup_cmd_timer().
>> Doing so would mean that commands in qfull_queue no longer have
>> a cmd_id (which is not necessary AFAICS) and therefore the problem
> 
> It's done for logging, but its only debug logging so we could just print
> the pointer value.
> 

Yes, and then add the pointer value to the pr_debug that prints the
assigned cmd_id after idr_alloc()

>> cannot happen, because tcmu_handle_completions() cannot find them
>> in the IDR.
>>
> 
> This is probably best if it's not a lot of trouble.

I already spent some time to check the necessary changes. I think
there is no trouble, just some small changes.
So, yes, the bigger fix is probably better.

I'll wait till next week for further comments and then send a patch.

> 
> If you go this route then in tcmu_reset_ring I think I made a mistake

I'm not sure, but I don't think there is a mistake. If you cleanup
the ring, you don't need to cancel the commands in qfull_queue.

> and we just continue the loop if TCMU_CMD_BIT_INFLIGHT is not set. I
> think we want to do:
> 
> for cmd in qfull_list
>      target_complete_cmd()
So I think, this is not really necessary, but after
     clear_bit(TCMU_DEV_BIT_BROKEN, &udev->flags);

we could do
     run_qfull_queue(udev, false);

right?

> 
> for cmd in idr
>       target_complete_cmd()
>
Mike Christie April 30, 2020, 5:29 p.m. UTC | #3
On 4/30/20 11:40 AM, Bodo Stroesser wrote:
> 
> 
> On 04/30/20 17:59, Mike Christie wrote:
>> On 4/30/20 10:10 AM, Bodo Stroesser wrote:
>>> Hi,
>>>
>>> When tcmu queues a new command - no matter whether in command
>>> ring or in qfull_queue - a cmd_id from IDR udev->commands is
>>> assigned to the command.
>>>
>>> If userspaces sends a wrong command completion containing the
>>> cmd_id of a command on the qfull_queue, tcmu_handle_completions()
>>> finds the command in the IDR and calls tcmu_handle_completion()
>>> for it. This might do some nasty things, because commands in
>>> qfull_queue do not have a valid dbi list.
>>>
>>> Of course this is easy to fix. E.g.:
>>>
>>>
>>> --- a/drivers/target/target_core_user.c    2020-04-30
>>> 14:15:36.177184668 +0200
>>> +++ b/drivers/target/target_core_user.c    2020-04-30
>>> 14:29:43.331882066 +0200
>>> @@ -1242,13 +1242,14 @@ static unsigned int tcmu_handle_completi
>>>           }
>>>           WARN_ON(tcmu_hdr_get_op(entry->hdr.len_op) != TCMU_OP_CMD);
>>>   -        cmd = idr_remove(&udev->commands, entry->hdr.cmd_id);
>>> -        if (!cmd) {
>>> +        cmd = idr_find(&udev->commands, entry->hdr.cmd_id);
>>> +        if (!cmd || !test_bit(TCMU_CMD_BIT_INFLIGHT, &cmd->flags)) {
>>>               pr_err("cmd_id %u not found, ring is broken\n",
>>>                      entry->hdr.cmd_id);
>>>               set_bit(TCMU_DEV_BIT_BROKEN, &udev->flags);
>>>               break;
>>>           }
>>> +        idr_remove(&udev->commands, entry->hdr.cmd_id);
>>>             tcmu_handle_completion(cmd, entry);
>>>  
>>> To avoid the second idr_*() call in main data path, it would also
>>> be possible to not replace the idr_remove() by idr_find(), but
>>> in case cmd was found without TCMU_CMD_BIT_INFLIGHT being set,
>>> during error handling re-add the cmd to the idr with the same id
>>> as before by calling idr_alloc(...,cmd_id, crmdd_id+1, ...).
>>>
>>>
>>> OTOH, I'm wondering whether the better solution would be to
>>> remove idr_alloc() from tcmu_setup_cmd_timer() and add it
>>> to queue_cmd_ring() immediately before or after it calls
>>> tcmu_setup_cmd_timer().
>>> Doing so would mean that commands in qfull_queue no longer have
>>> a cmd_id (which is not necessary AFAICS) and therefore the problem
>>
>> It's done for logging, but its only debug logging so we could just print
>> the pointer value.
>>
> 
> Yes, and then add the pointer value to the pr_debug that prints the
> assigned cmd_id after idr_alloc()
> 
>>> cannot happen, because tcmu_handle_completions() cannot find them
>>> in the IDR.
>>>
>>
>> This is probably best if it's not a lot of trouble.
> 
> I already spent some time to check the necessary changes. I think
> there is no trouble, just some small changes.
> So, yes, the bigger fix is probably better.
> 
> I'll wait till next week for further comments and then send a patch.
> 
>>
>> If you go this route then in tcmu_reset_ring I think I made a mistake
> 
> I'm not sure, but I don't think there is a mistake. If you cleanup
> the ring, you don't need to cancel the commands in qfull_queue.

I was thinking about your userspace restart PGR question in another
mail. Between the time userspace stopped and restarted the session->id
to I_T nexus mapping could have changed, so id now refers to a different
I_T nexus.

Code path wise, it would be easier if we restarted the inflight and
non-inflight commands the same way.


> 
>> and we just continue the loop if TCMU_CMD_BIT_INFLIGHT is not set. I
>> think we want to do:
>>
>> for cmd in qfull_list
>>      target_complete_cmd()
> So I think, this is not really necessary, but after
>     clear_bit(TCMU_DEV_BIT_BROKEN, &udev->flags);
> 
> we could do
>     run_qfull_queue(udev, false);
> 
> right?
> 
>>
>> for cmd in idr
>>       target_complete_cmd()
>>
>
diff mbox series

Patch

--- a/drivers/target/target_core_user.c	2020-04-30 14:15:36.177184668 +0200
+++ b/drivers/target/target_core_user.c	2020-04-30 14:29:43.331882066 +0200
@@ -1242,13 +1242,14 @@  static unsigned int tcmu_handle_completi
 		}
 		WARN_ON(tcmu_hdr_get_op(entry->hdr.len_op) != TCMU_OP_CMD);
 
-		cmd = idr_remove(&udev->commands, entry->hdr.cmd_id);
-		if (!cmd) {
+		cmd = idr_find(&udev->commands, entry->hdr.cmd_id);
+		if (!cmd || !test_bit(TCMU_CMD_BIT_INFLIGHT, &cmd->flags)) {
 			pr_err("cmd_id %u not found, ring is broken\n",
 			       entry->hdr.cmd_id);
 			set_bit(TCMU_DEV_BIT_BROKEN, &udev->flags);
 			break;
 		}
+		idr_remove(&udev->commands, entry->hdr.cmd_id);
 
 		tcmu_handle_completion(cmd, entry);