Message ID | 20210416092146.3201-1-d.bogdanov@yadro.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [v2] target: core: remove from tmr_list at lun unlink | expand |
On 4/16/21 4:21 AM, Dmitry Bogdanov wrote: > Currently TMF commands are removed from de_device.dev_tmf_list at > the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd > up on a command status (response) is queued in transport layer. > It means that LUN and backend device can be deleted meantime and at > the moment of repsonse completion a panic is occured: > > target_tmr_work() > cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire > transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun > - // - // - // - > <<<--- lun remove > <<<--- core backend device remove > - // - // - // - > qlt_handle_abts_completion() > tfo->free_mcmd() > transport_generic_free_cmd() > target_put_sess_cmd() > core_tmr_release_req() { > if (dev) { // backend device, can not be null > spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH > Sorry about this. I was thinking about this patch some more while reviewing this version. Is there another race possible? 1. We have a cmd running in lio. 2. The initiator times it out and sends tmf. 3. cmd completes. 4. target_submit_tmr has called transport_lookup_tmr_lun. 5. transport_lookup_tmr_lun has set se_dev and tmr_dev. 6. You now remove the lun and device. 7. Now you crash on the dev references. I think we need to do a percpu_ref_tryget_live in transport_lookup_tmr_lun like is done in transport_lookup_cmd_lun. If successful set se_cmd->lun_ref_active = true; If we get the ref, we drop it in transport_lun_remove_cmd like with the non tmr case. Combined with this patch then I think we are ok for all races.
On 4/16/21 4:38 PM, Mike Christie wrote: > On 4/16/21 4:21 AM, Dmitry Bogdanov wrote: >> Currently TMF commands are removed from de_device.dev_tmf_list at >> the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd >> up on a command status (response) is queued in transport layer. >> It means that LUN and backend device can be deleted meantime and at >> the moment of repsonse completion a panic is occured: >> >> target_tmr_work() >> cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire >> transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun >> - // - // - // - >> <<<--- lun remove >> <<<--- core backend device remove >> - // - // - // - >> qlt_handle_abts_completion() >> tfo->free_mcmd() >> transport_generic_free_cmd() >> target_put_sess_cmd() >> core_tmr_release_req() { >> if (dev) { // backend device, can not be null >> spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH >> > > Sorry about this. I was thinking about this patch some more while > reviewing this version. > > Is there another race possible? > > 1. We have a cmd running in lio. > 2. The initiator times it out and sends tmf. > 3. cmd completes. > 4. target_submit_tmr has called transport_lookup_tmr_lun. > 5. transport_lookup_tmr_lun has set se_dev and tmr_dev. > 6. You now remove the lun and device. > 7. Now you crash on the dev references. > > I think we need to do a percpu_ref_tryget_live in transport_lookup_tmr_lun > like is done in transport_lookup_cmd_lun. If successful set > > se_cmd->lun_ref_active = true; > > If we get the ref, we drop it in transport_lun_remove_cmd like with > the non tmr case. > Ignore this. I see we do a percpu_ref_tryget_live in there already.
On 4/16/21 4:21 AM, Dmitry Bogdanov wrote: > Currently TMF commands are removed from de_device.dev_tmf_list at > the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd > up on a command status (response) is queued in transport layer. > It means that LUN and backend device can be deleted meantime and at > the moment of repsonse completion a panic is occured: > > target_tmr_work() > cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire > transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun > - // - // - // - > <<<--- lun remove > <<<--- core backend device remove > - // - // - // - > qlt_handle_abts_completion() > tfo->free_mcmd() > transport_generic_free_cmd() > target_put_sess_cmd() > core_tmr_release_req() { > if (dev) { // backend device, can not be null > spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH > > Call Trace: > NIP [c000000000e1683c] _raw_spin_lock_irqsave+0x2c/0xc0 > LR [c00800000e433338] core_tmr_release_req+0x40/0xa0 [target_core_mod] > Call Trace: > (unreliable) > 0x0 > target_put_sess_cmd+0x2a0/0x370 [target_core_mod] > transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod] > tcm_qla2xxx_complete_mcmd+0x28/0x50 [tcm_qla2xxx] > process_one_work+0x2c4/0x5c0 > worker_thread+0x88/0x690 > > For FC protocol it is a race condition, but for iSCSI protocol it is > easyly reproduced by manual sending iSCSI commands: > - Send some SCSI sommand > - Send Abort of that command over iSCSI > - Remove LUN on target > - Send next iSCSI command to acknowledge the Abort_Response > - target panics > > There is no sense to keep the command in tmr_list until response > completion, so move the removal from tmr_list from the response > completion to the response queueing when lun is unlinked. > Move the removal from state list too as it is a subject to the same > race condition. > > Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6") > Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> > Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> > > --- > v2: > fix cmd stuck in tmr list in error case in iscsi > move clearing cmd->se_lun to transport_lun_remove_cmd > > The issue exists from the very begining. > I uploaded a scapy script that helps to reproduce the issue at > https://urldefense.com/v3/__https://gist.github.com/logost/cb93df41dd2432454324449b390403c4__;!!GqivPVa7Brio!MjANCRIp5ZrtKYonxEKclROOwOe7s-adKHLiUd2Njis6m1774RMpLGNkHyapFf68BwFr$ > --- > drivers/target/iscsi/iscsi_target.c | 2 +- > drivers/target/target_core_tmr.c | 9 -------- > drivers/target/target_core_transport.c | 29 +++++++++++++++++++------- > 3 files changed, 23 insertions(+), 17 deletions(-) > > diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c > index d0e7ed8f28cc..3311b031a812 100644 > --- a/drivers/target/iscsi/iscsi_target.c > +++ b/drivers/target/iscsi/iscsi_target.c > @@ -2142,7 +2142,7 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd, > * TMR TASK_REASSIGN. > */ > iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state); > - target_put_sess_cmd(&cmd->se_cmd); > + transport_generic_free_cmd(&cmd->se_cmd, false); > return 0; > } Doh. I see how I got all confused. I guess this path leaks the lun_ref taken by transport_lookup_tmr_lun. It looks like an old issue and nothing to do with your patch. I'm not sure if we are supposed to be calling transport_generic_free_cmd twice. It looks like it works ok, because your patch added the "cmd->se_lun = NULL" in transport_lun_remove_cmd, so we won't do a double list deletion. It feels dirty though. I can feel Bart saying, "Mike did you see the comment at the top of the function". :) Maybe it's best to more cleanly unwind what was setup in the rror path. I think we can fix lun_ref leak too. So instead of doing transport_generic_free_cmd above do transport_lun_remove_cmd to match/undo the transport_lookup_tmr_lun call in iscsit_handle_task_mgt_cmd?
On 4/16/21 5:39 PM, Mike Christie wrote: > On 4/16/21 4:21 AM, Dmitry Bogdanov wrote: >> Currently TMF commands are removed from de_device.dev_tmf_list at >> the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd >> up on a command status (response) is queued in transport layer. >> It means that LUN and backend device can be deleted meantime and at >> the moment of repsonse completion a panic is occured: >> >> target_tmr_work() >> cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire >> transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun >> - // - // - // - >> <<<--- lun remove >> <<<--- core backend device remove >> - // - // - // - >> qlt_handle_abts_completion() >> tfo->free_mcmd() >> transport_generic_free_cmd() >> target_put_sess_cmd() >> core_tmr_release_req() { >> if (dev) { // backend device, can not be null >> spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH >> >> Call Trace: >> NIP [c000000000e1683c] _raw_spin_lock_irqsave+0x2c/0xc0 >> LR [c00800000e433338] core_tmr_release_req+0x40/0xa0 [target_core_mod] >> Call Trace: >> (unreliable) >> 0x0 >> target_put_sess_cmd+0x2a0/0x370 [target_core_mod] >> transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod] >> tcm_qla2xxx_complete_mcmd+0x28/0x50 [tcm_qla2xxx] >> process_one_work+0x2c4/0x5c0 >> worker_thread+0x88/0x690 >> >> For FC protocol it is a race condition, but for iSCSI protocol it is >> easyly reproduced by manual sending iSCSI commands: >> - Send some SCSI sommand >> - Send Abort of that command over iSCSI >> - Remove LUN on target >> - Send next iSCSI command to acknowledge the Abort_Response >> - target panics >> >> There is no sense to keep the command in tmr_list until response >> completion, so move the removal from tmr_list from the response >> completion to the response queueing when lun is unlinked. >> Move the removal from state list too as it is a subject to the same >> race condition. >> >> Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6") >> Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> >> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> >> >> --- >> v2: >> fix cmd stuck in tmr list in error case in iscsi >> move clearing cmd->se_lun to transport_lun_remove_cmd >> >> The issue exists from the very begining. >> I uploaded a scapy script that helps to reproduce the issue at >> https://urldefense.com/v3/__https://gist.github.com/logost/cb93df41dd2432454324449b390403c4__;!!GqivPVa7Brio!MjANCRIp5ZrtKYonxEKclROOwOe7s-adKHLiUd2Njis6m1774RMpLGNkHyapFf68BwFr$ >> --- >> drivers/target/iscsi/iscsi_target.c | 2 +- >> drivers/target/target_core_tmr.c | 9 -------- >> drivers/target/target_core_transport.c | 29 +++++++++++++++++++------- >> 3 files changed, 23 insertions(+), 17 deletions(-) >> >> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c >> index d0e7ed8f28cc..3311b031a812 100644 >> --- a/drivers/target/iscsi/iscsi_target.c >> +++ b/drivers/target/iscsi/iscsi_target.c >> @@ -2142,7 +2142,7 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd, >> * TMR TASK_REASSIGN. >> */ >> iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state); >> - target_put_sess_cmd(&cmd->se_cmd); >> + transport_generic_free_cmd(&cmd->se_cmd, false); >> return 0; >> } > > Doh. I see how I got all confused. I guess this path leaks the lun_ref > taken by transport_lookup_tmr_lun. It looks like an old issue and nothing > to do with your patch. > > I'm not sure if we are supposed to be calling transport_generic_free_cmd > twice. It looks like it works ok, because your patch added the "cmd->se_lun = NULL" > in transport_lun_remove_cmd, so we won't do a double list deletion. > It feels dirty though. I can feel Bart saying, "Mike did you see the comment > at the top of the function". :) > > Maybe it's best to more cleanly unwind what was setup in the rror path. I think we > can fix lun_ref leak too. > > So instead of doing transport_generic_free_cmd above do transport_lun_remove_cmd > to match/undo the transport_lookup_tmr_lun call in iscsit_handle_task_mgt_cmd? Shoot. I'm all over the place. I think the root issue is my original comment on the v1 patch was wrong. On a failure we would still do: iscsit_free_cmd -> transport_generic_free_cmd -> transport_lun_remove_cmd right? So we don't need any change in the iscsi target. It should all just work.
On Fri, Apr 16, 2021 at 12:21:46PM +0300, Dmitry Bogdanov wrote: > @@ -719,8 +726,16 @@ static void transport_lun_remove_cmd(struct se_cmd *cmd) > if (!lun) > return; > > + target_remove_from_state_list(cmd); > + target_remove_from_tmr_list(cmd); > + > if (cmpxchg(&cmd->lun_ref_active, true, false)) > percpu_ref_put(&lun->lun_ref); > + > + /* > + * Clear struct se_cmd->se_lun before the handoff to FE. > + */ > + cmd->se_lun = NULL; > } Sadly we just found out that this code is racing with core_tmr_drain_tmr_list(). If LUN RESET comes in while there are still some outstanding ABORT TASK functions left, the following sequence is possible: 1. During LUN RESET processing core_tmr_drain_tmr_list() is called 2. During ABORT TASK processing transport_lun_remove_cmd() is called at the sime time 3. core_tmr_drain_tmr_list() acquires &dev->se_tmr_lock lock and moves TMRs to the on-stack drain_tmr_list 4. core_tmr_drain_tmr_list() releases &dev->se_tmr_lock and starts working on the drain_tmr_list 5. At the same moment target_remove_from_tmr_list() is called 6. It acquires &dev->se_tmr_lock, removes TMR from the list by list_del_init() and releases &dev->se_tmr_lock What happens next is this: [ 391.438244] LUN_RESET: releasing TMR 00000000e2ee2634 Function: 0x01, Response: 0x05, t_state: 11 [ 391.438246] LUN_RESET: releasing TMR 00000000e2ee2634 Function: 0x01, Response: 0x05, t_state: 11 The same TMR is being pulled out twice out of the drain_tmr_list. This happens because there are no locks that prevent the list traversal in core_tmr_drain_tmr_list() and the list element removal in target_remove_from_tmr_list() from being executed concurrently. So list_del_init() in target_remove_from_tmr_list() calls INIT_LIST_HEAD() and tmr_p->next now points to tmr_p. Hence the following warnings: [ 391.438300] WARNING: CPU: 12 PID: 20064 at ../drivers/target/target_core_transport.c:2785 ... [ 391.438448] WARNING: CPU: 12 PID: 20064 at ../lib/refcount.c:28 refcount_warn_saturate+0x224/0x240 This issue also prevents other TMRs from being released, resulting in a stuck session. Not always, since sometimes drain_tmr_list only contains one element, but still possible.
Hi Mike, >>> --- a/drivers/target/iscsi/iscsi_target.c >>> +++ b/drivers/target/iscsi/iscsi_target.c >>> @@ -2142,7 +2142,7 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd, >>> * TMR TASK_REASSIGN. >>> */ >>> iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state); >>> - target_put_sess_cmd(&cmd->se_cmd); >>> + transport_generic_free_cmd(&cmd->se_cmd, false); >>> return 0; >>> } >> >> Doh. I see how I got all confused. I guess this path leaks the lun_ref >> taken by transport_lookup_tmr_lun. It looks like an old issue and >> nothing to do with your patch. >> I'm not sure if we are supposed to be calling >> transport_generic_free_cmd twice. It looks like it works ok, because your patch added the "cmd->se_lun = NULL" >> in transport_lun_remove_cmd, so we won't do a double list deletion. >> It feels dirty though. I can feel Bart saying, "Mike did you see the >> comment at the top of the function". :) >> >> Maybe it's best to more cleanly unwind what was setup in the rror >> path. I think we can fix lun_ref leak too. >> >> So instead of doing transport_generic_free_cmd above do >> transport_lun_remove_cmd to match/undo the transport_lookup_tmr_lun call in iscsit_handle_task_mgt_cmd? > >Shoot. I'm all over the place. I think the root issue is my original comment on the v1 patch was wrong. >On a failure we would still do: >iscsit_free_cmd -> transport_generic_free_cmd -> transport_lun_remove_cmd >right? So we don't need any change in the iscsi target. It should all just work. iscsit_free_cmd will be called only at response completion(next incoming iscsi command): iscsit_ack_from_expstatsn -> iscsit_free_cmd. That produces some unacceptable delay of lun unlinking from cmd. There is a bug report to the similar behavior: http://lkml.iu.edu/hypermail/linux/kernel/2002.0/05272.html Because of that complain, the commit 83f85b8ec305, that solves the same crash as I am fixing, was reverted. So, this piece of patch has some indirect relation :) I will extract it to a separate patch in the coming patchset on TMF handling. BR, Dmitry
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index d0e7ed8f28cc..3311b031a812 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -2142,7 +2142,7 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd, * TMR TASK_REASSIGN. */ iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state); - target_put_sess_cmd(&cmd->se_cmd); + transport_generic_free_cmd(&cmd->se_cmd, false); return 0; } EXPORT_SYMBOL(iscsit_handle_task_mgt_cmd); diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c index 7347285471fa..323a173010c1 100644 --- a/drivers/target/target_core_tmr.c +++ b/drivers/target/target_core_tmr.c @@ -50,15 +50,6 @@ EXPORT_SYMBOL(core_tmr_alloc_req); void core_tmr_release_req(struct se_tmr_req *tmr) { - struct se_device *dev = tmr->tmr_dev; - unsigned long flags; - - if (dev) { - spin_lock_irqsave(&dev->se_tmr_lock, flags); - list_del_init(&tmr->tmr_list); - spin_unlock_irqrestore(&dev->se_tmr_lock, flags); - } - kfree(tmr); } diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 5ecb9f18a53d..34b20c0646ab 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -667,6 +667,20 @@ static void target_remove_from_state_list(struct se_cmd *cmd) spin_unlock_irqrestore(&dev->queues[cmd->cpuid].lock, flags); } +static void target_remove_from_tmr_list(struct se_cmd *cmd) +{ + struct se_device *dev = NULL; + unsigned long flags; + + if (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) + dev = cmd->se_tmr_req->tmr_dev; + + if (dev) { + spin_lock_irqsave(&dev->se_tmr_lock, flags); + list_del_init(&cmd->se_tmr_req->tmr_list); + spin_unlock_irqrestore(&dev->se_tmr_lock, flags); + } +} /* * This function is called by the target core after the target core has * finished processing a SCSI command or SCSI TMF. Both the regular command @@ -678,13 +692,6 @@ static int transport_cmd_check_stop_to_fabric(struct se_cmd *cmd) { unsigned long flags; - target_remove_from_state_list(cmd); - - /* - * Clear struct se_cmd->se_lun before the handoff to FE. - */ - cmd->se_lun = NULL; - spin_lock_irqsave(&cmd->t_state_lock, flags); /* * Determine if frontend context caller is requesting the stopping of @@ -719,8 +726,16 @@ static void transport_lun_remove_cmd(struct se_cmd *cmd) if (!lun) return; + target_remove_from_state_list(cmd); + target_remove_from_tmr_list(cmd); + if (cmpxchg(&cmd->lun_ref_active, true, false)) percpu_ref_put(&lun->lun_ref); + + /* + * Clear struct se_cmd->se_lun before the handoff to FE. + */ + cmd->se_lun = NULL; } static void target_complete_failure_work(struct work_struct *work)