Message ID | 20170202005853.23456-20-bart.vanassche@sandisk.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Wed, 2017-02-01 at 16:58 -0800, Bart Van Assche wrote: > Instead of invoking target driver callback functions from the > context that handles an abort or LUN RESET task management function, > only set the abort flag from that context and perform the actual > abort handling from the context of the regular command processing > flow. This approach has the following advantages: > - The task management function code becomes much easier to read and > to verify since the number of potential race conditions against > the command processing flow is strongly reduced. > - It is no longer needed to store the command state into the command > itself since that information is no longer needed from the context > where a task management function is processed. > > Notes: > - With this patch applied the CMD_T_ABORTED flag is checked at two points > by the target core: just before local execution of a command starts > (see also target_execute_cmd()) and also just before the response is > sent (see also target_complete_cmd()). > - Two .release_cmd() calls have been changed into transport_put_cmd() > calls to ensure that the 'finished' completion is set before > .release_cmd() is called. > And here is meat of the series. Before diving into the specific issues with the patch, I'd like to give some context wrt to how the existing code works today in v4.10, and highlight the first order and second order issues involved. First order issue is target_core_tmr.c code ABORT_TASK and LUN_RESET obtains a se_cmd->cmd_kref, and blocks on transport_wait_for_tasks() -> wait_for_completion(&se_cmd->t_transport_stop_comp) waiting to quiesce an outstanding in-flight se_cmd descriptor. This quiesce of in-flight se_cmd can happen in three ways: - Early submission in target_execute_cmd() to catch CMD_T_STOP before se_cmd is submitted to the backend, or - waiting for outstanding backend I/O to complete via target_complete_cmd() to catch CMD_T_STOP (typical case when a backend device is no responding), or - waiting for transport_cmd_check_stop() hand-off of se_cmd from target-core to fabric driver response to catch CMD_T_STOP. As you know in target_core_tmr.c code, this quiesce can happen for se_cmd with ABORT_TASK in core_tmr_abort_task(), with LUN_RESET it happen for TMRs in core_tmr_drain_tmr_list() and se_cmd descriptors in core_tmr_drain_state_list() as part of se_device->state_list. Once CMD_T_STOP is cleared, target_core_tmr.c will drop both the outstanding references to se_cmd->cmd_kref, as well as it's own reference obtained by __target_check_io_state(). The second order issue, which is what complicates things with the existing code, is that any of the above first order cases can occur and block waiting for CMD_T_STOP to be cleared at the _same time_ while the associated fabric driver se_session is being shutdown, and sets CMD_T_FABRIC_STOP on all the se_session's se_cmd descriptors. The session shutdown can happen explicitly by the fabric driver, or by a session reinstatement event (with iscsi-target for example) when a host doesn't get a TMR response due to backend I/O not being completed for an extended amount of time. The dynamic interaction of the first and second order issues here is what has taken us a long time to get right in upstream code, and which has been stable for end-users the past year since the advent of commit 0f4a94316. So with that additional context about how/why things work as they do today in mainline code, my comments are inline below. As mentioned earlier, please respond in-line to the specific comments. > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> > Cc: Hannes Reinecke <hare@suse.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Andy Grover <agrover@redhat.com> > Cc: David Disseldorp <ddiss@suse.de> > --- > drivers/target/target_core_internal.h | 2 - > drivers/target/target_core_tmr.c | 73 ++++++++-------- > drivers/target/target_core_transport.c | 153 +++++++++++++-------------------- > include/target/target_core_base.h | 2 + > 4 files changed, 95 insertions(+), 135 deletions(-) > > diff --git a/drivers/target/target_core_internal.h b/drivers/target/target_core_internal.h > index 9ab7090f7c83..6f6cae81ab8e 100644 > --- a/drivers/target/target_core_internal.h > +++ b/drivers/target/target_core_internal.h > @@ -136,7 +136,6 @@ int init_se_kmem_caches(void); > void release_se_kmem_caches(void); > u32 scsi_get_new_index(scsi_index_t); > void transport_subsystem_check_init(void); > -void transport_cmd_finish_abort(struct se_cmd *, int); > unsigned char *transport_dump_cmd_direction(struct se_cmd *); > void transport_dump_dev_state(struct se_device *, char *, int *); > void transport_dump_dev_info(struct se_device *, struct se_lun *, > @@ -146,7 +145,6 @@ int transport_dump_vpd_assoc(struct t10_vpd *, unsigned char *, int); > int transport_dump_vpd_ident_type(struct t10_vpd *, unsigned char *, int); > int transport_dump_vpd_ident(struct t10_vpd *, unsigned char *, int); > void transport_clear_lun_ref(struct se_lun *); > -void transport_send_task_abort(struct se_cmd *); > sense_reason_t target_cmd_size_check(struct se_cmd *cmd, unsigned int size); > void target_qf_do_work(struct work_struct *work); > bool target_check_wce(struct se_device *dev); > diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c > index 367799b4dde1..bc07e059fb22 100644 > --- a/drivers/target/target_core_tmr.c > +++ b/drivers/target/target_core_tmr.c > @@ -75,25 +75,6 @@ void core_tmr_release_req(struct se_tmr_req *tmr) > kfree(tmr); > } > > -static void core_tmr_handle_tas_abort(struct se_cmd *cmd, int tas) > -{ > - unsigned long flags; > - bool remove = true, send_tas; > - /* > - * TASK ABORTED status (TAS) bit support > - */ > - spin_lock_irqsave(&cmd->t_state_lock, flags); > - send_tas = (cmd->transport_state & CMD_T_TAS); > - spin_unlock_irqrestore(&cmd->t_state_lock, flags); > - > - if (send_tas) { > - remove = false; > - transport_send_task_abort(cmd); > - } > - > - transport_cmd_finish_abort(cmd, remove); > -} > - > static int target_check_cdb_and_preempt(struct list_head *list, > struct se_cmd *cmd) > { > @@ -192,13 +173,13 @@ void core_tmr_abort_task( > continue; > } > > - list_del_init(&se_cmd->se_cmd_list); > + se_cmd->send_abort_response = false; > spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); > > - cancel_work_sync(&se_cmd->work); > - transport_wait_for_tasks(se_cmd); > + while (!wait_for_completion_timeout(&se_cmd->finished, 180 * HZ)) > + pr_debug("ABORT TASK: still waiting for ITT %#llx\n", > + ref_tag); > > - transport_cmd_finish_abort(se_cmd, true); > target_put_sess_cmd(se_cmd); > So instead of using the common transport_wait_for_tasks() across all of the quiesce cases, this introduces a new se_cmd->finished completion to seperate out normal CMD_T_STOP shutdown vs. TMR ABORT_TASK. Not a bad idea. However, there is a basic fatal flaw in this approach with this patch as-is that results in se_cmd->finished never being able to complete for any se_cmd with CMD_T_ABORTED . For the above, core_tmr_abort_task() calls __target_check_io_state() to obtain a se_cmd->cmd_kref if the descriptor is able to be aborted and sets CMD_T_ABORTED. Then, wait_for_completion_timeout() sits in a while looping waiting for se_cmd->finished to be completed. However, the complete_all(&se_cmd->finished) you've added below is only invoked from the final se_cmd->cmd_kref callback in target_release_cmd_kref(). Which means core_tmr_abort_task() will always be stuck indefinitely on se_cmd->finished above, because the subsequent target_put_sess_cmd() after the wait_for_completion_timeout() is the caller responsible for doing the final kref_put(&se_cmd->cmd_kref, target_release_cmd_kref) to perform complete_all(&se_cmd->finished). > printk("ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for" > @@ -295,14 +276,31 @@ static void core_tmr_drain_tmr_list( > (preempt_and_abort_list) ? "Preempt" : "", tmr_p, > tmr_p->function, tmr_p->response, cmd->t_state); > > - cancel_work_sync(&cmd->work); > - transport_wait_for_tasks(cmd); > - > - transport_cmd_finish_abort(cmd, 1); > + while (!wait_for_completion_timeout(&cmd->finished, 180 * HZ)) > + pr_debug("LUN RESET: still waiting for ITT %#llx\n", > + cmd->tag); > target_put_sess_cmd(cmd); > } > } > Same thing here for TMR abort. There is no way for target_release_cmd_kref() to ever invoke complete_all(&se_cmd->finished) to make forward progress beyond the wait_for_completion_timeout() here until the subsequent target_put_sess_cmd() is called to drop se_cmd->cmd_kref to zero. > +/** > + * core_tmr_drain_state_list() - Abort SCSI commands associated with a device. > + * > + * @dev: Device for which to abort outstanding SCSI commands. > + * @prout_cmd: Pointer to the SCSI PREEMPT AND ABORT if this function is called > + * to realize the PREEMPT AND ABORT functionality. > + * @tmr_sess: Session through which the LUN RESET has been received. > + * @tas: Task Aborted Status (TAS) bit from the SCSI control mode page. > + * A quote from SPC-4, paragraph "7.5.10 Control mode page": > + * "A task aborted status (TAS) bit set to zero specifies that > + * aborted commands shall be terminated by the device server > + * without any response to the application client. A TAS bit set > + * to one specifies that commands aborted by the actions of an I_T > + * nexus other than the I_T nexus on which the command was > + * received shall be completed with TASK ABORTED status." > + * @preempt_and_abort_list: For the PREEMPT AND ABORT functionality, a list > + * with registrations that will be preempted. > + */ > static void core_tmr_drain_state_list( > struct se_device *dev, > struct se_cmd *prout_cmd, > @@ -311,6 +309,7 @@ static void core_tmr_drain_state_list( > struct list_head *preempt_and_abort_list) > { > LIST_HEAD(drain_task_list); > + struct se_node_acl *tmr_nacl = tmr_sess ? tmr_sess->se_node_acl : NULL; > struct se_session *sess; > struct se_cmd *cmd, *next; > unsigned long flags; > @@ -340,6 +339,10 @@ static void core_tmr_drain_state_list( > */ > spin_lock_irqsave(&dev->execute_task_lock, flags); > list_for_each_entry_safe(cmd, next, &dev->state_list, state_list) { > + /* Skip task management functions. */ > + if (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) > + continue; > + TMRs are never added to se_device->state_list, so this extra check is unnecessary. > /* > * For PREEMPT_AND_ABORT usage, only process commands > * with a matching reservation key. > @@ -365,6 +368,8 @@ static void core_tmr_drain_state_list( > > list_move_tail(&cmd->state_list, &drain_task_list); > cmd->state_active = false; > + if (tmr_nacl && tmr_nacl != cmd->se_sess->se_node_acl && tas) > + cmd->send_abort_response = true; > } > spin_unlock_irqrestore(&dev->execute_task_lock, flags); > > @@ -387,17 +392,9 @@ static void core_tmr_drain_state_list( > (cmd->transport_state & CMD_T_STOP) != 0, > (cmd->transport_state & CMD_T_SENT) != 0); > > - /* > - * If the command may be queued onto a workqueue cancel it now. > - * > - * This is equivalent to removal from the execute queue in the > - * loop above, but we do it down here given that > - * cancel_work_sync may block. > - */ > - cancel_work_sync(&cmd->work); > - transport_wait_for_tasks(cmd); > - > - core_tmr_handle_tas_abort(cmd, tas); > + while (!wait_for_completion_timeout(&cmd->finished, 180 * HZ)) > + pr_debug("LUN RESET: still waiting for cmd with ITT %#llx\n", > + cmd->tag); > target_put_sess_cmd(cmd); > } > } Same bug as above for aborting se_cmd from se_device->state_list. There is no way for target_release_cmd_kref() to ever invoke complete_all(&se_cmd->finished) to make forward progress beyond the wait_for_completion_timeout() here until the subsequent target_put_sess_cmd() is called to drop se_cmd->cmd_kref to zero. > diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c > index 40dff4799984..8506ecc0d6c0 100644 > --- a/drivers/target/target_core_transport.c > +++ b/drivers/target/target_core_transport.c > @@ -650,25 +650,6 @@ static void transport_lun_remove_cmd(struct se_cmd *cmd) > percpu_ref_put(&lun->lun_ref); > } > > -void transport_cmd_finish_abort(struct se_cmd *cmd, int remove) > -{ > - bool ack_kref = (cmd->se_cmd_flags & SCF_ACK_KREF); > - > - if (cmd->se_cmd_flags & SCF_SE_LUN_CMD) > - transport_lun_remove_cmd(cmd); > - /* > - * Allow the fabric driver to unmap any resources before > - * releasing the descriptor via TFO->release_cmd() > - */ > - if (remove) > - cmd->se_tfo->aborted_task(cmd); > - > - if (transport_cmd_check_stop_to_fabric(cmd)) > - return; > - if (remove && ack_kref) > - transport_put_cmd(cmd); > -} > - > static void target_complete_failure_work(struct work_struct *work) > { > struct se_cmd *cmd = container_of(work, struct se_cmd, work); > @@ -700,14 +681,58 @@ static unsigned char *transport_get_sense_buffer(struct se_cmd *cmd) > return cmd->sense_buffer; > } > > +static void transport_handle_abort(struct se_cmd *cmd) > +{ > + bool ack_kref = cmd->se_cmd_flags & SCF_ACK_KREF; > + > + transport_lun_remove_cmd(cmd); > + > + if (cmd->send_abort_response) { > + cmd->scsi_status = SAM_STAT_TASK_ABORTED; > + pr_debug("Setting SAM_STAT_TASK_ABORTED status for CDB: 0x%02x, ITT: 0x%08llx\n", > + cmd->t_task_cdb[0], cmd->tag); > + trace_target_cmd_complete(cmd); > + cmd->se_tfo->queue_status(cmd); > + transport_cmd_check_stop_to_fabric(cmd); > + } else { > + /* > + * Allow the fabric driver to unmap any resources before > + * releasing the descriptor via TFO->release_cmd() > + */ > + cmd->se_tfo->aborted_task(cmd); > + /* > + * To do: establish a unit attention condition on the I_T > + * nexus associated with cmd. See also the paragraph "Aborting > + * commands" in SAM. > + */ > + if (transport_cmd_check_stop_to_fabric(cmd) == 0 && ack_kref) > + transport_put_cmd(cmd); > + } > +} > + > +static void target_abort_work(struct work_struct *work) > +{ > + struct se_cmd *cmd = container_of(work, struct se_cmd, work); > + > + transport_handle_abort(cmd); > +} > + > +/* May be called from interrupt context so must not sleep. */ > void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status) > { > struct se_device *dev = cmd->se_dev; > int success = scsi_status == GOOD; > unsigned long flags; > > - cmd->scsi_status = scsi_status; > + if (cmd->transport_state & CMD_T_ABORTED) { > + INIT_WORK(&cmd->work, target_abort_work); > + goto queue_work; > + } else if (cmd->transport_state & CMD_T_STOP) { > + complete_all(&cmd->t_transport_stop_comp); > + return; > + } > > + cmd->scsi_status = scsi_status; > > spin_lock_irqsave(&cmd->t_state_lock, flags); > cmd->transport_state &= ~CMD_T_BUSY; > @@ -720,16 +745,7 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status) > success = 1; > } > > - /* > - * Check for case where an explicit ABORT_TASK has been received > - * and transport_wait_for_tasks() will be waiting for completion.. > - */ > - if (cmd->transport_state & CMD_T_ABORTED || > - cmd->transport_state & CMD_T_STOP) { > - spin_unlock_irqrestore(&cmd->t_state_lock, flags); > - complete_all(&cmd->t_transport_stop_comp); > - return; > - } else if (!success) { > + if (!success) { > INIT_WORK(&cmd->work, target_complete_failure_work); > } else { > INIT_WORK(&cmd->work, target_complete_ok_work); > @@ -739,6 +755,7 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status) > cmd->transport_state |= (CMD_T_COMPLETE | CMD_T_ACTIVE); > spin_unlock_irqrestore(&cmd->t_state_lock, flags); > > +queue_work: > if (cmd->se_cmd_flags & SCF_USE_CPUID) > queue_work_on(cmd->cpuid, target_completion_wq, &cmd->work); > else > @@ -1221,6 +1238,7 @@ void transport_init_se_cmd( > INIT_LIST_HEAD(&cmd->state_list); > init_completion(&cmd->t_transport_stop_comp); > init_completion(&cmd->cmd_wait_comp); > + init_completion(&cmd->finished); > spin_lock_init(&cmd->t_state_lock); > kref_init(&cmd->cmd_kref); > cmd->transport_state = CMD_T_DEV_ACTIVE; > @@ -1883,16 +1901,18 @@ static int __transport_check_aborted_status(struct se_cmd *, int); > void target_execute_cmd(struct se_cmd *cmd) > { > /* > + * If the received CDB has aleady been aborted stop processing it here. > + */ > + if (transport_check_aborted_status(cmd, 1) != 0) > + return; > + > + /* > * Determine if frontend context caller is requesting the stopping of > * this command for frontend exceptions. > * > * If the received CDB has aleady been aborted stop processing it here. > */ > spin_lock_irq(&cmd->t_state_lock); > - if (__transport_check_aborted_status(cmd, 1)) { > - spin_unlock_irq(&cmd->t_state_lock); > - return; > - } > if (cmd->transport_state & CMD_T_STOP) { > pr_debug("%s:%d CMD_T_STOP for ITT: 0x%08llx\n", > __func__, __LINE__, cmd->tag); The use of transport_check_aborted_status(), instead of using __transport_check_aborted_status() means se_cmd->t_state_lock is taken here twice for every I/O. There is no reason to take this lock twice, so to avoid additional fast-path overhead this part should be avoided. > @@ -2499,15 +2519,11 @@ static void target_wait_free_cmd(struct se_cmd *cmd, bool *aborted, bool *tas) > > int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks) > { > - int ret = 0; > bool aborted = false, tas = false; > > if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) { > if (wait_for_tasks && (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB)) > target_wait_free_cmd(cmd, &aborted, &tas); > - > - if (!aborted || tas) > - ret = transport_put_cmd(cmd); > } else { > if (wait_for_tasks) > target_wait_free_cmd(cmd, &aborted, &tas); > @@ -2521,9 +2537,6 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks) > > if (cmd->se_lun) > transport_lun_remove_cmd(cmd); > - > - if (!aborted || tas) > - ret = transport_put_cmd(cmd); > } > /* > * If the task has been internally aborted due to TMR ABORT_TASK > @@ -2534,10 +2547,8 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks) > if (aborted) { > pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag); > wait_for_completion(&cmd->cmd_wait_comp); > - cmd->se_tfo->release_cmd(cmd); > - ret = 1; > } > - return ret; > + return transport_put_cmd(cmd); > } This code snippet is specific to the second order issue mentioned above involving concurrent session shutdown (CMD_T_FABRIC_STOP) and CMD_T_ABORTED. transport_generic_free_cmd() is used by iscsi-target + iser-target during session shutdown, and for the second order case existing code expects I/O to be quiesced and blocked on ->cmd_wait_comp waiting for se_cmd->cmd_kref to drop to zero, before invoking se_tfo->release_cmd() directly. Specifically, existing iscsi-target + iser-target code expects when transport_generic_free_cmd() returns the command must no longer be in flight, and for the CMD_T_ABORTED + CMD_T_FABRIC_STOP case, se_cmd descriptor memory must be released directly. > EXPORT_SYMBOL(transport_generic_free_cmd); > > @@ -2596,6 +2607,8 @@ static void target_release_cmd_kref(struct kref *kref) > unsigned long flags; > bool fabric_stop; > > + complete_all(&se_cmd->finished); > + As mentioned in the three cases above for CMD_T_ABORTED, this is the fatal flaw of the patch. When CMD_T_ABORTED descriptors take a se_cmd->cmd_kref and then block on se_cmd->finished, there is no way for them to make forward progress because this complete_all() is only ever called with se_cmd->cmd_kref reaches zero. So as-is, any se_cmd descriptor that reaches CMD_T_ABORTED lgic will always result in a hung tasks no matter what. Btw, I did check the final outcome of this series just to be sure it wasn't fixed in subsequent patches, and the same flaw exists. > spin_lock_irqsave(&se_sess->sess_cmd_lock, flags); > > spin_lock(&se_cmd->t_state_lock); > @@ -2697,7 +2710,7 @@ void target_wait_for_sess_cmds(struct se_session *se_sess) > " fabric state: %d\n", se_cmd, se_cmd->t_state, > se_cmd->se_tfo->get_cmd_state(se_cmd)); > > - se_cmd->se_tfo->release_cmd(se_cmd); > + target_put_sess_cmd(se_cmd); > } > This looks like a similar issue forward progress issue here in target_wait_for_sess_cmd(). There is a wait_for_completion(&se_cmd->cmd_wait_comp) right above the original se_tfo->release_cmd(), so this change to do target_put_sess_cmd() instead means se_cmd->cmd_wait_comp will be blocked forever waiting for se_cmd->cmd_kref to reach zero via target_put_sess_cmd(). > spin_lock_irqsave(&se_sess->sess_cmd_lock, flags); > @@ -3009,16 +3022,7 @@ static int __transport_check_aborted_status(struct se_cmd *cmd, int send_status) > return 1; > } > > - pr_debug("Sending delayed SAM_STAT_TASK_ABORTED status for CDB:" > - " 0x%02x ITT: 0x%08llx\n", cmd->t_task_cdb[0], cmd->tag); > - > cmd->se_cmd_flags &= ~SCF_SEND_DELAYED_TAS; > - cmd->scsi_status = SAM_STAT_TASK_ABORTED; > - trace_target_cmd_complete(cmd); > - > - spin_unlock_irq(&cmd->t_state_lock); > - cmd->se_tfo->queue_status(cmd); > - spin_lock_irq(&cmd->t_state_lock); > > return 1; > } > @@ -3035,47 +3039,6 @@ int transport_check_aborted_status(struct se_cmd *cmd, int send_status) > } > EXPORT_SYMBOL(transport_check_aborted_status); > > -void transport_send_task_abort(struct se_cmd *cmd) > -{ > - unsigned long flags; > - > - spin_lock_irqsave(&cmd->t_state_lock, flags); > - if (cmd->se_cmd_flags & (SCF_SENT_CHECK_CONDITION)) { > - spin_unlock_irqrestore(&cmd->t_state_lock, flags); > - return; > - } > - spin_unlock_irqrestore(&cmd->t_state_lock, flags); > - > - /* > - * If there are still expected incoming fabric WRITEs, we wait > - * until until they have completed before sending a TASK_ABORTED > - * response. This response with TASK_ABORTED status will be > - * queued back to fabric module by transport_check_aborted_status(). > - */ > - if (cmd->data_direction == DMA_TO_DEVICE) { > - if (cmd->se_tfo->write_pending_status(cmd) != 0) { > - spin_lock_irqsave(&cmd->t_state_lock, flags); > - if (cmd->se_cmd_flags & SCF_SEND_DELAYED_TAS) { > - spin_unlock_irqrestore(&cmd->t_state_lock, flags); > - goto send_abort; > - } > - cmd->se_cmd_flags |= SCF_SEND_DELAYED_TAS; > - spin_unlock_irqrestore(&cmd->t_state_lock, flags); > - return; > - } > - } Not 100% sure, but I think this also breaks iscsi-target + iser-target. The fabric driver needs to allow WRITE transfers to complete once CMD_T_ABORTED is set, but then prevent the submission into the backend. AFAICT the removal of this means the se_cmd is aborted right away, and iscsi-target + iser-target aren't allowed to complete their WRITE payloads. > -send_abort: > - cmd->scsi_status = SAM_STAT_TASK_ABORTED; > - > - transport_lun_remove_cmd(cmd); > - > - pr_debug("Setting SAM_STAT_TASK_ABORTED status for CDB: 0x%02x, ITT: 0x%08llx\n", > - cmd->t_task_cdb[0], cmd->tag); > - > - trace_target_cmd_complete(cmd); > - cmd->se_tfo->queue_status(cmd); > -} > - > static void target_tmr_work(struct work_struct *work) > { > struct se_cmd *cmd = container_of(work, struct se_cmd, work); > diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h > index fb87f0e5d0d6..e690ce36e592 100644 > --- a/include/target/target_core_base.h > +++ b/include/target/target_core_base.h > @@ -444,6 +444,7 @@ struct se_cmd { > unsigned cmd_wait_set:1; > unsigned unknown_data_length:1; > bool state_active:1; > + bool send_abort_response:1; > u64 tag; /* SAM command identifier aka task tag */ > /* Delay for ALUA Active/NonOptimized state access in milliseconds */ > int alua_nonop_delay; > @@ -496,6 +497,7 @@ struct se_cmd { > spinlock_t t_state_lock; > struct kref cmd_kref; > struct completion t_transport_stop_comp; > + struct completion finished; > > struct work_struct work; > -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-02-07 at 07:40 -0800, Nicholas A. Bellinger wrote: > However, there is a basic fatal flaw in this approach > with this patch as-is that results in se_cmd->finished never being able > to complete for any se_cmd with CMD_T_ABORTED . > > For the above, core_tmr_abort_task() calls __target_check_io_state() to > obtain a se_cmd->cmd_kref if the descriptor is able to be aborted and > sets CMD_T_ABORTED. Then, wait_for_completion_timeout() sits in a while > looping waiting for se_cmd->finished to be completed. > > However, the complete_all(&se_cmd->finished) you've added below is only > invoked from the final se_cmd->cmd_kref callback in > target_release_cmd_kref(). > > Which means core_tmr_abort_task() will always be stuck indefinitely on > se_cmd->finished above, because the subsequent target_put_sess_cmd() > after the wait_for_completion_timeout() is the caller responsible for > doing the final kref_put(&se_cmd->cmd_kref, target_release_cmd_kref) > to perform complete_all(&se_cmd->finished). A fix for this is in test. I will post a new patch as soon as my tests have finished. > > printk("ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for" > > @@ -295,14 +276,31 @@ static void core_tmr_drain_tmr_list( > > (preempt_and_abort_list) ? "Preempt" : "", tmr_p, > > tmr_p->function, tmr_p->response, cmd->t_state); > > > > - cancel_work_sync(&cmd->work); > > - transport_wait_for_tasks(cmd); > > - > > - transport_cmd_finish_abort(cmd, 1); > > + while (!wait_for_completion_timeout(&cmd->finished, 180 * HZ)) > > + pr_debug("LUN RESET: still waiting for ITT %#llx\n", > > + cmd->tag); > > target_put_sess_cmd(cmd); > > } > > } > > > > Same thing here for TMR abort. > > There is no way for target_release_cmd_kref() to ever invoke > complete_all(&se_cmd->finished) to make forward progress beyond the > wait_for_completion_timeout() here until the subsequent > target_put_sess_cmd() is called to drop se_cmd->cmd_kref to zero. > > > @@ -340,6 +339,10 @@ static void core_tmr_drain_state_list( > > */ > > spin_lock_irqsave(&dev->execute_task_lock, flags); > > list_for_each_entry_safe(cmd, next, &dev->state_list, state_list) { > > + /* Skip task management functions. */ > > + if (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) > > + continue; > > + > > TMRs are never added to se_device->state_list, so this extra check is > unnecessary. OK, I will leave this out. > > void target_execute_cmd(struct se_cmd *cmd) > > { > > /* > > + * If the received CDB has aleady been aborted stop processing it here. > > + */ > > + if (transport_check_aborted_status(cmd, 1) != 0) > > + return; > > + > > + /* > > * Determine if frontend context caller is requesting the stopping of > > * this command for frontend exceptions. > > * > > * If the received CDB has aleady been aborted stop processing it here. > > */ > > spin_lock_irq(&cmd->t_state_lock); > > - if (__transport_check_aborted_status(cmd, 1)) { > > - spin_unlock_irq(&cmd->t_state_lock); > > - return; > > - } > > if (cmd->transport_state & CMD_T_STOP) { > > pr_debug("%s:%d CMD_T_STOP for ITT: 0x%08llx\n", > > __func__, __LINE__, cmd->tag); > > The use of transport_check_aborted_status(), instead of using > __transport_check_aborted_status() means se_cmd->t_state_lock is taken > here twice for every I/O. > > There is no reason to take this lock twice, so to avoid additional > fast-path overhead this part should be avoided. A later patch makes transport_check_aborted_status() lockless. > > @@ -2534,10 +2547,8 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks) > > if (aborted) { > > pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag); > > wait_for_completion(&cmd->cmd_wait_comp); > > - cmd->se_tfo->release_cmd(cmd); > > - ret = 1; > > } > > - return ret; > > + return transport_put_cmd(cmd); > > } > > This code snippet is specific to the second order issue mentioned above > involving concurrent session shutdown (CMD_T_FABRIC_STOP) and > CMD_T_ABORTED. > > transport_generic_free_cmd() is used by iscsi-target + iser-target > during session shutdown, and for the second order case existing code > expects I/O to be quiesced and blocked on ->cmd_wait_comp waiting for > se_cmd->cmd_kref to drop to zero, before invoking se_tfo->release_cmd() > directly. > > Specifically, existing iscsi-target + iser-target code expects when > transport_generic_free_cmd() returns the command must no longer be in > flight, and for the CMD_T_ABORTED + CMD_T_FABRIC_STOP case, se_cmd > descriptor memory must be released directly. I will make sure that all target driver work has completed before transport_generic_free_cmd() returns. Bart.-- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-02-07 at 22:27 +0000, Bart Van Assche wrote: > On Tue, 2017-02-07 at 07:40 -0800, Nicholas A. Bellinger wrote: > > However, there is a basic fatal flaw in this approach > > with this patch as-is that results in se_cmd->finished never being able > > to complete for any se_cmd with CMD_T_ABORTED . > > > > For the above, core_tmr_abort_task() calls __target_check_io_state() to > > obtain a se_cmd->cmd_kref if the descriptor is able to be aborted and > > sets CMD_T_ABORTED. Then, wait_for_completion_timeout() sits in a while > > looping waiting for se_cmd->finished to be completed. > > > > However, the complete_all(&se_cmd->finished) you've added below is only > > invoked from the final se_cmd->cmd_kref callback in > > target_release_cmd_kref(). > > > > Which means core_tmr_abort_task() will always be stuck indefinitely on > > se_cmd->finished above, because the subsequent target_put_sess_cmd() > > after the wait_for_completion_timeout() is the caller responsible for > > doing the final kref_put(&se_cmd->cmd_kref, target_release_cmd_kref) > > to perform complete_all(&se_cmd->finished). > > A fix for this is in test. I will post a new patch as soon as my tests have > finished. Can you elaborate a bit on how exactly you've been testing these changes for the first order (handle ABORT_TASKs and LUN_RESET with outstanding backend I/O) and second order (handle session shutdown while order one is active) issues mentioned earlier..? My main concern is that your tests didn't catch the basic first order issue from the earlier review, or at least they didn't check for hung tasks or proper target shutdown after the test completed The nightly automated runs that we (Datera) for v4.1.y and v3.14.y production include the following: Provision 500 target endpoints + luns per node in a 2500 volume cluster. Run a mixed 4k random workload, and drive up the backend I/O latency to the point where open-iscsi starts to send ABORT_TASK (15 second default), LUN_RESET and target reset / session reinstatement (30 second default). This typically generates 1000's of ABORT_TASKs and session resets, and while that is going on we also perform explicit target migration via configfs (eg: third order issue), that performs target configfs shutdown while the first and second order issues are occurring. ESX is also a really good test for TMRs, because it generates ABORT_TASK after 5 seconds of backend I/O latency, as is very aggressive in attempting to bring the device back online via repeated TMRs. So I don't think you need 100s of LUNs to verify these changes, but I'd at least like you to verify with a few dozen target + LUNs with a mixed random small block workload with a sufficiently deep iodepth (say 16), and ensure at least a few hundred ABORT_TASK, LUN_RESET and session shutdowns are really occurring during your test. Once the test is completed, perform an explicit fabric logout and target shutdown. This is to ensure there are no reference leaks, kernel hung tasks in un-interruptible sleep, or other memory leaks that have been introduced by the series. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-02-07 at 19:11 -0800, Nicholas A. Bellinger wrote: > Can you elaborate a bit on how exactly you've been testing these changes > for the first order (handle ABORT_TASKs and LUN_RESET with outstanding > backend I/O) and second order (handle session shutdown while order one > is active) issues mentioned earlier..? Previous tests were run against a fileio backend. That's probably why my tests passed despite the circular wait in the TMF code (that has already been solved BTW). Anyway, the tests I run to verify the TMF code are: 1. The libiscsi unit tests. 2. Running fio with a high queue depth and data verification enabled in one shell and the following code in a second shell: while true; do sg_reset -d /dev/sd...; sleep .1; echo -n .; done > Once the test is completed, perform an explicit fabric logout and target > shutdown. This is to ensure there are no reference leaks, kernel hung > tasks in un-interruptible sleep, or other memory leaks that have been > introduced by the series. I run all my tests against a kernel with kmemleak enabled. That's a very effective way for detecting memory leaks. Bart.-- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2017-02-08 at 18:16 +0000, Bart Van Assche wrote: > On Tue, 2017-02-07 at 19:11 -0800, Nicholas A. Bellinger wrote: > > Can you elaborate a bit on how exactly you've been testing these changes > > for the first order (handle ABORT_TASKs and LUN_RESET with outstanding > > backend I/O) and second order (handle session shutdown while order one > > is active) issues mentioned earlier..? > > Previous tests were run against a fileio backend. That would explain why the basic first order functionality didn't work, because the code-paths that where changed was never actually tested. > That's probably why my > tests passed despite the circular wait in the TMF code (that has already > been solved BTW). Yeah, already looked at your change. It's a hack. Adding a second atomic_t emulating a kref on top of the existing se_cmd->cmd_kref, and then doing a complete_all() for the special TMR path in the normal fast-path is not going to be acceptable. Like I said, I'm all for improvements in the TMR area but you'll need to be alot more methodical about how your proposing the changes, and how your QA team (manual or automation) are verifying the changes. > Anyway, the tests I run to verify the TMF code are: > 1. The libiscsi unit tests. > 2. Running fio with a high queue depth and data verification enabled in > one shell and the following code in a second shell: > > while true; do sg_reset -d /dev/sd...; sleep .1; echo -n .; done > How many individual occurrences of the first order issue (ABORT_TASK + LUN_RESET), second order issue (session resinstatement while first order is active), and third order issue (configfs shutdown while second order is active) have to tested..? How many total volumes, nodes are you using..? > > Once the test is completed, perform an explicit fabric logout and target > > shutdown. This is to ensure there are no reference leaks, kernel hung > > tasks in un-interruptible sleep, or other memory leaks that have been > > introduced by the series. > > I run all my tests against a kernel with kmemleak enabled. That's a very > effective way for detecting memory leaks. Can you give an idea of how many dedicated QA resources that Sandisk is putting on upstream target development..? Do you have production customers using it..? -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2017-02-08 at 11:06 -0800, Nicholas A. Bellinger wrote: > Adding a second atomic_t emulating a kref on top of the existing > se_cmd->cmd_kref, and then doing a complete_all() for the special TMR > path in the normal fast-path is not going to be acceptable. I don't see why that approach would not be acceptable. My patches remove more atomic operations than the number of new atomic operations that are introduced in the hot path so these patches will improve performance. Bart.-- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2017-02-08 at 20:14 +0000, Bart Van Assche wrote: > On Wed, 2017-02-08 at 11:06 -0800, Nicholas A. Bellinger wrote: > > Adding a second atomic_t emulating a kref on top of the existing > > se_cmd->cmd_kref, and then doing a complete_all() for the special TMR > > path in the normal fast-path is not going to be acceptable. > > I don't see why that approach would not be acceptable. My patches remove > more atomic operations than the number of new atomic operations that are > introduced in the hot path so these patches will improve performance. You've got less than 24 hours of soak time after I highlighted a rather fundamental flaw in the v2 approach to the first order issue of how ABORT_TASK and LUN_RESET function when backend I/O is taking an extended amount of time to complete. This tells me your entire manual testing of these code-path thus far been rather useless. Have you tested the second and third order issues at scale yet to ensure these changes don't introduce regressions, vs. what we know is stable in mainline..? If so, how many LUNs, nodes, and how many occurrences of each order issue..? The point is you need to remove the target patches from linux-next that don't have reviews and answer to and understand the feedback I've given you to think about, considering it's taken years to get this particular area of the code stable across the drivers in mainline. Beyond that, if you want to make larger changes you need to really consider getting dedicated QA resources and/or automation setup from your employer to test them for extended period of times, in stressful conditions. Trying to manually test these large changes at small scale in your spare time, efficiently avoiding the interesting code-paths until the obvious flaws where pointed out on the list is unacceptable. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2017-02-08 at 14:49 -0800, Nicholas A. Bellinger wrote: > On Wed, 2017-02-08 at 20:14 +0000, Bart Van Assche wrote: > > On Wed, 2017-02-08 at 11:06 -0800, Nicholas A. Bellinger wrote: > > > Adding a second atomic_t emulating a kref on top of the existing > > > se_cmd->cmd_kref, and then doing a complete_all() for the special TMR > > > path in the normal fast-path is not going to be acceptable. > > > > I don't see why that approach would not be acceptable. My patches remove > > more atomic operations than the number of new atomic operations that are > > introduced in the hot path so these patches will improve performance. > > You've got less than 24 hours of soak time [ ... ] You are irritating me, and it's not very smart to do that. Anyway, a good kernel developer develops tests to reproduce race conditions in much less time than 24 hours. Bart.-- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2017-02-08 at 23:04 +0000, Bart Van Assche wrote: > On Wed, 2017-02-08 at 14:49 -0800, Nicholas A. Bellinger wrote: > > On Wed, 2017-02-08 at 20:14 +0000, Bart Van Assche wrote: > > > On Wed, 2017-02-08 at 11:06 -0800, Nicholas A. Bellinger wrote: > > > > Adding a second atomic_t emulating a kref on top of the existing > > > > se_cmd->cmd_kref, and then doing a complete_all() for the special TMR > > > > path in the normal fast-path is not going to be acceptable. > > > > > > I don't see why that approach would not be acceptable. My patches remove > > > more atomic operations than the number of new atomic operations that are > > > introduced in the hot path so these patches will improve performance. > > > > You've got less than 24 hours of soak time [ ... ] > > You are irritating me, and it's not very smart to do that. Anyway, a good > kernel developer develops tests to reproduce race conditions in much less time > than 24 hours. Listen, my primary concern is the integrity of the code-paths that has taken us years to get stable. Based upon your actions, it looks like your primary concern is getting large untested and unreviewed changes that effect complex code-paths into linux-next, and into mainline. This, for a codebase that you don't have dedicated QA resources or automation assigned from your employer, nor production customers to answer to. So if you want to be a LIO maintainer, you'll need to starting answering to the LIO community for your actions. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2017-02-08 at 15:09 -0800, Nicholas A. Bellinger wrote: > Listen, my primary concern is the integrity of the code-paths that has > taken us years to get stable. If that would be true you would have taken this patch series upstream two years ago, when I posted this patch series for the first time. This patch series namely makes the SCSI target core much more reliable and much easier to maintain. However, instead of helping to make this patch series make progress two years ago you chose to obstruct this work and to increase the complexity of the target core further. You still have a chance to show that reliability and maintainability of the target core is your primary concern by posting review comments for v4 of the patch series I posted. Bart.-- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2017-02-08 at 23:15 +0000, Bart Van Assche wrote: > On Wed, 2017-02-08 at 15:09 -0800, Nicholas A. Bellinger wrote: > > Listen, my primary concern is the integrity of the code-paths that has > > taken us years to get stable. > > If that would be true you would have taken this patch series upstream two > years ago, when I posted this patch series for the first time. Actually, I responded with an explanation of why it was the wrong approach back then, and different issues to consider when touching this particular area of code: http://www.spinics.net/lists/target-devel/msg11057.html And this was your response: http://www.spinics.net/lists/target-devel/msg11542.html Your response disregarded the feedback, and focused on a trivial comment and ignored all others of substance. This is why I didn't merge your patch. So I did the same thing again yesterday. When through the patch line by line, explaining the first, second and third order issues to consider when touching this complex code-path. The only difference was yesterday it was shown the patch could have never worked due to a fundamental flaw in the approach, and that the interesting code-paths in question where never actually hit with your manual testing. > This patch > series namely makes the SCSI target core much more reliable and much easier > to maintain. Making it 'much more readable' at the cost of stability for something that you've been unable to verify with QA regression or automation testing is not acceptable. Like I said, you'll need to start being alot more methodical about incremental patches to complex code-paths, and the testing approach if you want to make larger changes to this area in mainline. You've got like 5 patches for spelling fixes, and only 1 for the large TMR change..? Really..? > However, instead of helping to make this patch series make > progress two years ago you chose to obstruct this work and to increase the > complexity of the target core further. You still have a chance to show that > reliability and maintainability of the target core is your primary concern > by posting review comments for v4 of the patch series I posted. > If you continue to ignore feedback and are unwilling or unable to throughly QA regression test changes in complex areas that have taken years to get working across the tree, don't expect a different outcome this time around either. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2017-02-08 at 15:43 -0800, Nicholas A. Bellinger wrote:
> [ ... ] complex areas that have taken years to get working across the tree [ ... ]
This comment by itself shows that the target core needs to be simplified.
There is only one reason why it took so long to improve stability of the
SCSI target core, namely that it is too complex.
Bart.
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2017-02-08 at 23:48 +0000, Bart Van Assche wrote: > On Wed, 2017-02-08 at 15:43 -0800, Nicholas A. Bellinger wrote: > > [ ... ] complex areas that have taken years to get working across the tree [ ... ] > > This comment by itself shows that the target core needs to be simplified. > There is only one reason why it took so long to improve stability of the > SCSI target core, namely that it is too complex. Ignore and snip all you want, but it doesn't change the fact you've been putting un-tested and un-reviewed junk into linux-next. That will need to stop immediately. If your employer is willing to make a commitment for QA and/or automation resources to regression test (at scale) these type of complex target changes, then I'd be much less concerned about things. If that is not the case, and you've got no production customers for LIO to answer to, and only doing small scale manual regression testing in your spare time, then the risk of regressions for your 'cleanups' in complex areas will continue to be an issue. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/target/target_core_internal.h b/drivers/target/target_core_internal.h index 9ab7090f7c83..6f6cae81ab8e 100644 --- a/drivers/target/target_core_internal.h +++ b/drivers/target/target_core_internal.h @@ -136,7 +136,6 @@ int init_se_kmem_caches(void); void release_se_kmem_caches(void); u32 scsi_get_new_index(scsi_index_t); void transport_subsystem_check_init(void); -void transport_cmd_finish_abort(struct se_cmd *, int); unsigned char *transport_dump_cmd_direction(struct se_cmd *); void transport_dump_dev_state(struct se_device *, char *, int *); void transport_dump_dev_info(struct se_device *, struct se_lun *, @@ -146,7 +145,6 @@ int transport_dump_vpd_assoc(struct t10_vpd *, unsigned char *, int); int transport_dump_vpd_ident_type(struct t10_vpd *, unsigned char *, int); int transport_dump_vpd_ident(struct t10_vpd *, unsigned char *, int); void transport_clear_lun_ref(struct se_lun *); -void transport_send_task_abort(struct se_cmd *); sense_reason_t target_cmd_size_check(struct se_cmd *cmd, unsigned int size); void target_qf_do_work(struct work_struct *work); bool target_check_wce(struct se_device *dev); diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c index 367799b4dde1..bc07e059fb22 100644 --- a/drivers/target/target_core_tmr.c +++ b/drivers/target/target_core_tmr.c @@ -75,25 +75,6 @@ void core_tmr_release_req(struct se_tmr_req *tmr) kfree(tmr); } -static void core_tmr_handle_tas_abort(struct se_cmd *cmd, int tas) -{ - unsigned long flags; - bool remove = true, send_tas; - /* - * TASK ABORTED status (TAS) bit support - */ - spin_lock_irqsave(&cmd->t_state_lock, flags); - send_tas = (cmd->transport_state & CMD_T_TAS); - spin_unlock_irqrestore(&cmd->t_state_lock, flags); - - if (send_tas) { - remove = false; - transport_send_task_abort(cmd); - } - - transport_cmd_finish_abort(cmd, remove); -} - static int target_check_cdb_and_preempt(struct list_head *list, struct se_cmd *cmd) { @@ -192,13 +173,13 @@ void core_tmr_abort_task( continue; } - list_del_init(&se_cmd->se_cmd_list); + se_cmd->send_abort_response = false; spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); - cancel_work_sync(&se_cmd->work); - transport_wait_for_tasks(se_cmd); + while (!wait_for_completion_timeout(&se_cmd->finished, 180 * HZ)) + pr_debug("ABORT TASK: still waiting for ITT %#llx\n", + ref_tag); - transport_cmd_finish_abort(se_cmd, true); target_put_sess_cmd(se_cmd); printk("ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for" @@ -295,14 +276,31 @@ static void core_tmr_drain_tmr_list( (preempt_and_abort_list) ? "Preempt" : "", tmr_p, tmr_p->function, tmr_p->response, cmd->t_state); - cancel_work_sync(&cmd->work); - transport_wait_for_tasks(cmd); - - transport_cmd_finish_abort(cmd, 1); + while (!wait_for_completion_timeout(&cmd->finished, 180 * HZ)) + pr_debug("LUN RESET: still waiting for ITT %#llx\n", + cmd->tag); target_put_sess_cmd(cmd); } } +/** + * core_tmr_drain_state_list() - Abort SCSI commands associated with a device. + * + * @dev: Device for which to abort outstanding SCSI commands. + * @prout_cmd: Pointer to the SCSI PREEMPT AND ABORT if this function is called + * to realize the PREEMPT AND ABORT functionality. + * @tmr_sess: Session through which the LUN RESET has been received. + * @tas: Task Aborted Status (TAS) bit from the SCSI control mode page. + * A quote from SPC-4, paragraph "7.5.10 Control mode page": + * "A task aborted status (TAS) bit set to zero specifies that + * aborted commands shall be terminated by the device server + * without any response to the application client. A TAS bit set + * to one specifies that commands aborted by the actions of an I_T + * nexus other than the I_T nexus on which the command was + * received shall be completed with TASK ABORTED status." + * @preempt_and_abort_list: For the PREEMPT AND ABORT functionality, a list + * with registrations that will be preempted. + */ static void core_tmr_drain_state_list( struct se_device *dev, struct se_cmd *prout_cmd, @@ -311,6 +309,7 @@ static void core_tmr_drain_state_list( struct list_head *preempt_and_abort_list) { LIST_HEAD(drain_task_list); + struct se_node_acl *tmr_nacl = tmr_sess ? tmr_sess->se_node_acl : NULL; struct se_session *sess; struct se_cmd *cmd, *next; unsigned long flags; @@ -340,6 +339,10 @@ static void core_tmr_drain_state_list( */ spin_lock_irqsave(&dev->execute_task_lock, flags); list_for_each_entry_safe(cmd, next, &dev->state_list, state_list) { + /* Skip task management functions. */ + if (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) + continue; + /* * For PREEMPT_AND_ABORT usage, only process commands * with a matching reservation key. @@ -365,6 +368,8 @@ static void core_tmr_drain_state_list( list_move_tail(&cmd->state_list, &drain_task_list); cmd->state_active = false; + if (tmr_nacl && tmr_nacl != cmd->se_sess->se_node_acl && tas) + cmd->send_abort_response = true; } spin_unlock_irqrestore(&dev->execute_task_lock, flags); @@ -387,17 +392,9 @@ static void core_tmr_drain_state_list( (cmd->transport_state & CMD_T_STOP) != 0, (cmd->transport_state & CMD_T_SENT) != 0); - /* - * If the command may be queued onto a workqueue cancel it now. - * - * This is equivalent to removal from the execute queue in the - * loop above, but we do it down here given that - * cancel_work_sync may block. - */ - cancel_work_sync(&cmd->work); - transport_wait_for_tasks(cmd); - - core_tmr_handle_tas_abort(cmd, tas); + while (!wait_for_completion_timeout(&cmd->finished, 180 * HZ)) + pr_debug("LUN RESET: still waiting for cmd with ITT %#llx\n", + cmd->tag); target_put_sess_cmd(cmd); } } diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 40dff4799984..8506ecc0d6c0 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -650,25 +650,6 @@ static void transport_lun_remove_cmd(struct se_cmd *cmd) percpu_ref_put(&lun->lun_ref); } -void transport_cmd_finish_abort(struct se_cmd *cmd, int remove) -{ - bool ack_kref = (cmd->se_cmd_flags & SCF_ACK_KREF); - - if (cmd->se_cmd_flags & SCF_SE_LUN_CMD) - transport_lun_remove_cmd(cmd); - /* - * Allow the fabric driver to unmap any resources before - * releasing the descriptor via TFO->release_cmd() - */ - if (remove) - cmd->se_tfo->aborted_task(cmd); - - if (transport_cmd_check_stop_to_fabric(cmd)) - return; - if (remove && ack_kref) - transport_put_cmd(cmd); -} - static void target_complete_failure_work(struct work_struct *work) { struct se_cmd *cmd = container_of(work, struct se_cmd, work); @@ -700,14 +681,58 @@ static unsigned char *transport_get_sense_buffer(struct se_cmd *cmd) return cmd->sense_buffer; } +static void transport_handle_abort(struct se_cmd *cmd) +{ + bool ack_kref = cmd->se_cmd_flags & SCF_ACK_KREF; + + transport_lun_remove_cmd(cmd); + + if (cmd->send_abort_response) { + cmd->scsi_status = SAM_STAT_TASK_ABORTED; + pr_debug("Setting SAM_STAT_TASK_ABORTED status for CDB: 0x%02x, ITT: 0x%08llx\n", + cmd->t_task_cdb[0], cmd->tag); + trace_target_cmd_complete(cmd); + cmd->se_tfo->queue_status(cmd); + transport_cmd_check_stop_to_fabric(cmd); + } else { + /* + * Allow the fabric driver to unmap any resources before + * releasing the descriptor via TFO->release_cmd() + */ + cmd->se_tfo->aborted_task(cmd); + /* + * To do: establish a unit attention condition on the I_T + * nexus associated with cmd. See also the paragraph "Aborting + * commands" in SAM. + */ + if (transport_cmd_check_stop_to_fabric(cmd) == 0 && ack_kref) + transport_put_cmd(cmd); + } +} + +static void target_abort_work(struct work_struct *work) +{ + struct se_cmd *cmd = container_of(work, struct se_cmd, work); + + transport_handle_abort(cmd); +} + +/* May be called from interrupt context so must not sleep. */ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status) { struct se_device *dev = cmd->se_dev; int success = scsi_status == GOOD; unsigned long flags; - cmd->scsi_status = scsi_status; + if (cmd->transport_state & CMD_T_ABORTED) { + INIT_WORK(&cmd->work, target_abort_work); + goto queue_work; + } else if (cmd->transport_state & CMD_T_STOP) { + complete_all(&cmd->t_transport_stop_comp); + return; + } + cmd->scsi_status = scsi_status; spin_lock_irqsave(&cmd->t_state_lock, flags); cmd->transport_state &= ~CMD_T_BUSY; @@ -720,16 +745,7 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status) success = 1; } - /* - * Check for case where an explicit ABORT_TASK has been received - * and transport_wait_for_tasks() will be waiting for completion.. - */ - if (cmd->transport_state & CMD_T_ABORTED || - cmd->transport_state & CMD_T_STOP) { - spin_unlock_irqrestore(&cmd->t_state_lock, flags); - complete_all(&cmd->t_transport_stop_comp); - return; - } else if (!success) { + if (!success) { INIT_WORK(&cmd->work, target_complete_failure_work); } else { INIT_WORK(&cmd->work, target_complete_ok_work); @@ -739,6 +755,7 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status) cmd->transport_state |= (CMD_T_COMPLETE | CMD_T_ACTIVE); spin_unlock_irqrestore(&cmd->t_state_lock, flags); +queue_work: if (cmd->se_cmd_flags & SCF_USE_CPUID) queue_work_on(cmd->cpuid, target_completion_wq, &cmd->work); else @@ -1221,6 +1238,7 @@ void transport_init_se_cmd( INIT_LIST_HEAD(&cmd->state_list); init_completion(&cmd->t_transport_stop_comp); init_completion(&cmd->cmd_wait_comp); + init_completion(&cmd->finished); spin_lock_init(&cmd->t_state_lock); kref_init(&cmd->cmd_kref); cmd->transport_state = CMD_T_DEV_ACTIVE; @@ -1883,16 +1901,18 @@ static int __transport_check_aborted_status(struct se_cmd *, int); void target_execute_cmd(struct se_cmd *cmd) { /* + * If the received CDB has aleady been aborted stop processing it here. + */ + if (transport_check_aborted_status(cmd, 1) != 0) + return; + + /* * Determine if frontend context caller is requesting the stopping of * this command for frontend exceptions. * * If the received CDB has aleady been aborted stop processing it here. */ spin_lock_irq(&cmd->t_state_lock); - if (__transport_check_aborted_status(cmd, 1)) { - spin_unlock_irq(&cmd->t_state_lock); - return; - } if (cmd->transport_state & CMD_T_STOP) { pr_debug("%s:%d CMD_T_STOP for ITT: 0x%08llx\n", __func__, __LINE__, cmd->tag); @@ -2499,15 +2519,11 @@ static void target_wait_free_cmd(struct se_cmd *cmd, bool *aborted, bool *tas) int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks) { - int ret = 0; bool aborted = false, tas = false; if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) { if (wait_for_tasks && (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB)) target_wait_free_cmd(cmd, &aborted, &tas); - - if (!aborted || tas) - ret = transport_put_cmd(cmd); } else { if (wait_for_tasks) target_wait_free_cmd(cmd, &aborted, &tas); @@ -2521,9 +2537,6 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks) if (cmd->se_lun) transport_lun_remove_cmd(cmd); - - if (!aborted || tas) - ret = transport_put_cmd(cmd); } /* * If the task has been internally aborted due to TMR ABORT_TASK @@ -2534,10 +2547,8 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks) if (aborted) { pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag); wait_for_completion(&cmd->cmd_wait_comp); - cmd->se_tfo->release_cmd(cmd); - ret = 1; } - return ret; + return transport_put_cmd(cmd); } EXPORT_SYMBOL(transport_generic_free_cmd); @@ -2596,6 +2607,8 @@ static void target_release_cmd_kref(struct kref *kref) unsigned long flags; bool fabric_stop; + complete_all(&se_cmd->finished); + spin_lock_irqsave(&se_sess->sess_cmd_lock, flags); spin_lock(&se_cmd->t_state_lock); @@ -2697,7 +2710,7 @@ void target_wait_for_sess_cmds(struct se_session *se_sess) " fabric state: %d\n", se_cmd, se_cmd->t_state, se_cmd->se_tfo->get_cmd_state(se_cmd)); - se_cmd->se_tfo->release_cmd(se_cmd); + target_put_sess_cmd(se_cmd); } spin_lock_irqsave(&se_sess->sess_cmd_lock, flags); @@ -3009,16 +3022,7 @@ static int __transport_check_aborted_status(struct se_cmd *cmd, int send_status) return 1; } - pr_debug("Sending delayed SAM_STAT_TASK_ABORTED status for CDB:" - " 0x%02x ITT: 0x%08llx\n", cmd->t_task_cdb[0], cmd->tag); - cmd->se_cmd_flags &= ~SCF_SEND_DELAYED_TAS; - cmd->scsi_status = SAM_STAT_TASK_ABORTED; - trace_target_cmd_complete(cmd); - - spin_unlock_irq(&cmd->t_state_lock); - cmd->se_tfo->queue_status(cmd); - spin_lock_irq(&cmd->t_state_lock); return 1; } @@ -3035,47 +3039,6 @@ int transport_check_aborted_status(struct se_cmd *cmd, int send_status) } EXPORT_SYMBOL(transport_check_aborted_status); -void transport_send_task_abort(struct se_cmd *cmd) -{ - unsigned long flags; - - spin_lock_irqsave(&cmd->t_state_lock, flags); - if (cmd->se_cmd_flags & (SCF_SENT_CHECK_CONDITION)) { - spin_unlock_irqrestore(&cmd->t_state_lock, flags); - return; - } - spin_unlock_irqrestore(&cmd->t_state_lock, flags); - - /* - * If there are still expected incoming fabric WRITEs, we wait - * until until they have completed before sending a TASK_ABORTED - * response. This response with TASK_ABORTED status will be - * queued back to fabric module by transport_check_aborted_status(). - */ - if (cmd->data_direction == DMA_TO_DEVICE) { - if (cmd->se_tfo->write_pending_status(cmd) != 0) { - spin_lock_irqsave(&cmd->t_state_lock, flags); - if (cmd->se_cmd_flags & SCF_SEND_DELAYED_TAS) { - spin_unlock_irqrestore(&cmd->t_state_lock, flags); - goto send_abort; - } - cmd->se_cmd_flags |= SCF_SEND_DELAYED_TAS; - spin_unlock_irqrestore(&cmd->t_state_lock, flags); - return; - } - } -send_abort: - cmd->scsi_status = SAM_STAT_TASK_ABORTED; - - transport_lun_remove_cmd(cmd); - - pr_debug("Setting SAM_STAT_TASK_ABORTED status for CDB: 0x%02x, ITT: 0x%08llx\n", - cmd->t_task_cdb[0], cmd->tag); - - trace_target_cmd_complete(cmd); - cmd->se_tfo->queue_status(cmd); -} - static void target_tmr_work(struct work_struct *work) { struct se_cmd *cmd = container_of(work, struct se_cmd, work); diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h index fb87f0e5d0d6..e690ce36e592 100644 --- a/include/target/target_core_base.h +++ b/include/target/target_core_base.h @@ -444,6 +444,7 @@ struct se_cmd { unsigned cmd_wait_set:1; unsigned unknown_data_length:1; bool state_active:1; + bool send_abort_response:1; u64 tag; /* SAM command identifier aka task tag */ /* Delay for ALUA Active/NonOptimized state access in milliseconds */ int alua_nonop_delay; @@ -496,6 +497,7 @@ struct se_cmd { spinlock_t t_state_lock; struct kref cmd_kref; struct completion t_transport_stop_comp; + struct completion finished; struct work_struct work;
Instead of invoking target driver callback functions from the context that handles an abort or LUN RESET task management function, only set the abort flag from that context and perform the actual abort handling from the context of the regular command processing flow. This approach has the following advantages: - The task management function code becomes much easier to read and to verify since the number of potential race conditions against the command processing flow is strongly reduced. - It is no longer needed to store the command state into the command itself since that information is no longer needed from the context where a task management function is processed. Notes: - With this patch applied the CMD_T_ABORTED flag is checked at two points by the target core: just before local execution of a command starts (see also target_execute_cmd()) and also just before the response is sent (see also target_complete_cmd()). - Two .release_cmd() calls have been changed into transport_put_cmd() calls to ensure that the 'finished' completion is set before .release_cmd() is called. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andy Grover <agrover@redhat.com> Cc: David Disseldorp <ddiss@suse.de> --- drivers/target/target_core_internal.h | 2 - drivers/target/target_core_tmr.c | 73 ++++++++-------- drivers/target/target_core_transport.c | 153 +++++++++++++-------------------- include/target/target_core_base.h | 2 + 4 files changed, 95 insertions(+), 135 deletions(-)