Message ID | 20241011081807.65027-1-hawkxiang.cpp@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | scsi: libiscsi: Set expecting_cc_ua flag when stop_conn | expand |
CC'ing the fibre channel experts because they might have the same issue. On 10/11/24 3:18 AM, Xiang Zhang wrote: > Initiator need to recover session and reconnect to target, after calling stop_conn. And target will rebuild new session info, and mark ASC_POWERON_RESET ua sense for scsi devices belong to the target(device reset). After recovery, first scsi command(scmd) request to target will get ASC_POWERON_RESET(ua sense) + SAM_STAT_CHECK_CONDITION(status) in response. > According to scsi code: "scsi_done --> scsi_complete --> scsi_decide_disposition --> scsi_check_sense", if expecting_cc_ua = 0, scmd response with ASC_POWERON_RESET(ua sense) will ignore "cmd->retries <= cmd->allowed", fail directly. It will cause SCSI return io_error to upper layer without retry. Just want to make sure I understand the problem. Does the failure only happen with tape or passthrough or if removable is set? For commands coming from sd, then scsi_io_completion will end up calling scsi_io_completion_action and seeing the UNIT_ATTENTION and will retry. I'm not saying we shouldn't do a fix like you did below. Just want to make sure I understand the case you describe above. > If we set expecting_cc_ua=1 in fail_scsi_tasks, SISC will retry the scmd which is response with ASC_POWERON_RESET. The scmd second request to target can successful, because target will clear ASC_POWERON_RESET in device pending ua_sense_list after first scmd request. What does "SISC" stand for? > > Signed-off-by: Xiang Zhang <hawkxiang.cpp@gmail.com> > --- > drivers/scsi/libiscsi.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index 0fda8905eabd..317e57be32b3 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -629,9 +629,10 @@ static void __fail_scsi_task(struct iscsi_task *task, int err) > conn->session->queued_cmdsn--; > /* it was never sent so just complete like normal */ > state = ISCSI_TASK_COMPLETED; > - } else if (err == DID_TRANSPORT_DISRUPTED) > + } else if (err == DID_TRANSPORT_DISRUPTED) { > state = ISCSI_TASK_ABRT_SESS_RECOV; > - else > + sc->device->expecting_cc_ua = 1; The failure case can happen with other transports like fibre channel right? If it's common I think we want this in the core scsi code. For iscsi, we want to set expecting_cc_ua whenever we call scsi_block_targets() or whenever we return DID_TRANSPORT_DISRUPTED or DID_TRANSPORT_FAILFAST. FC developers, I'm not sure if that's the case for you. For example if your driver called fc_remote_port_delete -> scsi_block_targets but then the issue is resolved quickly, like for a quick cable pull, and you called fc_remote_port_add, could there be cases where you did not get a I_T Nexus loss/reset type of issue? Or is it the case where anytime a fc driver calls fc_remote_port_delete then you will expect a UA after calling fc_remote_port_add again?
Hi Xiang, kernel test robot noticed the following build warnings: [auto build test WARNING on mkp-scsi/for-next] [also build test WARNING on jejb-scsi/for-next linus/master v6.12-rc2 next-20241011] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Xiang-Zhang/scsi-libiscsi-Set-expecting_cc_ua-flag-when-stop_conn/20241011-161915 base: https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next patch link: https://lore.kernel.org/r/20241011081807.65027-1-hawkxiang.cpp%40gmail.com patch subject: [PATCH] scsi: libiscsi: Set expecting_cc_ua flag when stop_conn config: x86_64-kexec (https://download.01.org/0day-ci/archive/20241012/202410122213.bq19EI34-lkp@intel.com/config) compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241012/202410122213.bq19EI34-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202410122213.bq19EI34-lkp@intel.com/ All warnings (new ones prefixed by >>): >> drivers/scsi/libiscsi.c:634:3: warning: variable 'sc' is uninitialized when used here [-Wuninitialized] 634 | sc->device->expecting_cc_ua = 1; | ^~ drivers/scsi/libiscsi.c:618:22: note: initialize the variable 'sc' to silence this warning 618 | struct scsi_cmnd *sc; | ^ | = NULL 1 warning generated. vim +/sc +634 drivers/scsi/libiscsi.c 610 611 /* 612 * session back and frwd lock must be held and if not called for a task that 613 * is still pending or from the xmit thread, then xmit thread must be suspended 614 */ 615 static void __fail_scsi_task(struct iscsi_task *task, int err) 616 { 617 struct iscsi_conn *conn = task->conn; 618 struct scsi_cmnd *sc; 619 int state; 620 621 if (cleanup_queued_task(task)) 622 return; 623 624 if (task->state == ISCSI_TASK_PENDING) { 625 /* 626 * cmd never made it to the xmit thread, so we should not count 627 * the cmd in the sequencing 628 */ 629 conn->session->queued_cmdsn--; 630 /* it was never sent so just complete like normal */ 631 state = ISCSI_TASK_COMPLETED; 632 } else if (err == DID_TRANSPORT_DISRUPTED) { 633 state = ISCSI_TASK_ABRT_SESS_RECOV; > 634 sc->device->expecting_cc_ua = 1; 635 } else 636 state = ISCSI_TASK_ABRT_TMF; 637 638 sc = task->sc; 639 sc->result = err << 16; 640 scsi_set_resid(sc, scsi_bufflen(sc)); 641 iscsi_complete_task(task, state); 642 } 643
On 10/12/24 2:55 AM, 张翔 wrote: > > > For commands coming from sd, then scsi_io_completion will end up calling > scsi_io_completion_action and seeing the UNIT_ATTENTION and will retry. > I'm not saying we shouldn't do a fix like you did below. Just want to > make sure I understand the case you describe above. > > > For commands coming from sd, then scsi_complete calling scsi_decide_disposition to get "enum scsi_disposition", scsi_decide_disposition seeing the SAM_STAT_CHECK_CONDITION and calling scsi_check_sense function, then scsi_check_sense seeing UNIT_ATTENTION. If expecting_cc_ua == 1, scsi_check_sense return NEEDS_RETRY and scsi_complete will retry. For sd, scsi_decide_disposition will return SUCCESS. scsi_complete will call scsi_finish_command. In there we call the upper layer done callback, sd_done, and it will return 0 as there are no good bytes. scsi_io_completion will initially complete 0 bytes. If there are retries left then we call scsi_io_completion_action which sees the UA and will retry.
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 0fda8905eabd..317e57be32b3 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -629,9 +629,10 @@ static void __fail_scsi_task(struct iscsi_task *task, int err) conn->session->queued_cmdsn--; /* it was never sent so just complete like normal */ state = ISCSI_TASK_COMPLETED; - } else if (err == DID_TRANSPORT_DISRUPTED) + } else if (err == DID_TRANSPORT_DISRUPTED) { state = ISCSI_TASK_ABRT_SESS_RECOV; - else + sc->device->expecting_cc_ua = 1; + } else state = ISCSI_TASK_ABRT_TMF; sc = task->sc;
Initiator need to recover session and reconnect to target, after calling stop_conn. And target will rebuild new session info, and mark ASC_POWERON_RESET ua sense for scsi devices belong to the target(device reset). After recovery, first scsi command(scmd) request to target will get ASC_POWERON_RESET(ua sense) + SAM_STAT_CHECK_CONDITION(status) in response. According to scsi code: "scsi_done --> scsi_complete --> scsi_decide_disposition --> scsi_check_sense", if expecting_cc_ua = 0, scmd response with ASC_POWERON_RESET(ua sense) will ignore "cmd->retries <= cmd->allowed", fail directly. It will cause SCSI return io_error to upper layer without retry. If we set expecting_cc_ua=1 in fail_scsi_tasks, SISC will retry the scmd which is response with ASC_POWERON_RESET. The scmd second request to target can successful, because target will clear ASC_POWERON_RESET in device pending ua_sense_list after first scmd request. Signed-off-by: Xiang Zhang <hawkxiang.cpp@gmail.com> --- drivers/scsi/libiscsi.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)