Message ID | 1493751261-81647-1-git-send-email-bryantly@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Tue, 2017-05-02 at 13:54 -0500, Bryant G. Ly wrote: > The driver is sending a response to the actual scsi op that was > aborted by an abort task TM, while LIO is sending a response to > the abort task TM. > > ibmvscsis_tgt does not send the response to the client until > release_cmd time. The reason for this was because if we did it > at queue_status time, then the client would be free to reuse the > tag for that command, but we're still using the tag until the > command is released at release_cmd time, so we chose to delay > sending the response until then. That then caused this issue, because > release_cmd is always called, even if queue_status is not. > > SCSI spec says that the initiator that sends the abort task > TM NEVER gets a response to the aborted op and with the current > code it will send a response. Thus this fix will remove that response > if the TAS bit is set. > > Cc: <stable@vger.kernel.org> # v4.8+ > Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> > Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> > --- > drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 66 ++++++++++++++++++++++---------- > 1 file changed, 45 insertions(+), 21 deletions(-) Applied, with a small update to the last sentence of the commit log wrt to 'if ABORTED && !TAS bit is set'. Thanks Bryant + Tyrel.
Hi Nick, On 5/2/17 10:43 PM, Nicholas A. Bellinger wrote: > On Tue, 2017-05-02 at 13:54 -0500, Bryant G. Ly wrote: >> The driver is sending a response to the actual scsi op that was >> aborted by an abort task TM, while LIO is sending a response to >> the abort task TM. >> >> ibmvscsis_tgt does not send the response to the client until >> release_cmd time. The reason for this was because if we did it >> at queue_status time, then the client would be free to reuse the >> tag for that command, but we're still using the tag until the >> command is released at release_cmd time, so we chose to delay >> sending the response until then. That then caused this issue, because >> release_cmd is always called, even if queue_status is not. >> >> SCSI spec says that the initiator that sends the abort task >> TM NEVER gets a response to the aborted op and with the current >> code it will send a response. Thus this fix will remove that response >> if the TAS bit is set. >> >> Cc: <stable@vger.kernel.org> # v4.8+ >> Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> >> Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> >> --- >> drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 66 ++++++++++++++++++++++---------- >> 1 file changed, 45 insertions(+), 21 deletions(-) > Applied, with a small update to the last sentence of the commit log wrt > to 'if ABORTED && !TAS bit is set'. > > Thanks Bryant + Tyrel. > So I have been running tests on this patch extensively and it seems like after running 2 hrs of consecutive aborts it fails again with the same problem of driver sending a response to the actual scsi op. It looks like when LIO processes the abort and if (!__target_check_io_state(se_cmd, se_sess, 0)) then rsp gets set as TMR_TASK_DOES_NOT_EXIST. With that response the CMD_T_ABORTED state is not set, so then the ibmvscsis driver still sends that duplicate response. I think the best solution would be in driver when queue_tm_rsp gets called and when the driver builds the response to check for the TMR_TASK_DOES_NOT_EXIST and set our own flag. Then we would also check for that flag, so it would be: if ((cmd->se_cmd.transport_state & CMD_T_ABORTED && !(cmd->se_cmd.transport_state & CMD_T_TAS)) || cmd->flags & CMD_ABORTED) { This will ensure in the CMD_T_ABORTED w/o CMD_T_TAS scenarios are handled and the ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST case. -Bryant
On 5/3/17 11:38 AM, Bryant G. Ly wrote: > Hi Nick, > > On 5/2/17 10:43 PM, Nicholas A. Bellinger wrote: > >> On Tue, 2017-05-02 at 13:54 -0500, Bryant G. Ly wrote: >>> The driver is sending a response to the actual scsi op that was >>> aborted by an abort task TM, while LIO is sending a response to >>> the abort task TM. >>> >>> ibmvscsis_tgt does not send the response to the client until >>> release_cmd time. The reason for this was because if we did it >>> at queue_status time, then the client would be free to reuse the >>> tag for that command, but we're still using the tag until the >>> command is released at release_cmd time, so we chose to delay >>> sending the response until then. That then caused this issue, because >>> release_cmd is always called, even if queue_status is not. >>> >>> SCSI spec says that the initiator that sends the abort task >>> TM NEVER gets a response to the aborted op and with the current >>> code it will send a response. Thus this fix will remove that response >>> if the TAS bit is set. >>> >>> Cc: <stable@vger.kernel.org> # v4.8+ >>> Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> >>> Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> >>> --- >>> drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 66 >>> ++++++++++++++++++++++---------- >>> 1 file changed, 45 insertions(+), 21 deletions(-) >> Applied, with a small update to the last sentence of the commit log wrt >> to 'if ABORTED && !TAS bit is set'. >> >> Thanks Bryant + Tyrel. >> > So I have been running tests on this patch extensively and it seems > like after running 2 hrs of consecutive aborts it fails again with the > same problem of driver sending a response to the actual scsi op. > > It looks like when LIO processes the abort and if > (!__target_check_io_state(se_cmd, se_sess, 0)) then rsp gets set as > TMR_TASK_DOES_NOT_EXIST. > > With that response the CMD_T_ABORTED state is not set, so then the > ibmvscsis driver still sends that duplicate response. > > I think the best solution would be in driver when queue_tm_rsp gets > called and when the driver builds the response to check for the > TMR_TASK_DOES_NOT_EXIST and set our own flag. > Then we would also check for that flag, so it would be: > > if ((cmd->se_cmd.transport_state & CMD_T_ABORTED && > !(cmd->se_cmd.transport_state & CMD_T_TAS)) || cmd->flags & > CMD_ABORTED) { > > This will ensure in the CMD_T_ABORTED w/o CMD_T_TAS scenarios are > handled and the ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST case. > > -Bryant > > Hi Nick, You can ignore the email about the patch being broken. The script that was used for testing isn't right. So the patch is working as intended now, where on an abort the client does not receive an extra response to the actual scsi op. Thanks, Bryant Ly
On 5/3/17 1:55 PM, Bryant G. Ly wrote: > On 5/3/17 11:38 AM, Bryant G. Ly wrote: > >> Hi Nick, >> >> On 5/2/17 10:43 PM, Nicholas A. Bellinger wrote: >> >>> On Tue, 2017-05-02 at 13:54 -0500, Bryant G. Ly wrote: >>>> The driver is sending a response to the actual scsi op that was >>>> aborted by an abort task TM, while LIO is sending a response to >>>> the abort task TM. >>>> >>>> ibmvscsis_tgt does not send the response to the client until >>>> release_cmd time. The reason for this was because if we did it >>>> at queue_status time, then the client would be free to reuse the >>>> tag for that command, but we're still using the tag until the >>>> command is released at release_cmd time, so we chose to delay >>>> sending the response until then. That then caused this issue, because >>>> release_cmd is always called, even if queue_status is not. >>>> >>>> SCSI spec says that the initiator that sends the abort task >>>> TM NEVER gets a response to the aborted op and with the current >>>> code it will send a response. Thus this fix will remove that response >>>> if the TAS bit is set. >>>> >>>> Cc: <stable@vger.kernel.org> # v4.8+ >>>> Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> >>>> Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> >>>> --- >>>> drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 66 >>>> ++++++++++++++++++++++---------- >>>> 1 file changed, 45 insertions(+), 21 deletions(-) >>> Applied, with a small update to the last sentence of the commit log wrt >>> to 'if ABORTED && !TAS bit is set'. >>> >>> Thanks Bryant + Tyrel. >>> >> So I have been running tests on this patch extensively and it seems >> like after running 2 hrs of consecutive aborts it fails again with >> the same problem of driver sending a response to the actual scsi op. >> >> It looks like when LIO processes the abort and if >> (!__target_check_io_state(se_cmd, se_sess, 0)) then rsp gets set as >> TMR_TASK_DOES_NOT_EXIST. >> >> With that response the CMD_T_ABORTED state is not set, so then the >> ibmvscsis driver still sends that duplicate response. >> >> I think the best solution would be in driver when queue_tm_rsp gets >> called and when the driver builds the response to check for the >> TMR_TASK_DOES_NOT_EXIST and set our own flag. >> Then we would also check for that flag, so it would be: >> >> if ((cmd->se_cmd.transport_state & CMD_T_ABORTED && >> !(cmd->se_cmd.transport_state & CMD_T_TAS)) || cmd->flags & >> CMD_ABORTED) { >> >> This will ensure in the CMD_T_ABORTED w/o CMD_T_TAS scenarios are >> handled and the ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST case. >> >> -Bryant >> >> > Hi Nick, > > You can ignore the email about the patch being broken. The script that > was used for testing isn't right. > > So the patch is working as intended now, where on an abort the client > does not receive an extra response to the actual scsi op. > > Thanks, > > Bryant Ly > Sorry Nick, but can you hold off on this patch. We are still getting extra responses but in a small timing window. The majority of the aborts work but if we get an abort between LIO calling queue_state and LIO calling release_cmd, we get an extra response. Working on a solution, and will still have to do the tests where we basically send aborts over and over on the client btwn random timing widows. -Bryant
On Thu, 2017-05-04 at 18:13 -0500, Bryant G. Ly wrote: > > > Sorry Nick, but can you hold off on this patch. We are still getting extra responses but in a small timing window. > > The majority of the aborts work but if we get an abort between LIO calling queue_state and LIO calling release_cmd, we get an extra response. > > > Working on a solution, and will still have to do the tests where we basically send aborts over and over on the client btwn random timing widows. > Sure, dropping this patch for now.
diff --git a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c index 4bb5635..d6e62ce 100644 --- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c +++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c @@ -1758,33 +1758,46 @@ static void ibmvscsis_send_messages(struct scsi_info *vscsi) if (!(vscsi->flags & RESPONSE_Q_DOWN)) { list_for_each_entry_safe(cmd, nxt, &vscsi->waiting_rsp, list) { - iue = cmd->iue; - - crq->valid = VALID_CMD_RESP_EL; - crq->format = cmd->rsp.format; + /* + * If Task Abort Status Bit is set, then dont send a + * response. + */ + if (cmd->se_cmd.transport_state & CMD_T_ABORTED && + !(cmd->se_cmd.transport_state & CMD_T_TAS)) { + list_del(&cmd->list); + ibmvscsis_free_cmd_resources(vscsi, cmd); + } else { + iue = cmd->iue; - if (cmd->flags & CMD_FAST_FAIL) - crq->status = VIOSRP_ADAPTER_FAIL; + crq->valid = VALID_CMD_RESP_EL; + crq->format = cmd->rsp.format; - crq->IU_length = cpu_to_be16(cmd->rsp.len); + if (cmd->flags & CMD_FAST_FAIL) + crq->status = VIOSRP_ADAPTER_FAIL; - rc = h_send_crq(vscsi->dma_dev->unit_address, - be64_to_cpu(msg_hi), - be64_to_cpu(cmd->rsp.tag)); + crq->IU_length = cpu_to_be16(cmd->rsp.len); - pr_debug("send_messages: cmd %p, tag 0x%llx, rc %ld\n", - cmd, be64_to_cpu(cmd->rsp.tag), rc); + rc = h_send_crq(vscsi->dma_dev->unit_address, + be64_to_cpu(msg_hi), + be64_to_cpu(cmd->rsp.tag)); - /* if all ok free up the command element resources */ - if (rc == H_SUCCESS) { - /* some movement has occurred */ - vscsi->rsp_q_timer.timer_pops = 0; - list_del(&cmd->list); + pr_debug("send_messages: cmd %p, tag 0x%llx, rc %ld\n", + cmd, be64_to_cpu(cmd->rsp.tag), rc); - ibmvscsis_free_cmd_resources(vscsi, cmd); - } else { - srp_snd_msg_failed(vscsi, rc); - break; + /* if all ok free up the command element + * resources + */ + if (rc == H_SUCCESS) { + /* some movement has occurred */ + vscsi->rsp_q_timer.timer_pops = 0; + list_del(&cmd->list); + + ibmvscsis_free_cmd_resources(vscsi, + cmd); + } else { + srp_snd_msg_failed(vscsi, rc); + break; + } } } @@ -3581,9 +3594,20 @@ static int ibmvscsis_write_pending(struct se_cmd *se_cmd) { struct ibmvscsis_cmd *cmd = container_of(se_cmd, struct ibmvscsis_cmd, se_cmd); + struct scsi_info *vscsi = cmd->adapter; struct iu_entry *iue = cmd->iue; int rc; + /* + * If CLIENT_FAILED OR RESPONSE_Q_DOWN, then just return success + * since LIO can't do anything about it, and we dont want to + * attempt an srp_transfer_data. + */ + if ((vscsi->flags & (CLIENT_FAILED | RESPONSE_Q_DOWN))) { + pr_err("write_pending failed since: %d\n", vscsi->flags); + return 0; + } + rc = srp_transfer_data(cmd, &vio_iu(iue)->srp.cmd, ibmvscsis_rdma, 1, 1); if (rc) {