diff mbox series

[v2,4/4] qla2xxx: Handle incorrect entry_type entries

Message ID 20200831161854.70879-5-dwagner@suse.de (mailing list archive)
State Superseded
Headers show
Series qla2xxx: A couple crash fixes | expand

Commit Message

Daniel Wagner Aug. 31, 2020, 4:18 p.m. UTC
It was observed on an ISP8324 16Gb HBA with fw=8.08.203 (d0d5) that
pkt->entry_type was MBX_IOCB_TYPE/0x39 with an sp->type SRB_SCSI_CMD
which is invalid and should not be possible.

A careful code review of the crash dump didn't reveal any short
comings. Reading the entry_type from the crash dump shows the expected
value of STATUS_TYPE/0x03 but the call trace shows that
qla24xx_mbx_iocb_entry() is used.

One possible explanation is when pkt->entry_type is read it doesn't
contain the correct information. That means the driver observes an data
race by the firmware.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/scsi/qla2xxx/qla_isr.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

Comments

Arun Easi Sept. 8, 2020, 6:47 a.m. UTC | #1
On Mon, 31 Aug 2020, 9:18am, Daniel Wagner wrote:
>
> It was observed on an ISP8324 16Gb HBA with fw=8.08.203 (d0d5) that
> pkt->entry_type was MBX_IOCB_TYPE/0x39 with an sp->type SRB_SCSI_CMD
> which is invalid and should not be possible.
> 
> A careful code review of the crash dump didn't reveal any short
> comings. Reading the entry_type from the crash dump shows the expected
> value of STATUS_TYPE/0x03 but the call trace shows that
> qla24xx_mbx_iocb_entry() is used.
> 
> One possible explanation is when pkt->entry_type is read it doesn't
> contain the correct information. That means the driver observes an data
> race by the firmware.
> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>  drivers/scsi/qla2xxx/qla_isr.c | 30 ++++++++++++++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
> index b787643f5031..22aa4c0b901d 100644
> --- a/drivers/scsi/qla2xxx/qla_isr.c
> +++ b/drivers/scsi/qla2xxx/qla_isr.c
> @@ -3392,6 +3392,33 @@ void qla24xx_nvme_ls4_iocb(struct scsi_qla_host *vha,
>  	sp->done(sp, comp_status);
>  }
>  
> +static void qla24xx_process_mbx_iocb_response(struct scsi_qla_host *vha,
> +	struct rsp_que *rsp, struct sts_entry_24xx *pkt)
> +{
> +	srb_t *sp;
> +
> +	sp = qla2x00_get_sp_from_handle(vha, rsp->req, pkt);
> +	if (!sp)
> +		return;
> +
> +	if (sp->type == SRB_SCSI_CMD ||
> +	    sp->type == SRB_NVME_CMD ||
> +	    sp->type == SRB_TM_CMD) {
> +		/* Some firmware version don't update the entry_type
> +		 * correctly.  It was observed entry_type contained
> +		 * MBCX_IOCB_TYPE instead of the expected STATUS_TYPE
> +		 * for sp->type SRB_SCSI_CMD, SRB_NVME_CMD or
> +		 * SRB_TM_CMD.
> +		 */

Could you drop the above comment about firmware, as it is speculation at
this point?


> +		ql_log(ql_log_warn, vha, 0x509d,
> +		       "Firmware didn't update entry_type correctly\n");
> +		qla2x00_status_entry(vha, rsp, pkt);
> +		return;

It'd be best to take a chip reset path, rather than assuming the
packet is good and having the appropriate handler called (hacky).
An approach similar to the one done at the beginning of
qla2x00_get_sp_from_handle() is what I had in mind.

> +	}
> +
> +	qla24xx_mbx_iocb_entry(vha, rsp->req, (struct mbx_24xx_entry *)pkt);
> +}
> +
>  /**
>   * qla24xx_process_response_queue() - Process response queue entries.
>   * @vha: SCSI driver HA context
> @@ -3499,8 +3526,7 @@ void qla24xx_process_response_queue(struct scsi_qla_host *vha,
>  			    (struct abort_entry_24xx *)pkt);
>  			break;
>  		case MBX_IOCB_TYPE:
> -			qla24xx_mbx_iocb_entry(vha, rsp->req,
> -			    (struct mbx_24xx_entry *)pkt);
> +			qla24xx_process_mbx_iocb_response(vha, rsp, pkt);

I'd have preferred a common approach across the different IOCB types
as an attempt to harden the code, but that will be a little more
involved work. This looks ok.

Regards,
-Arun
Daniel Wagner Sept. 8, 2020, 7:57 a.m. UTC | #2
Hi Arun,

On Mon, Sep 07, 2020 at 11:47:48PM -0700, Arun Easi wrote:
> Could you drop the above comment about firmware, as it is speculation at
> this point?

Sure, no problem.

> It'd be best to take a chip reset path, rather than assuming the
> packet is good and having the appropriate handler called (hacky).
> An approach similar to the one done at the beginning of
> qla2x00_get_sp_from_handle() is what I had in mind.

Ok, agreed a reset is probably the safest choice.

> I'd have preferred a common approach across the different IOCB types
> as an attempt to harden the code, but that will be a little more
> involved work. This looks ok.

Yes, I was pondering on it but I don't know enough to really come up
with something reasonable. Currently our customers report only this
hickup. So this is really only a partial workaround.

Thanks,
Daniel
diff mbox series

Patch

diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index b787643f5031..22aa4c0b901d 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -3392,6 +3392,33 @@  void qla24xx_nvme_ls4_iocb(struct scsi_qla_host *vha,
 	sp->done(sp, comp_status);
 }
 
+static void qla24xx_process_mbx_iocb_response(struct scsi_qla_host *vha,
+	struct rsp_que *rsp, struct sts_entry_24xx *pkt)
+{
+	srb_t *sp;
+
+	sp = qla2x00_get_sp_from_handle(vha, rsp->req, pkt);
+	if (!sp)
+		return;
+
+	if (sp->type == SRB_SCSI_CMD ||
+	    sp->type == SRB_NVME_CMD ||
+	    sp->type == SRB_TM_CMD) {
+		/* Some firmware version don't update the entry_type
+		 * correctly.  It was observed entry_type contained
+		 * MBCX_IOCB_TYPE instead of the expected STATUS_TYPE
+		 * for sp->type SRB_SCSI_CMD, SRB_NVME_CMD or
+		 * SRB_TM_CMD.
+		 */
+		ql_log(ql_log_warn, vha, 0x509d,
+		       "Firmware didn't update entry_type correctly\n");
+		qla2x00_status_entry(vha, rsp, pkt);
+		return;
+	}
+
+	qla24xx_mbx_iocb_entry(vha, rsp->req, (struct mbx_24xx_entry *)pkt);
+}
+
 /**
  * qla24xx_process_response_queue() - Process response queue entries.
  * @vha: SCSI driver HA context
@@ -3499,8 +3526,7 @@  void qla24xx_process_response_queue(struct scsi_qla_host *vha,
 			    (struct abort_entry_24xx *)pkt);
 			break;
 		case MBX_IOCB_TYPE:
-			qla24xx_mbx_iocb_entry(vha, rsp->req,
-			    (struct mbx_24xx_entry *)pkt);
+			qla24xx_process_mbx_iocb_response(vha, rsp, pkt);
 			break;
 		case VP_CTRL_IOCB_TYPE:
 			qla_ctrlvp_completed(vha, rsp->req,