Message ID | 20200831161854.70879-5-dwagner@suse.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | qla2xxx: A couple crash fixes | expand |
On Mon, 31 Aug 2020, 9:18am, Daniel Wagner wrote: > > It was observed on an ISP8324 16Gb HBA with fw=8.08.203 (d0d5) that > pkt->entry_type was MBX_IOCB_TYPE/0x39 with an sp->type SRB_SCSI_CMD > which is invalid and should not be possible. > > A careful code review of the crash dump didn't reveal any short > comings. Reading the entry_type from the crash dump shows the expected > value of STATUS_TYPE/0x03 but the call trace shows that > qla24xx_mbx_iocb_entry() is used. > > One possible explanation is when pkt->entry_type is read it doesn't > contain the correct information. That means the driver observes an data > race by the firmware. > > Signed-off-by: Daniel Wagner <dwagner@suse.de> > --- > drivers/scsi/qla2xxx/qla_isr.c | 30 ++++++++++++++++++++++++++++-- > 1 file changed, 28 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c > index b787643f5031..22aa4c0b901d 100644 > --- a/drivers/scsi/qla2xxx/qla_isr.c > +++ b/drivers/scsi/qla2xxx/qla_isr.c > @@ -3392,6 +3392,33 @@ void qla24xx_nvme_ls4_iocb(struct scsi_qla_host *vha, > sp->done(sp, comp_status); > } > > +static void qla24xx_process_mbx_iocb_response(struct scsi_qla_host *vha, > + struct rsp_que *rsp, struct sts_entry_24xx *pkt) > +{ > + srb_t *sp; > + > + sp = qla2x00_get_sp_from_handle(vha, rsp->req, pkt); > + if (!sp) > + return; > + > + if (sp->type == SRB_SCSI_CMD || > + sp->type == SRB_NVME_CMD || > + sp->type == SRB_TM_CMD) { > + /* Some firmware version don't update the entry_type > + * correctly. It was observed entry_type contained > + * MBCX_IOCB_TYPE instead of the expected STATUS_TYPE > + * for sp->type SRB_SCSI_CMD, SRB_NVME_CMD or > + * SRB_TM_CMD. > + */ Could you drop the above comment about firmware, as it is speculation at this point? > + ql_log(ql_log_warn, vha, 0x509d, > + "Firmware didn't update entry_type correctly\n"); > + qla2x00_status_entry(vha, rsp, pkt); > + return; It'd be best to take a chip reset path, rather than assuming the packet is good and having the appropriate handler called (hacky). An approach similar to the one done at the beginning of qla2x00_get_sp_from_handle() is what I had in mind. > + } > + > + qla24xx_mbx_iocb_entry(vha, rsp->req, (struct mbx_24xx_entry *)pkt); > +} > + > /** > * qla24xx_process_response_queue() - Process response queue entries. > * @vha: SCSI driver HA context > @@ -3499,8 +3526,7 @@ void qla24xx_process_response_queue(struct scsi_qla_host *vha, > (struct abort_entry_24xx *)pkt); > break; > case MBX_IOCB_TYPE: > - qla24xx_mbx_iocb_entry(vha, rsp->req, > - (struct mbx_24xx_entry *)pkt); > + qla24xx_process_mbx_iocb_response(vha, rsp, pkt); I'd have preferred a common approach across the different IOCB types as an attempt to harden the code, but that will be a little more involved work. This looks ok. Regards, -Arun
Hi Arun, On Mon, Sep 07, 2020 at 11:47:48PM -0700, Arun Easi wrote: > Could you drop the above comment about firmware, as it is speculation at > this point? Sure, no problem. > It'd be best to take a chip reset path, rather than assuming the > packet is good and having the appropriate handler called (hacky). > An approach similar to the one done at the beginning of > qla2x00_get_sp_from_handle() is what I had in mind. Ok, agreed a reset is probably the safest choice. > I'd have preferred a common approach across the different IOCB types > as an attempt to harden the code, but that will be a little more > involved work. This looks ok. Yes, I was pondering on it but I don't know enough to really come up with something reasonable. Currently our customers report only this hickup. So this is really only a partial workaround. Thanks, Daniel
diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index b787643f5031..22aa4c0b901d 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -3392,6 +3392,33 @@ void qla24xx_nvme_ls4_iocb(struct scsi_qla_host *vha, sp->done(sp, comp_status); } +static void qla24xx_process_mbx_iocb_response(struct scsi_qla_host *vha, + struct rsp_que *rsp, struct sts_entry_24xx *pkt) +{ + srb_t *sp; + + sp = qla2x00_get_sp_from_handle(vha, rsp->req, pkt); + if (!sp) + return; + + if (sp->type == SRB_SCSI_CMD || + sp->type == SRB_NVME_CMD || + sp->type == SRB_TM_CMD) { + /* Some firmware version don't update the entry_type + * correctly. It was observed entry_type contained + * MBCX_IOCB_TYPE instead of the expected STATUS_TYPE + * for sp->type SRB_SCSI_CMD, SRB_NVME_CMD or + * SRB_TM_CMD. + */ + ql_log(ql_log_warn, vha, 0x509d, + "Firmware didn't update entry_type correctly\n"); + qla2x00_status_entry(vha, rsp, pkt); + return; + } + + qla24xx_mbx_iocb_entry(vha, rsp->req, (struct mbx_24xx_entry *)pkt); +} + /** * qla24xx_process_response_queue() - Process response queue entries. * @vha: SCSI driver HA context @@ -3499,8 +3526,7 @@ void qla24xx_process_response_queue(struct scsi_qla_host *vha, (struct abort_entry_24xx *)pkt); break; case MBX_IOCB_TYPE: - qla24xx_mbx_iocb_entry(vha, rsp->req, - (struct mbx_24xx_entry *)pkt); + qla24xx_process_mbx_iocb_response(vha, rsp, pkt); break; case VP_CTRL_IOCB_TYPE: qla_ctrlvp_completed(vha, rsp->req,
It was observed on an ISP8324 16Gb HBA with fw=8.08.203 (d0d5) that pkt->entry_type was MBX_IOCB_TYPE/0x39 with an sp->type SRB_SCSI_CMD which is invalid and should not be possible. A careful code review of the crash dump didn't reveal any short comings. Reading the entry_type from the crash dump shows the expected value of STATUS_TYPE/0x03 but the call trace shows that qla24xx_mbx_iocb_entry() is used. One possible explanation is when pkt->entry_type is read it doesn't contain the correct information. That means the driver observes an data race by the firmware. Signed-off-by: Daniel Wagner <dwagner@suse.de> --- drivers/scsi/qla2xxx/qla_isr.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-)