Message ID | 20200827095829.63871-5-dwagner@suse.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | qla2xxx: A couple crash fixes | expand |
On Thu, 2020-08-27 at 11:58 +0200, Daniel Wagner wrote: > It was observed on an ISP8324 16Gb HBA with fw=8.08.203 (d0d5) that > pkt->entry_type was MBX_IOCB_TYPE/0x39 with an sp->type SRB_SCSI_CMD > which is invalid and should not be possible. > > A careful code review of the crash dump didn't reveal any short > comings. Reading the entry_type from the crash dump shows the > expected > value of STATUS_TYPE/0x03 but the call trace shows that > qla24xx_mbx_iocb_entry() is used. > > One possible explanation is when pkt->entry_type is read it doesn't > contain the correct information. That means the driver observes an > data > race by the firmware. > > Signed-off-by: Daniel Wagner <dwagner@suse.de> > --- > drivers/scsi/qla2xxx/qla_isr.c | 28 ++++++++++++++++++++++++++-- > 1 file changed, 26 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_isr.c > b/drivers/scsi/qla2xxx/qla_isr.c > index b787643f5031..0c324e88b189 100644 > --- a/drivers/scsi/qla2xxx/qla_isr.c > +++ b/drivers/scsi/qla2xxx/qla_isr.c > @@ -3392,6 +3392,31 @@ void qla24xx_nvme_ls4_iocb(struct > scsi_qla_host *vha, > sp->done(sp, comp_status); > } > > +static void qla24xx_process_mbx_iocb_response(struct scsi_qla_host > *vha, > + struct rsp_que *rsp, struct sts_entry_24xx *pkt) > +{ > + srb_t *sp; > + > + sp = qla2x00_get_sp_from_handle(vha, rsp->req, pkt); > + if (!sp) > + return; > + > + if (sp->type == SRB_SCSI_CMD || > + sp->type == SRB_NVME_CMD || > + sp->type == SRB_TM_CMD) { > + /* Some firmware version don't update the entry_type > + * correctly. It was observed entry_type contained > + * MBCX_IOCB_TYPE instead of the expected STATUS_TYPE > + * for sp->type SRB_SCSI_CMD, SRB_NVME_CMD or > + * SRB_TM_CMD. > + */ > + qla2x00_status_entry(vha, rsp, pkt); > + return; > + } > + > + qla24xx_mbx_iocb_entry(vha, rsp->req, (struct mbx_24xx_entry > *)pkt); > +} > + > /** > * qla24xx_process_response_queue() - Process response queue > entries. > * @vha: SCSI driver HA context > @@ -3499,8 +3524,7 @@ void qla24xx_process_response_queue(struct > scsi_qla_host *vha, > (struct abort_entry_24xx *)pkt); > break; > case MBX_IOCB_TYPE: > - qla24xx_mbx_iocb_entry(vha, rsp->req, > - (struct mbx_24xx_entry *)pkt); > + qla24xx_process_mbx_iocb_response(vha, rsp, > pkt); > break; > case VP_CTRL_IOCB_TYPE: > qla_ctrlvp_completed(vha, rsp->req, Should we perhaps log an error message when we detect a mismatch between sp->type and entry_type? Regards, Martin
On Thu, Aug 27, 2020 at 12:17:13PM +0200, Martin Wilck wrote: > Should we perhaps log an error message when we detect a mismatch > between sp->type and entry_type? Sure can do, but does it really help? Not much we can do in the driver. I hope the firmware gets fixed eventually. I am not against it, just not sure if the log entry really is helping except saying 'you are using a firmware with a known issue'.
On Thu, 2020-08-27 at 13:46 +0200, Daniel Wagner wrote: > On Thu, Aug 27, 2020 at 12:17:13PM +0200, Martin Wilck wrote: > > Should we perhaps log an error message when we detect a mismatch > > between sp->type and entry_type? > > Sure can do, but does it really help? Not much we can do in the > driver. I hope the firmware gets fixed eventually. I am not against > it, > just not sure if the log entry really is helping except saying 'you > are > using a firmware with a known issue'. > ... which might provide insightful, to users as well as perhaps developers (by observing under which conditions this problem occurs). I'd hope so, at least. But you know this issue much better than me. Regards, Martin
diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index b787643f5031..0c324e88b189 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -3392,6 +3392,31 @@ void qla24xx_nvme_ls4_iocb(struct scsi_qla_host *vha, sp->done(sp, comp_status); } +static void qla24xx_process_mbx_iocb_response(struct scsi_qla_host *vha, + struct rsp_que *rsp, struct sts_entry_24xx *pkt) +{ + srb_t *sp; + + sp = qla2x00_get_sp_from_handle(vha, rsp->req, pkt); + if (!sp) + return; + + if (sp->type == SRB_SCSI_CMD || + sp->type == SRB_NVME_CMD || + sp->type == SRB_TM_CMD) { + /* Some firmware version don't update the entry_type + * correctly. It was observed entry_type contained + * MBCX_IOCB_TYPE instead of the expected STATUS_TYPE + * for sp->type SRB_SCSI_CMD, SRB_NVME_CMD or + * SRB_TM_CMD. + */ + qla2x00_status_entry(vha, rsp, pkt); + return; + } + + qla24xx_mbx_iocb_entry(vha, rsp->req, (struct mbx_24xx_entry *)pkt); +} + /** * qla24xx_process_response_queue() - Process response queue entries. * @vha: SCSI driver HA context @@ -3499,8 +3524,7 @@ void qla24xx_process_response_queue(struct scsi_qla_host *vha, (struct abort_entry_24xx *)pkt); break; case MBX_IOCB_TYPE: - qla24xx_mbx_iocb_entry(vha, rsp->req, - (struct mbx_24xx_entry *)pkt); + qla24xx_process_mbx_iocb_response(vha, rsp, pkt); break; case VP_CTRL_IOCB_TYPE: qla_ctrlvp_completed(vha, rsp->req,
It was observed on an ISP8324 16Gb HBA with fw=8.08.203 (d0d5) that pkt->entry_type was MBX_IOCB_TYPE/0x39 with an sp->type SRB_SCSI_CMD which is invalid and should not be possible. A careful code review of the crash dump didn't reveal any short comings. Reading the entry_type from the crash dump shows the expected value of STATUS_TYPE/0x03 but the call trace shows that qla24xx_mbx_iocb_entry() is used. One possible explanation is when pkt->entry_type is read it doesn't contain the correct information. That means the driver observes an data race by the firmware. Signed-off-by: Daniel Wagner <dwagner@suse.de> --- drivers/scsi/qla2xxx/qla_isr.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-)