diff mbox

qla2xxx: shoud ensure no *io done* after qla2xxx_eh_abort returning SUCCESS to avoid race between *io timeout* and *io done*.

Message ID OF4D71CDEF.67EA2227-ON48257E94.00172CC0-48257E94.001795EA@zte.com.cn (mailing list archive)
State New, archived
Headers show

Commit Message

Jiang Biao Aug. 1, 2015, 4:17 a.m. UTC
qla2xxx: shoud ensure no *io done* after qla2xxx_eh_abort returning
SUCCESS to avoid race between *io timeout* and *io done*.

LLDD driver should ensure that there is no *io done* after eh_abort
(*qla2xxx_eh_abort*) returning SUCCESS to avoid the race between
*io timeout* and *io done*. But qla2xxx driver can not guarantee that.

If *io done* occurs when qla2xxx_eh_abort is in process, it may
return SUCCESS without actually aborting the cmd. In that case, the
race between *io timeout* and *io done* appears. Although there is
REQ_ATOM_COMPLETE flag trying to avoid the race, there is also race
possibility that may result in crash.

The following is one of the race cases,
         CPU 1                             CPU2
scsi_try_to_abort_cmd()->
   qla2xxx_eh_abort()
                                   *io done*->qla2x00_sp_compl()->
                                      qla2x00_sp_free_dma()
      return SUCCESS->
   *retry*->scsi_queue_insert()->
      blk_clear_rq_complete()
                                   scsi_done()->
                                      blk_complete_request()

In this race case, because the REQ_ATOM_COMPLETE flag has been
clear in blk_clear_rq_complete, the *io done* will try to complete
the IO request that has already been requeued. When the requeued IO
completes again, another *io done* of the same IO request will start
again. That means the same IO request will be finished twice, which
will result in double releasing the resources of the request, then
lead to crash.

So when *io done* happens during qla2xxx_eh_abort, we should wait
there till *io done* finishs, ensuring no *io done* after
qla2xxx_eh_abort returning SUCCESS, which avoids race between 
*io timeout* and *io done*.

Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>

Signed-off-by: Tan Hu <tan.hu@zte.com.cn>

Reviewed-by: Cai Qu <cai.qu@zte.com.cn>
diff mbox

Patch

diff -uprN scsi/qla_os.c scsi_new/qla_os.c
--- scsi/qla_os.c       2015-07-31 19:39:06.000000000 +0800
+++ scsi_new/qla_os.c   2015-07-31 19:46:20.000000000 +0800
@@ -939,8 +939,15 @@  qla2xxx_eh_abort(struct scsi_cmnd *cmd)
        int rval, wait = 0;
        struct qla_hw_data *ha = vha->hw;

-       if (!CMD_SP(cmd))
+       if (!CMD_SP(cmd)) {
+               /*
+                * Wait until io done finishs to avoid race between 
+                * io timeout and io done.
+                */
+               spin_lock_irqsave(&ha->hardware_lock, flags);
+               spin_unlock_irqrestore(&ha->hardware_lock, flags);
                return SUCCESS;
+       }

        ret = fc_block_scsi_eh(cmd);
        if (ret != 0)
@@ -996,8 +1003,15 @@  qla2xxx_eh_abort(struct scsi_cmnd *cmd)
        spin_unlock_irqrestore(&ha->hardware_lock, flags);

        /* Did the command return during mailbox execution? */
-       if (ret == FAILED && !CMD_SP(cmd))
+       if (ret == FAILED && !CMD_SP(cmd)) {
+               /*
+                * Wait until io done finishs to avoid race between 
+                * io timeout and io done.
+                */
+               spin_lock_irqsave(&ha->hardware_lock, flags);
+               spin_unlock_irqrestore(&ha->hardware_lock, flags);
                ret = SUCCESS;
+       }

        /* Wait for the command to be returned. */
        if (wait) {