From patchwork Sat Aug 1 04:17:17 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiang Biao X-Patchwork-Id: 6921841 Return-Path: X-Original-To: patchwork-linux-scsi@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id B915BC05AC for ; Sat, 1 Aug 2015 04:17:52 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id F00E5205EB for ; Sat, 1 Aug 2015 04:17:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D73C3205D4 for ; Sat, 1 Aug 2015 04:17:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750804AbbHAERm (ORCPT ); Sat, 1 Aug 2015 00:17:42 -0400 Received: from mx7.zte.com.cn ([202.103.147.169]:56630 "EHLO zte.com.cn" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750729AbbHAERl (ORCPT ); Sat, 1 Aug 2015 00:17:41 -0400 Received: from mse01.zte.com.cn (unknown [10.30.3.20]) by Websense Email Security Gateway with ESMTPS id 2E831A0D2F080; Sat, 1 Aug 2015 12:17:36 +0800 (CST) Received: from notes_smtp.zte.com.cn ([10.30.1.239]) by mse01.zte.com.cn with ESMTP id t714Hahq089405; Sat, 1 Aug 2015 12:17:36 +0800 (GMT-8) (envelope-from jiang.biao2@zte.com.cn) To: qla2xxx-upstream@qlogic.com, linux-scsi@vger.kernel.org Cc: tan.hu@zte.com.cn, cai.qu@zte.com.cn Subject: [Patch] qla2xxx: shoud ensure no *io done* after qla2xxx_eh_abort returning SUCCESS to avoid race between *io timeout* and *io done*. MIME-Version: 1.0 X-KeepSent: 4D71CDEF:67EA2227-48257E94:00172CC0; type=4; name=$KeepSent X-Mailer: Lotus Notes Release 8.5.3 September 15, 2011 Message-ID: From: jiang.biao2@zte.com.cn Date: Sat, 1 Aug 2015 12:17:17 +0800 X-MIMETrack: Serialize by Router on notes_smtp/zte_ltd(Release 8.5.3FP6|November 21, 2013) at 2015-08-01 12:17:32, Serialize complete at 2015-08-01 12:17:32 X-MAIL: mse01.zte.com.cn t714Hahq089405 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY, URIBL_BLACK autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP qla2xxx: shoud ensure no *io done* after qla2xxx_eh_abort returning SUCCESS to avoid race between *io timeout* and *io done*. LLDD driver should ensure that there is no *io done* after eh_abort (*qla2xxx_eh_abort*) returning SUCCESS to avoid the race between *io timeout* and *io done*. But qla2xxx driver can not guarantee that. If *io done* occurs when qla2xxx_eh_abort is in process, it may return SUCCESS without actually aborting the cmd. In that case, the race between *io timeout* and *io done* appears. Although there is REQ_ATOM_COMPLETE flag trying to avoid the race, there is also race possibility that may result in crash. The following is one of the race cases, CPU 1 CPU2 scsi_try_to_abort_cmd()-> qla2xxx_eh_abort() *io done*->qla2x00_sp_compl()-> qla2x00_sp_free_dma() return SUCCESS-> *retry*->scsi_queue_insert()-> blk_clear_rq_complete() scsi_done()-> blk_complete_request() In this race case, because the REQ_ATOM_COMPLETE flag has been clear in blk_clear_rq_complete, the *io done* will try to complete the IO request that has already been requeued. When the requeued IO completes again, another *io done* of the same IO request will start again. That means the same IO request will be finished twice, which will result in double releasing the resources of the request, then lead to crash. So when *io done* happens during qla2xxx_eh_abort, we should wait there till *io done* finishs, ensuring no *io done* after qla2xxx_eh_abort returning SUCCESS, which avoids race between *io timeout* and *io done*. Signed-off-by: Jiang Biao Signed-off-by: Tan Hu Reviewed-by: Cai Qu diff -uprN scsi/qla_os.c scsi_new/qla_os.c --- scsi/qla_os.c 2015-07-31 19:39:06.000000000 +0800 +++ scsi_new/qla_os.c 2015-07-31 19:46:20.000000000 +0800 @@ -939,8 +939,15 @@ qla2xxx_eh_abort(struct scsi_cmnd *cmd) int rval, wait = 0; struct qla_hw_data *ha = vha->hw; - if (!CMD_SP(cmd)) + if (!CMD_SP(cmd)) { + /* + * Wait until io done finishs to avoid race between + * io timeout and io done. + */ + spin_lock_irqsave(&ha->hardware_lock, flags); + spin_unlock_irqrestore(&ha->hardware_lock, flags); return SUCCESS; + } ret = fc_block_scsi_eh(cmd); if (ret != 0) @@ -996,8 +1003,15 @@ qla2xxx_eh_abort(struct scsi_cmnd *cmd) spin_unlock_irqrestore(&ha->hardware_lock, flags); /* Did the command return during mailbox execution? */ - if (ret == FAILED && !CMD_SP(cmd)) + if (ret == FAILED && !CMD_SP(cmd)) { + /* + * Wait until io done finishs to avoid race between + * io timeout and io done. + */ + spin_lock_irqsave(&ha->hardware_lock, flags); + spin_unlock_irqrestore(&ha->hardware_lock, flags); ret = SUCCESS; + } /* Wait for the command to be returned. */ if (wait) {