From patchwork Tue Sep  5 12:54:24 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
X-Patchwork-Id: 9938919
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	75C25602B6 for <patchwork-linux-scsi@patchwork.kernel.org>;
	Tue,  5 Sep 2017 12:55:10 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 687722891C
	for <patchwork-linux-scsi@patchwork.kernel.org>;
	Tue,  5 Sep 2017 12:55:10 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 5D45328972; Tue,  5 Sep 2017 12:55:10 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=unavailable version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0BE542891C
	for <patchwork-linux-scsi@patchwork.kernel.org>;
	Tue,  5 Sep 2017 12:55:10 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751482AbdIEMy4 (ORCPT
	<rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
	Tue, 5 Sep 2017 08:54:56 -0400
Received: from mailhub.sw.ru ([195.214.232.25]:29759 "EHLO relay.sw.ru"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751238AbdIEMyz (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
	Tue, 5 Sep 2017 08:54:55 -0400
Received: from snorch.sw.ru (msk-vpn.virtuozzo.com [195.214.232.6])
	by relay.sw.ru (8.13.4/8.13.4) with ESMTP id v85CseSk024738;
	Tue, 5 Sep 2017 15:54:41 +0300 (MSK)
From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
To: "James E . J . Bottomley" <jejb@linux.vnet.ibm.com>,
	"Martin K . Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>, linux-scsi@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Konstantin Khorenko <khorenko@virtuozzo.com>,
	Pavel Tikhomirov <ptikhomirov@virtuozzo.com>, <devel@openvz.org>
Subject: [PATCH] scsi/eh: fix hang adding ehandler wakeups after
	decrementing host_busy
Date: Tue,  5 Sep 2017 15:54:24 +0300
Message-Id: <20170905125424.15412-1-ptikhomirov@virtuozzo.com>
X-Mailer: git-send-email 2.13.5
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

We have a problem on several our nodes with scsi EH. Imagine such an
order of execution of two threads:

CPU1 scsi_eh_scmd_add		CPU2 scsi_host_queue_ready
/* shost->host_busy == 1 initialy */

				if (shost->shost_state == SHOST_RECOVERY)
					/* does not get here */
					return 0;

lock(shost->host_lock);
shost->shost_state = SHOST_RECOVERY;

				busy = shost->host_busy++;
				/* host->can_queue == 1 initialy, busy == 1
				 * - go to starved label */
				lock(shost->host_lock) /* wait */

shost->host_failed++;
/* shost->host_busy == 2, shost->host_failed == 1 */
call scsi_eh_wakeup(shost) {
	if (host_busy == host_failed) {
		/* does not get here */
		wake_up_process(shost->ehandler)
	}
}
unlock(shost->host_lock)

				/* acquire lock */
				shost->host_busy--;

Finaly we do not wakeup scsi_error_handler and all other commands
coming will hang as we are in never ending recovery state as there
is no one left to wakeup handler.

So scsi disc in these host becomes unresponsive and all bio on node
hangs. (We trigger these problem when scsi cmnds to DVD drive timeout.)

Main idea of the fix is to try to do wake up every time we decrement
host_busy or increment host_failed(the latter is already OK).

Now the very *last* one of busy threads getting host_lock after
decrementing host_busy will see all write operations on host's
shost_state, host_busy and host_failed completed thanks to implied
memory barriers on spin_lock/unlock, so at the time of busy==failed
we will trigger wakeup in at least one thread. (Thats why putting
recovery and failed checks under lock)

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
---
 drivers/scsi/scsi_lib.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f6097b89d5d3..6c99221d60aa 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -320,12 +320,11 @@ void scsi_device_unbusy(struct scsi_device *sdev)
 	if (starget->can_queue > 0)
 		atomic_dec(&starget->target_busy);
 
+	spin_lock_irqsave(shost->host_lock, flags);
 	if (unlikely(scsi_host_in_recovery(shost) &&
-		     (shost->host_failed || shost->host_eh_scheduled))) {
-		spin_lock_irqsave(shost->host_lock, flags);
+		     (shost->host_failed || shost->host_eh_scheduled)))
 		scsi_eh_wakeup(shost);
-		spin_unlock_irqrestore(shost->host_lock, flags);
-	}
+	spin_unlock_irqrestore(shost->host_lock, flags);
 
 	atomic_dec(&sdev->device_busy);
 }
@@ -1503,6 +1502,13 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 	spin_unlock_irq(shost->host_lock);
 out_dec:
 	atomic_dec(&shost->host_busy);
+
+	spin_lock_irq(shost->host_lock);
+	if (unlikely(scsi_host_in_recovery(shost) &&
+		     (shost->host_failed || shost->host_eh_scheduled)))
+		scsi_eh_wakeup(shost);
+	spin_unlock_irq(shost->host_lock);
+
 	return 0;
 }
 
@@ -1964,6 +1970,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 out_dec_host_busy:
 	atomic_dec(&shost->host_busy);
+
+	spin_lock_irq(shost->host_lock);
+	if (unlikely(scsi_host_in_recovery(shost) &&
+		     (shost->host_failed || shost->host_eh_scheduled)))
+		scsi_eh_wakeup(shost);
+	spin_unlock_irq(shost->host_lock);
+
 out_dec_target_busy:
 	if (scsi_target(sdev)->can_queue > 0)
 		atomic_dec(&scsi_target(sdev)->target_busy);