From patchwork Wed Jul 15 12:47:29 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kevin Groeneveld X-Patchwork-Id: 6798261 Return-Path: X-Original-To: patchwork-linux-scsi@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 9D4D39F380 for ; Wed, 15 Jul 2015 12:54:17 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id B006A2040F for ; Wed, 15 Jul 2015 12:54:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8B07B201F5 for ; Wed, 15 Jul 2015 12:54:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752042AbbGOMyO (ORCPT ); Wed, 15 Jul 2015 08:54:14 -0400 Received: from mail.lenbrook.com ([206.191.95.214]:7925 "EHLO mail.lenbrook.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751967AbbGOMyO (ORCPT ); Wed, 15 Jul 2015 08:54:14 -0400 X-Greylist: delayed 305 seconds by postgrey-1.27 at vger.kernel.org; Wed, 15 Jul 2015 08:54:14 EDT Received: from ubuntu.lenbrook.com (192.168.0.28) by MAIL1.pickering.lenbrook.com (192.168.0.250) with Microsoft SMTP Server id 14.3.235.1; Wed, 15 Jul 2015 08:49:07 -0400 From: Kevin Groeneveld To: CC: , , , , , "Kevin Groeneveld" Subject: [PATCH] scsi: fix hang in scsi error handling Date: Wed, 15 Jul 2015 08:47:29 -0400 Message-ID: <1436964449-31447-1-git-send-email-kgroeneveld@lenbrook.com> X-Mailer: git-send-email 1.7.4.1 MIME-Version: 1.0 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP With the following setup/steps I can consistently trigger the scsi host to hang requiring a reboot: 1. iMX6Q processor with built in AHCI compatible SATA host 2. SATA port multiplier in CBS mode connected to iMX6Q 3. HDD connected to port multiplier 4. CDROM connected to port multiplier 5. trigger continuous I/O to HDD 6. repeatedly execute CDROM_DRIVE_STATUS ioctl on CDROM with no disc in drive I don't think this issue is iMX6 specific but that is the only platform I have duplicated the hang on. To trigger the issue at least two CPU cores must be enabled and the HDD access and CDROM ioctls must be happening concurrently. If I only enable one CPU core the hang does not occur. The following C program can be used to trigger the CDROM ioctl: #include #include #include int main(int argc, char* argv[]) { int fd; fd = open("/dev/cdrom", O_RDONLY | O_NONBLOCK); if(fd < 0) { perror("cannot open /dev/cdrom"); return fd; } for(;;) { ioctl(fd, CDROM_DRIVE_STATUS, 0); usleep(100 * 1000); } } When the hang occurs shost->host_busy == 2 and shost->host_failed == 1 in the scsi_eh_wakeup function. However this function only wakes the error handler if host_busy == host_failed. The patch changes the condition to test if host_busy >= host_failed and updates the corresponding condition in scsi_error_handler. Without the patch I can trigger the hang within seconds. With the patch I have not duplicated the hang after hours of testing. Signed-off-by: Kevin Groeneveld --- drivers/scsi/scsi_error.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 106884a..853964b 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -61,7 +61,7 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *, /* called with shost->host_lock held */ void scsi_eh_wakeup(struct Scsi_Host *shost) { - if (atomic_read(&shost->host_busy) == shost->host_failed) { + if (atomic_read(&shost->host_busy) >= shost->host_failed) { trace_scsi_eh_wakeup(shost); wake_up_process(shost->ehandler); SCSI_LOG_ERROR_RECOVERY(5, shost_printk(KERN_INFO, shost, @@ -2173,7 +2173,7 @@ int scsi_error_handler(void *data) while (!kthread_should_stop()) { set_current_state(TASK_INTERRUPTIBLE); if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) || - shost->host_failed != atomic_read(&shost->host_busy)) { + shost->host_failed > atomic_read(&shost->host_busy)) { SCSI_LOG_ERROR_RECOVERY(1, shost_printk(KERN_INFO, shost, "scsi_eh_%d: sleeping\n",