From patchwork Thu Aug 27 18:16:37 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 7086701 Return-Path: X-Original-To: patchwork-linux-scsi@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 2217BBEEC1 for ; Thu, 27 Aug 2015 18:17:29 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 1ABAF209DB for ; Thu, 27 Aug 2015 18:17:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3414820817 for ; Thu, 27 Aug 2015 18:17:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753892AbbH0SRN (ORCPT ); Thu, 27 Aug 2015 14:17:13 -0400 Received: from mail-wi0-f171.google.com ([209.85.212.171]:37033 "EHLO mail-wi0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750945AbbH0SRM (ORCPT ); Thu, 27 Aug 2015 14:17:12 -0400 Received: by widdq5 with SMTP id dq5so210484wid.0; Thu, 27 Aug 2015 11:17:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=from:to:cc:subject:date:message-id; bh=MWiAA0ni/w8VhiS70h9+U9/aJxHd0g4KzTk8Hinr5JM=; b=CjEqV57sUPeapyfOskgBYYK7MUBGF67OKL8AyQB5e8OzHFV3H0sS+KhWk2o30sD1/v r+vZCK+LFQDPHaWysdTlD4e9+qvmVUuP4jY87RETC+GQrPpYOcPe1ECE1Dh8m3zUkg+5 BIP8cRDcU8dVPqndsDslTSQq8KM0pXMehBcodw89tE+TTAL46hTc0wcHnLfaXHCyTayK vmyegAnYSE70D2eMjDqwnA6OS4oK00FWu9wts00BM+DSsgaTNQz4pMIbVHceQ8+9U+2G c/C7no+YlnCi8ExqhdbcdIrXM0B/Jy2mdpi5GiSQFAzlxMoC4ol91JDdy27hdlggLNhg cqfg== X-Received: by 10.180.88.98 with SMTP id bf2mr11845204wib.90.1440699431499; Thu, 27 Aug 2015 11:17:11 -0700 (PDT) Received: from tiehlicka.suse.cz (bband-dyn181.95-103-48.t-com.sk. [95.103.48.181]) by smtp.gmail.com with ESMTPSA id wo2sm4244217wjb.2.2015.08.27.11.17.10 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 27 Aug 2015 11:17:10 -0700 (PDT) From: mhocko@kernel.org To: "James E.J. Bottomley" Cc: Dan Williams , linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, Michal Hocko Subject: [PATCH] scsi: fix scsi_error_handler vs. scsi_host_dev_release race Date: Thu, 27 Aug 2015 20:16:37 +0200 Message-Id: <1440699420-30499-1-git-send-email-mhocko@kernel.org> X-Mailer: git-send-email 2.5.0 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko b9d5c6b7ef57 ("[SCSI] cleanup setting task state in scsi_error_handler()") has introduced a race between scsi_error_handler and scsi_host_dev_release resulting in the hang when the device goes away because scsi_error_handler might miss a wake up: CPU0 CPU1 scsi_error_handler scsi_host_dev_release kthread_stop() kthread_should_stop() test_bit(KTHREAD_SHOULD_STOP) set_bit(KTHREAD_SHOULD_STOP) wake_up_process() wait_for_completion() set_current_state(TASK_INTERRUPTIBLE) schedule() The most straightforward solution seems to be to invert the ordering of the set_current_state and kthread_should_stop. The issue has been noticed during reboot test on a 3.0 based kernel but the current code seems to be affected in the same way. Cc: stable # 3.6+ Reported-and-Debugged-by: Mike Mayer Signed-off-by: Michal Hocko Acked-by: Dan Williams Reviewed-by: Hannes Reinecke --- drivers/scsi/scsi_error.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 6457a8a0db9c..2c0a817d5dbe 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -2169,8 +2169,11 @@ int scsi_error_handler(void *data) * We never actually get interrupted because kthread_run * disables signal delivery for the created thread. */ - while (!kthread_should_stop()) { + while (true) { set_current_state(TASK_INTERRUPTIBLE); + if (kthread_should_stop()) + break; + if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) || shost->host_failed != atomic_read(&shost->host_busy)) { SCSI_LOG_ERROR_RECOVERY(1,