From patchwork Sun Jan 15 09:19:25 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ingo Molnar X-Patchwork-Id: 9517327 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9CCE0601D8 for ; Sun, 15 Jan 2017 09:19:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8826528418 for ; Sun, 15 Jan 2017 09:19:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7B51E28464; Sun, 15 Jan 2017 09:19:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CF7F228418 for ; Sun, 15 Jan 2017 09:19:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750949AbdAOJTb (ORCPT ); Sun, 15 Jan 2017 04:19:31 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:36038 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750797AbdAOJTa (ORCPT ); Sun, 15 Jan 2017 04:19:30 -0500 Received: by mail-wm0-f66.google.com with SMTP id r126so22923468wmr.3; Sun, 15 Jan 2017 01:19:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=oFCdD5qzNATwtDBHlRhu/hdA1x/55NlOAvObpe9NQhM=; b=J7OP1oOHnyaRiP7GUBLzP5iw4QPnBpWpmkILN9hYmLgjns03RVOUM37+xaMw3NOge/ FCCowimIRvC6GLXANWylIlquzJzk9lmwXY29tIHTr4+nObMp8yWLhEaFlpJJjLJQseQC uOqCXJucB6+XKxE27KPlWNKwa3oAQPOrT5EQifIcg7BGD9E9igR08i7CES9YpNyq8W67 2nkmi8sF7oAqZ/S2kuumYWCBw9tKo6aWkmD2sqsO2+yVHgQu8SuaH4PGsHVeEsv5ck4s UYOJU2F+hztzS2/n5RDmGZnTDfkeLvyoUyXcZ74q9aOMl5dk2uC6HT7pymv1ix2Eac1D GPgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=oFCdD5qzNATwtDBHlRhu/hdA1x/55NlOAvObpe9NQhM=; b=tDmDPOCwhhst+xu1bcxXqawrHiP8//bVKRjZzY5EobdAlveA4+r8g3UQR7RMnvBIPz KzvUr4PBxHQ2VpQy0JkbEwvmiysnukFMdbjoM4OiW+C01+HW7/Tk03XvbECrmiYQZyKN /X9Ck0bvHkSCaTa2JzJbSR9YSXcwOST1UYtHScS6WvVUfJsP7XkwTuhir28fVZEqFB9E 9zNQxCqWIUnUn0o8X/JzwARBh47RM5km3EeWmQ6EMS+09bYypMVYB9YvouMMk6ni2p9G Lh9eRt/nC6clJyrecXiODgGHe1oiLdAg46k3DuXcXCrXb4uwHfNvM9AK1XjhwgZOkaae q2LQ== X-Gm-Message-State: AIkVDXKnlr7FHDRRh/a/SABecWaQTPURqTgJmonUbFNYSiK/VewDRLVo5MoH2Me+LgMmFw== X-Received: by 10.28.86.131 with SMTP id k125mr8681578wmb.89.1484471968827; Sun, 15 Jan 2017 01:19:28 -0800 (PST) Received: from gmail.com (2E8B0CD5.catv.pool.telekom.hu. [46.139.12.213]) by smtp.gmail.com with ESMTPSA id f76sm19339402wmd.15.2017.01.15.01.19.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 15 Jan 2017 01:19:27 -0800 (PST) Date: Sun, 15 Jan 2017 10:19:25 +0100 From: Ingo Molnar To: James Bottomley Cc: Andrew Morton , Linus Torvalds , Sathya Prakash , Chaitra P B , Suganath Prabu Subramani , Sreekanth Reddy , Hannes Reinecke , linux-scsi , linux-kernel , Thomas Gleixner Subject: [PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination" Message-ID: <20170115091925.GA26656@gmail.com> References: <1484319727.2527.8.camel@HansenPartnership.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1484319727.2527.8.camel@HansenPartnership.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP So there's a new mpt3sas SCSI driver boot regression, introduced in this merge window, which made one of my servers unbootable. The kernel, starting at upstream commit a829a8445f09, would hang thusly: [ 6.230363] Linux agpgart interface v0.103 [ 6.245029] brd: module loaded [ 6.253233] loop: module loaded [ 6.256695] mpt3sas version 14.101.00.00 loaded [ 6.261890] mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (65950628 kB) [ 6.326222] mpt2sas_cm0: MSI-X vectors supported: 1, no of cores: 32, max_msix_vectors: -1 [ 6.334953] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 24 [ 6.340237] mpt2sas_cm0: iomem(0x00000000dff3c000), mapped(0xffffc90007414000), size(16384) [ 6.349002] mpt2sas_cm0: ioport(0x000000000000e000), size(256) [ 6.410830] mpt2sas_cm0: sending message unit reset !! [ 6.417739] mpt2sas_cm0: message unit reset: SUCCESS [ 6.463486] mpt2sas_cm0: Allocated physical memory: size(8199 kB) [ 6.469820] mpt2sas_cm0: Current Controller Queue Depth(3640),Max Controller Queue Depth(3712) [ 6.478840] mpt2sas_cm0: Scatter Gather Elements per IO(128) [ 6.530653] mpt2sas_cm0: LSISAS2008: FWVersion(12.00.00.00), ChipRevision(0x03), BiosVersion(07.23.01.00) [ 6.540621] mpt2sas_cm0: Protocol=( [ 6.540622] Initiator [ 6.544346] ,Target [ 6.546844] ), [ 6.549168] Capabilities=( [ 6.551165] TLR [ 6.554098] ,EEDP [ 6.556095] ,Snapshot Buffer [ 6.558249] ,Diag Trace Buffer [ 6.561359] ,Task Set Full [ 6.564666] ,NCQ [ 6.567594] ) [ 6.571517] scsi host0: Fusion MPT SAS Host [ 6.576539] mpt2sas_cm0: sending port enable !! [ 6.576699] ahci 0000:00:11.0: version 3.0 [ 6.577285] ahci 0000:00:11.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode [ 6.577290] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part ccc [ 6.579218] scsi host1: ahci [ 6.579685] scsi host2: ahci [ 6.5800[ 39.972084] sd 0:0:0:0: attempting task abort! scmd(ffff881014cb9500) [ 39.978809] sd 0:0:0:0: [sda] tag#0 CDB: ATA command pass through(12)/Blank a1 08 2e 00 01 00 00 00 00 ec 00 00 [ 39.989346] scsi target0:0:0: handle(0x0009), sas_address(0x4433221100000000), phy(0) [ 39.997584] scsi target0:0:0: enclosure_logical_id(0x5003048003e10c00), slot(31) [ 40.005425] sd 0:0:0:0: task abort: SUCCESS scmd(ffff881014cb9500) udevd[472]: timeout 'ata_id --export /dev/sda' udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503] udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503] udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503] udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503] udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503] udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503] udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503] udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503] [ this would continue ad infinitum. ] The correct bootup sequence would be: [ 6.252918] loop: module loaded [ 6.256390] mpt3sas version 14.101.00.00 loaded [ 6.261554] mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (65950628 kB) [ 6.325894] mpt2sas_cm0: MSI-X vectors supported: 1, no of cores: 32, max_msix_vectors: -1 [ 6.334640] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 24 [ 6.339925] mpt2sas_cm0: iomem(0x00000000dff3c000), mapped(0xffffc900073f4000), size(16384) [ 6.348672] mpt2sas_cm0: ioport(0x000000000000e000), size(256) [ 6.410508] mpt2sas_cm0: sending message unit reset !! [ 6.417437] mpt2sas_cm0: message unit reset: SUCCESS [ 6.463275] mpt2sas_cm0: Allocated physical memory: size(8199 kB) [ 6.469627] mpt2sas_cm0: Current Controller Queue Depth(3640),Max Controller Queue Depth(3712) [ 6.478635] mpt2sas_cm0: Scatter Gather Elements per IO(128) [ 6.530433] mpt2sas_cm0: LSISAS2008: FWVersion(12.00.00.00), ChipRevision(0x03), BiosVersion(07.23.01.00) [ 6.540424] mpt2sas_cm0: Protocol=( [ 6.540425] Initiator [ 6.544150] ,Target [ 6.546644] ), [ 6.548968] Capabilities=( [ 6.550943] TLR [ 6.553901] ,EEDP [ 6.555898] ,Snapshot Buffer [ 6.558050] ,Diag Trace Buffer [ 6.561159] ,Task Set Full [ 6.564462] ,NCQ [ 6.567395] ) [ 6.571316] scsi host0: Fusion MPT SAS Host [ 6.576344] mpt2sas_cm0: sending port enable !! [ 6.576495] ahci 0000:00:11.0: version 3.0 [ 6.577100] ahci 0000:00:11.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode [ 6.577105] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part ccc [ 6.579016] scsi host1: ahci [ 6.579387] scsi host2: ahci [ 6.[ [32m OK [0m] Started Journal Service. ... (BTW., note the various broken printk lines - which is an unrelated bug.) I bisected the regression back to this upstream merge commit done by Linus: commit a829a8445f09036404060f4d6489cb13433f4304 Merge: 84b607913442 f5b893c94715 Author: Linus Torvalds Date: Wed Dec 14 10:49:33 2016 -0800 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi ... which is a head-scratcher, so I double checked the key bisection points, but the bisection result is robust. I also re-created Linus's merge and double checked the conflict resolution - which looks fine as well. After (much) more testing it turns out that the bug is some sort of combination bug, in that scsi-next didn't have all the SCSI fixes that upstream already had, in particular it didn't have these commits: 7ff723ad0f87 scsi: mpt3sas: Unblock device after controller reset 18f6084a989b scsi: mpt3sas: Fix secure erase premature termination 6d3a56ed0985 scsi: mpt3sas: Fix for block device of raid exists even after deleting raid disk When Linus pulled in scsi-next-minus-fixes these two sets of commits combined and produced the regression - and made the bisection lead to the merge commit. So I manually rebased those 3 fixes on top of the scsi-next tree (f5b893c94715) and indeed one of them broke my box: 18f6084a989b scsi: mpt3sas: Fix secure erase premature termination I reverted it from latest upstream (with a minor conflict resolution), and that makes my box boot fine again. I have no idea which scsi-next commit this change interacted with, and it's not easy to find out so I'm not volunteering! It must be one of these 256 commits: e3a00f68e426..f5b893c94715 Note that reverting the first commit alone does not help: 7ff723ad0f87 scsi: mpt3sas: Unblock device after controller reset So it's reverting 18f6084a989b (while keeping ata_12_16_cmd() around to enable the 7ff723ad0f87 fix) that does the trick. Thanks, Ingo --- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ====================> From 0734e6d2a7f757172d6b7750d8fcf602909300e6 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Sun, 15 Jan 2017 09:59:39 +0100 Subject: [PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination" This reverts commit 18f6084a989ba1b38702f9af37a2e4049a924be6. Conflicts: drivers/scsi/mpt3sas/mpt3sas_scsih.c Signed-off-by: Ingo Molnar --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index b5c966e319d3..3573daa2cce8 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -4063,13 +4063,6 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd) if (ioc->logging_level & MPT_DEBUG_SCSI) scsi_print_command(scmd); - /* - * Lock the device for any subsequent command until command is - * done. - */ - if (ata_12_16_cmd(scmd)) - scsi_internal_device_block(scmd->device); - sas_device_priv_data = scmd->device->hostdata; if (!sas_device_priv_data || !sas_device_priv_data->sas_target) { scmd->result = DID_NO_CONNECT << 16; @@ -4650,9 +4643,6 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply) if (scmd == NULL) return 1; - if (ata_12_16_cmd(scmd)) - scsi_internal_device_unblock(scmd->device, SDEV_RUNNING); - mpi_request = mpt3sas_base_get_msg_frame(ioc, smid); if (mpi_reply == NULL) {