From patchwork Fri Oct 13 13:40:07 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steffen Maier X-Patchwork-Id: 10004767 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B88C460216 for ; Fri, 13 Oct 2017 13:40:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A958B29083 for ; Fri, 13 Oct 2017 13:40:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A8481290A3; Fri, 13 Oct 2017 13:40:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DBA3B29083 for ; Fri, 13 Oct 2017 13:40:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753720AbdJMNkh (ORCPT ); Fri, 13 Oct 2017 09:40:37 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:36528 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753460AbdJMNkg (ORCPT ); Fri, 13 Oct 2017 09:40:36 -0400 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id v9DDdvAS077359 for ; Fri, 13 Oct 2017 09:40:36 -0400 Received: from e06smtp14.uk.ibm.com (e06smtp14.uk.ibm.com [195.75.94.110]) by mx0a-001b2d01.pphosted.com with ESMTP id 2djvv7npee-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Fri, 13 Oct 2017 09:40:35 -0400 Received: from localhost by e06smtp14.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 13 Oct 2017 14:40:32 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp14.uk.ibm.com (192.168.101.144) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 13 Oct 2017 14:40:29 +0100 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v9DDeTka25428120; Fri, 13 Oct 2017 13:40:29 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3FC424C040; Fri, 13 Oct 2017 14:36:23 +0100 (BST) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 012984C046; Fri, 13 Oct 2017 14:36:23 +0100 (BST) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.152.85.9]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Fri, 13 Oct 2017 14:36:22 +0100 (BST) From: Steffen Maier To: "James E . J . Bottomley" , "Martin K . Petersen" Cc: linux-scsi@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Heiko Carstens , Steffen Maier , stable@vger.kernel.org Subject: [PATCH] zfcp: fix erp_action use-before-initialize in REC action trace Date: Fri, 13 Oct 2017 15:40:07 +0200 X-Mailer: git-send-email 2.13.5 X-TM-AS-GCONF: 00 x-cbid: 17101313-0016-0000-0000-000004F587DE X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17101313-0017-0000-0000-00002830A7A6 Message-Id: <20171013134007.54534-1-maier@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-10-13_04:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1710130190 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP v4.10 commit 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery") extended accessing parent pointer fields of struct zfcp_erp_action for tracing. If an erp_action has never been enqueued before, these parent pointer fields are uninitialized and NULL. Examples are zfcp objects freshly added to the parent object's children list, before enqueueing their first recovery subsequently. In zfcp_erp_try_rport_unblock(), we iterate such list. Accessing erp_action fields can cause a NULL pointer dereference. Since the kernel can read from lowcore on s390, it does not immediately cause a kernel page fault. Instead it can cause hangs on trying to acquire the wrong erp_action->adapter->dbf->rec_lock in zfcp_dbf_rec_action_lvl() ^bogus^ while holding already other locks with IRQs disabled. Real life example from attaching lots of LUNs in parallel on many CPUs: crash> bt 17723 PID: 17723 TASK: ... CPU: 25 COMMAND: "zfcperp0.0.1800" LOWCORE INFO: -psw : 0x0404300180000000 0x000000000038e424 -function : _raw_spin_lock_wait_flags at 38e424 ... #0 [fdde8fc90] zfcp_dbf_rec_action_lvl at 3e0004e9862 [zfcp] #1 [fdde8fce8] zfcp_erp_try_rport_unblock at 3e0004dfddc [zfcp] #2 [fdde8fd38] zfcp_erp_strategy at 3e0004e0234 [zfcp] #3 [fdde8fda8] zfcp_erp_thread at 3e0004e0a12 [zfcp] #4 [fdde8fe60] kthread at 173550 #5 [fdde8feb8] kernel_thread_starter at 10add2 zfcp_adapter zfcp_port zfcp_unit
, 0x404040d600000000 scsi_device NULL, returning early! zfcp_scsi_dev.status = 0x40000000 0x40000000 ZFCP_STATUS_COMMON_RUNNING crash> zfcp_unit
struct zfcp_unit { erp_action = { adapter = 0x0, port = 0x0, unit = 0x0, }, } zfcp_erp_action is always fully embedded into its container object. Such container object is never moved in its object tree (only add or delete). Hence, erp_action parent pointers can never change. To fix the issue, initialize the erp_action parent pointers before adding the erp_action container to any list and thus before it becomes accessible from outside of its initializing function. In order to also close the time window between zfcp_erp_setup_act() memsetting the entire erp_action to zero and setting the parent pointers again, drop the memset and instead explicitly initialize individually all erp_action fields except for parent pointers. To be extra careful not to introduce any other unintended side effect, even keep zeroing the erp_action fields for list and timer. Also double-check with WARN_ON_ONCE that erp_action parent pointers never change, so we get to know when we would deviate from previous behavior. Signed-off-by: Steffen Maier Fixes: 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery") Cc: #2.6.32+ Reviewed-by: Benjamin Block --- James, Martin, it's an important bugfix cut against James' scsi.git fixes branch, and would be nice if it could make it into 4.14 via rc. drivers/s390/scsi/zfcp_aux.c | 5 +++++ drivers/s390/scsi/zfcp_erp.c | 18 +++++++++++------- drivers/s390/scsi/zfcp_scsi.c | 5 +++++ 3 files changed, 21 insertions(+), 7 deletions(-) --- a/drivers/s390/scsi/zfcp_aux.c +++ b/drivers/s390/scsi/zfcp_aux.c @@ -357,6 +357,8 @@ struct zfcp_adapter *zfcp_adapter_enqueu adapter->next_port_scan = jiffies; + adapter->erp_action.adapter = adapter; + if (zfcp_qdio_setup(adapter)) goto failed; @@ -513,6 +515,9 @@ struct zfcp_port *zfcp_port_enqueue(stru port->dev.groups = zfcp_port_attr_groups; port->dev.release = zfcp_port_release; + port->erp_action.adapter = adapter; + port->erp_action.port = port; + if (dev_set_name(&port->dev, "0x%016llx", (unsigned long long)wwpn)) { kfree(port); goto err_out; --- a/drivers/s390/scsi/zfcp_erp.c +++ b/drivers/s390/scsi/zfcp_erp.c @@ -193,9 +193,8 @@ static struct zfcp_erp_action *zfcp_erp_ atomic_or(ZFCP_STATUS_COMMON_ERP_INUSE, &zfcp_sdev->status); erp_action = &zfcp_sdev->erp_action; - memset(erp_action, 0, sizeof(struct zfcp_erp_action)); - erp_action->port = port; - erp_action->sdev = sdev; + WARN_ON_ONCE(erp_action->port != port); + WARN_ON_ONCE(erp_action->sdev != sdev); if (!(atomic_read(&zfcp_sdev->status) & ZFCP_STATUS_COMMON_RUNNING)) act_status |= ZFCP_STATUS_ERP_CLOSE_ONLY; @@ -208,8 +207,8 @@ static struct zfcp_erp_action *zfcp_erp_ zfcp_erp_action_dismiss_port(port); atomic_or(ZFCP_STATUS_COMMON_ERP_INUSE, &port->status); erp_action = &port->erp_action; - memset(erp_action, 0, sizeof(struct zfcp_erp_action)); - erp_action->port = port; + WARN_ON_ONCE(erp_action->port != port); + WARN_ON_ONCE(erp_action->sdev != NULL); if (!(atomic_read(&port->status) & ZFCP_STATUS_COMMON_RUNNING)) act_status |= ZFCP_STATUS_ERP_CLOSE_ONLY; break; @@ -219,7 +218,8 @@ static struct zfcp_erp_action *zfcp_erp_ zfcp_erp_action_dismiss_adapter(adapter); atomic_or(ZFCP_STATUS_COMMON_ERP_INUSE, &adapter->status); erp_action = &adapter->erp_action; - memset(erp_action, 0, sizeof(struct zfcp_erp_action)); + WARN_ON_ONCE(erp_action->port != NULL); + WARN_ON_ONCE(erp_action->sdev != NULL); if (!(atomic_read(&adapter->status) & ZFCP_STATUS_COMMON_RUNNING)) act_status |= ZFCP_STATUS_ERP_CLOSE_ONLY; @@ -229,7 +229,11 @@ static struct zfcp_erp_action *zfcp_erp_ return NULL; } - erp_action->adapter = adapter; + WARN_ON_ONCE(erp_action->adapter != adapter); + memset(&erp_action->list, 0, sizeof(erp_action->list)); + memset(&erp_action->timer, 0, sizeof(erp_action->timer)); + erp_action->step = ZFCP_ERP_STEP_UNINITIALIZED; + erp_action->fsf_req_id = 0; erp_action->action = need; erp_action->status = act_status; --- a/drivers/s390/scsi/zfcp_scsi.c +++ b/drivers/s390/scsi/zfcp_scsi.c @@ -115,10 +115,15 @@ static int zfcp_scsi_slave_alloc(struct struct zfcp_unit *unit; int npiv = adapter->connection_features & FSF_FEATURE_NPIV_MODE; + zfcp_sdev->erp_action.adapter = adapter; + zfcp_sdev->erp_action.sdev = sdev; + port = zfcp_get_port_by_wwpn(adapter, rport->port_name); if (!port) return -ENXIO; + zfcp_sdev->erp_action.port = port; + unit = zfcp_unit_find(port, zfcp_scsi_dev_lun(sdev)); if (unit) put_device(&unit->dev);