From patchwork Mon Jun 3 21:04:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tariq Toukan X-Patchwork-Id: 13684356 X-Patchwork-Delegate: kuba@kernel.org Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2060.outbound.protection.outlook.com [40.107.93.60]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97C7813B58C for ; Mon, 3 Jun 2024 21:06:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.93.60 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717448775; cv=fail; b=EK+H3hHtOm4DkR21qJbARv+gpEUVmAg4euO4WTrYWcnCartpSfpI9INBkiKx5Jd4OC6JVfz3E5QH7dhr2W3Uk/gkxx6OA+k+ZNUCcjYLgSx6vxFMnYIk0DC5JNHnkmGlKH7e/eTMlB1VUzrwzmLToIfxDhdtbJYUCONwvxs/5Pg= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717448775; c=relaxed/simple; bh=Kw2qqrdXzsXargIlFnI2YJamvoACcVgRzuzZ00C0yNQ=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QHt0qJPKJ/KJpKoncPHQf7GDnBQ6tqOYmIP5P5xbgNa26942GOnl3Wj3GftXMoVRU9bWEKzaJ9L0jBh4WHWI0CWwGv5wRErmgO3gQiq0HyLls/wfWTFHH9p61vhBqA8yNC2MxsHsdTZKAk1ijZV3vT+EknV/+CRF8ffqhFhRtT8= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=fWil/NNg; arc=fail smtp.client-ip=40.107.93.60 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="fWil/NNg" ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TjMSKOTtHF+tchoyTuyA3mm7s5QpjFbMN6SQ1BPl+RuI05R7dDHw97sxsRFjUAW1Hg2fK6lx+6DJO7vJy49y3J9SSNL0YdJDJqauX/v/Za2aM6kJnVQpZhnJbjX6+idgJ0bRafWA6gCoKz4zt0bLzjAYHzQVSbpHWoA+q5l1ELs1O5X4hsaoKSjUU10ZfnXeyFNDZ0TRr2EmdzG2Ugu/qAFzf7fytI3k5cf7Afevkjtd8R/d4XwtueWv/YqyPTWHbLN2Ob+BwNz5ZzxFjZHNrGDajqld27/SrlOiMSJaiDWq4tXxOfVHU6yEToTPocKhVgKDHMGtAc1CTugGbHRdQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7wrXuEJWt4aW4ahbSUl6o09kWbZfUYg9qTQ54x/NqmI=; b=OHM4TaGvWD6IxjdYdPKvZjyGQOp8uoEdDJ55ojxbeskDPPbX2hS7dlE+VDisgSBja2mMp5EL3HE08EgutvKJzY7xYUqThTwz5Nj71bALLcaMUNUP3e5n/EI4gSvjipkxAbVF6y12sjQT1zc1pu7o/X4wO3UV3rWWiEXOXC3GjS/lnQkQUzNKnwOzP65SWK+KB4rFX4LQ6ro67bWivdQ88SO7zU+RudkfKx93ViJ3VzgfHevvOwW0+6++013LJ3fSrZWqGUhnT1SK1y50zRHXyp0Db2xoTJbvOQJ3XAAS95wuTLojpLeglV7quaWlFSlFhpw9D89Lmqqw6YwOGjcGWQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7wrXuEJWt4aW4ahbSUl6o09kWbZfUYg9qTQ54x/NqmI=; b=fWil/NNg1ob9jPkK6y0s2eKOFbyZM3V3VVFgm0xLACiKqu1mJTisGHfSro4/FnTqzo5CnhcUh0i9qL/1UmTo1uL2c8qJaUk7BDjYv419kXC8WUxydGZQhRbGpWR5ErDHrNLT/ZmNnVhVJ2sSZlZTj++BD7r7QURYbXQk06RHpWwGpfJ4MQkQaMvM/AKL0Ad7nBT53zY4pEyW2gUqZKPxJAFoefiJdYB08352UkSuNScRqbohA+hE8q1IJGlLXIa9Jv6ZvjyReqgZo2QjJwWEbcTU7DZD+nSnO/XZmNk6NY2P3QMl72yVs5CUhDORtVZX+VN+Ki+zYwjV4qUEu6K9pg== Received: from BL0PR02CA0050.namprd02.prod.outlook.com (2603:10b6:207:3d::27) by IA1PR12MB7541.namprd12.prod.outlook.com (2603:10b6:208:42f::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7633.22; Mon, 3 Jun 2024 21:06:05 +0000 Received: from BL02EPF0001A0FA.namprd03.prod.outlook.com (2603:10b6:207:3d:cafe::d9) by BL0PR02CA0050.outlook.office365.com (2603:10b6:207:3d::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7633.30 via Frontend Transport; Mon, 3 Jun 2024 21:06:05 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by BL02EPF0001A0FA.mail.protection.outlook.com (10.167.242.101) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7633.15 via Frontend Transport; Mon, 3 Jun 2024 21:06:04 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Mon, 3 Jun 2024 14:05:51 -0700 Received: from drhqmail203.nvidia.com (10.126.190.182) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Mon, 3 Jun 2024 14:05:51 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.182) with Microsoft SMTP Server id 15.2.1544.4 via Frontend Transport; Mon, 3 Jun 2024 14:05:48 -0700 From: Tariq Toukan To: "David S. Miller" , Jakub Kicinski , Paolo Abeni , Eric Dumazet CC: , Saeed Mahameed , Gal Pressman , Leon Romanovsky , Moshe Shemesh , Shay Drori , Tariq Toukan Subject: [PATCH net 1/2] net/mlx5: Stop waiting for PCI if pci channel is offline Date: Tue, 4 Jun 2024 00:04:42 +0300 Message-ID: <20240603210443.980518-2-tariqt@nvidia.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240603210443.980518-1-tariqt@nvidia.com> References: <20240603210443.980518-1-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BL02EPF0001A0FA:EE_|IA1PR12MB7541:EE_ X-MS-Office365-Filtering-Correlation-Id: e8e39cd6-974d-4e4f-62ba-08dc8410fb47 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230031|82310400017|376005|36860700004|1800799015; X-Microsoft-Antispam-Message-Info: FPWicJmmdmPR/04+fGIhRjWcX1NXgXaZtUMPhfdFgkTbYAVZSOmDWOM1bbjN6G3La8tFnFE3vkCjat7uGW0ewgRIRjOIDdjVbw8Tl9l/N3lyN374waR9O7AzqFplkiu47rzqNjCbLkYU4AyhXuOYVIqPRB206Wwk++9ZZe9F0yz7ZX6iPNEjnSNPgTvXRSVcgtVSvjAte6aq+zvmpWlz0QJbkGqQrRmrCNR+mf5bXzhbcEyiOzwcwGNgLpC0OMgBCItQlEFD3q6epCmGcTBiBuGIqNt+QL9CdsuoQA3LTOvahPmvdFmMtsxbd2HOQKGkSi614mEva4OkMWLUCmTysHquwZtfh6H3w+PQjWmOhrdEzSxKDbkiTfrAFuf3TqL7w9pBDWkXBknjlXm102phZmNqJJAPAorbTVwuHuz4zRhUnTzGbi054Js0sVjyKT6aSHFmDo2bIFEFYnN3iYgGNLte9AKFbS+nguKr6h8zzgm88o00V4vxPtCa5+hZjsUR5tDc2oPI9vdby6R6A47zHtwLYqvBrDhJXeW5bTBy9Dz/aAB9et009zFPqx9l9NzD/P/3TGRg6oeG3NH+v/SWipq7QY+IhHlBBSycAFE4+wWVoOzyWlPcLyEmvpdb2rsuw9vAico0cQIZGQvAmHGzuUf4Pv4/D+l5tOItXkDNVo0+xt1+F25G5/Z2Yt9cpL83zk8dGHc5PScdyrqRyqQZWSgdBUqBWglL3zRIbPsW0Fv2YJbYt/y01w7bGcJAeQ3SC1ZJ8gRG0IvpY2zAUXBpYcbo2ZZFSGWsiVQeCpwAZSkAX9iaMf4OokBVYpaFR/wwmEKyfWwPwmgspuVQWndX4sFSfVanynGo0ewJmpXWzGM1EqWQXSIlQ0iKzs4Nt/X0TmcDQKHjYi+WkJq/Rm5deGQDSwvj4rG9i7BblKklehVpYicKeZDkpNFQPH5/AiPMWfjIav4cJYsJ83grESbx0EusBFAnyt0z6TqkAIxtz6HMwK9B+S0XCm6mUWZ3nzGchpO7m7iLK7cATdgwi/ijmumB+FMZa8Nyi8C+rCCilH5Qoh0/4/vBC+R6UT2Zveki6Z9/55f5K6EeAHq30eH3hmsWhBYpHz8RRfdTlTR3w2OWH+L5f+7I8GoyoY2uVvhwF8ixJyMHvRJPe1T4p/bVVGRe5MYt9KooLog922oQapVEsW5UJwbDfVpMJCuMnNArMx39XqQYO0Wpf/1ikm52ICwZ4K3blL60fzhi1Ygw0P5W8wk8+x0GovgbLjAUGqp7ezmwsTfJRjadNJNWPvuhSYToXncoMhWBRmR9PsYi0c6wDAwF0dJxZtuzhGU6Va/FRXrTLZxQgTbUTn+5C6gX2ZadKpIjd9WyKKoDkdMQXbU= X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230031)(82310400017)(376005)(36860700004)(1800799015);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Jun 2024 21:06:04.7995 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e8e39cd6-974d-4e4f-62ba-08dc8410fb47 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BL02EPF0001A0FA.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR12MB7541 X-Patchwork-Delegate: kuba@kernel.org From: Moshe Shemesh In case pci channel becomes offline the driver should not wait for PCI reads during health dump and recovery flow. The driver has timeout for each of these loops trying to read PCI, so it would fail anyway. However, in case of recovery waiting till timeout may cause the pci error_detected() callback fail to meet pci_dpc_recovered() wait timeout. Fixes: b3bd076f7501 ("net/mlx5: Report devlink health on FW fatal issues") Signed-off-by: Moshe Shemesh Reviewed-by: Shay Drori Signed-off-by: Tariq Toukan --- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 4 ++++ drivers/net/ethernet/mellanox/mlx5/core/health.c | 8 ++++++++ drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 4 ++++ 3 files changed, 16 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c index 2d95a9b7b44e..b61b7d966114 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c @@ -373,6 +373,10 @@ int mlx5_cmd_fast_teardown_hca(struct mlx5_core_dev *dev) do { if (mlx5_get_nic_state(dev) == MLX5_INITIAL_SEG_NIC_INTERFACE_DISABLED) break; + if (pci_channel_offline(dev->pdev)) { + mlx5_core_err(dev, "PCI channel offline, stop waiting for NIC IFC\n"); + return -EACCES; + } cond_resched(); } while (!time_after(jiffies, end)); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index ad38e31822df..a6329ca2d9bf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -248,6 +248,10 @@ void mlx5_error_sw_reset(struct mlx5_core_dev *dev) do { if (mlx5_get_nic_state(dev) == MLX5_INITIAL_SEG_NIC_INTERFACE_DISABLED) break; + if (pci_channel_offline(dev->pdev)) { + mlx5_core_err(dev, "PCI channel offline, stop waiting for NIC IFC\n"); + goto unlock; + } msleep(20); } while (!time_after(jiffies, end)); @@ -317,6 +321,10 @@ int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev) mlx5_core_warn(dev, "device is being removed, stop waiting for PCI\n"); return -ENODEV; } + if (pci_channel_offline(dev->pdev)) { + mlx5_core_err(dev, "PCI channel offline, stop waiting for PCI\n"); + return -EACCES; + } msleep(100); } return 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c index 6b774e0c2766..d0b595ba6110 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c @@ -74,6 +74,10 @@ int mlx5_vsc_gw_lock(struct mlx5_core_dev *dev) ret = -EBUSY; goto pci_unlock; } + if (pci_channel_offline(dev->pdev)) { + ret = -EACCES; + goto pci_unlock; + } /* Check if semaphore is already locked */ ret = vsc_read(dev, VSC_SEMAPHORE_OFFSET, &lock_val); From patchwork Mon Jun 3 21:04:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tariq Toukan X-Patchwork-Id: 13684355 X-Patchwork-Delegate: kuba@kernel.org Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2054.outbound.protection.outlook.com [40.107.93.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E5D513B59C for ; Mon, 3 Jun 2024 21:06:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.93.54 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717448768; cv=fail; b=gQhX7LUarVCp68lFtEhaUWX93Bm4iOTNHc3W1UmIBwrmekNilE/m1gwZPiiJGhdJVr1wizRfLuUkjWKcv09WXDXkqxByKt9RhGtiWy9290hOzXChrag3rSajKnxt6KVWOnYoXSXosxpbYhNqOP5A4xi9xxpPoPWl/0RExvwaiwA= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717448768; c=relaxed/simple; bh=tihl7qiS3tmFyFsItgYYvbl9cAshPwkXAGO+ivGa5Dk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=NIlvsplJ/CK0lmq3hqB26ohm+be5F0pMplwPNSwU+xqFXOB3d5WYGG8F9h2ipuyYmuf+UgjVMogQhQZRweifa/LCBK7JMwoRs3RhwCrbY5mGPZHbggzGSmRkxUHvHS2Bib7MmYjgDV0Hgi8lvntLbQpxcHNg5JyL8XhtwoZ5Vvk= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=tluUCR8l; arc=fail smtp.client-ip=40.107.93.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="tluUCR8l" ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OOZYdfsxqNB7DHTm8NyLA7AXY0x/Xalms1f/wnaTat3T1djwyxO5T8kwp3pyetzwUHhTpj3LCF+HBpfsN7b29wbFtt296qBC6LWknlKqNi3ml0fCMCJYVHvHpeuESFqZBdngbEp8ZonEiJkZxlxeggWfMq40uIcNe26eTtNeBNVBB2OtdkHcDtMO9T5E3DaNOUFdziYd9pNu+hQ0DnclbGjeD03NkOqMvbO4Z56+bgXsrUxCOysW0rxiHtM7kc+bIbi5I5lGywQDgMLszzJfIML2Ip6OqTuwiGryK/ppna7k4tWmO3fI6qrEOT2w69UgUDjm7+50TQGbxm6zlZUiwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=NtDNz0GhE0AfkeXVTrSKca+j3qL8Ii/kvWSCiDt5aMI=; b=Fglow3rmAskVuHSLqPu6PBAQGCtiTpWYZX3WjGN9VSbb/a1squmjZ/bEaSPM4yTkNhyu5zDS1Q0Bbxxr2Sj7LFQryy+BfwTQqMBNnHR8tcvPdShnmwMeE/V9jOMWGE0792Zlz1L87dbUkVgzjWQ4IQBoiC+IrfLewmsEDqGVUAWSmR/6kvbN6SWBuQXce2ByzuUTDvmZKwhu1R7DKFiBRuCPvJArLJPfMJUyWToXKdTNMhbJgbdtU+xmc7IZmrh4HNjT8LKUt5s40eOGcM3Dq7zwL8XF4KKQ3BqYfT0Z1N0v69oB/1zZynm9ERAofdO8Iy7Xeai04DfTDCGPnDV+3A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=davemloft.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=NtDNz0GhE0AfkeXVTrSKca+j3qL8Ii/kvWSCiDt5aMI=; b=tluUCR8lYQmw7ExLMCi61Z63AZqC4OGjFhBF1COQb78HGg6d6SgHaGBcwbtahD84dbeHQm5/o4PyHsaaFfZTV6TKIt52VDFV4dPnKpCMFbHUz9wwWMIaEKvO6V6x4dIJHaX0YJBeZkA1p1JIurPxYRSzByLyMizhkVorPiXH5olw2PGttTxvBk0SLdGK+1lsFTvDAjd7v3fdmNQYq2jseWc8eoujeq/HGjztibLSb/EH5/sMR/qt5SHRJcs3rAswoyh2CxrTX8AFRXjKc5rUkGtbOUZQZmtdYMG6mu0BzC+5dqMJRtb7NTdTLhwuaTj6sFkyYkHa26i+RZyZKRrj2g== Received: from DS7PR03CA0359.namprd03.prod.outlook.com (2603:10b6:8:55::35) by MW3PR12MB4380.namprd12.prod.outlook.com (2603:10b6:303:5a::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7633.25; Mon, 3 Jun 2024 21:06:02 +0000 Received: from DS3PEPF0000C37B.namprd04.prod.outlook.com (2603:10b6:8:55:cafe::a9) by DS7PR03CA0359.outlook.office365.com (2603:10b6:8:55::35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7633.30 via Frontend Transport; Mon, 3 Jun 2024 21:06:02 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by DS3PEPF0000C37B.mail.protection.outlook.com (10.167.23.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7633.15 via Frontend Transport; Mon, 3 Jun 2024 21:06:02 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Mon, 3 Jun 2024 14:05:54 -0700 Received: from drhqmail203.nvidia.com (10.126.190.182) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Mon, 3 Jun 2024 14:05:54 -0700 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.182) with Microsoft SMTP Server id 15.2.1544.4 via Frontend Transport; Mon, 3 Jun 2024 14:05:51 -0700 From: Tariq Toukan To: "David S. Miller" , Jakub Kicinski , Paolo Abeni , Eric Dumazet CC: , Saeed Mahameed , Gal Pressman , Leon Romanovsky , Shay Drory , Moshe Shemesh , Tariq Toukan Subject: [PATCH net 2/2] net/mlx5: Always stop health timer during driver removal Date: Tue, 4 Jun 2024 00:04:43 +0300 Message-ID: <20240603210443.980518-3-tariqt@nvidia.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240603210443.980518-1-tariqt@nvidia.com> References: <20240603210443.980518-1-tariqt@nvidia.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS3PEPF0000C37B:EE_|MW3PR12MB4380:EE_ X-MS-Office365-Filtering-Correlation-Id: 33bcf5f3-741e-4697-21ce-08dc8410f9a3 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230031|376005|36860700004|1800799015|82310400017; X-Microsoft-Antispam-Message-Info: 8+5IgdGhCewP3/xa6ZzBm6vreVyzKHF94OWjj3Br1iOp+19jFRuN45Bbihd1S4sqlPBZatjsg6L3ewcy8XjcQtKUUDeymXnuckzcogwcIN/bVfaxKl+HWRDFRF/xwxU6taGFtugYk9qqx70lBIGvlM0nq9BOLUxWDYdP6kKDAzkfe+nG1UeRYZDH9S+RIR6RoJTU7nZbYN6FAFZ3xGWEEtx9KYsxsHnegauLoX7dWlct08v3huMDNHLNUEkhfRCPvcfrWoM7v4vSoMZOabnPV/PEC0YhXH4BYHJ+uZKAFgTkot6gzjytySlNUP0HJpficNXRoUv3EscTW+MUu7in6P+w+OqJ2mqyMOxZMClGtK1kc8Wb4dICPNFzKO7DWRqIlXl6LnfOpxVpA8l79hlUdAs/8PgmbpSWNaS10uPBDU/8/9YR5G3+GTVF5mQ8jNW3mGFSRB30pm4vl0YLKKHLfcDO1HrkGX7pzOPkZqiWMZ31baoLYUa3xj94TNJn9XcQ49POzPyJKtfLkxzX8tq/xFmBgxTXjGCyDC+zgv6aEA4j1tzPTOgnOkJrerT3tQeJN7wHYu6F8M41k7l5YbgZ1FHt7/iXIqDU53PQNLY+j/ogeTEcTsBOz7PL7rKxZFD4vbU2B33YQzML3Lp6OOW6rYAAhLreGnPgHxh1TfpjCsuMQq3TsTmRidq6VGxGC1D8fjit0iKAOkrXow3gpn/caCTiLrAWWJ5Kg8oGeQJ2QElN1bIQbbyHN014b9dmQcDfdqQGBr+NEDTYnJMGk6krODKTFrE7ueD+agIH0OTGfmAt0qvIMXZ/hGKHMv3x+cDS26bC1a/5AJGIn2XXap3naWgHskUuuptDP4s9WdGsFRkLxLECb9FM9v4VvrF86d7gP3rSUV8jdyf5vOBSgjB6xd5bkzbvczKBh+uPI7TM5GXrLtZVA/SCibI7QqZC6GdGnZ/gDDxrpaYyjiYqoIeUNfwylQJkuVOh/Plqx81lmvQQ1nCcfjlS6nTmeOjL0xI/4O/u+L7PpV9qG+Ff8mtYPujXKdV5PhvTiBr8WFpBBGzqPoOaMrl5T1Uaa2bwc7Vxej4LTT08A2ofJLvuuiyW2xV+I9Mvp++wRvBZ0agh/St87jwEE0lx0URw20c64rWEtVT4uzTIx8DZMPqFEzHHpg5hVdly9xGbEIlmJrP0ijzc+E7SW76kfnLs74FZgM22bQwEz99cl8s7iQ4rv3f/eqGshA0k1QVk9/PyISFMmLQWns9vS2NPtrPezsl3mjNuobCRBOEYgEUDWH8i8aVh6cicw9nq6sjXb8/BheDlRtqaW4TBOr7pRmeoOgi6cY4sgMNsmnQ4RK9O21r1nw7+6zfZUA0vvgNIpg935ODa5tlCqRe9w0VcLIuy9wNYw8kQ X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230031)(376005)(36860700004)(1800799015)(82310400017);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Jun 2024 21:06:02.0922 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 33bcf5f3-741e-4697-21ce-08dc8410f9a3 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DS3PEPF0000C37B.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW3PR12MB4380 X-Patchwork-Delegate: kuba@kernel.org From: Shay Drory Currently, if teardown_hca fails to execute during driver removal, mlx5 does not stop the health timer. Afterwards, mlx5 continue with driver teardown. This may lead to a UAF bug, which results in page fault Oops[1], since the health timer invokes after resources were freed. Hence, stop the health monitor even if teardown_hca fails. [1] mlx5_core 0000:18:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) mlx5_core 0000:18:00.0: E-Switch: cleanup mlx5_core 0000:18:00.0: wait_func:1155:(pid 1967079): TEARDOWN_HCA(0x103) timeout. Will cause a leak of a command resource mlx5_core 0000:18:00.0: mlx5_function_close:1288:(pid 1967079): tear_down_hca failed, skip cleanup BUG: unable to handle page fault for address: ffffa26487064230 PGD 100c00067 P4D 100c00067 PUD 100e5a067 PMD 105ed7067 PTE 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE ------- --- 6.7.0-68.fc38.x86_64 #1 Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020 RIP: 0010:ioread32be+0x34/0x60 RSP: 0018:ffffa26480003e58 EFLAGS: 00010292 RAX: ffffa26487064200 RBX: ffff9042d08161a0 RCX: ffff904c108222c0 RDX: 000000010bbf1b80 RSI: ffffffffc055ddb0 RDI: ffffa26487064230 RBP: ffff9042d08161a0 R08: 0000000000000022 R09: ffff904c108222e8 R10: 0000000000000004 R11: 0000000000000441 R12: ffffffffc055ddb0 R13: ffffa26487064200 R14: ffffa26480003f00 R15: ffff904c108222c0 FS: 0000000000000000(0000) GS:ffff904c10800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffa26487064230 CR3: 00000002c4420006 CR4: 00000000007706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? __die+0x23/0x70 ? page_fault_oops+0x171/0x4e0 ? exc_page_fault+0x175/0x180 ? asm_exc_page_fault+0x26/0x30 ? __pfx_poll_health+0x10/0x10 [mlx5_core] ? __pfx_poll_health+0x10/0x10 [mlx5_core] ? ioread32be+0x34/0x60 mlx5_health_check_fatal_sensors+0x20/0x100 [mlx5_core] ? __pfx_poll_health+0x10/0x10 [mlx5_core] poll_health+0x42/0x230 [mlx5_core] ? __next_timer_interrupt+0xbc/0x110 ? __pfx_poll_health+0x10/0x10 [mlx5_core] call_timer_fn+0x21/0x130 ? __pfx_poll_health+0x10/0x10 [mlx5_core] __run_timers+0x222/0x2c0 run_timer_softirq+0x1d/0x40 __do_softirq+0xc9/0x2c8 __irq_exit_rcu+0xa6/0xc0 sysvec_apic_timer_interrupt+0x72/0x90 asm_sysvec_apic_timer_interrupt+0x1a/0x20 RIP: 0010:cpuidle_enter_state+0xcc/0x440 ? cpuidle_enter_state+0xbd/0x440 cpuidle_enter+0x2d/0x40 do_idle+0x20d/0x270 cpu_startup_entry+0x2a/0x30 rest_init+0xd0/0xd0 arch_call_rest_init+0xe/0x30 start_kernel+0x709/0xa90 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0x96/0xa0 secondary_startup_64_no_verify+0x18f/0x19b ---[ end trace 0000000000000000 ]--- Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load") Signed-off-by: Shay Drory Reviewed-by: Moshe Shemesh Signed-off-by: Tariq Toukan --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 6574c145dc1e..459a836a5d9c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1298,6 +1298,9 @@ static int mlx5_function_teardown(struct mlx5_core_dev *dev, bool boot) if (!err) mlx5_function_disable(dev, boot); + else + mlx5_stop_health_poll(dev, boot); + return err; }