From patchwork Mon Aug 29 09:02:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 12957569 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7F5DECAAD4 for ; Mon, 29 Aug 2022 09:03:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229914AbiH2JDE (ORCPT ); Mon, 29 Aug 2022 05:03:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51668 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229884AbiH2JCx (ORCPT ); Mon, 29 Aug 2022 05:02:53 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7D205A3ED; Mon, 29 Aug 2022 02:02:48 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8F4E860F49; Mon, 29 Aug 2022 09:02:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 78811C433C1; Mon, 29 Aug 2022 09:02:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1661763767; bh=SK79t5xTPnCsfiuUia3Kovnbd8ReQTcyoQ4SnlPI59A=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=JiJEZcbU6FfEow2NpdxGIgVlhEcicC3aBU4l8zHRUCMuMUE2lMT+3r0o6u2fp1XGU cQokEQdKlCtxX7edRw/jp7lSdNAc6iL7DIG87JGZKkwbo+uh04jdjeNBSL6ZGVGRLF vdl+4wGijf0f8mzqqIa8rJtYHMEVDDTqkGn8NnP5GDt/VRtMvQkj2WGMZ8C50Zkxcz +9kvqX0UzL16CDDMQmLyhM6dl4GZtYd+pzwhgJ2QIcapMIPqUhC2sKsk6RWCX8l1YL u/NqDZ2gSWAcj5gbztuz+DkAi0TMX+xUOX5HMKTmnFKiD5mzXPcBn95qS8vLHoAvPv dvpCLXoWidlDg== From: Leon Romanovsky To: Jason Gunthorpe Cc: Maher Sanalla , Chris Mi , linux-rdma@vger.kernel.org, Maor Gottlieb , netdev@vger.kernel.org, Saeed Mahameed , Shay Drory , Michael Guralnik Subject: [PATCH rdma-rc 1/3] RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profile Date: Mon, 29 Aug 2022 12:02:27 +0300 Message-Id: X-Mailer: git-send-email 2.37.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Maher Sanalla When the RDMA auxiliary driver probes, it sets its profile based on devlink driverinit value. The latter might not be in sync with FW yet (In case devlink reload is not performed), thus causing a mismatch between RDMA driver and FW. This results in the following FW syndrome when the RDMA driver tries to adjust RoCE state, which fails the probe: "0xC1F678 | modify_nic_vport_context: roce_en set on a vport that doesn't support roce" To prevent this, select the PF profile based on FW RoCE capability instead of relying on devlink driverinit value. To provide backward compatibility of the RoCE disable feature, on older FW's where roce_rw is not set (FW RoCE capability is read-only), keep the current behavior e.g., rely on devlink driverinit value. Fixes: fbfa97b4d79f ("net/mlx5: Disable roce at HCA level") Reviewed-by: Shay Drory Reviewed-by: Michael Guralnik Reviewed-by: Saeed Mahameed Signed-off-by: Maher Sanalla Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/main.c | 2 +- .../net/ethernet/mellanox/mlx5/core/main.c | 23 +++++++++++++++++-- include/linux/mlx5/driver.h | 19 +++++++-------- 3 files changed, 32 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index fc94a1b25485..883d7c60143e 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -4336,7 +4336,7 @@ static int mlx5r_probe(struct auxiliary_device *adev, dev->mdev = mdev; dev->num_ports = num_ports; - if (ll == IB_LINK_LAYER_ETHERNET && !mlx5_is_roce_init_enabled(mdev)) + if (ll == IB_LINK_LAYER_ETHERNET && !mlx5_get_roce_state(mdev)) profile = &raw_eth_profile; else profile = &pf_profile; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index bec8d6d0b5f6..4870a88cecb7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -494,6 +494,24 @@ static int max_uc_list_get_devlink_param(struct mlx5_core_dev *dev) return err; } +bool mlx5_is_roce_on(struct mlx5_core_dev *dev) +{ + struct devlink *devlink = priv_to_devlink(dev); + union devlink_param_value val; + int err; + + err = devlink_param_driverinit_value_get(devlink, + DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE, + &val); + + if (!err) + return val.vbool; + + mlx5_core_dbg(dev, "Failed to get param. err = %d\n", err); + return MLX5_CAP_GEN(dev, roce); +} +EXPORT_SYMBOL(mlx5_is_roce_on); + static int handle_hca_cap_2(struct mlx5_core_dev *dev, void *set_ctx) { void *set_hca_cap; @@ -597,7 +615,8 @@ static int handle_hca_cap(struct mlx5_core_dev *dev, void *set_ctx) MLX5_CAP_GEN_MAX(dev, num_total_dynamic_vf_msix)); if (MLX5_CAP_GEN(dev, roce_rw_supported)) - MLX5_SET(cmd_hca_cap, set_hca_cap, roce, mlx5_is_roce_init_enabled(dev)); + MLX5_SET(cmd_hca_cap, set_hca_cap, roce, + mlx5_is_roce_on(dev)); max_uc_list = max_uc_list_get_devlink_param(dev); if (max_uc_list > 0) @@ -623,7 +642,7 @@ static int handle_hca_cap(struct mlx5_core_dev *dev, void *set_ctx) */ static bool is_roce_fw_disabled(struct mlx5_core_dev *dev) { - return (MLX5_CAP_GEN(dev, roce_rw_supported) && !mlx5_is_roce_init_enabled(dev)) || + return (MLX5_CAP_GEN(dev, roce_rw_supported) && !mlx5_is_roce_on(dev)) || (!MLX5_CAP_GEN(dev, roce_rw_supported) && !MLX5_CAP_GEN(dev, roce)); } diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 96b16fbe1aa4..d6338fb449c8 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1279,16 +1279,17 @@ enum { MLX5_TRIGGERED_CMD_COMP = (u64)1 << 32, }; -static inline bool mlx5_is_roce_init_enabled(struct mlx5_core_dev *dev) +bool mlx5_is_roce_on(struct mlx5_core_dev *dev); + +static inline bool mlx5_get_roce_state(struct mlx5_core_dev *dev) { - struct devlink *devlink = priv_to_devlink(dev); - union devlink_param_value val; - int err; - - err = devlink_param_driverinit_value_get(devlink, - DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE, - &val); - return err ? MLX5_CAP_GEN(dev, roce) : val.vbool; + if (MLX5_CAP_GEN(dev, roce_rw_supported)) + return MLX5_CAP_GEN(dev, roce); + + /* If RoCE cap is read-only in FW, get RoCE state from devlink + * in order to support RoCE enable/disable feature + */ + return mlx5_is_roce_on(dev); } #endif /* MLX5_DRIVER_H */ From patchwork Mon Aug 29 09:02:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 12957567 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 601CFECAAD2 for ; Mon, 29 Aug 2022 09:02:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229846AbiH2JCp (ORCPT ); Mon, 29 Aug 2022 05:02:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229827AbiH2JCn (ORCPT ); Mon, 29 Aug 2022 05:02:43 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 235B45A3CC; Mon, 29 Aug 2022 02:02:42 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2FD7EB80DB2; Mon, 29 Aug 2022 09:02:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 781D3C433C1; Mon, 29 Aug 2022 09:02:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1661763759; bh=0uM2LvbVnQyiYbs9u0/L77mjcG8pOtqLBMbT1Msxdfw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ZJovKZXLtObsH8NebOCUtO842iShsLPOEK1scNWd/DyJAKWIknqZ17uSUVSc7GGAB VB2bqhnYHh/1PNaXMtgYxynEWELA4mv7Zn5Im4Xs8IayRHV+McIHw1zJ20jsqlVuxg MghcSurpjFy5hjT+FI8vJad8Gp0TvQnMMjxhw6B0w+dhWgW2UjJbQCbDDvurDr7FcV fe0hrX+xgGiDsDcJtd1YtiTImZ5VL2TDu/wVwne2n2mGgttubZqL46LVLob58jGv33 949ZpUJsgKCeYfDBAx8AhgiLnFlGlM0y0CBpVR+PjMU6x4piTJlN+Z2VfNEhqa41Nl ShJg8LUrz1V9g== From: Leon Romanovsky To: Jason Gunthorpe Cc: Chris Mi , linux-rdma@vger.kernel.org, Maher Sanalla , Maor Gottlieb , netdev@vger.kernel.org, Saeed Mahameed , Mark Bloch Subject: [PATCH rdma-rc 2/3] RDMA/mlx5: Set local port to one when accessing counters Date: Mon, 29 Aug 2022 12:02:28 +0300 Message-Id: <6c5086c295c76211169e58dbd610fb0402360bab.1661763459.git.leonro@nvidia.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Chris Mi When accessing Ports Performance Counters Register (PPCNT), local port must be one if it is Function-Per-Port HCA that HCA_CAP.num_ports is 1. The offending patch can change the local port to other values when accessing PPCNT after enabling switchdev mode. The following syndrome will be printed: # cat /sys/class/infiniband/rdmap4s0f0/ports/2/counters/* # dmesg mlx5_core 0000:04:00.0: mlx5_cmd_check:756:(pid 12450): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x1e5585) Fix it by setting local port to one for Function-Per-Port HCA. Fixes: 210b1f78076f ("IB/mlx5: When not in dual port RoCE mode, use provided port as native") Reviewed-by: Mark Bloch Signed-off-by: Chris Mi Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/mad.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/mad.c b/drivers/infiniband/hw/mlx5/mad.c index 293ed709e5ed..b4dc52392275 100644 --- a/drivers/infiniband/hw/mlx5/mad.c +++ b/drivers/infiniband/hw/mlx5/mad.c @@ -166,6 +166,12 @@ static int process_pma_cmd(struct mlx5_ib_dev *dev, u32 port_num, mdev = dev->mdev; mdev_port_num = 1; } + if (MLX5_CAP_GEN(dev->mdev, num_ports) == 1) { + /* set local port to one for Function-Per-Port HCA. */ + mdev = dev->mdev; + mdev_port_num = 1; + } + /* Declaring support of extended counters */ if (in_mad->mad_hdr.attr_id == IB_PMA_CLASS_PORT_INFO) { struct ib_class_port_info cpi = {}; From patchwork Mon Aug 29 09:02:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 12957568 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05043ECAAD2 for ; Mon, 29 Aug 2022 09:02:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229622AbiH2JCy (ORCPT ); Mon, 29 Aug 2022 05:02:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229851AbiH2JCr (ORCPT ); Mon, 29 Aug 2022 05:02:47 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98D475A2E5; Mon, 29 Aug 2022 02:02:45 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 53C49B80DB2; Mon, 29 Aug 2022 09:02:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 79F31C433D6; Mon, 29 Aug 2022 09:02:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1661763763; bh=SISLwQwAfFl63GeAIKOLuaRtfad0UF1rvZTb7/MS7z8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=r9J2HMTGxzf6emQhsn/ttaJaqYYiZaO+J229UZzUbrrOhYt/N3KCFkOPsWIP3FhDH +4PHdH+naDKVraLm0QmpXRPKnzSOGo9zdl91kutN77VJin49yPYN6PgoQ5ELt/5oxy 1wYGLX+CFT2u4Fy/tjYNCaL60V/b2rauk9zs+HrsOk6inclScmj/8ome1TFv0vg7Re Pic4alRsVwRLVE5w4xIfLjN3lLx7Da6UFJlE02GVUiSlA2DuLNRivlhUbohmTorwHQ Dz00X/PFqv4Sahxs8LwzBsOq2yk7gQAEowNn/g0bxIZ9Mw4Qy31JrB69unxmRXFFNv J0CqTyUJlAhBA== From: Leon Romanovsky To: Jason Gunthorpe Cc: Maor Gottlieb , Chris Mi , linux-rdma@vger.kernel.org, Maher Sanalla , netdev@vger.kernel.org, Saeed Mahameed , Michael Guralnik Subject: [PATCH rdma-rc 3/3] RDMA/mlx5: Fix UMR cleanup on error flow of driver init Date: Mon, 29 Aug 2022 12:02:29 +0300 Message-Id: <4cfa61386cf202e9ce330e8d228ce3b25a36326e.1661763459.git.leonro@nvidia.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Maor Gottlieb The cited commit removed from the cleanup flow of umr the checks if the resources were created. This could lead to null-ptr-deref in case that we had failure in mlx5_ib_stage_ib_reg_init stage. Fix it by adding new state to the umr that can say if the resources were created or not and check it in the umr cleanup flow before destroying the resources. Fixes: 04876c12c19e ("RDMA/mlx5: Move init and cleanup of UMR to umr.c") Reviewed-by: Michael Guralnik Signed-off-by: Maor Gottlieb Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 + drivers/infiniband/hw/mlx5/umr.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 2e2ad3918385..e66bf72f1f04 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -708,6 +708,7 @@ struct mlx5_ib_umr_context { }; enum { + MLX5_UMR_STATE_UNINIT, MLX5_UMR_STATE_ACTIVE, MLX5_UMR_STATE_RECOVER, MLX5_UMR_STATE_ERR, diff --git a/drivers/infiniband/hw/mlx5/umr.c b/drivers/infiniband/hw/mlx5/umr.c index e00b94d1b1ea..d5105b5c9979 100644 --- a/drivers/infiniband/hw/mlx5/umr.c +++ b/drivers/infiniband/hw/mlx5/umr.c @@ -177,6 +177,7 @@ int mlx5r_umr_resource_init(struct mlx5_ib_dev *dev) sema_init(&dev->umrc.sem, MAX_UMR_WR); mutex_init(&dev->umrc.lock); + dev->umrc.state = MLX5_UMR_STATE_ACTIVE; return 0; @@ -191,6 +192,8 @@ int mlx5r_umr_resource_init(struct mlx5_ib_dev *dev) void mlx5r_umr_resource_cleanup(struct mlx5_ib_dev *dev) { + if (dev->umrc.state == MLX5_UMR_STATE_UNINIT) + return; ib_destroy_qp(dev->umrc.qp); ib_free_cq(dev->umrc.cq); ib_dealloc_pd(dev->umrc.pd);