From patchwork Tue Jul 11 17:52:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 13309286 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9392EB64DC for ; Tue, 11 Jul 2023 17:53:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230258AbjGKRxO (ORCPT ); Tue, 11 Jul 2023 13:53:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230236AbjGKRxO (ORCPT ); Tue, 11 Jul 2023 13:53:14 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B0E0E6F for ; Tue, 11 Jul 2023 10:53:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689097990; x=1720633990; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=F2/NC+YiqsViZSHTmTllYqlZHqHRHyaOZjQOiEwU2Ks=; b=D3pFp1dUT6qItiuJxQn2OQ8mqSTAxgafPYPklZ8sdymYybfOiMfcoqwv JCarqjdBLfrt+r0xQxNy9OG8N69A0VCVrHbGmOXI8y5UqCeuFQtRhkhiX YgEeD4cgzpqlVcg6YXnEF1FvnQ2KevMwGI2f9eMNKBEZHRuGUr2+CCS7p Y+rWRSC4cmQomMqqDwvF8NH/KNVcYh5hC5EJZvfUsGL9GfeA6LVJfl6OF +5LwN1RfEUsRDPv1IFTX2CkTALBu1zoSelvoURUclp/WFBzp/BoJXgU83 oUz+NumAEesWCgNGcwn5Dpp8n1TroxMBXWse3aNFfXgVeZH7AtWsgZwAY A==; X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="395482607" X-IronPort-AV: E=Sophos;i="6.01,197,1684825200"; d="scan'208";a="395482607" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jul 2023 10:53:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="724542495" X-IronPort-AV: E=Sophos;i="6.01,197,1684825200"; d="scan'208";a="724542495" Received: from ssaleem-mobl1.amr.corp.intel.com ([10.92.33.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jul 2023 10:53:06 -0700 From: Shiraz Saleem To: jgg@nvidia.com, leon@kernel.org, linux-rdma@vger.kernel.org Cc: Shiraz Saleem Subject: [PATCH for-rc 1/3] RDMA/irdma: Add missing read barriers Date: Tue, 11 Jul 2023 12:52:51 -0500 Message-Id: <20230711175253.1289-2-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230711175253.1289-1-shiraz.saleem@intel.com> References: <20230711175253.1289-1-shiraz.saleem@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On code inspection, there are many instances in the driver where CEQE and AEQE fields written to by HW are read without guaranteeing that the polarity bit has been read and checked first. Add a read barrier to avoid reordering of loads on the CEQE/AEQE fields prior to checking the polarity bit. Fixes: 3f49d684256 ("RDMA/irdma: Implement HW Admin Queue OPs") Signed-off-by: Shiraz Saleem --- drivers/infiniband/hw/irdma/ctrl.c | 9 ++++++++- drivers/infiniband/hw/irdma/puda.c | 6 ++++++ drivers/infiniband/hw/irdma/uk.c | 3 +++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/irdma/ctrl.c b/drivers/infiniband/hw/irdma/ctrl.c index d88c918..c91439f 100644 --- a/drivers/infiniband/hw/irdma/ctrl.c +++ b/drivers/infiniband/hw/irdma/ctrl.c @@ -3363,6 +3363,9 @@ int irdma_sc_ccq_get_cqe_info(struct irdma_sc_cq *ccq, if (polarity != ccq->cq_uk.polarity) return -ENOENT; + /* Ensure CEQE contents are read after valid bit is checked */ + dma_rmb(); + get_64bit_val(cqe, 8, &qp_ctx); cqp = (struct irdma_sc_cqp *)(unsigned long)qp_ctx; info->error = (bool)FIELD_GET(IRDMA_CQ_ERROR, temp); @@ -4009,13 +4012,17 @@ int irdma_sc_get_next_aeqe(struct irdma_sc_aeq *aeq, u8 polarity; aeqe = IRDMA_GET_CURRENT_AEQ_ELEM(aeq); - get_64bit_val(aeqe, 0, &compl_ctx); get_64bit_val(aeqe, 8, &temp); polarity = (u8)FIELD_GET(IRDMA_AEQE_VALID, temp); if (aeq->polarity != polarity) return -ENOENT; + /* Ensure AEQE contents are read after valid bit is checked */ + dma_rmb(); + + get_64bit_val(aeqe, 0, &compl_ctx); + print_hex_dump_debug("WQE: AEQ_ENTRY WQE", DUMP_PREFIX_OFFSET, 16, 8, aeqe, 16, false); diff --git a/drivers/infiniband/hw/irdma/puda.c b/drivers/infiniband/hw/irdma/puda.c index 4ec9639..5625317 100644 --- a/drivers/infiniband/hw/irdma/puda.c +++ b/drivers/infiniband/hw/irdma/puda.c @@ -230,6 +230,9 @@ static int irdma_puda_poll_info(struct irdma_sc_cq *cq, if (valid_bit != cq_uk->polarity) return -ENOENT; + /* Ensure CQE contents are read after valid bit is checked */ + dma_rmb(); + if (cq->dev->hw_attrs.uk_attrs.hw_rev >= IRDMA_GEN_2) ext_valid = (bool)FIELD_GET(IRDMA_CQ_EXTCQE, qword3); @@ -243,6 +246,9 @@ static int irdma_puda_poll_info(struct irdma_sc_cq *cq, if (polarity != cq_uk->polarity) return -ENOENT; + /* Ensure ext CQE contents are read after ext valid bit is checked */ + dma_rmb(); + IRDMA_RING_MOVE_HEAD_NOCHECK(cq_uk->cq_ring); if (!IRDMA_RING_CURRENT_HEAD(cq_uk->cq_ring)) cq_uk->polarity = !cq_uk->polarity; diff --git a/drivers/infiniband/hw/irdma/uk.c b/drivers/infiniband/hw/irdma/uk.c index dd428d9..ea2c077 100644 --- a/drivers/infiniband/hw/irdma/uk.c +++ b/drivers/infiniband/hw/irdma/uk.c @@ -1527,6 +1527,9 @@ void irdma_uk_clean_cq(void *q, struct irdma_cq_uk *cq) if (polarity != temp) break; + /* Ensure CQE contents are read after valid bit is checked */ + dma_rmb(); + get_64bit_val(cqe, 8, &comp_ctx); if ((void *)(unsigned long)comp_ctx == q) set_64bit_val(cqe, 8, 0); From patchwork Tue Jul 11 17:52:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 13309288 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 379AAEB64DD for ; Tue, 11 Jul 2023 17:53:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230236AbjGKRxR (ORCPT ); Tue, 11 Jul 2023 13:53:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50144 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231680AbjGKRxQ (ORCPT ); Tue, 11 Jul 2023 13:53:16 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8185EE6F for ; Tue, 11 Jul 2023 10:53:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689097995; x=1720633995; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fwvrV7ClSP8Dm1EuJES8KTdYFZ0HFJmM4MHfnwhgvLQ=; b=AcWwfMMp0JZVe6vXcI5Ss0e366TGwzKMD/vKdGwc0x2hC5VzNg26bWdO 2zZuA+SjblI0sTyZBN0IezZWQgBNevYpy09cLyTmz4MeDRXx0YuDuTMB5 g4bobSlIcQMlP3yqytBP9aE6+ympyzkjJ9mbhS37dR7kDeke8z1bjFfAH ixpDngSRAYi6v103o+zWVITnuT6wzbbuN//BfpLWL5dn+gfiyYhoKk/15 ZC3Lu1pxn+d8xv7tEakrcFFSHsR5pI7teHkFtyMo3sLzqluIq+qJ8Nffd K3YaKr8uRig+FqnHCk0nwCLPeA2mo0Ac+bqZKE8yN+kTmrFb5Kzc+vDKM w==; X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="395482611" X-IronPort-AV: E=Sophos;i="6.01,197,1684825200"; d="scan'208";a="395482611" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jul 2023 10:53:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="724542502" X-IronPort-AV: E=Sophos;i="6.01,197,1684825200"; d="scan'208";a="724542502" Received: from ssaleem-mobl1.amr.corp.intel.com ([10.92.33.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jul 2023 10:53:07 -0700 From: Shiraz Saleem To: jgg@nvidia.com, leon@kernel.org, linux-rdma@vger.kernel.org Cc: Shiraz Saleem Subject: [PATCH for-rc 2/3] RDMA/irdma: Fix data race on CQP completion stats Date: Tue, 11 Jul 2023 12:52:52 -0500 Message-Id: <20230711175253.1289-3-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230711175253.1289-1-shiraz.saleem@intel.com> References: <20230711175253.1289-1-shiraz.saleem@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org CQP completion statistics is read lockesly in irdma_wait_event and irdma_check_cqp_progress while it can be updated in the completion thread irdma_sc_ccq_get_cqe_info on another CPU as KCSAN reports. Make completion statistics an atomic variable to reflect coherent updates to it. This will also avoid load/store tearing logic bug potentially possible by compiler optimizations. [77346.170861] BUG: KCSAN: data-race in irdma_handle_cqp_op [irdma] / irdma_sc_ccq_get_cqe_info [irdma] [77346.171383] write to 0xffff8a3250b108e0 of 8 bytes by task 9544 on cpu 4: [77346.171483] irdma_sc_ccq_get_cqe_info+0x27a/0x370 [irdma] [77346.171658] irdma_cqp_ce_handler+0x164/0x270 [irdma] [77346.171835] cqp_compl_worker+0x1b/0x20 [irdma] [77346.172009] process_one_work+0x4d1/0xa40 [77346.172024] worker_thread+0x319/0x700 [77346.172037] kthread+0x180/0x1b0 [77346.172054] ret_from_fork+0x22/0x30 [77346.172136] read to 0xffff8a3250b108e0 of 8 bytes by task 9838 on cpu 2: [77346.172234] irdma_handle_cqp_op+0xf4/0x4b0 [irdma] [77346.172413] irdma_cqp_aeq_cmd+0x75/0xa0 [irdma] [77346.172592] irdma_create_aeq+0x390/0x45a [irdma] [77346.172769] irdma_rt_init_hw.cold+0x212/0x85d [irdma] [77346.172944] irdma_probe+0x54f/0x620 [irdma] [77346.173122] auxiliary_bus_probe+0x66/0xa0 [77346.173137] really_probe+0x140/0x540 [77346.173154] __driver_probe_device+0xc7/0x220 [77346.173173] driver_probe_device+0x5f/0x140 [77346.173190] __driver_attach+0xf0/0x2c0 [77346.173208] bus_for_each_dev+0xa8/0xf0 [77346.173225] driver_attach+0x29/0x30 [77346.173240] bus_add_driver+0x29c/0x2f0 [77346.173255] driver_register+0x10f/0x1a0 [77346.173272] __auxiliary_driver_register+0xbc/0x140 [77346.173287] irdma_init_module+0x55/0x1000 [irdma] [77346.173460] do_one_initcall+0x7d/0x410 [77346.173475] do_init_module+0x81/0x2c0 [77346.173491] load_module+0x1232/0x12c0 [77346.173506] __do_sys_finit_module+0x101/0x180 [77346.173522] __x64_sys_finit_module+0x3c/0x50 [77346.173538] do_syscall_64+0x39/0x90 [77346.173553] entry_SYSCALL_64_after_hwframe+0x63/0xcd [77346.173634] value changed: 0x0000000000000094 -> 0x0000000000000095 Fixes: 915cc7ac0f8 ("RDMA/irdma: Add miscellaneous utility definitions") Signed-off-by: Shiraz Saleem --- drivers/infiniband/hw/irdma/ctrl.c | 22 +++++++++--------- drivers/infiniband/hw/irdma/defs.h | 46 ++++++++++++++++++------------------- drivers/infiniband/hw/irdma/type.h | 2 ++ drivers/infiniband/hw/irdma/utils.c | 2 +- 4 files changed, 36 insertions(+), 36 deletions(-) diff --git a/drivers/infiniband/hw/irdma/ctrl.c b/drivers/infiniband/hw/irdma/ctrl.c index c91439f..45e3344 100644 --- a/drivers/infiniband/hw/irdma/ctrl.c +++ b/drivers/infiniband/hw/irdma/ctrl.c @@ -2712,13 +2712,13 @@ static int irdma_sc_cq_modify(struct irdma_sc_cq *cq, */ void irdma_check_cqp_progress(struct irdma_cqp_timeout *timeout, struct irdma_sc_dev *dev) { - if (timeout->compl_cqp_cmds != dev->cqp_cmd_stats[IRDMA_OP_CMPL_CMDS]) { - timeout->compl_cqp_cmds = dev->cqp_cmd_stats[IRDMA_OP_CMPL_CMDS]; + u64 completed_ops = atomic64_read(&dev->cqp->completed_ops); + + if (timeout->compl_cqp_cmds != completed_ops) { + timeout->compl_cqp_cmds = completed_ops; timeout->count = 0; - } else { - if (dev->cqp_cmd_stats[IRDMA_OP_REQ_CMDS] != - timeout->compl_cqp_cmds) - timeout->count++; + } else if (timeout->compl_cqp_cmds != dev->cqp->requested_ops) { + timeout->count++; } } @@ -2761,7 +2761,7 @@ static int irdma_cqp_poll_registers(struct irdma_sc_cqp *cqp, u32 tail, if (newtail != tail) { /* SUCCESS */ IRDMA_RING_MOVE_TAIL(cqp->sq_ring); - cqp->dev->cqp_cmd_stats[IRDMA_OP_CMPL_CMDS]++; + atomic64_inc(&cqp->completed_ops); return 0; } udelay(cqp->dev->hw_attrs.max_sleep_count); @@ -3121,8 +3121,8 @@ int irdma_sc_cqp_init(struct irdma_sc_cqp *cqp, info->dev->cqp = cqp; IRDMA_RING_INIT(cqp->sq_ring, cqp->sq_size); - cqp->dev->cqp_cmd_stats[IRDMA_OP_REQ_CMDS] = 0; - cqp->dev->cqp_cmd_stats[IRDMA_OP_CMPL_CMDS] = 0; + cqp->requested_ops = 0; + atomic64_set(&cqp->completed_ops, 0); /* for the cqp commands backlog. */ INIT_LIST_HEAD(&cqp->dev->cqp_cmd_head); @@ -3274,7 +3274,7 @@ __le64 *irdma_sc_cqp_get_next_send_wqe_idx(struct irdma_sc_cqp *cqp, u64 scratch if (ret_code) return NULL; - cqp->dev->cqp_cmd_stats[IRDMA_OP_REQ_CMDS]++; + cqp->requested_ops++; if (!*wqe_idx) cqp->polarity = !cqp->polarity; wqe = cqp->sq_base[*wqe_idx].elem; @@ -3400,7 +3400,7 @@ int irdma_sc_ccq_get_cqe_info(struct irdma_sc_cq *ccq, dma_wmb(); /* make sure shadow area is updated before moving tail */ IRDMA_RING_MOVE_TAIL(cqp->sq_ring); - ccq->dev->cqp_cmd_stats[IRDMA_OP_CMPL_CMDS]++; + atomic64_inc(&cqp->completed_ops); return ret_code; } diff --git a/drivers/infiniband/hw/irdma/defs.h b/drivers/infiniband/hw/irdma/defs.h index 6014b9d..d06e45d 100644 --- a/drivers/infiniband/hw/irdma/defs.h +++ b/drivers/infiniband/hw/irdma/defs.h @@ -191,32 +191,30 @@ enum irdma_cqp_op_type { IRDMA_OP_MANAGE_VF_PBLE_BP = 25, IRDMA_OP_QUERY_FPM_VAL = 26, IRDMA_OP_COMMIT_FPM_VAL = 27, - IRDMA_OP_REQ_CMDS = 28, - IRDMA_OP_CMPL_CMDS = 29, - IRDMA_OP_AH_CREATE = 30, - IRDMA_OP_AH_MODIFY = 31, - IRDMA_OP_AH_DESTROY = 32, - IRDMA_OP_MC_CREATE = 33, - IRDMA_OP_MC_DESTROY = 34, - IRDMA_OP_MC_MODIFY = 35, - IRDMA_OP_STATS_ALLOCATE = 36, - IRDMA_OP_STATS_FREE = 37, - IRDMA_OP_STATS_GATHER = 38, - IRDMA_OP_WS_ADD_NODE = 39, - IRDMA_OP_WS_MODIFY_NODE = 40, - IRDMA_OP_WS_DELETE_NODE = 41, - IRDMA_OP_WS_FAILOVER_START = 42, - IRDMA_OP_WS_FAILOVER_COMPLETE = 43, - IRDMA_OP_SET_UP_MAP = 44, - IRDMA_OP_GEN_AE = 45, - IRDMA_OP_QUERY_RDMA_FEATURES = 46, - IRDMA_OP_ALLOC_LOCAL_MAC_ENTRY = 47, - IRDMA_OP_ADD_LOCAL_MAC_ENTRY = 48, - IRDMA_OP_DELETE_LOCAL_MAC_ENTRY = 49, - IRDMA_OP_CQ_MODIFY = 50, + IRDMA_OP_AH_CREATE = 28, + IRDMA_OP_AH_MODIFY = 29, + IRDMA_OP_AH_DESTROY = 30, + IRDMA_OP_MC_CREATE = 31, + IRDMA_OP_MC_DESTROY = 32, + IRDMA_OP_MC_MODIFY = 33, + IRDMA_OP_STATS_ALLOCATE = 34, + IRDMA_OP_STATS_FREE = 35, + IRDMA_OP_STATS_GATHER = 36, + IRDMA_OP_WS_ADD_NODE = 37, + IRDMA_OP_WS_MODIFY_NODE = 38, + IRDMA_OP_WS_DELETE_NODE = 39, + IRDMA_OP_WS_FAILOVER_START = 40, + IRDMA_OP_WS_FAILOVER_COMPLETE = 41, + IRDMA_OP_SET_UP_MAP = 42, + IRDMA_OP_GEN_AE = 43, + IRDMA_OP_QUERY_RDMA_FEATURES = 44, + IRDMA_OP_ALLOC_LOCAL_MAC_ENTRY = 45, + IRDMA_OP_ADD_LOCAL_MAC_ENTRY = 46, + IRDMA_OP_DELETE_LOCAL_MAC_ENTRY = 47, + IRDMA_OP_CQ_MODIFY = 48, /* Must be last entry*/ - IRDMA_MAX_CQP_OPS = 51, + IRDMA_MAX_CQP_OPS = 49, }; /* CQP SQ WQES */ diff --git a/drivers/infiniband/hw/irdma/type.h b/drivers/infiniband/hw/irdma/type.h index 5ee6860..a207095 100644 --- a/drivers/infiniband/hw/irdma/type.h +++ b/drivers/infiniband/hw/irdma/type.h @@ -365,6 +365,8 @@ struct irdma_sc_cqp { struct irdma_dcqcn_cc_params dcqcn_params; __le64 *host_ctx; u64 *scratch_array; + u64 requested_ops; + atomic64_t completed_ops; u32 cqp_id; u32 sq_size; u32 hw_sq_size; diff --git a/drivers/infiniband/hw/irdma/utils.c b/drivers/infiniband/hw/irdma/utils.c index 71e1c5d..775a799 100644 --- a/drivers/infiniband/hw/irdma/utils.c +++ b/drivers/infiniband/hw/irdma/utils.c @@ -567,7 +567,7 @@ static int irdma_wait_event(struct irdma_pci_f *rf, bool cqp_error = false; int err_code = 0; - cqp_timeout.compl_cqp_cmds = rf->sc_dev.cqp_cmd_stats[IRDMA_OP_CMPL_CMDS]; + cqp_timeout.compl_cqp_cmds = atomic64_read(&rf->sc_dev.cqp->completed_ops); do { irdma_cqp_ce_handler(rf, &rf->ccq.sc_cq); if (wait_event_timeout(cqp_request->waitq, From patchwork Tue Jul 11 17:52:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 13309289 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80E56C001DE for ; Tue, 11 Jul 2023 17:53:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231742AbjGKRxS (ORCPT ); Tue, 11 Jul 2023 13:53:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231680AbjGKRxR (ORCPT ); Tue, 11 Jul 2023 13:53:17 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2714E55 for ; Tue, 11 Jul 2023 10:53:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689097996; x=1720633996; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UCMTpmw21y3uzOlGpdf3BM0hksxK2jGdaP/9qtBBAE0=; b=eCoM/6YBin7A1S8nNyz/IwMBnNHcX97SPhQ8t0xVZvy3iMjsV9d2m45e B8IS9YX6gQ+SwK2uFQTrPWkulPkSCNztBMTwxTQersTTq1N1NB6W4ZPet YUPYg8shwb7LP4w9BlPLpHJ7M8liiDT4/2vJ3sYVJyGPYi3+Iz3pAC1Cr A9A77rtu7t8ygN5lo14QGSOu5Mni+gQ82XxBIAH1so5rfgsjqRK80wy+G HGcjSRAgrZdsG4jSb1OxD9+MeZIiejxzhWlz7tT4XqABuobd9UhbmEeGq DxY2w1ggq6EjhM+QmigcrBF9nMi1r+M+CTY7sX51RitslDcRG2+c8E8c9 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="395482614" X-IronPort-AV: E=Sophos;i="6.01,197,1684825200"; d="scan'208";a="395482614" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jul 2023 10:53:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="724542505" X-IronPort-AV: E=Sophos;i="6.01,197,1684825200"; d="scan'208";a="724542505" Received: from ssaleem-mobl1.amr.corp.intel.com ([10.92.33.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jul 2023 10:53:08 -0700 From: Shiraz Saleem To: jgg@nvidia.com, leon@kernel.org, linux-rdma@vger.kernel.org Cc: Shiraz Saleem Subject: [PATCH for-rc 3/3] RDMA/irdma: Fix data race on CQP request done Date: Tue, 11 Jul 2023 12:52:53 -0500 Message-Id: <20230711175253.1289-4-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230711175253.1289-1-shiraz.saleem@intel.com> References: <20230711175253.1289-1-shiraz.saleem@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org KCSAN detects a data race on cqp_request->request_done memory location which is accessed locklessly in irdma_handle_cqp_op while being updated in irdma_cqp_ce_handler. Annotate lockless intent with READ_ONCE/WRITE_ONCE to avoid any compiler optimizations like load fusing and/or KCSAN warning. [222808.417128] BUG: KCSAN: data-race in irdma_cqp_ce_handler [irdma] / irdma_wait_event [irdma] [222808.417532] write to 0xffff8e44107019dc of 1 bytes by task 29658 on cpu 5: [222808.417610] irdma_cqp_ce_handler+0x21e/0x270 [irdma] [222808.417725] cqp_compl_worker+0x1b/0x20 [irdma] [222808.417827] process_one_work+0x4d1/0xa40 [222808.417835] worker_thread+0x319/0x700 [222808.417842] kthread+0x180/0x1b0 [222808.417852] ret_from_fork+0x22/0x30 [222808.417918] read to 0xffff8e44107019dc of 1 bytes by task 29688 on cpu 1: [222808.417995] irdma_wait_event+0x1e2/0x2c0 [irdma] [222808.418099] irdma_handle_cqp_op+0xae/0x170 [irdma] [222808.418202] irdma_cqp_cq_destroy_cmd+0x70/0x90 [irdma] [222808.418308] irdma_puda_dele_rsrc+0x46d/0x4d0 [irdma] [222808.418411] irdma_rt_deinit_hw+0x179/0x1d0 [irdma] [222808.418514] irdma_ib_dealloc_device+0x11/0x40 [irdma] [222808.418618] ib_dealloc_device+0x2a/0x120 [ib_core] [222808.418823] __ib_unregister_device+0xde/0x100 [ib_core] [222808.418981] ib_unregister_device+0x22/0x40 [ib_core] [222808.419142] irdma_ib_unregister_device+0x70/0x90 [irdma] [222808.419248] i40iw_close+0x6f/0xc0 [irdma] [222808.419352] i40e_client_device_unregister+0x14a/0x180 [i40e] [222808.419450] i40iw_remove+0x21/0x30 [irdma] [222808.419554] auxiliary_bus_remove+0x31/0x50 [222808.419563] device_remove+0x69/0xb0 [222808.419572] device_release_driver_internal+0x293/0x360 [222808.419582] driver_detach+0x7c/0xf0 [222808.419592] bus_remove_driver+0x8c/0x150 [222808.419600] driver_unregister+0x45/0x70 [222808.419610] auxiliary_driver_unregister+0x16/0x30 [222808.419618] irdma_exit_module+0x18/0x1e [irdma] [222808.419733] __do_sys_delete_module.constprop.0+0x1e2/0x310 [222808.419745] __x64_sys_delete_module+0x1b/0x30 [222808.419755] do_syscall_64+0x39/0x90 [222808.419763] entry_SYSCALL_64_after_hwframe+0x63/0xcd [222808.419829] value changed: 0x01 -> 0x03 Fixes: 915cc7ac0f8 ("RDMA/irdma: Add miscellaneous utility definitions") Signed-off-by: Shiraz Saleem --- drivers/infiniband/hw/irdma/hw.c | 2 +- drivers/infiniband/hw/irdma/main.h | 2 +- drivers/infiniband/hw/irdma/utils.c | 6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/irdma/hw.c b/drivers/infiniband/hw/irdma/hw.c index 795f7fd..1cfc03d 100644 --- a/drivers/infiniband/hw/irdma/hw.c +++ b/drivers/infiniband/hw/irdma/hw.c @@ -2075,7 +2075,7 @@ void irdma_cqp_ce_handler(struct irdma_pci_f *rf, struct irdma_sc_cq *cq) cqp_request->compl_info.error = info.error; if (cqp_request->waiting) { - cqp_request->request_done = true; + WRITE_ONCE(cqp_request->request_done, true); wake_up(&cqp_request->waitq); irdma_put_cqp_request(&rf->cqp, cqp_request); } else { diff --git a/drivers/infiniband/hw/irdma/main.h b/drivers/infiniband/hw/irdma/main.h index def6dd5..2323962 100644 --- a/drivers/infiniband/hw/irdma/main.h +++ b/drivers/infiniband/hw/irdma/main.h @@ -161,8 +161,8 @@ struct irdma_cqp_request { void (*callback_fcn)(struct irdma_cqp_request *cqp_request); void *param; struct irdma_cqp_compl_info compl_info; + bool request_done; /* READ/WRITE_ONCE macros operate on it */ bool waiting:1; - bool request_done:1; bool dynamic:1; }; diff --git a/drivers/infiniband/hw/irdma/utils.c b/drivers/infiniband/hw/irdma/utils.c index 775a799..eb083f7 100644 --- a/drivers/infiniband/hw/irdma/utils.c +++ b/drivers/infiniband/hw/irdma/utils.c @@ -481,7 +481,7 @@ void irdma_free_cqp_request(struct irdma_cqp *cqp, if (cqp_request->dynamic) { kfree(cqp_request); } else { - cqp_request->request_done = false; + WRITE_ONCE(cqp_request->request_done, false); cqp_request->callback_fcn = NULL; cqp_request->waiting = false; @@ -515,7 +515,7 @@ void irdma_put_cqp_request(struct irdma_cqp *cqp, { if (cqp_request->waiting) { cqp_request->compl_info.error = true; - cqp_request->request_done = true; + WRITE_ONCE(cqp_request->request_done, true); wake_up(&cqp_request->waitq); } wait_event_timeout(cqp->remove_wq, @@ -571,7 +571,7 @@ static int irdma_wait_event(struct irdma_pci_f *rf, do { irdma_cqp_ce_handler(rf, &rf->ccq.sc_cq); if (wait_event_timeout(cqp_request->waitq, - cqp_request->request_done, + READ_ONCE(cqp_request->request_done), msecs_to_jiffies(CQP_COMPL_WAIT_TIME_MS))) break;