From patchwork Tue Dec 3 13:52:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13892515 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A72D1F428D for ; Tue, 3 Dec 2024 13:52:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733233956; cv=none; b=U+jqrv+V+cKje4SoUpCoIrnCfzgIui65JtI6roibAIS1e6rESDihL3J7BnZs2UGlrPR15nLyB2RAfd6XU4CKnbfTYlS3oFmbfwu88O3fU1zyXqASZ74DBKkuQ0vW9/AysAWH71VKVJ9ly626x+0mhp9bHnBnUdlHxxvu/+p8y/o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733233956; c=relaxed/simple; bh=qdu+lJ2t9U1QfT25byKfVqwO3HADgk7NtGQMnVfZLF4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QR9LM0F35X365XZTBJ2kU1vAQtKQyGKjPNbapavJHdPCHYApB4MBF2JwQjaEL9I9FjTdl/2oNc6hv8IrZK0kff5gEpp4mFdC5S5nnzGj5UuCDiBF0ujwYLQ/mcKilivARePQx9AIilpDNsOaOTE8QEL7r5+Xf6pfW7p2bKnuIHY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=J2DxtKh5; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="J2DxtKh5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AA06CC4CED6; Tue, 3 Dec 2024 13:52:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733233956; bh=qdu+lJ2t9U1QfT25byKfVqwO3HADgk7NtGQMnVfZLF4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=J2DxtKh5UwfSro0dsI8/tAUphkSjTmJlY/ZyKUibffzDydMmfm9ii2thBYRsBxwhI /RiSuoUZXDOjazYyfxGlmZP+WohuoblBhrIQ53IJwJdMf27UN7kwz42jONP3m0jGdx 6nKFp3LF0kaGmxdfSuxB6QHncZ/eOpkX+b0hFdPpNGROP4iZ8/8pRDGlX9OTSpZZ2f rOLf7pt+BYp+ShG/c971JDrLG/PcLM0rsSfP5NaapCy4iMY12HgyiMEkYA+xzebnDj 3ohT8Rex4YPfypJ6WS3TzMZ3DDFbpmoKl6ZftBhV78c1LDdUUYqsX0bRCCQjBAaC93 q8DfheL4HqOuA== From: Leon Romanovsky To: Jason Gunthorpe Cc: Or Har-Toov , linux-rdma@vger.kernel.org, Maher Sanalla Subject: [PATCH rdma-next 1/3] IB/mad: Replace MAD's refcount with a state machine Date: Tue, 3 Dec 2024 15:52:21 +0200 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Or Har-Toov Replace the refcount mechanism with a 'state' field to track the status of MADs send work requests (WRs). The state machine better represents the stages in the MAD lifecycle, specifically indicating whether the MAD is waiting for a response or a completion. The existing refcount only takes two values: * 1 - MAD is waiting either for completion or for response. * 2 - MAD is waiting for both response and completion. Also when a response was received before a completion notification. The current state transitions are not clearly visible, and developers needs to infer the state from the refcount's value, which is error-prone and difficult to follow. Thus, replace with a state machine as the following: * IB_MAD_STATE_SEND_START - MAD was sent to the QP and is waiting for completion notification * IB_MAD_STATE_WAIT_RESP - MAD send completed successfully, waiting for a response * IB_MAD_STATE_EARLY_RESP - Response came early, before send completion notification * IB_MAD_STATE_DONE - MAD processing completed Signed-off-by: Or Har-Toov Reviewed-by: Maher Sanalla Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/mad.c | 44 ++++++++++++++---------------- drivers/infiniband/core/mad_priv.h | 10 ++++++- drivers/infiniband/core/mad_rmpp.c | 7 ++--- 3 files changed, 33 insertions(+), 28 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 1fd54d5c4dd8..9b101f91ca3e 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -1118,8 +1118,7 @@ int ib_post_send_mad(struct ib_mad_send_buf *send_buf, mad_send_wr->max_retries = send_buf->retries; mad_send_wr->retries_left = send_buf->retries; send_buf->retries = 0; - /* Reference for work request to QP + response */ - mad_send_wr->refcount = 1 + (mad_send_wr->timeout > 0); + mad_send_wr->state = IB_MAD_STATE_SEND_START; mad_send_wr->status = IB_WC_SUCCESS; /* Reference MAD agent until send completes */ @@ -1773,9 +1772,13 @@ ib_find_send_mad(const struct ib_mad_agent_private *mad_agent_priv, void ib_mark_mad_done(struct ib_mad_send_wr_private *mad_send_wr) { mad_send_wr->timeout = 0; - if (mad_send_wr->refcount == 1) + if (mad_send_wr->state == IB_MAD_STATE_WAIT_RESP) { + mad_send_wr->state = IB_MAD_STATE_DONE; list_move_tail(&mad_send_wr->agent_list, &mad_send_wr->mad_agent_priv->done_list); + } else { + mad_send_wr->state = IB_MAD_STATE_EARLY_RESP; + } } static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv, @@ -2195,6 +2198,7 @@ static void wait_for_response(struct ib_mad_send_wr_private *mad_send_wr) list_item = &mad_agent_priv->wait_list; } + mad_send_wr->state = IB_MAD_STATE_WAIT_RESP; list_add(&mad_send_wr->agent_list, list_item); /* Reschedule a work item if we have a shorter timeout */ @@ -2222,6 +2226,11 @@ void ib_mad_complete_send_wr(struct ib_mad_send_wr_private *mad_send_wr, mad_agent_priv = mad_send_wr->mad_agent_priv; spin_lock_irqsave(&mad_agent_priv->lock, flags); + if (mad_send_wr->state == IB_MAD_STATE_EARLY_RESP) { + mad_send_wr->state = IB_MAD_STATE_DONE; + goto done; + } + if (ib_mad_kernel_rmpp_agent(&mad_agent_priv->agent)) { ret = ib_process_rmpp_send_wc(mad_send_wr, mad_send_wc); if (ret == IB_RMPP_RESULT_CONSUMED) @@ -2232,14 +2241,10 @@ void ib_mad_complete_send_wr(struct ib_mad_send_wr_private *mad_send_wr, if (mad_send_wc->status != IB_WC_SUCCESS && mad_send_wr->status == IB_WC_SUCCESS) { mad_send_wr->status = mad_send_wc->status; - mad_send_wr->refcount -= (mad_send_wr->timeout > 0); - } - - if (--mad_send_wr->refcount > 0) { - if (mad_send_wr->refcount == 1 && mad_send_wr->timeout && - mad_send_wr->status == IB_WC_SUCCESS) { - wait_for_response(mad_send_wr); - } + } else if (mad_send_wr->status == IB_WC_SUCCESS && + mad_send_wr->timeout && + mad_send_wr->state == IB_MAD_STATE_SEND_START) { + wait_for_response(mad_send_wr); goto done; } @@ -2407,12 +2412,9 @@ static void cancel_mads(struct ib_mad_agent_private *mad_agent_priv) spin_lock_irqsave(&mad_agent_priv->lock, flags); list_for_each_entry_safe(mad_send_wr, temp_mad_send_wr, - &mad_agent_priv->send_list, agent_list) { - if (mad_send_wr->status == IB_WC_SUCCESS) { + &mad_agent_priv->send_list, agent_list) + if (mad_send_wr->status == IB_WC_SUCCESS) mad_send_wr->status = IB_WC_WR_FLUSH_ERR; - mad_send_wr->refcount -= (mad_send_wr->timeout > 0); - } - } /* Empty wait list to prevent receives from finding a request */ list_splice_init(&mad_agent_priv->wait_list, &cancel_list); @@ -2459,7 +2461,6 @@ int ib_modify_mad(struct ib_mad_send_buf *send_buf, u32 timeout_ms) struct ib_mad_agent_private *mad_agent_priv; struct ib_mad_send_wr_private *mad_send_wr; unsigned long flags; - int active; if (!send_buf) return -EINVAL; @@ -2473,14 +2474,11 @@ int ib_modify_mad(struct ib_mad_send_buf *send_buf, u32 timeout_ms) return -EINVAL; } - active = (!mad_send_wr->timeout || mad_send_wr->refcount > 1); - if (!timeout_ms) { + if (!timeout_ms) mad_send_wr->status = IB_WC_WR_FLUSH_ERR; - mad_send_wr->refcount -= (mad_send_wr->timeout > 0); - } mad_send_wr->send_buf.timeout_ms = timeout_ms; - if (active) + if (mad_send_wr->state == IB_MAD_STATE_SEND_START) mad_send_wr->timeout = msecs_to_jiffies(timeout_ms); else ib_reset_mad_timeout(mad_send_wr, timeout_ms); @@ -2607,7 +2605,7 @@ static int retry_send(struct ib_mad_send_wr_private *mad_send_wr) ret = ib_send_mad(mad_send_wr); if (!ret) { - mad_send_wr->refcount++; + mad_send_wr->state = IB_MAD_STATE_SEND_START; list_add_tail(&mad_send_wr->agent_list, &mad_send_wr->mad_agent_priv->send_list); } diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h index 1b7445a6f671..cc2de81ea6f6 100644 --- a/drivers/infiniband/core/mad_priv.h +++ b/drivers/infiniband/core/mad_priv.h @@ -118,6 +118,13 @@ struct ib_mad_snoop_private { struct completion comp; }; +enum ib_mad_state { + IB_MAD_STATE_SEND_START, + IB_MAD_STATE_WAIT_RESP, + IB_MAD_STATE_EARLY_RESP, + IB_MAD_STATE_DONE +}; + struct ib_mad_send_wr_private { struct ib_mad_list_head mad_list; struct list_head agent_list; @@ -132,7 +139,6 @@ struct ib_mad_send_wr_private { int max_retries; int retries_left; int retry; - int refcount; enum ib_wc_status status; /* RMPP control */ @@ -143,6 +149,8 @@ struct ib_mad_send_wr_private { int seg_num; int newwin; int pad; + + enum ib_mad_state state; }; struct ib_mad_local_private { diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c index 8af0619a39cd..dff264e9bb68 100644 --- a/drivers/infiniband/core/mad_rmpp.c +++ b/drivers/infiniband/core/mad_rmpp.c @@ -717,13 +717,13 @@ static void process_rmpp_ack(struct ib_mad_agent_private *agent, ib_mad_complete_send_wr(mad_send_wr, &wc); return; } - if (mad_send_wr->refcount == 1) + if (mad_send_wr->state == IB_MAD_STATE_WAIT_RESP) ib_reset_mad_timeout(mad_send_wr, mad_send_wr->send_buf.timeout_ms); spin_unlock_irqrestore(&agent->lock, flags); ack_ds_ack(agent, mad_recv_wc); return; - } else if (mad_send_wr->refcount == 1 && + } else if (mad_send_wr->state == IB_MAD_STATE_WAIT_RESP && mad_send_wr->seg_num < mad_send_wr->newwin && mad_send_wr->seg_num < mad_send_wr->send_buf.seg_count) { /* Send failure will just result in a timeout/retry */ @@ -731,7 +731,7 @@ static void process_rmpp_ack(struct ib_mad_agent_private *agent, if (ret) goto out; - mad_send_wr->refcount++; + mad_send_wr->state = IB_MAD_STATE_SEND_START; list_move_tail(&mad_send_wr->agent_list, &mad_send_wr->mad_agent_priv->send_list); } @@ -890,7 +890,6 @@ int ib_send_rmpp_mad(struct ib_mad_send_wr_private *mad_send_wr) mad_send_wr->newwin = init_newwin(mad_send_wr); /* We need to wait for the final ACK even if there isn't a response */ - mad_send_wr->refcount += (mad_send_wr->timeout == 0); ret = send_next_seg(mad_send_wr); if (!ret) return IB_RMPP_RESULT_CONSUMED; From patchwork Tue Dec 3 13:52:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13892517 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3324F1F130D for ; Tue, 3 Dec 2024 13:52:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733233965; cv=none; b=K0JQC5kTIoJQjrX3T3NdD4hhUSJZSQQ4CLBNse5SRW7CPoqVgK7Bqr4wjExZazgPVKO18S11McGmrQko1slJjCaoMx7qINRmWuGUQ0Nn2uzPkkvugrClXZDVFkNkQ3u1GmEYYLcTcvoAMVWPG85hoHNiMsiFWXuV77ixCbPiJGw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733233965; c=relaxed/simple; bh=vnc+rnhaHHzPgpDZnrmLNuG5NfWVJjfLpQ/Esp0FkHM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RqniJQDsghD6qvTW9cnSEfa3Dnl3ESS1xelDwmbZMmgXvmYq891Te5ZWpKNSeDwbPvxqEnEnl8ADT9UjKl/uhRc/NsNdtNH7StwTI2kX3MPmWrIhFadZJBVY2stCL+Lu5uqJfcePZhSPotn2GZwuj1isGDlJOoXGPTCEY0k5BfQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=G4Pq/9FR; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="G4Pq/9FR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5E25AC4CED6; Tue, 3 Dec 2024 13:52:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733233964; bh=vnc+rnhaHHzPgpDZnrmLNuG5NfWVJjfLpQ/Esp0FkHM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=G4Pq/9FRzL4l2eR8CtJBXei/jXyy8dbsTVMB/qq7OLZiZ77IEObXhc01u6kL63dkU CrxmXiCtI9AoTjREjXXX0zL/hr7GVEcgkHYgwHrh/DK2iwDGaql089lmomKYvz2kbo P7zG9EzKwNO4W2IBvKJAYjopXNxZEPLpGzP0XjdnMCN/TS9YnmdYhc0SB2oPIyZjou J8xi647csXnDjiYg6RR12b2342DGsZMZanOvujDj77mipz8SuAgZEnCQpyrzlv0Oei qS7++70COZJN87FT+unbdtEC7TbV2dEj3Do+DxIWId2mFMXfWFXGTSdkT1FDkBYSE3 DzFonYk6DhzBQ== From: Leon Romanovsky To: Jason Gunthorpe Cc: Or Har-Toov , linux-rdma@vger.kernel.org, Maher Sanalla Subject: [PATCH rdma-next 2/3] IB/mad: Remove unnecessary done list by utilizing MAD states Date: Tue, 3 Dec 2024 15:52:22 +0200 Message-ID: <8f746ee2eac86138b1051908b95a21fdff24af6c.1733233636.git.leonro@nvidia.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Or Har-Toov Remove the done list, which has become unnecessary with the introduction of the `state` parameter. Previously, the done list was used to ensure that MADs removed from the wait list would still be in some list, preventing failures in the call to `list_del` in `ib_mad_complete_send_wr`. However, with the new state management, we can mark a MAD as done when it is completed and simply not delete those MADs. Removing the done list eliminates unnecessary memory usage and simplifies the code. Signed-off-by: Or Har-Toov Reviewed-by: Maher Sanalla Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/mad.c | 13 ++++++------- drivers/infiniband/core/mad_priv.h | 1 - 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 9b101f91ca3e..e16bc396f6bc 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -391,7 +391,6 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device, spin_lock_init(&mad_agent_priv->lock); INIT_LIST_HEAD(&mad_agent_priv->send_list); INIT_LIST_HEAD(&mad_agent_priv->wait_list); - INIT_LIST_HEAD(&mad_agent_priv->done_list); INIT_LIST_HEAD(&mad_agent_priv->rmpp_list); INIT_DELAYED_WORK(&mad_agent_priv->timed_work, timeout_sends); INIT_LIST_HEAD(&mad_agent_priv->local_list); @@ -1772,13 +1771,11 @@ ib_find_send_mad(const struct ib_mad_agent_private *mad_agent_priv, void ib_mark_mad_done(struct ib_mad_send_wr_private *mad_send_wr) { mad_send_wr->timeout = 0; - if (mad_send_wr->state == IB_MAD_STATE_WAIT_RESP) { + list_del(&mad_send_wr->agent_list); + if (mad_send_wr->state == IB_MAD_STATE_WAIT_RESP) mad_send_wr->state = IB_MAD_STATE_DONE; - list_move_tail(&mad_send_wr->agent_list, - &mad_send_wr->mad_agent_priv->done_list); - } else { + else mad_send_wr->state = IB_MAD_STATE_EARLY_RESP; - } } static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv, @@ -2249,7 +2246,9 @@ void ib_mad_complete_send_wr(struct ib_mad_send_wr_private *mad_send_wr, } /* Remove send from MAD agent and notify client of completion */ - list_del(&mad_send_wr->agent_list); + if (mad_send_wr->state == IB_MAD_STATE_SEND_START) + list_del(&mad_send_wr->agent_list); + adjust_timeout(mad_agent_priv); spin_unlock_irqrestore(&mad_agent_priv->lock, flags); diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h index cc2de81ea6f6..4af63c1664c2 100644 --- a/drivers/infiniband/core/mad_priv.h +++ b/drivers/infiniband/core/mad_priv.h @@ -96,7 +96,6 @@ struct ib_mad_agent_private { spinlock_t lock; struct list_head send_list; struct list_head wait_list; - struct list_head done_list; struct delayed_work timed_work; unsigned long timeout; struct list_head local_list; From patchwork Tue Dec 3 13:52:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13892516 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E1B4E1EE006 for ; Tue, 3 Dec 2024 13:52:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733233963; cv=none; b=fX2MNgEqWzq502dluK+jEs+kLKrHstTcIGGQTsFhpI0NBMpN5YGlWwJUNGJFNTkcShc6Z62DKkbZ1KvufVLIVlqvWP9Rkl6Zh6NjHoHJvOv9FuvVw4Z3UZVm+CkIZAkVPEEffQ8e51Ni6cry89ayAWDUp7abKaSsxvhl7+D2zPg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733233963; c=relaxed/simple; bh=Ri2rvFH3uGX2uhGXx0iN9W2u8dudF6PqldI5cSMI3uc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=USUms6PbRuV5WEz8a2Dmaz8sE0u/jOpZuNaQtni3PvjaMzVLZHG/LdTfyjdIAyZWjg/nO4m+L2fHhNUws1CotqAOQv1MCOLcwkcQcQJcBVyNhj1eJRIPWL7zRSpXojhQgkhU+rkfexEnhDRtUXe9N2fiNrbn4Q6sTlv9okOPaaI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bOnRevO4; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bOnRevO4" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C0E72C4CECF; Tue, 3 Dec 2024 13:52:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733233960; bh=Ri2rvFH3uGX2uhGXx0iN9W2u8dudF6PqldI5cSMI3uc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bOnRevO4JlO7043QV41+350gG5Q08TuOD7aZU3GbJVGpZKLnlBoSMRXhQKIjZD3Ia eCx4z/I3kjuYUkLTvYPkbc1zz5W0JUwcDSie/OaXQ4o5SrTBaBAiBYS+tmIHktAcjN subj8y8dSrxOp4bK8HrqOi7EF6DHQlzVcTOMczeW1GMlhpj+xFp5QTloPYl8+ArSey 3zC5up4zhOiRazzc4hUjw60TbPsvBF9mCQRLcXApRfZtUGkp3tSNIDwgH3Vqk7wJJw BJlTSlZlIA3oowmP4mTt59IHBgPHye41nO6m4dsJkeFi/8oevbHiVxKuKnkV3Lq++t i2QrD2M9yJsEQ== From: Leon Romanovsky To: Jason Gunthorpe Cc: Or Har-Toov , linux-rdma@vger.kernel.org, Maher Sanalla Subject: [PATCH rdma-next 3/3] IB/mad: Add flow control for solicited MADs Date: Tue, 3 Dec 2024 15:52:23 +0200 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Or Har-Toov Currently, MADs sent via an agent are being forwarded directly to the corresponding MAD QP layer. MADs with a timeout value set and requiring a response (solicited MADs) will be resent if the timeout expires without receiving a response. In a congested subnet, flooding MAD QP layer with more solicited send requests from the agent will only worsen the situation by triggering more timeouts and therefore more retries. Thus, add flow control for non-user solicited MADs to block agents from issuing new solicited MAD requests to the MAD QP until outstanding requests are completed and the MAD QP is ready to process additional requests. Therefore, keep track of the total outstanding solicited MAD work requests (MADs that are on agent send list or wait list). The number of outstanding send WRs will be limited by a fraction of the RQ size, and any new send WR that exceeds that limit will be held in a backlog list. Backlog MADs will be forwarded to agent send list only once the total number of outstanding send WRs falls below the limit. For this purpose, a new state is introduced: * IB_MAD_STATE_QUEUED - MAD is in backlog list Unsolicited MADs, RMPP MADs and MADs which are not SA, SMP or CM are not subject to this flow control mechanism and will not be affected by this change. Signed-off-by: Or Har-Toov Reviewed-by: Maher Sanalla Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/mad.c | 171 +++++++++++++++++++++++++++-- drivers/infiniband/core/mad_priv.h | 8 ++ 2 files changed, 171 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index e16bc396f6bc..86e846b12e2f 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -210,6 +210,29 @@ int ib_response_mad(const struct ib_mad_hdr *hdr) } EXPORT_SYMBOL(ib_response_mad); +#define SOL_FC_MAX_DEFAULT_FRAC 4 +#define SOL_FC_MAX_SA_FRAC 32 + +static int get_sol_fc_max_outstanding(struct ib_mad_reg_req *mad_reg_req) +{ + if (!mad_reg_req) + /* Send only agent */ + return mad_recvq_size / SOL_FC_MAX_DEFAULT_FRAC; + + switch (mad_reg_req->mgmt_class) { + case IB_MGMT_CLASS_CM: + return mad_recvq_size / SOL_FC_MAX_DEFAULT_FRAC; + case IB_MGMT_CLASS_SUBN_ADM: + return mad_recvq_size / SOL_FC_MAX_SA_FRAC; + case IB_MGMT_CLASS_SUBN_LID_ROUTED: + case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: + return min(mad_recvq_size, IB_MAD_QP_RECV_SIZE) / + SOL_FC_MAX_DEFAULT_FRAC; + default: + return 0; + } +} + /* * ib_register_mad_agent - Register to send/receive MADs * @@ -392,12 +415,15 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device, INIT_LIST_HEAD(&mad_agent_priv->send_list); INIT_LIST_HEAD(&mad_agent_priv->wait_list); INIT_LIST_HEAD(&mad_agent_priv->rmpp_list); + INIT_LIST_HEAD(&mad_agent_priv->backlog_list); INIT_DELAYED_WORK(&mad_agent_priv->timed_work, timeout_sends); INIT_LIST_HEAD(&mad_agent_priv->local_list); INIT_WORK(&mad_agent_priv->local_work, local_completions); refcount_set(&mad_agent_priv->refcount, 1); init_completion(&mad_agent_priv->comp); + mad_agent_priv->sol_fc_max = + get_sol_fc_max_outstanding(mad_reg_req); ret2 = ib_mad_agent_security_setup(&mad_agent_priv->agent, qp_type); if (ret2) { ret = ERR_PTR(ret2); @@ -1054,6 +1080,43 @@ int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr) return ret; } +static bool is_solicited_fc_mad(struct ib_mad_send_wr_private *mad_send_wr) +{ + struct ib_rmpp_mad *rmpp_mad; + u8 mgmt_class; + + if (!mad_send_wr->timeout) + return 0; + + rmpp_mad = mad_send_wr->send_buf.mad; + if (mad_send_wr->mad_agent_priv->agent.rmpp_version && + (ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) & IB_MGMT_RMPP_FLAG_ACTIVE)) + return 0; + + mgmt_class = + ((struct ib_mad_hdr *)mad_send_wr->send_buf.mad)->mgmt_class; + return mgmt_class == IB_MGMT_CLASS_CM || + mgmt_class == IB_MGMT_CLASS_SUBN_ADM || + mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || + mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE; +} + +static bool mad_is_for_backlog(struct ib_mad_send_wr_private *mad_send_wr) +{ + struct ib_mad_agent_private *mad_agent_priv = + mad_send_wr->mad_agent_priv; + + if (!mad_send_wr->is_solicited_fc || !mad_agent_priv->sol_fc_max) + return false; + + if (!list_empty(&mad_agent_priv->backlog_list)) + return true; + + return mad_agent_priv->sol_fc_send_count + + mad_agent_priv->sol_fc_wait_count >= + mad_agent_priv->sol_fc_max; +} + /* * ib_post_send_mad - Posts MAD(s) to the send queue of the QP associated * with the registered client @@ -1117,14 +1180,26 @@ int ib_post_send_mad(struct ib_mad_send_buf *send_buf, mad_send_wr->max_retries = send_buf->retries; mad_send_wr->retries_left = send_buf->retries; send_buf->retries = 0; - mad_send_wr->state = IB_MAD_STATE_SEND_START; mad_send_wr->status = IB_WC_SUCCESS; /* Reference MAD agent until send completes */ refcount_inc(&mad_agent_priv->refcount); spin_lock_irqsave(&mad_agent_priv->lock, flags); + + mad_send_wr->is_solicited_fc = is_solicited_fc_mad(mad_send_wr); + if (mad_is_for_backlog(mad_send_wr)) { + list_add_tail(&mad_send_wr->agent_list, + &mad_agent_priv->backlog_list); + mad_send_wr->state = IB_MAD_STATE_QUEUED; + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); + return 0; + } + list_add_tail(&mad_send_wr->agent_list, &mad_agent_priv->send_list); + mad_send_wr->state = IB_MAD_STATE_SEND_START; + mad_agent_priv->sol_fc_send_count += + mad_send_wr->is_solicited_fc; spin_unlock_irqrestore(&mad_agent_priv->lock, flags); if (ib_mad_kernel_rmpp_agent(&mad_agent_priv->agent)) { @@ -1136,6 +1211,8 @@ int ib_post_send_mad(struct ib_mad_send_buf *send_buf, if (ret < 0) { /* Fail send request */ spin_lock_irqsave(&mad_agent_priv->lock, flags); + mad_agent_priv->sol_fc_send_count -= + mad_send_wr->is_solicited_fc; list_del(&mad_send_wr->agent_list); spin_unlock_irqrestore(&mad_agent_priv->lock, flags); deref_mad_agent(mad_agent_priv); @@ -1768,14 +1845,59 @@ ib_find_send_mad(const struct ib_mad_agent_private *mad_agent_priv, return NULL; } +static void +process_mad_from_backlog(struct ib_mad_agent_private *mad_agent_priv) +{ + struct ib_mad_send_wr_private *mad_send_wr; + struct ib_mad_send_wc mad_send_wc = {}; + unsigned long flags; + int ret; + + spin_lock_irqsave(&mad_agent_priv->lock, flags); + while (!list_empty(&mad_agent_priv->backlog_list) && + (mad_agent_priv->sol_fc_send_count + + mad_agent_priv->sol_fc_wait_count < + mad_agent_priv->sol_fc_max)) { + mad_send_wr = list_entry(mad_agent_priv->backlog_list.next, + struct ib_mad_send_wr_private, + agent_list); + list_move_tail(&mad_send_wr->agent_list, + &mad_agent_priv->send_list); + mad_agent_priv->sol_fc_send_count++; + mad_send_wr->state = IB_MAD_STATE_SEND_START; + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); + ret = ib_send_mad(mad_send_wr); + if (!ret) + return; + + spin_lock_irqsave(&mad_agent_priv->lock, flags); + deref_mad_agent(mad_agent_priv); + mad_agent_priv->sol_fc_send_count--; + list_del(&mad_send_wr->agent_list); + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); + mad_send_wc.send_buf = &mad_send_wr->send_buf; + mad_send_wc.status = IB_WC_LOC_QP_OP_ERR; + mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, + &mad_send_wc); + spin_lock_irqsave(&mad_agent_priv->lock, flags); + } + + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); +} + void ib_mark_mad_done(struct ib_mad_send_wr_private *mad_send_wr) { mad_send_wr->timeout = 0; list_del(&mad_send_wr->agent_list); - if (mad_send_wr->state == IB_MAD_STATE_WAIT_RESP) + if (mad_send_wr->state == IB_MAD_STATE_WAIT_RESP) { + mad_send_wr->mad_agent_priv->sol_fc_wait_count -= + mad_send_wr->is_solicited_fc; mad_send_wr->state = IB_MAD_STATE_DONE; - else + } else { + mad_send_wr->mad_agent_priv->sol_fc_send_count -= + mad_send_wr->is_solicited_fc; mad_send_wr->state = IB_MAD_STATE_EARLY_RESP; + } } static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv, @@ -2177,7 +2299,7 @@ static void wait_for_response(struct ib_mad_send_wr_private *mad_send_wr) unsigned long delay; mad_agent_priv = mad_send_wr->mad_agent_priv; - list_del(&mad_send_wr->agent_list); + list_del_init(&mad_send_wr->agent_list); delay = mad_send_wr->timeout; mad_send_wr->timeout += jiffies; @@ -2195,6 +2317,16 @@ static void wait_for_response(struct ib_mad_send_wr_private *mad_send_wr) list_item = &mad_agent_priv->wait_list; } + if (mad_send_wr->state == IB_MAD_STATE_SEND_START) { + if (mad_send_wr->is_solicited_fc) { + mad_agent_priv->sol_fc_send_count--; + mad_agent_priv->sol_fc_wait_count++; + } + } else if (mad_send_wr->state == IB_MAD_STATE_QUEUED) { + mad_agent_priv->sol_fc_wait_count += + mad_send_wr->is_solicited_fc; + } + mad_send_wr->state = IB_MAD_STATE_WAIT_RESP; list_add(&mad_send_wr->agent_list, list_item); @@ -2246,19 +2378,25 @@ void ib_mad_complete_send_wr(struct ib_mad_send_wr_private *mad_send_wr, } /* Remove send from MAD agent and notify client of completion */ - if (mad_send_wr->state == IB_MAD_STATE_SEND_START) + if (mad_send_wr->state == IB_MAD_STATE_SEND_START) { list_del(&mad_send_wr->agent_list); + mad_agent_priv->sol_fc_send_count -= + mad_send_wr->is_solicited_fc; + } adjust_timeout(mad_agent_priv); spin_unlock_irqrestore(&mad_agent_priv->lock, flags); if (mad_send_wr->status != IB_WC_SUCCESS) mad_send_wc->status = mad_send_wr->status; - if (ret == IB_RMPP_RESULT_INTERNAL) + if (ret == IB_RMPP_RESULT_INTERNAL) { ib_rmpp_send_handler(mad_send_wc); - else + } else { + if (mad_send_wr->is_solicited_fc) + process_mad_from_backlog(mad_agent_priv); mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, mad_send_wc); + } /* Release reference on agent taken when sending */ deref_mad_agent(mad_agent_priv); @@ -2417,6 +2555,8 @@ static void cancel_mads(struct ib_mad_agent_private *mad_agent_priv) /* Empty wait list to prevent receives from finding a request */ list_splice_init(&mad_agent_priv->wait_list, &cancel_list); + mad_agent_priv->sol_fc_wait_count = 0; + list_splice_tail_init(&mad_agent_priv->backlog_list, &cancel_list); spin_unlock_irqrestore(&mad_agent_priv->lock, flags); /* Report all cancelled requests */ @@ -2452,6 +2592,13 @@ find_send_wr(struct ib_mad_agent_private *mad_agent_priv, &mad_send_wr->send_buf == send_buf) return mad_send_wr; } + + list_for_each_entry(mad_send_wr, &mad_agent_priv->backlog_list, + agent_list) { + if (&mad_send_wr->send_buf == send_buf) + return mad_send_wr; + } + return NULL; } @@ -2477,7 +2624,8 @@ int ib_modify_mad(struct ib_mad_send_buf *send_buf, u32 timeout_ms) mad_send_wr->status = IB_WC_WR_FLUSH_ERR; mad_send_wr->send_buf.timeout_ms = timeout_ms; - if (mad_send_wr->state == IB_MAD_STATE_SEND_START) + if (mad_send_wr->state == IB_MAD_STATE_SEND_START || + (mad_send_wr->state == IB_MAD_STATE_QUEUED && timeout_ms)) mad_send_wr->timeout = msecs_to_jiffies(timeout_ms); else ib_reset_mad_timeout(mad_send_wr, timeout_ms); @@ -2607,7 +2755,10 @@ static int retry_send(struct ib_mad_send_wr_private *mad_send_wr) mad_send_wr->state = IB_MAD_STATE_SEND_START; list_add_tail(&mad_send_wr->agent_list, &mad_send_wr->mad_agent_priv->send_list); + mad_send_wr->mad_agent_priv->sol_fc_send_count += + mad_send_wr->is_solicited_fc; } + return ret; } @@ -2641,6 +2792,8 @@ static void timeout_sends(struct work_struct *work) } list_del_init(&mad_send_wr->agent_list); + mad_agent_priv->sol_fc_wait_count -= + mad_send_wr->is_solicited_fc; if (mad_send_wr->status == IB_WC_SUCCESS && !retry_send(mad_send_wr)) continue; @@ -2655,6 +2808,8 @@ static void timeout_sends(struct work_struct *work) else mad_send_wc.status = mad_send_wr->status; mad_send_wc.send_buf = &mad_send_wr->send_buf; + if (mad_send_wr->is_solicited_fc) + process_mad_from_backlog(mad_agent_priv); mad_agent_priv->agent.send_handler(&mad_agent_priv->agent, &mad_send_wc); deref_mad_agent(mad_agent_priv); diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h index 4af63c1664c2..b2a12a82a62d 100644 --- a/drivers/infiniband/core/mad_priv.h +++ b/drivers/infiniband/core/mad_priv.h @@ -95,12 +95,16 @@ struct ib_mad_agent_private { spinlock_t lock; struct list_head send_list; + unsigned int sol_fc_send_count; struct list_head wait_list; + unsigned int sol_fc_wait_count; struct delayed_work timed_work; unsigned long timeout; struct list_head local_list; struct work_struct local_work; struct list_head rmpp_list; + unsigned int sol_fc_max; + struct list_head backlog_list; refcount_t refcount; union { @@ -118,6 +122,7 @@ struct ib_mad_snoop_private { }; enum ib_mad_state { + IB_MAD_STATE_QUEUED, IB_MAD_STATE_SEND_START, IB_MAD_STATE_WAIT_RESP, IB_MAD_STATE_EARLY_RESP, @@ -150,6 +155,9 @@ struct ib_mad_send_wr_private { int pad; enum ib_mad_state state; + + /* Solicited MAD flow control */ + bool is_solicited_fc; }; struct ib_mad_local_private {