From patchwork Thu Dec 5 13:49:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13895400 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BD78218EAF for ; Thu, 5 Dec 2024 13:50:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406642; cv=none; b=fn0yPH0c9nys1/KuzpaXjpjmrolOXI6ShCinBccWHvTsmcTlzYa7QDKSEr9uoZiyAFMJpHi8nirwulUNuQ+7NcTWa+W4rK+1A7JuLU1eqknTx9k07XFB61V6eJRoivSJXL+sD/s1DCKV+qp26KlP/75CmAA9w513eiqJhAxAJ/w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406642; c=relaxed/simple; bh=L4BZ+9Q6I3k4gyiXav/kuyPNUbzeOrJ8tE/7fSExGvQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TSeUFpWtBFY/YZyvayHBWjgxA/o7GCKiTVSLEmYpCmPz94B9WIdjfmYy3ZAMqAPsywppUNVOgdGWYEBc9WlvSKQL8HGkItEmMB3OAFTUkyib4hDJ8IueV4dNtnypKC/1vyxsVsaU9QnnNWhhnqFawkwKXPKPwczY/1637Ck3tSM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=g7whkiMH; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="g7whkiMH" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F36D4C4CED1; Thu, 5 Dec 2024 13:50:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733406641; bh=L4BZ+9Q6I3k4gyiXav/kuyPNUbzeOrJ8tE/7fSExGvQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=g7whkiMHqJPkTiQLkM+fnYWmvzwKqpEMhaU9gz1Ux0p1DwwtNxZIkZc5orAF2hM46 YzdUY/m6GTOvhibQunc7QjtbgCNPJYSLxdaOjWS9v9G0kPu2MQnO5gmDLcgJFQOaS9 qPe8j55jNkPhLck3mEDtDU93je/UXYBsLCpmDhFT23f2Q1LHTtKqjJzLMvf3pZ8IpL mPHk39riYAB1WaZBpFGDye57z/p2iYMtVuikHOuVpavEqX4v26ryA7h30yEAKe5wjq LjCz4RqXEusrtkhTfT8l15jIzyiHPOePKgrcLvUpER+AWOU24mJyWT3voJeE0wojAd VbN5Yl/u+ByCg== From: Leon Romanovsky To: Jason Gunthorpe Cc: Vlad Dumitrescu , linux-rdma@vger.kernel.org, Sean Hefty Subject: [PATCH rdma-next 1/9] IB/mad: Apply timeout modification (CM MRA) only once Date: Thu, 5 Dec 2024 15:49:31 +0200 Message-ID: <14281da695b89e5d58890996f29d15145d85b960.1733405453.git.leon@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Vlad Dumitrescu ib_modify_mad with non-zero timeout is used exclusively for CM MRA [1]. The timeout is computed as the sum of the local timeout (set via cm_init_av_by_path) and a constant 60s+ Service Timeout (CMA_CM_MRA_SETTING) passed via the MRA message by the remote. MRAs are generated when duplicate messages are received. Overwriting send_buf.timeout_ms means the MRA Service Timeout requested by the remote will apply to all remaining retry attempts. This can lead to unnecessary and extreme delays when a receiver is only temporarily overloaded. Do not save the MRA timeout so that it only applies to the next retry. ib_modify_mad is also called with a zero timeout, to implement ib_cancel_mad. The timeout was also saved in that case, but it is not required as timeout_sends will skip the retry_send (which reads the saved value) call anyway, based on the non-successful status. [1] IBTA v1.7 - Section 12.6.6 - MRA - Message Receipt Acknowledgment Signed-off-by: Vlad Dumitrescu Reviewed-by: Sean Hefty Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/mad.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 86e846b12e2f..bcfbb2a5c02b 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -2623,7 +2623,6 @@ int ib_modify_mad(struct ib_mad_send_buf *send_buf, u32 timeout_ms) if (!timeout_ms) mad_send_wr->status = IB_WC_WR_FLUSH_ERR; - mad_send_wr->send_buf.timeout_ms = timeout_ms; if (mad_send_wr->state == IB_MAD_STATE_SEND_START || (mad_send_wr->state == IB_MAD_STATE_QUEUED && timeout_ms)) mad_send_wr->timeout = msecs_to_jiffies(timeout_ms); From patchwork Thu Dec 5 13:49:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13895401 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9536421A425 for ; Thu, 5 Dec 2024 13:50:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406645; cv=none; b=eMsg+QTkv1x7OgoN3jRxTyJ37l+E4tzZU6Y5/3IqzY6GATGjwI4DCKDerkEe+rKEavMziFuu2lQ5XZW7J5yRQ3uzpKaPdI5ffprnFDkpabKVrzYTnWZgblbRDEqXurJnjUfOxKJohx4g5HV8PypkPMZwt+U01bBMe9/GL1+MKfA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406645; c=relaxed/simple; bh=5417LrkO6HhyB5HJCUBG8hZZpVa7sCLNnP+7bBCRhWY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VgybBugY6qnG6SvvtltvIseUJ+2/yPJ+WuyZTZ36g/alLFkgVJifcI8rf06LjfRA+KP89uHEIrt7Kl+0hcoVKtS7Rm7niNyrUPT02ap7MREVKp9mdG2m7MHjsECsQDVlfxPKlEEP3NAK9sdn9JNOF+Ps93Xslv5JlZ/lfXoIvbE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dOY5umB5; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dOY5umB5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AF7B1C4CED1; Thu, 5 Dec 2024 13:50:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733406645; bh=5417LrkO6HhyB5HJCUBG8hZZpVa7sCLNnP+7bBCRhWY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dOY5umB5IyyvvfmnjJ09OR3Il4dTxgbindeqNrsJK4/MIcVy5iR401LyfuaFrx+8O RRqiBNKjUQhH/tClTASkw/GhR0sXC8d6EnwICW+oI0gan0OvNOUZg/X5IhCgJFnlDR cj0OoZ1HkGAB3RdDlX+qZ9wGJze1BlTy7xnSJivr5jPUv+E3bW6FYYZV/m5BuTOcM8 5S0wTtdbco7iq475ML/esgQv2dSt5hFPI+zFnPtLuKDRcr7m8bp00WUOKiG4yksZEt d/Z6S8HNbJjtC3x0/4kqJT7SDzat13HliJARbP3R12/d7Md5GURihOFKPedd2QQDDg dbln7chUZEAmg== From: Leon Romanovsky To: Jason Gunthorpe Cc: Vlad Dumitrescu , linux-rdma@vger.kernel.org, Sean Hefty Subject: [PATCH rdma-next 2/9] IB/mad: Add deadline for send MADs Date: Thu, 5 Dec 2024 15:49:32 +0200 Message-ID: <3e9add3109a36c3238465b9ce11363084b9ddb14.1733405453.git.leon@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Vlad Dumitrescu The MAD layer does not currently provide a way to enforce a deadline. Callers which need that, like rdma_resolve_route / SA, make assumptions about the MAD layer's retry algorithm and set the retries and timeout_ms fields struct ib_mad_send_buf accordingly. For example, given today's retry algorithm - linear, with no significant scheduling or queueing delays - callers expect the final timeout to trigger roughly after (retries + 1) * timeout_ms. Add helper to set internal deadline based on relative timeout from current time. Callers can configure the deadline at any time, but should account for delays themselves introduce before calling ib_post_send_mad. Otherwise, if the deadline has passed, post fails. When a deadline is not set or it's too high, clamp to 5 minutes after post time. Probably not a good idea to accept arbitrary timeouts. After a series of callers will be converted to use this new parameter, the MAD layer can evolve its retry algorithm (e.g., to prevent congestion) without affecting those callers. Note that existing fields still need to be exposed: - timeout_ms will be needed to reset the retry algorithm after a temporary delay requested by remote via CM MRA [1], and - retries is needed to implement CM REQ:Max CM Retries [2]. In case of CM MRA (ib_modify_mad is called with non-zero timeout), increase the deadline as the sender can't plan for MRA-requested delays. Ignore RMPP for now - it uses a different per-window retry algorithm. [1] IBTA v1.7 - Section 12.6.6 - MRA - Message Receipt Acknowledgment [2] IBTA v1.7 - Section 12.7.27 - Max CM Retries Signed-off-by: Vlad Dumitrescu Reviewed-by: Sean Hefty Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/mad.c | 54 +++++++++++++++++++++++++++--- drivers/infiniband/core/mad_priv.h | 1 + include/rdma/ib_mad.h | 29 ++++++++++++++++ 3 files changed, 80 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index bcfbb2a5c02b..5c255ee3db38 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -54,6 +54,8 @@ #define CREATE_TRACE_POINTS #include +#define IB_MAD_MAX_DEADLINE (jiffies + msecs_to_jiffies(5 * 60 * 1000)) + #ifdef CONFIG_TRACEPOINTS static void create_mad_addr_info(struct ib_mad_send_wr_private *mad_send_wr, struct ib_mad_qp_info *qp_info, @@ -855,6 +857,26 @@ int ib_mad_kernel_rmpp_agent(const struct ib_mad_agent *agent) } EXPORT_SYMBOL(ib_mad_kernel_rmpp_agent); +int ib_set_mad_deadline(struct ib_mad_send_buf *send_buf, u32 total_timeout_ms) +{ + struct ib_mad_send_wr_private *mad_send_wr = + container_of(send_buf, struct ib_mad_send_wr_private, send_buf); + + if (WARN_ON_ONCE(!total_timeout_ms)) + return -EINVAL; + + if (WARN_ON_ONCE(mad_send_wr->deadline)) + return -EINVAL; + + if (WARN_ON_ONCE(ib_mad_kernel_rmpp_agent( + &mad_send_wr->mad_agent_priv->agent))) + return -EINVAL; + + mad_send_wr->deadline = jiffies + msecs_to_jiffies(total_timeout_ms); + return 0; +} +EXPORT_SYMBOL(ib_set_mad_deadline); + struct ib_mad_send_buf *ib_create_send_mad(struct ib_mad_agent *mad_agent, u32 remote_qpn, u16 pkey_index, int rmpp_active, int hdr_len, @@ -1174,6 +1196,19 @@ int ib_post_send_mad(struct ib_mad_send_buf *send_buf, continue; } + if (!ib_mad_kernel_rmpp_agent(&mad_agent_priv->agent) && + send_buf->timeout_ms) { + if (!mad_send_wr->deadline || + time_after(mad_send_wr->deadline, + IB_MAD_MAX_DEADLINE)) { + mad_send_wr->deadline = IB_MAD_MAX_DEADLINE; + } else if (time_after_eq(jiffies, + mad_send_wr->deadline)) { + ret = -ETIMEDOUT; + goto error; + } + } + mad_send_wr->tid = ((struct ib_mad_hdr *) send_buf->mad)->tid; /* Timeout will be updated after send completes */ mad_send_wr->timeout = msecs_to_jiffies(send_buf->timeout_ms); @@ -2293,16 +2328,23 @@ static void adjust_timeout(struct ib_mad_agent_private *mad_agent_priv) static void wait_for_response(struct ib_mad_send_wr_private *mad_send_wr) { - struct ib_mad_agent_private *mad_agent_priv; struct ib_mad_send_wr_private *temp_mad_send_wr; + struct ib_mad_agent_private *mad_agent_priv; + const unsigned long now = jiffies; struct list_head *list_item; unsigned long delay; mad_agent_priv = mad_send_wr->mad_agent_priv; list_del_init(&mad_send_wr->agent_list); - delay = mad_send_wr->timeout; - mad_send_wr->timeout += jiffies; + /* Caller must ensure mad_send_wr->timeout is relative */ + if (!mad_send_wr->deadline) + delay = mad_send_wr->timeout; + else if (time_after_eq(now, mad_send_wr->deadline)) + delay = 0; /* schedule ASAP */ + else + delay = min(mad_send_wr->deadline - now, mad_send_wr->timeout); + mad_send_wr->timeout = now + delay; if (delay) { list_for_each_prev(list_item, &mad_agent_priv->wait_list) { @@ -2623,6 +2665,9 @@ int ib_modify_mad(struct ib_mad_send_buf *send_buf, u32 timeout_ms) if (!timeout_ms) mad_send_wr->status = IB_WC_WR_FLUSH_ERR; + if (mad_send_wr->deadline) + mad_send_wr->deadline += msecs_to_jiffies(timeout_ms); + if (mad_send_wr->state == IB_MAD_STATE_SEND_START || (mad_send_wr->state == IB_MAD_STATE_QUEUED && timeout_ms)) mad_send_wr->timeout = msecs_to_jiffies(timeout_ms); @@ -2726,7 +2771,8 @@ static int retry_send(struct ib_mad_send_wr_private *mad_send_wr) { int ret; - if (!mad_send_wr->retries_left) + if (time_after_eq(jiffies, mad_send_wr->deadline) || + !mad_send_wr->retries_left) return -ETIMEDOUT; mad_send_wr->retries_left--; diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h index b2a12a82a62d..24580ad2d428 100644 --- a/drivers/infiniband/core/mad_priv.h +++ b/drivers/infiniband/core/mad_priv.h @@ -140,6 +140,7 @@ struct ib_mad_send_wr_private { struct ib_sge sg_list[IB_MAD_SEND_REQ_MAX_SG]; __be64 tid; unsigned long timeout; + unsigned long deadline; int max_retries; int retries_left; int retry; diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h index 3f1b58d8b4bf..69708170a0d6 100644 --- a/include/rdma/ib_mad.h +++ b/include/rdma/ib_mad.h @@ -727,6 +727,9 @@ void ib_free_recv_mad(struct ib_mad_recv_wc *mad_recv_wc); * * This call will reset the timeout value for a sent MAD to the specified * value. + * + * If called with a non-zero value and ib_set_mad_deadline was used, the + * deadline will be extended by the @timeout_ms. */ int ib_modify_mad(struct ib_mad_send_buf *send_buf, u32 timeout_ms); @@ -818,4 +821,30 @@ void ib_free_send_mad(struct ib_mad_send_buf *send_buf); */ int ib_mad_kernel_rmpp_agent(const struct ib_mad_agent *agent); +/** + * ib_set_mad_deadline - Sets send MAD's deadline based on current time. + * @send_buf: Previously allocated send data buffer. + * @total_timeout_ms: Time to wait before stopping retries. + * + * The deadline will start being enforced once ib_post_send_mad is called. + * It is NOT guaranteed that at least one send will be performed. Only valid + * for MADs waiting for response (ib_mad_send_buf.timeout_ms must also be set). + * + * This option allows callers to bound the time a MAD is owned by the MAD layer. + * This takes precedence over ib_mad_send_buf.{retries, timeout_ms} and is + * independent from the MAD layer's internal retry algorithm. + * + * Once the this deadline expires, the MAD data buffer will be returned to the + * caller via the send_handler configured at agent registration time. + * Invocation of the send_handler might happen slightly later due to scheduling + * delays. + * + * The deadline will be extended if ib_modify_mad is called. + * + * Can only be called once. + * + * Might return errors for MADs which do not support deadline. + */ +int ib_set_mad_deadline(struct ib_mad_send_buf *send_buf, u32 total_timeout_ms); + #endif /* IB_MAD_H */ From patchwork Thu Dec 5 13:49:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13895402 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B271218EBF for ; Thu, 5 Dec 2024 13:50:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406650; cv=none; b=r1nyFTVy6IAdBoB41flWEBmaOIGc72bfUCqiG+MJ2BfpVZlyG4a5+qTu9XwOYXrvOQfgOeNMcT8oeJL6lWoXRiVI5y+fw5x4tB24mu+5fg+wrFDNk3BfFYX/vEmHRQ7ymo+FJs60L2ZFEAb1vPbuQwVGT8SHYgXx3IY5jePalCw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406650; c=relaxed/simple; bh=CmxZO7CQEOtmJYZObt8DTER46vuplNhU5pZf2Un0/JI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QyAR61hSbPqJk3qYfm+sf7QZDodM44P3xCexbpCEkDRMZgNEkGYHg2/S301h4RcpIIO+6lMG0EOmebdpok/5YItMa2XLNo51qJM/bBMoXlojToVoOH9gJcIBY2LU9lma05gVtBG/6WfPoI2YfSjWdqrP4cPzZY/A9H5mAHW/3WE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GZHeXhK2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GZHeXhK2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6BCFFC4CED1; Thu, 5 Dec 2024 13:50:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733406648; bh=CmxZO7CQEOtmJYZObt8DTER46vuplNhU5pZf2Un0/JI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GZHeXhK2qE7lGW3UmKRYWcUNDS5vf2n1/58eL3faQergkJNU8CmoK7JGzKX0n5ExM IYBBN7/HeHha07arEBJk7kufza4Y65GCGirDt/onNYrNyzzPFN4UrYkpWaqyPPb62b bh2Y+wmTUf1kd/Yt2A8qBfgbd5X5EyYST+84rl7CBnjD4A0OQ9CSVM5A+NGs3rDY6l Qj427VGorJzAefWfR4JgUXYHN+SuwMg9gx5G9QD4eVUs84cB1UcthojxFZl0agcXpj DADWbK3oJK5y1O7DMu5Ehm53juqCjW49uYos4KcsXzcen9w89eaTMInTo0KZJEdXan TLUDYSRaJ08fQ== From: Leon Romanovsky To: Jason Gunthorpe Cc: Vlad Dumitrescu , linux-rdma@vger.kernel.org, Sean Hefty Subject: [PATCH rdma-next 3/9] RDMA/sa_query: Enforce min retry interval and deadline Date: Thu, 5 Dec 2024 15:49:33 +0200 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Vlad Dumitrescu SA query users, both in-kernel and userspace (via rdma_resolve_route), pass in a total timeout and expect the SA query layer to handle retries. SA query relies on the MAD layer to issue a fixed number of 10 retries at the specified interval (timeout), set to 1/10 of the requested total timeout. When the caller-requested total timeout is low (e.g., 1s for IPoIB), the resulting retry interval (e.g., 100ms) to too aggressive. There have been reports of overloaded SA receivers. Hence, enforce a minimum. A follow-up change will make this configurable via rdma tool (netlink) at per-port granularity. Continue to enforce the caller's total timeout by using the new MAD layer deadline option. Remove small-timeout special case - the total timeout option will take care of stopping the send even when more retries are left. Moreover, this special case results in an extremely aggressive 1ms retries, which is definitely not desirable. Signed-off-by: Vlad Dumitrescu Reviewed-by: Sean Hefty Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/sa_query.c | 34 +++++++++++++++++++----------- 1 file changed, 22 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 53571e6b3162..ac0d53bf91c4 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include #include @@ -59,6 +60,7 @@ #define IB_SA_LOCAL_SVC_TIMEOUT_MAX 200000 #define IB_SA_CPI_MAX_RETRY_CNT 3 #define IB_SA_CPI_RETRY_WAIT 1000 /*msecs */ +#define IB_SA_MIN_TIMEOUT_MS_DEFAULT 500 static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT; struct ib_sa_sm_ah { @@ -96,6 +98,7 @@ struct ib_sa_port { spinlock_t classport_lock; /* protects class port info set */ spinlock_t ah_lock; u32 port_num; + u32 min_timeout_ms; }; struct ib_sa_device { @@ -1344,13 +1347,14 @@ static int send_mad(struct ib_sa_query *query, unsigned long timeout_ms, if (ret < 0) return ret; - query->mad_buf->timeout_ms = timeout_ms / nmbr_sa_query_retries; + query->mad_buf->timeout_ms = + max(READ_ONCE(query->port->min_timeout_ms), + timeout_ms / nmbr_sa_query_retries); query->mad_buf->retries = nmbr_sa_query_retries; - if (!query->mad_buf->timeout_ms) { - /* Special case, very small timeout_ms */ - query->mad_buf->timeout_ms = 1; - query->mad_buf->retries = timeout_ms; - } + ret = ib_set_mad_deadline(query->mad_buf, timeout_ms); + if (ret) + goto out; + query->mad_buf->context[0] = query; query->id = id; @@ -1364,18 +1368,22 @@ static int send_mad(struct ib_sa_query *query, unsigned long timeout_ms, } ret = ib_post_send_mad(query->mad_buf, NULL); - if (ret) { - xa_lock_irqsave(&queries, flags); - __xa_erase(&queries, id); - xa_unlock_irqrestore(&queries, flags); - } /* * It's not safe to dereference query any more, because the * send may already have completed and freed the query in * another context. */ - return ret ? ret : id; + +out: + if (ret) { + xa_lock_irqsave(&queries, flags); + __xa_erase(&queries, id); + xa_unlock_irqrestore(&queries, flags); + return ret; + } + + return id; } void ib_sa_unpack_path(void *attribute, struct sa_path_rec *rec) @@ -2192,6 +2200,8 @@ static int ib_sa_add_one(struct ib_device *device) INIT_DELAYED_WORK(&sa_dev->port[i].ib_cpi_work, update_ib_cpi); + sa_dev->port[i].min_timeout_ms = IB_SA_MIN_TIMEOUT_MS_DEFAULT; + count++; } From patchwork Thu Dec 5 13:49:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13895406 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A375A218EA7 for ; Thu, 5 Dec 2024 13:51:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406663; cv=none; b=XODFMLeBeTNVdgvl6UFOprWFbmd8KKrQ4QkEx1nl6/AB253VipVjtUcF4vuFLV/4Lp7o67LAxAYEdXuzf2f7nADbWbfA+9P9nKQVZekEN1f7hslMFWR+CCAOnJbpfyNkbFjVANaEYLlQHVovfiYde0UE0NGQCQkKNHrE09Cjz2I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406663; c=relaxed/simple; bh=EC+PxLZ1X5+89h9EChX8Db/ikUqERJP9WIvrUJmi1Pw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KTw7mLLlW8xJyH1BjJbge0jK0fYHCWvU2E8ZaFfkiSff6tJUCb1Q1HoMFeRBMaZBFY9lVXYhv5ENxxsU4rEuvJC34lLt9xJq+CauTHA9bbFgid7Jt15UJ2JnBdNqRt5vAuIbhca/FSltwAmEZKfB/UdamPP6SGZ89O9LQIVEZps= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=aaZWzELo; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aaZWzELo" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 178F4C4CED1; Thu, 5 Dec 2024 13:51:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733406663; bh=EC+PxLZ1X5+89h9EChX8Db/ikUqERJP9WIvrUJmi1Pw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aaZWzELorP1GnmQmPezjQ0yS7glxe5zxlqsYugKhp/ZPFFJUHCM+5df+XQ/spOQJY n42qow8vcCzhjgUkzl1CDvAcW0G/kq+rCz4Y5FI1uQKoiO8q3PR0OkpBXv3ik6HJVs tzDFwlCZLIohoJnlUf65okjJa4O9RMd1gEC6uz4iQvw82bRBQg9/yzpDbFbYxXfR0l GpwlAEFsJ/LQgCoE90kIEU7HYnw8cbCsO47AmC3HTpYpwIcfz1uYl0ZAsLuYs/nprF UIeRMVjWPxWygBi6oiXrs//n5IlaX2zqvP3rIPKZQrX5fgLWUFIITrmXd420ZOMtlA kCL+RgSVIfG6g== From: Leon Romanovsky To: Jason Gunthorpe Cc: Vlad Dumitrescu , linux-rdma@vger.kernel.org, Sean Hefty Subject: [PATCH rdma-next 4/9] RDMA/nldev: Add sa-min-timeout management attribute Date: Thu, 5 Dec 2024 15:49:34 +0200 Message-ID: <69722b901037bf9727da09bc2cbea82e81f2f6e7.1733405453.git.leon@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Vlad Dumitrescu Add new namespace for MAD (Management Datagram) protocols as we expect more attributes in this area. Add first such attribute, to control the minimum initial timeout used by the SA client implementation. The SA client relies on the MAD layer to issue retries, but has to configure an initial timeout value for the first retry. While the SA client provides a default, the right value likely depends on network size, loss levels and capacity of the SA server. This attribute enables system admins to tune the trade-off between speed of recovery under transient loss and load (on the network and/or the SA server) generated by unnecessary retries. Enforce a reasonable range of 50ms - 10s. Changes do not apply to existing SA queries, which were already posted to the MAD layer. Example usage: # rdma management show ibp1s0f0/1 0: ibp1s0f0: 1 sa-min-timeout 500 ... # rdma management show 0: ibp1s0f0: 1 sa-min-timeout 500 ... 1: ibp1s0f1: 1 sa-min-timeout 500 ... # rdma management set ibp1s0f1/1 sa-min-timeout 1000 ... # rdma management show 0: ibp1s0f0: 1 sa-min-timeout 500 ... 1: ibp1s0f1: 1 sa-min-timeout 1000 ... Signed-off-by: Vlad Dumitrescu Reviewed-by: Sean Hefty Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/core_priv.h | 4 + drivers/infiniband/core/nldev.c | 114 ++++++++++++++++++++++++++++ drivers/infiniband/core/sa_query.c | 47 ++++++++++++ include/uapi/rdma/rdma_netlink.h | 5 ++ 4 files changed, 170 insertions(+) diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 05102769a918..7a7326588297 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -199,6 +199,10 @@ void ib_mad_cleanup(void); int ib_sa_init(void); void ib_sa_cleanup(void); +int ib_sa_min_timeout_set(struct ib_device *dev, u32 port_num, u32 val, + struct netlink_ext_ack *extack); +int ib_sa_min_timeout_get(struct ib_device *dev, u32 port_num, u32 *val); + void rdma_nl_init(void); void rdma_nl_exit(void); diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c index ff121e59b9c0..363742567dd2 100644 --- a/drivers/infiniband/core/nldev.c +++ b/drivers/infiniband/core/nldev.c @@ -171,6 +171,7 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = { [RDMA_NLDEV_ATTR_PARENT_NAME] = { .type = NLA_NUL_STRING }, [RDMA_NLDEV_ATTR_NAME_ASSIGN_TYPE] = { .type = NLA_U8 }, [RDMA_NLDEV_ATTR_EVENT_TYPE] = { .type = NLA_U8 }, + [RDMA_NLDEV_MGMT_ATTR_SA_MIN_TIMEOUT] = { .type = NLA_U32 }, }; static int put_driver_name_print_type(struct sk_buff *msg, const char *name, @@ -2621,6 +2622,112 @@ static int nldev_deldev(struct sk_buff *skb, struct nlmsghdr *nlh, return ib_del_sub_device_and_put(device); } +static int nldev_mgmt_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[RDMA_NLDEV_ATTR_MAX]; + struct ib_device *device; + struct sk_buff *msg; + u32 index; + u32 port; + u32 sa_min_timeout; + int ret; + + ret = __nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1, nldev_policy, + NL_VALIDATE_LIBERAL, extack); + if (ret || + !tb[RDMA_NLDEV_ATTR_DEV_INDEX] || + !tb[RDMA_NLDEV_ATTR_PORT_INDEX]) + return -EINVAL; + + index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]); + device = ib_device_get_by_index(sock_net(skb->sk), index); + if (!device) + return -EINVAL; + + port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]); + if (!rdma_is_port_valid(device, port)) { + ret = -EINVAL; + goto err; + } + + if (rdma_cap_ib_sa(device, port)) { + ret = ib_sa_min_timeout_get(device, port, &sa_min_timeout); + if (ret) + goto err; + } + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) { + ret = -ENOMEM; + goto err; + } + + nlh = nlmsg_put( + msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, + RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_MGMT_GET), 0, 0); + if (!nlh || + fill_nldev_handle(msg, device) || + nla_put_u32(msg, RDMA_NLDEV_ATTR_PORT_INDEX, port)) { + ret = -EMSGSIZE; + goto err_msg; + } + + if (rdma_cap_ib_sa(device, port)) { + ret = nla_put_u32(msg, RDMA_NLDEV_MGMT_ATTR_SA_MIN_TIMEOUT, + sa_min_timeout); + if (ret) + goto err_msg; + } + + nlmsg_end(msg, nlh); + return rdma_nl_unicast(sock_net(skb->sk), msg, NETLINK_CB(skb).portid); + +err_msg: + nlmsg_free(msg); +err: + ib_device_put(device); + return ret; +} + +static int nldev_set_mgmt_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[RDMA_NLDEV_ATTR_MAX]; + struct ib_device *device; + u32 index; + u32 port; + u32 sa_min_timeout; + int ret; + + ret = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1, nldev_policy, + extack); + if (ret || + !tb[RDMA_NLDEV_ATTR_DEV_INDEX] || + !tb[RDMA_NLDEV_ATTR_PORT_INDEX]) + return -EINVAL; + + index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]); + device = ib_device_get_by_index(sock_net(skb->sk), index); + if (!device) + return -EINVAL; + + port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]); + if (!rdma_is_port_valid(device, port)) + goto err; + + if (tb[RDMA_NLDEV_MGMT_ATTR_SA_MIN_TIMEOUT]) { + sa_min_timeout = + nla_get_u32(tb[RDMA_NLDEV_MGMT_ATTR_SA_MIN_TIMEOUT]); + return ib_sa_min_timeout_set(device, port, sa_min_timeout, + extack); + } + +err: + ib_device_put(device); + return -EINVAL; +} + static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = { [RDMA_NLDEV_CMD_GET] = { .doit = nldev_get_doit, @@ -2727,6 +2834,13 @@ static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = { .doit = nldev_deldev, .flags = RDMA_NL_ADMIN_PERM, }, + [RDMA_NLDEV_CMD_MGMT_GET] = { + .doit = nldev_mgmt_get_doit, + }, + [RDMA_NLDEV_CMD_MGMT_SET] = { + .doit = nldev_set_mgmt_set_doit, + .flags = RDMA_NL_ADMIN_PERM, + }, }; static int fill_mon_netdev_rename(struct sk_buff *msg, diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index ac0d53bf91c4..7f63cad3f212 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -60,7 +60,10 @@ #define IB_SA_LOCAL_SVC_TIMEOUT_MAX 200000 #define IB_SA_CPI_MAX_RETRY_CNT 3 #define IB_SA_CPI_RETRY_WAIT 1000 /*msecs */ +#define IB_SA_MIN_TIMEOUT_MS_MIN 50 #define IB_SA_MIN_TIMEOUT_MS_DEFAULT 500 +#define IB_SA_MIN_TIMEOUT_MS_MAX 10000 + static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT; struct ib_sa_sm_ah { @@ -1334,6 +1337,50 @@ static void init_mad(struct ib_sa_query *query, struct ib_mad_agent *agent) spin_unlock_irqrestore(&tid_lock, flags); } +int ib_sa_min_timeout_set(struct ib_device *dev, u32 port_num, u32 val, + struct netlink_ext_ack *extack) +{ + struct ib_sa_device *sa_dev = ib_get_client_data(dev, &sa_client); + struct ib_sa_port *port; + + if (!rdma_cap_ib_sa(dev, port_num)) + return -EOPNOTSUPP; + + if (!sa_dev) + return -ENODEV; + + port = &sa_dev->port[port_num - sa_dev->start_port]; + + if (val > IB_SA_MIN_TIMEOUT_MS_MAX || val < IB_SA_MIN_TIMEOUT_MS_MIN) { + NL_SET_ERR_MSG_FMT_MOD(extack, "Valid range [%u-%u]ms", + IB_SA_MIN_TIMEOUT_MS_MIN, + IB_SA_MIN_TIMEOUT_MS_MAX); + return -EINVAL; + } + + WRITE_ONCE(port->min_timeout_ms, val); + + return 0; +} + +int ib_sa_min_timeout_get(struct ib_device *dev, u32 port_num, u32 *val) +{ + struct ib_sa_device *sa_dev = ib_get_client_data(dev, &sa_client); + struct ib_sa_port *port; + + if (!rdma_cap_ib_sa(dev, port_num)) + return -EOPNOTSUPP; + + if (!sa_dev) + return -ENODEV; + + port = &sa_dev->port[port_num - sa_dev->start_port]; + + *val = READ_ONCE(port->min_timeout_ms); + + return 0; +} + static int send_mad(struct ib_sa_query *query, unsigned long timeout_ms, gfp_t gfp_mask) { diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h index 9f9cf20c1cd8..2b1c4c55e51f 100644 --- a/include/uapi/rdma/rdma_netlink.h +++ b/include/uapi/rdma/rdma_netlink.h @@ -308,6 +308,9 @@ enum rdma_nldev_command { RDMA_NLDEV_CMD_MONITOR, + RDMA_NLDEV_CMD_MGMT_GET, + RDMA_NLDEV_CMD_MGMT_SET, + RDMA_NLDEV_NUM_OPS }; @@ -580,6 +583,8 @@ enum rdma_nldev_attr { RDMA_NLDEV_ATTR_EVENT_TYPE, /* u8 */ RDMA_NLDEV_SYS_ATTR_MONITOR_MODE, /* u8 */ + + RDMA_NLDEV_MGMT_ATTR_SA_MIN_TIMEOUT, /* u32 */ /* * Always the end */ From patchwork Thu Dec 5 13:49:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13895403 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A8DC0219A66 for ; Thu, 5 Dec 2024 13:50:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406652; cv=none; b=W/gtKnVWN8NmUxmUzAh+/wNN+fkaoR0z0ZW22WlNaB83XTfcBNVDsUDPrbneKrc1qeNL+LK0sodVpzVLtChTZk1Dfa5cVR9tVq3o7ql7pjRmKSivs+exZV9xrVU7u0sPEtrL2005pAY1iEjCIsUmlvd7dxieEHZ7wvPhLopN210= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406652; c=relaxed/simple; bh=ZczzilvhBzoHP0XP8CgeNMTwOM2wl9FcSBFZzQ+Ojkk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VfRwhprOjmWeTIWHpYzrtFRzc2fEofui3lSMkm4esO+hycG84S5zMfjYypFT0IiHoMEAEl2bveqWrHD0MhrV9DjYgTti5WTKXSWS7bRdgMxmj/J1K1Yly5J6kQDALH51pI1HgUusQ3g2euSp2cs0XY/JEkG+2xlh2uzuLZ8/Wls= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WaBuwbSM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WaBuwbSM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0FB55C4CED1; Thu, 5 Dec 2024 13:50:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733406652; bh=ZczzilvhBzoHP0XP8CgeNMTwOM2wl9FcSBFZzQ+Ojkk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WaBuwbSMeLOkLdfwoKndGlyirzP54zUKOovTgUKMIfNEMBFLTY3O1W/ZsiKue/EQ8 +tIxH+QidzA1vdVcW676wb9CwSh8xHCYETu68BirvCsOxcj7e35qLzgJndvlQhKw3z RGOkDD6psNlp03yAsQPWOllSwNXngpGPDP7GT9b6ItFKUIQwUG2gkMZL1Sl2i/2pQ4 H5HzzLUsKhjVOpg+rNmnyyo4pBUsHQiXgJwr4+Jhb0TZRaWX7rsVbk7GyPJ7eVk2NM a237/pd0TJvCxoPqPEskWXm+Ym4qNVi74eDBNccUUXf91gxBBSE/XH0uXwzeQO3AOm jGpTM+rm5qesQ== From: Leon Romanovsky To: Jason Gunthorpe Cc: Vlad Dumitrescu , linux-rdma@vger.kernel.org, Sean Hefty Subject: [PATCH rdma-next 5/9] IB/umad: Set deadline when sending non-RMPP MADs Date: Thu, 5 Dec 2024 15:49:35 +0200 Message-ID: <3ddefc7bd188b15b9f03aebf469630f30f62bea2.1733405453.git.leon@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Vlad Dumitrescu Change semantics of the UAPI struct ib_user_mad_hdr retries and timeout_ms fields. Given the current implementation, users likely expect the total timeout to be (retries + 1) * timeout_ms. Use that as MAD deadline. This allows changes to the MAD layer's internal retry algorithm, for non-RMPP agents, without affecting the total timeout experienced by userspace. Signed-off-by: Vlad Dumitrescu Reviewed-by: Sean Hefty Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/user_mad.c | 8 ++++++++ include/uapi/rdma/ib_user_mad.h | 12 ++++++++++-- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index fd67fc9fe85a..3da6c0295657 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -650,6 +650,14 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf, } } + if (!ib_mad_kernel_rmpp_agent(agent) && packet->mad.hdr.timeout_ms) { + ret = ib_set_mad_deadline(packet->msg, + (packet->mad.hdr.retries + 1) * + packet->mad.hdr.timeout_ms); + if (ret) + goto err_send; + } + ret = ib_post_send_mad(packet->msg, NULL); if (ret) goto err_send; diff --git a/include/uapi/rdma/ib_user_mad.h b/include/uapi/rdma/ib_user_mad.h index 10b5f6a4c677..1e9c20a44e50 100644 --- a/include/uapi/rdma/ib_user_mad.h +++ b/include/uapi/rdma/ib_user_mad.h @@ -57,7 +57,11 @@ * received (transaction ID in data[] will be set to TID of original * request) (ignored on send) * @timeout_ms - Milliseconds to wait for response (unset on receive) - * @retries - Number of automatic retries to attempt + * before issuing a retry + * @retries - Maximum number of automatic retries to attempt. Actual + * number of retries could be less if (@retries + 1) * @timeout_ms + * is exceeded. When the registration request sets @rmpp_version, + * it applies per RMPP window * @qpn - Remote QP number received from/to be sent to * @qkey - Remote Q_Key to be sent with (unset on receive) * @lid - Remote lid received from/to be sent to @@ -100,7 +104,11 @@ struct ib_user_mad_hdr_old { * received (transaction ID in data[] will be set to TID of original * request) (ignored on send) * @timeout_ms - Milliseconds to wait for response (unset on receive) - * @retries - Number of automatic retries to attempt + * before issuing a retry + * @retries - Maximum number of automatic retries to attempt. Actual + * number of retries could be less if (@retries + 1) * @timeout_ms + * is exceeded. When the registration request sets @rmpp_version, + * it applies per RMPP window * @qpn - Remote QP number received from/to be sent to * @qkey - Remote Q_Key to be sent with (unset on receive) * @lid - Remote lid received from/to be sent to From patchwork Thu Dec 5 13:49:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13895404 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3FD66219A66 for ; Thu, 5 Dec 2024 13:50:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406656; cv=none; b=G/6UE7sTnnzJKZzlA5uG6bHbeEgSd7gL8etpmfnztf3EmPtLL5fFWU+hnNSwum6ArRz9xleu5V613CEnFFJBFXqviKEQfYf6bR1w/l8uDHXq1MhnBTjwsOrv9g5MSK68rvo4QzRiKD5EOwEOkGSnKwGZfCK3blyfnUhQ8Td1eUU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406656; c=relaxed/simple; bh=RPbbxUBk6QMnTYSHV6a6heM1slVPEotMVolGHI6GXGc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kEwmEP0Fe9Zr7HxIVYnkkbxl8IPa7kXXQ8Xwr4BPfQPR7lbFwbc7s4ZxvVo2Nnvonquk+03HibFsYNY0EhBs98PY1jWG45NHuUpS689p6ue0Eb+eH+1KmwNHI7vuprAnwMiF04T9sgL64g+16MJwBcmxN1QfkrGY2P34d1EaMEY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nwjaOSm7; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nwjaOSm7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A5741C4CED1; Thu, 5 Dec 2024 13:50:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733406656; bh=RPbbxUBk6QMnTYSHV6a6heM1slVPEotMVolGHI6GXGc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=nwjaOSm7P7tb0ddFFW0+CpcLxx2AvmRtWbny5ssm5pdLrCxyLyh7VaTOpQsoE1GZZ gLJaQSaKYY6r2wGPBAmCniRJixqCwc/OYpnqOkluDbCqQm8V/gMm+nrptzWe7WiSic kxqMxGd3+U3+m89RA0udjMtck6RP33x6oYbLtwkuR3zAgRpJa9f0fXeaDu/oxQIYpH lM+6jfgxaTFwKfPEFtdISKRs+a0DLCpFu615PBGM54lQiAFFR404TVvDOc19/22muv Pi12gGK/U7mSBxnWB/GmZemQ4rhSZ08loLIJS++d7ShU0z/0IZ8rDUDJox3Jewnu2U pHGtEbKuK1TPg== From: Leon Romanovsky To: Jason Gunthorpe Cc: Vlad Dumitrescu , linux-rdma@vger.kernel.org, Sean Hefty Subject: [PATCH rdma-next 6/9] IB/cm: Set deadline when sending MADs Date: Thu, 5 Dec 2024 15:49:36 +0200 Message-ID: <94e82976688780ac43f5719d86c6630228c2e590.1733405453.git.leon@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Vlad Dumitrescu With the current MAD retry algorithm, the expected total timeout is roughly (retries + 1) * timeout_ms. This is an approximation because scheduling and completion delays are not strictly accounted for. For CM the number of retries is typically CMA_MAX_CM_RETRIES (15), unless the peer is setting REQ:Max CM Retries [1] to a different value. In theory, the timeout could vary, being based on CMA_CM_RESPONSE_TIMEOUT + Packet Life Time, as well as the peer's MRA messages. In practice, for RoCE, the formula above results in 65536ms. Based on the above, set a constant deadline to a round 70s, for all cases. Note that MRAs will end up calling ib_modify_mad which will extend the deadline accordingly. This allows changes to the MAD layer's internal retry algorithm without affecting the total timeout experienced by CM. [1] IBTA v1.7 - Section 12.7.27 - Max CM Retries Signed-off-by: Vlad Dumitrescu Reviewed-by: Sean Hefty Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/cm.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 142170473e75..36649faf9842 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -36,6 +36,7 @@ MODULE_LICENSE("Dual BSD/GPL"); #define CM_DESTROY_ID_WAIT_TIMEOUT 10000 /* msecs */ #define CM_DIRECT_RETRY_CTX ((void *) 1UL) +#define CM_MAD_TOTAL_TIMEOUT 70000 /* msecs */ static const char * const ibcm_rej_reason_strs[] = { [IB_CM_REJ_NO_QP] = "no QP", @@ -279,6 +280,7 @@ static struct ib_mad_send_buf *cm_alloc_msg(struct cm_id_private *cm_id_priv) struct ib_mad_agent *mad_agent; struct ib_mad_send_buf *m; struct ib_ah *ah; + int ret; lockdep_assert_held(&cm_id_priv->lock); @@ -309,6 +311,17 @@ static struct ib_mad_send_buf *cm_alloc_msg(struct cm_id_private *cm_id_priv) } m->ah = ah; + m->retries = cm_id_priv->max_cm_retries; + ret = ib_set_mad_deadline(m, CM_MAD_TOTAL_TIMEOUT); + if (ret) { + m = ERR_PTR(ret); + ib_free_send_mad(m); + rdma_destroy_ah(ah, 0); + goto out; + } + + refcount_inc(&cm_id_priv->refcount); + m->context[0] = cm_id_priv; out: spin_unlock(&cm_id_priv->av.port->cm_dev->mad_agent_lock); From patchwork Thu Dec 5 13:49:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13895405 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3FB52218EA1 for ; Thu, 5 Dec 2024 13:50:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406660; cv=none; b=I5ESN3bsSK8p6V60NGY2PnE4JWEmhJqrhf5Wat3T9EWGEppzEV0I74Be3bJo4mQ9vUy0UNJJzA9g9SBsN4jMZfPYdLHkZszR1jkbNRc5IPwSzX7iIGaZpCOIT8lj1+zTReRhLwTw3lnlMshnxw6e3Div3ZZ9t1ZitZ4SvY24YAg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406660; c=relaxed/simple; bh=dEL/2sq4ht9b5bEGXDI/G7/t5bb4jTSEvkntFOdpnHw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jHj+HsDmtFIDegd4zr2bTQ/glTdKMgyOprl5BH8ckFANKpwrtqkBG9KXnfSOOyddyaHGd9jKZ0ieL13ZDnaCJZLBMbG3xD0KImPTPpdIka1mH5rzpJVLSxcC3r1ADNfZetxnFU4Fm/Wdcx1m3wWYaYwGifttITXEwOP6tVRPs9Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OWnc9Q2j; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OWnc9Q2j" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5277CC4CED1; Thu, 5 Dec 2024 13:50:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733406659; bh=dEL/2sq4ht9b5bEGXDI/G7/t5bb4jTSEvkntFOdpnHw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OWnc9Q2j1p0/nW4yNHXuGCXhvNG+y7Euj91tivZxBm12MdRvxr7ajcB7hoQ9bTjuq G4dB8CniUdioJhGL7CELxa431Oz1nTHjixphp3+zqpZwOKgA29Qrs00o3EGM/Unqwa r97gjlsNdDHpiCSwR0RZ9d/7dvmyOrDOp+b8RZwhQKOSq3vJebEGS4ZSLJX4zKmkpG zs+BLhFYgAFZMLxAas7r0NaluEKOo5xu/JsNvIz+VvtVC3krmZ6yukDn7c6x7YTNul sKnyKOxBWargqnrYsWqjM41qdWP+8AylUiGdkRDM02qVj7Zw8Tt9a1YEzadlUQYPSv aug/Z7YjoKe5w== From: Leon Romanovsky To: Jason Gunthorpe Cc: Vlad Dumitrescu , linux-rdma@vger.kernel.org, Sean Hefty Subject: [PATCH rdma-next 7/9] IB/mad: Exponential backoff when retrying sends Date: Thu, 5 Dec 2024 15:49:37 +0200 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Vlad Dumitrescu When a receiver is overloaded, MAD requests time out and get retried in a linear fashion. This could worsen congestion and reduce goodput. To help reduce the load over time, use exponential backoff after a preset number of retries. Cap delays between retries at 60s, even when in exponential mode. MRA message from recipient could request an even higher timeout, so continue to respect that for the next retry. However, reset the backoff algorithm to the beginning when and MRA is received. Exclude RMPP and OPA from exponential backoff. Signed-off-by: Vlad Dumitrescu Reviewed-by: Sean Hefty Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/mad.c | 53 ++++++++++++++++++++++++++++-- drivers/infiniband/core/mad_priv.h | 3 ++ 2 files changed, 53 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 5c255ee3db38..a3a8cf4bbc20 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -54,7 +54,9 @@ #define CREATE_TRACE_POINTS #include -#define IB_MAD_MAX_DEADLINE (jiffies + msecs_to_jiffies(5 * 60 * 1000)) +#define IB_MAD_LINEAR_TIMEOUTS_DEFAULT 4 +#define IB_MAD_MAX_TIMEOUT_MS (60 * MSEC_PER_SEC) +#define IB_MAD_MAX_DEADLINE (jiffies + msecs_to_jiffies(5 * 60 * 1000)) #ifdef CONFIG_TRACEPOINTS static void create_mad_addr_info(struct ib_mad_send_wr_private *mad_send_wr, @@ -1210,10 +1212,12 @@ int ib_post_send_mad(struct ib_mad_send_buf *send_buf, } mad_send_wr->tid = ((struct ib_mad_hdr *) send_buf->mad)->tid; + mad_send_wr->var_timeout_ms = send_buf->timeout_ms; /* Timeout will be updated after send completes */ mad_send_wr->timeout = msecs_to_jiffies(send_buf->timeout_ms); mad_send_wr->max_retries = send_buf->retries; mad_send_wr->retries_left = send_buf->retries; + mad_send_wr->backoff_retries = 0; send_buf->retries = 0; mad_send_wr->status = IB_WC_SUCCESS; @@ -2662,18 +2666,34 @@ int ib_modify_mad(struct ib_mad_send_buf *send_buf, u32 timeout_ms) return -EINVAL; } - if (!timeout_ms) + if (!timeout_ms) { mad_send_wr->status = IB_WC_WR_FLUSH_ERR; + goto apply; + } + + /* CM MRA requesting a lower timeout than ours. Could be a delayed MRA + * (variable backoff increased in the meantime) or remote using a const. + */ + if (timeout_ms < mad_send_wr->var_timeout_ms) + goto ignore; + + /* Assume remote will no longer be overloaded after MRA Service Timeout + * passes and restart variable backoff algorithm. + */ + mad_send_wr->var_timeout_ms = mad_send_wr->send_buf.timeout_ms; + mad_send_wr->backoff_retries = 0; if (mad_send_wr->deadline) mad_send_wr->deadline += msecs_to_jiffies(timeout_ms); +apply: if (mad_send_wr->state == IB_MAD_STATE_SEND_START || (mad_send_wr->state == IB_MAD_STATE_QUEUED && timeout_ms)) mad_send_wr->timeout = msecs_to_jiffies(timeout_ms); else ib_reset_mad_timeout(mad_send_wr, timeout_ms); +ignore: spin_unlock_irqrestore(&mad_agent_priv->lock, flags); return 0; } @@ -2767,6 +2787,30 @@ static void local_completions(struct work_struct *work) spin_unlock_irqrestore(&mad_agent_priv->lock, flags); } +/* + * Applies a variable backoff to certain send MADs. + * + * Exists to scope down the initial variable backoff implementation. + */ +static void set_next_timeout(struct ib_mad_send_wr_private *mad_send_wr) +{ + const struct ib_mad_agent_private *agent = mad_send_wr->mad_agent_priv; + const struct ib_mad_port_private *port = agent->qp_info->port_priv; + const struct ib_mad_hdr *hdr = mad_send_wr->send_buf.mad; + + if (ib_mad_kernel_rmpp_agent(&agent->agent)) + return; + + if (hdr->base_version != IB_MGMT_BASE_VERSION) + return; + + if (++mad_send_wr->backoff_retries < READ_ONCE(port->linear_timeouts)) + return; + + mad_send_wr->var_timeout_ms = + min(mad_send_wr->var_timeout_ms << 1, IB_MAD_MAX_TIMEOUT_MS); +} + static int retry_send(struct ib_mad_send_wr_private *mad_send_wr) { int ret; @@ -2778,7 +2822,8 @@ static int retry_send(struct ib_mad_send_wr_private *mad_send_wr) mad_send_wr->retries_left--; mad_send_wr->send_buf.retries++; - mad_send_wr->timeout = msecs_to_jiffies(mad_send_wr->send_buf.timeout_ms); + set_next_timeout(mad_send_wr); + mad_send_wr->timeout = msecs_to_jiffies(mad_send_wr->var_timeout_ms); if (ib_mad_kernel_rmpp_agent(&mad_send_wr->mad_agent_priv->agent)) { ret = ib_retry_rmpp(mad_send_wr); @@ -3195,6 +3240,8 @@ static int ib_mad_port_open(struct ib_device *device, goto error8; } + port_priv->linear_timeouts = IB_MAD_LINEAR_TIMEOUTS_DEFAULT; + spin_lock_irqsave(&ib_mad_port_list_lock, flags); list_add_tail(&port_priv->port_list, &ib_mad_port_list); spin_unlock_irqrestore(&ib_mad_port_list_lock, flags); diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h index 24580ad2d428..076ebcea27b4 100644 --- a/drivers/infiniband/core/mad_priv.h +++ b/drivers/infiniband/core/mad_priv.h @@ -139,10 +139,12 @@ struct ib_mad_send_wr_private { struct ib_ud_wr send_wr; struct ib_sge sg_list[IB_MAD_SEND_REQ_MAX_SG]; __be64 tid; + unsigned int var_timeout_ms; unsigned long timeout; unsigned long deadline; int max_retries; int retries_left; + int backoff_retries; int retry; enum ib_wc_status status; @@ -222,6 +224,7 @@ struct ib_mad_port_private { struct ib_mad_mgmt_version_table version[MAX_MGMT_VERSION]; struct workqueue_struct *wq; struct ib_mad_qp_info qp_info[IB_MAD_QPS_CORE]; + u8 linear_timeouts; }; int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr); From patchwork Thu Dec 5 13:49:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13895408 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F7F421A42C for ; Thu, 5 Dec 2024 13:51:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406671; cv=none; b=F2oE/Wjv993qAjMjded8zqFv+Uubb90G8Q4Krv8eRqxKwTSBPRBIfgo6B4nXn9wWEfn0C18GQ/drtjoRhmX7SsHwf9Jp6wt8rZdQu46SyY/m52nkRJpPoAamD4LbW35trntu5SIQuL2/KkJq+Rcx2Jvlwo4FaeNQ0bQGQS+oiEs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406671; c=relaxed/simple; bh=kRqSU0RzZQS/L/t1ONQV1CkuInqx4FqFwHX0X0L5hUU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CCy278Z1Wx5OKbVRLB3BQ0DC/hyqLUYxqoIJgauf2BWLEBmkQjmAtrLBSeZ9KfbFQsiUQMMl5LkA/PoMbz9RrYUomQ8L0k9ZEp/InH3+vFfvOyXDyUZNUAh/vCd2o8vc3ySyc/2HarcG5GiH+qELe5f+lmuEqueeI4mW4E2oaPg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ntLcNwwM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ntLcNwwM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B73EDC4CEDD; Thu, 5 Dec 2024 13:51:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733406671; bh=kRqSU0RzZQS/L/t1ONQV1CkuInqx4FqFwHX0X0L5hUU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ntLcNwwM8YPF8uFBRJ9y+jFZyJZy9L+rQo9k2yC8F2Pqk9j3TTEl/psnYYb3Yj7uU eIEf2xiWNq0EhBri9F9/7wzfjanM4FxyEi1KF0Nz3hLfpk6mQgmbJtIgeTo4FhG8cF VrZQd1NGDOenH3PFf+Rix3PdnGB+TJUpRhjiGBVPbEyCSxUSlvVFoxDKq21HL9+FHC oN/I0h03Hdz62yoEGU6CP3cv+EnZWUvhY8PBcv1CAGZWHSM7DBipd0t32JVqjof/0R pVZ8wAUwv3BbXJGzuRW0hmaHvgSVe5wr0q9vzgW6TBoP1gfu/cq4v84tJw//t+isGO 4LnUy6HBgOAGA== From: Leon Romanovsky To: Jason Gunthorpe Cc: Vlad Dumitrescu , linux-rdma@vger.kernel.org, Sean Hefty Subject: [PATCH rdma-next 8/9] RDMA/nldev: Add mad-linear-timeouts management attribute Date: Thu, 5 Dec 2024 15:49:38 +0200 Message-ID: <5328045b50805d019606f724b439104bbef3ff69.1733405453.git.leon@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Vlad Dumitrescu This attribute allows system admins to make a trade-off between speed of recovery under transient loss and reducing congestion under persistent loss or overload. Set 15 as max value as it allows sys admins to effectively opt-out the CM from exponential backoff. CM is currently using CMA_MAX_CM_RETRIES (15) constant to set retries. Other MAD layer callers use different values (e.g., sa_query uses 10, UMAD exposes the parameter to userspace), but a max of 15 linear retries should be enough. Example: # rdma management show rocep1s0f1/1 1: rocep1s0f1: 1 mad-linear-timeouts 4 ... # rdma management set rocep1s0f1/1 mad-linear-timeouts 6 # rdma management show 0: rocep1s0f0: 1 mad-linear-timeouts 4 ... 1: rocep1s0f1: 1 mad-linear-timeouts 6 ... Signed-off-by: Vlad Dumitrescu Reviewed-by: Sean Hefty Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/mad.c | 35 ++++++++++++++++++++++++++++++ drivers/infiniband/core/mad_priv.h | 4 ++++ drivers/infiniband/core/nldev.c | 19 ++++++++++++++++ include/uapi/rdma/rdma_netlink.h | 2 ++ 4 files changed, 60 insertions(+) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index a3a8cf4bbc20..7c4ac8ae0a3f 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -54,7 +54,9 @@ #define CREATE_TRACE_POINTS #include +#define IB_MAD_LINEAR_TIMEOUTS_MIN 1 #define IB_MAD_LINEAR_TIMEOUTS_DEFAULT 4 +#define IB_MAD_LINEAR_TIMEOUTS_MAX 15 #define IB_MAD_MAX_TIMEOUT_MS (60 * MSEC_PER_SEC) #define IB_MAD_MAX_DEADLINE (jiffies + msecs_to_jiffies(5 * 60 * 1000)) @@ -145,6 +147,39 @@ ib_get_mad_port(struct ib_device *device, u32 port_num) return entry; } +int ib_mad_linear_timeouts_set(struct ib_device *dev, u32 port_num, u8 val, + struct netlink_ext_ack *extack) +{ + struct ib_mad_port_private *port = ib_get_mad_port(dev, port_num); + + if (!port) + return -ENODEV; + + if (val > IB_MAD_LINEAR_TIMEOUTS_MAX || + val < IB_MAD_LINEAR_TIMEOUTS_MIN) { + NL_SET_ERR_MSG_FMT_MOD(extack, "Valid range [%u-%u]", + IB_MAD_LINEAR_TIMEOUTS_MIN, + IB_MAD_LINEAR_TIMEOUTS_MAX); + return -EINVAL; + } + + WRITE_ONCE(port->linear_timeouts, val); + + return 0; +} + +int ib_mad_linear_timeouts_get(struct ib_device *dev, u32 port_num, u8 *val) +{ + struct ib_mad_port_private *port = ib_get_mad_port(dev, port_num); + + if (!port) + return -ENODEV; + + *val = READ_ONCE(port->linear_timeouts); + + return 0; +} + static inline u8 convert_mgmt_class(u8 mgmt_class) { /* Alias IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE to 0 */ diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h index 076ebcea27b4..e6b362c054a6 100644 --- a/drivers/infiniband/core/mad_priv.h +++ b/drivers/infiniband/core/mad_priv.h @@ -241,4 +241,8 @@ void ib_mark_mad_done(struct ib_mad_send_wr_private *mad_send_wr); void ib_reset_mad_timeout(struct ib_mad_send_wr_private *mad_send_wr, unsigned long timeout_ms); +int ib_mad_linear_timeouts_set(struct ib_device *dev, u32 port_num, u8 val, + struct netlink_ext_ack *extack); +int ib_mad_linear_timeouts_get(struct ib_device *dev, u32 port_num, u8 *val); + #endif /* __IB_MAD_PRIV_H__ */ diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c index 363742567dd2..acb02f8c87c0 100644 --- a/drivers/infiniband/core/nldev.c +++ b/drivers/infiniband/core/nldev.c @@ -172,6 +172,7 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = { [RDMA_NLDEV_ATTR_NAME_ASSIGN_TYPE] = { .type = NLA_U8 }, [RDMA_NLDEV_ATTR_EVENT_TYPE] = { .type = NLA_U8 }, [RDMA_NLDEV_MGMT_ATTR_SA_MIN_TIMEOUT] = { .type = NLA_U32 }, + [RDMA_NLDEV_MGMT_ATTR_MAD_LINEAR_TIMEOUTS] = { .type = NLA_U8 }, }; static int put_driver_name_print_type(struct sk_buff *msg, const char *name, @@ -2627,6 +2628,7 @@ static int nldev_mgmt_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh, { struct nlattr *tb[RDMA_NLDEV_ATTR_MAX]; struct ib_device *device; + u8 mad_linear_timeouts; struct sk_buff *msg; u32 index; u32 port; @@ -2657,6 +2659,10 @@ static int nldev_mgmt_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh, goto err; } + ret = ib_mad_linear_timeouts_get(device, port, &mad_linear_timeouts); + if (ret) + goto err; + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) { ret = -ENOMEM; @@ -2680,6 +2686,11 @@ static int nldev_mgmt_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh, goto err_msg; } + ret = nla_put_u8(msg, RDMA_NLDEV_MGMT_ATTR_MAD_LINEAR_TIMEOUTS, + mad_linear_timeouts); + if (ret) + goto err_msg; + nlmsg_end(msg, nlh); return rdma_nl_unicast(sock_net(skb->sk), msg, NETLINK_CB(skb).portid); @@ -2695,6 +2706,7 @@ static int nldev_set_mgmt_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh, { struct nlattr *tb[RDMA_NLDEV_ATTR_MAX]; struct ib_device *device; + u8 mad_linear_timeouts; u32 index; u32 port; u32 sa_min_timeout; @@ -2723,6 +2735,13 @@ static int nldev_set_mgmt_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh, extack); } + if (tb[RDMA_NLDEV_MGMT_ATTR_MAD_LINEAR_TIMEOUTS]) { + mad_linear_timeouts = nla_get_u8( + tb[RDMA_NLDEV_MGMT_ATTR_MAD_LINEAR_TIMEOUTS]); + return ib_mad_linear_timeouts_set(device, port, + mad_linear_timeouts, extack); + } + err: ib_device_put(device); return -EINVAL; diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h index 2b1c4c55e51f..d209a5973c8e 100644 --- a/include/uapi/rdma/rdma_netlink.h +++ b/include/uapi/rdma/rdma_netlink.h @@ -585,6 +585,8 @@ enum rdma_nldev_attr { RDMA_NLDEV_SYS_ATTR_MONITOR_MODE, /* u8 */ RDMA_NLDEV_MGMT_ATTR_SA_MIN_TIMEOUT, /* u32 */ + + RDMA_NLDEV_MGMT_ATTR_MAD_LINEAR_TIMEOUTS, /* u8 */ /* * Always the end */ From patchwork Thu Dec 5 13:49:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13895407 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80A82219A8B for ; Thu, 5 Dec 2024 13:51:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406667; cv=none; b=EJc8MnU6qPhfYWoMOtKjQn0VMdQj7mutFad8jpb62/wn7iapdZqqaX7jmSy4CjpJLC6gEQeJkjPopT6mKM+QNhbl7HhNGpTRcZHGJCuRrLM3JNaC/p9HXQwI8puENogZZEkDAmBZikyaO/PbmTXo7KTXkHC6q1QwgiSCkjU1q2c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733406667; c=relaxed/simple; bh=V554t4N8qQN+JBMC9l/npfafQer+nglMEicPSpXa1MQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PGvGW8njwCiOrMLVEwRVXKR01ipP9dDRXA9gWl9YyTfBPUUeA6KlSutVwGN8Bq+DAm+jBxjGXe0FI+QxA9SIu21yHOi9ZcWJDn8QkZoYx99W2TABajWXiDSXsB91OrnsBPopbN3YLCn5qiPSM0JTwn4Db0wMSBsHz7Oc+CwxeM8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=u03dUrsI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="u03dUrsI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 925A7C4CED1; Thu, 5 Dec 2024 13:51:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733406667; bh=V554t4N8qQN+JBMC9l/npfafQer+nglMEicPSpXa1MQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=u03dUrsIfYlSQsam3aK18R4jiMaI0Oxl/K1lfKptcxa9xFswXGDQ5DK6Dc2gDdTuM aJjMk9iBf1scwGnEhc4VOhqy4yV1YOgwuU9fSS1khnGUsRSGKPfcpxAz7M5wE/Af1u UGuBDUKrYCubVV9r/S665GAWeMOAKCI4L+WQdQkk4o9//YaFViCtRZi8miU/bzd6S1 tbqRfM9cddinRrfchVdu29LqqSqNSHKMpane/0lDigRZf4/oQ81AggrySD3TC6DL+W W55iMQJMWJQ7KcohutmZsnesDDzh5pZAbyIp2aqCixFsdS0o9kMNwkQ7ahN7TZDkfW PIEaPPKFTae6A== From: Leon Romanovsky To: Jason Gunthorpe Cc: Vlad Dumitrescu , linux-rdma@vger.kernel.org, Sean Hefty Subject: [PATCH rdma-next 9/9] IB/cma: Lower response timeout to roughly 1s Date: Thu, 5 Dec 2024 15:49:39 +0200 Message-ID: <7ff4bf69f132a204570eded9ff6788316df3e3b4.1733405453.git.leon@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Vlad Dumitrescu Current CMA_CM_RESPONSE_TIMEOUT converts to 4096ms, which is high for (low latency) RDMA networks. Match TCP's initial RTO of 1s (RFC 6298). Rely on the recently added MAD layer exponential backoff to counter-balance this reduction in case of persistent loss. Signed-off-by: Vlad Dumitrescu Reviewed-by: Sean Hefty Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/cma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 64ace0b968f0..ec4141b84351 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -44,7 +44,7 @@ MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("Generic RDMA CM Agent"); MODULE_LICENSE("Dual BSD/GPL"); -#define CMA_CM_RESPONSE_TIMEOUT 20 +#define CMA_CM_RESPONSE_TIMEOUT 18 #define CMA_MAX_CM_RETRIES 15 #define CMA_CM_MRA_SETTING (IB_CM_MRA_FLAG_DELAY | 24) #define CMA_IBOE_PACKET_LIFETIME 16