From patchwork Fri Nov 3 20:43:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 13445128 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30840C41535 for ; Fri, 3 Nov 2023 20:44:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229879AbjKCUoI (ORCPT ); Fri, 3 Nov 2023 16:44:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229484AbjKCUoH (ORCPT ); Fri, 3 Nov 2023 16:44:07 -0400 Received: from mail-oo1-xc29.google.com (mail-oo1-xc29.google.com [IPv6:2607:f8b0:4864:20::c29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3C79D5A for ; Fri, 3 Nov 2023 13:44:04 -0700 (PDT) Received: by mail-oo1-xc29.google.com with SMTP id 006d021491bc7-581ed744114so1289553eaf.0 for ; Fri, 03 Nov 2023 13:44:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699044244; x=1699649044; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QktSggJLQcetld8uakFuhfIq2ygrmNPQ2pKu14TqlYs=; b=AkZ8Z6zFx887gi9RcEdmb4wTYinmotRz2XMk+XR9TIdWBSgxfpk6Z+0m7di4QCTF6a SKbxXM01tex+xsAOV8mwk17h1JbVSrq0VknmA6O6BEz5W9R8Wh6p61enD4rsF71A2a8n p6CSuT3PquoAQCHygdGV8iEYoq8Ee5cZaZiulpeburj5JT4yl+E6TMRogK2WTQGFy8wN VSowxo0GgrMTzurzmXmFcrFSYrpABNK2sFV5JBHys66fnLOhrJTfCm+KBrUNStl4TSbN 45oJkQDIJwsxjs7KpNc3eX6pT1MnZOZ2AcYdgkDzLQfmRuwi1v330tGkjPo8s0kzvw7O 6UKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699044244; x=1699649044; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QktSggJLQcetld8uakFuhfIq2ygrmNPQ2pKu14TqlYs=; b=w8I6SiV6BKR1hTTy0LIwFfUc8JGLx9U2HNiPd1RFU/i40xSqHlBFIuzDdYFxLNvqfu zzhJ0qaqVdD1UmooZEDSqgIjpmG61vaJ7+VRzEkYu/ZLKodBM73h7B56jNW6dr8rHYlY nrkXzI/8QlTHQ9y2OaebZ+F7mvG0iUP7PsUmnAf2lRldX3TTo28xCTsTqcJhZmdifD6G PHCidwvezBZ/O9rtwR97QqtzZ7sSWXbKoOz/AmPX71LrAPRfRJLlDOB4zhtCGoRbvqap M4ElCw62TjOGAOzt/msF9D+vpe3w9BLvARESw8/gt7bFLOQVzTX2H9nQ1XYP8pMdW2C7 gUXA== X-Gm-Message-State: AOJu0YzE5WCbk6xYxf2NIJHLc50t+W/2yZHVumfeZcaPod4HRwqttfn6 cpTrH5KV914edJwZr1AvPwY= X-Google-Smtp-Source: AGHT+IEu9+Ggi69nWeiMeZvXNA6ZgIupEx6OX1cdXKIZhgIp9zLgk+me7K//dtFLX0YklX1u7T349g== X-Received: by 2002:a4a:c299:0:b0:56c:dce3:ce89 with SMTP id b25-20020a4ac299000000b0056cdce3ce89mr17847193ooq.5.1699044239432; Fri, 03 Nov 2023 13:43:59 -0700 (PDT) Received: from bob-3900x.lan (2603-8081-1405-679b-6bc0-11b9-c519-2c18.res6.spectrum.com. [2603:8081:1405:679b:6bc0:11b9:c519:2c18]) by smtp.gmail.com with ESMTPSA id v9-20020a4ae049000000b00581e5b78ce5sm447766oos.38.2023.11.03.13.43.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Nov 2023 13:43:59 -0700 (PDT) From: Bob Pearson To: jgg@nvidia.com, yanjun.zhu@linux.dev, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next 1/6] RDMA/rxe: Cleanup rxe_ah/av_chk_attr Date: Fri, 3 Nov 2023 15:43:20 -0500 Message-Id: <20231103204324.9606-2-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231103204324.9606-1-rpearsonhpe@gmail.com> References: <20231103204324.9606-1-rpearsonhpe@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Replace rxe_ah_chk_attr() and rxe_av_chk_attr() by a single routine rxe_chk_ah_attr(). Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_av.c | 43 ++++----------------------- drivers/infiniband/sw/rxe/rxe_loc.h | 3 +- drivers/infiniband/sw/rxe/rxe_qp.c | 4 +-- drivers/infiniband/sw/rxe/rxe_verbs.c | 5 ++-- 4 files changed, 12 insertions(+), 43 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c index 889d7adbd455..4ac17b8def28 100644 --- a/drivers/infiniband/sw/rxe/rxe_av.c +++ b/drivers/infiniband/sw/rxe/rxe_av.c @@ -14,45 +14,24 @@ void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av) memcpy(av->dmac, attr->roce.dmac, ETH_ALEN); } -static int chk_attr(void *obj, struct rdma_ah_attr *attr, bool obj_is_ah) +int rxe_chk_ah_attr(struct rxe_dev *rxe, struct rdma_ah_attr *attr) { const struct ib_global_route *grh = rdma_ah_read_grh(attr); - struct rxe_port *port; - struct rxe_dev *rxe; - struct rxe_qp *qp; - struct rxe_ah *ah; + struct rxe_port *port = &rxe->port; int type; - if (obj_is_ah) { - ah = obj; - rxe = to_rdev(ah->ibah.device); - } else { - qp = obj; - rxe = to_rdev(qp->ibqp.device); - } - - port = &rxe->port; - if (rdma_ah_get_ah_flags(attr) & IB_AH_GRH) { if (grh->sgid_index > port->attr.gid_tbl_len) { - if (obj_is_ah) - rxe_dbg_ah(ah, "invalid sgid index = %d\n", - grh->sgid_index); - else - rxe_dbg_qp(qp, "invalid sgid index = %d\n", - grh->sgid_index); + rxe_dbg_dev(rxe, "invalid sgid index = %d\n", + grh->sgid_index); return -EINVAL; } type = rdma_gid_attr_network_type(grh->sgid_attr); if (type < RDMA_NETWORK_IPV4 || type > RDMA_NETWORK_IPV6) { - if (obj_is_ah) - rxe_dbg_ah(ah, "invalid network type for rdma_rxe = %d\n", - type); - else - rxe_dbg_qp(qp, "invalid network type for rdma_rxe = %d\n", - type); + rxe_dbg_dev(rxe, "invalid network type for rdma_rxe = %d\n", + type); return -EINVAL; } } @@ -60,16 +39,6 @@ static int chk_attr(void *obj, struct rdma_ah_attr *attr, bool obj_is_ah) return 0; } -int rxe_av_chk_attr(struct rxe_qp *qp, struct rdma_ah_attr *attr) -{ - return chk_attr(qp, attr, false); -} - -int rxe_ah_chk_attr(struct rxe_ah *ah, struct rdma_ah_attr *attr) -{ - return chk_attr(ah, attr, true); -} - void rxe_av_from_attr(u8 port_num, struct rxe_av *av, struct rdma_ah_attr *attr) { diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 4d2a8ef52c85..3d2504a0ae56 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -9,8 +9,7 @@ /* rxe_av.c */ void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av); -int rxe_av_chk_attr(struct rxe_qp *qp, struct rdma_ah_attr *attr); -int rxe_ah_chk_attr(struct rxe_ah *ah, struct rdma_ah_attr *attr); +int rxe_chk_ah_attr(struct rxe_dev *rxe, struct rdma_ah_attr *attr); void rxe_av_from_attr(u8 port_num, struct rxe_av *av, struct rdma_ah_attr *attr); void rxe_av_to_attr(struct rxe_av *av, struct rdma_ah_attr *attr); diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c index 28e379c108bc..c28005db032d 100644 --- a/drivers/infiniband/sw/rxe/rxe_qp.c +++ b/drivers/infiniband/sw/rxe/rxe_qp.c @@ -456,11 +456,11 @@ int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp *qp, goto err1; } - if (mask & IB_QP_AV && rxe_av_chk_attr(qp, &attr->ah_attr)) + if (mask & IB_QP_AV && rxe_chk_ah_attr(rxe, &attr->ah_attr)) goto err1; if (mask & IB_QP_ALT_PATH) { - if (rxe_av_chk_attr(qp, &attr->alt_ah_attr)) + if (rxe_chk_ah_attr(rxe, &attr->alt_ah_attr)) goto err1; if (!rdma_is_port_valid(&rxe->ib_dev, attr->alt_port_num)) { rxe_dbg_qp(qp, "invalid alt port %d\n", attr->alt_port_num); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 48f86839d36a..6706d540f1f6 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -286,7 +286,7 @@ static int rxe_create_ah(struct ib_ah *ibah, /* create index > 0 */ ah->ah_num = ah->elem.index; - err = rxe_ah_chk_attr(ah, init_attr->ah_attr); + err = rxe_chk_ah_attr(rxe, init_attr->ah_attr); if (err) { rxe_dbg_ah(ah, "bad attr"); goto err_cleanup; @@ -322,10 +322,11 @@ static int rxe_create_ah(struct ib_ah *ibah, static int rxe_modify_ah(struct ib_ah *ibah, struct rdma_ah_attr *attr) { + struct rxe_dev *rxe = to_rdev(ibah->device); struct rxe_ah *ah = to_rah(ibah); int err; - err = rxe_ah_chk_attr(ah, attr); + err = rxe_chk_ah_attr(rxe, attr); if (err) { rxe_dbg_ah(ah, "bad attr"); goto err_out; From patchwork Fri Nov 3 20:43:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 13445126 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16B36C4332F for ; Fri, 3 Nov 2023 20:44:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229850AbjKCUoG (ORCPT ); Fri, 3 Nov 2023 16:44:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229436AbjKCUoG (ORCPT ); Fri, 3 Nov 2023 16:44:06 -0400 Received: from mail-oo1-xc30.google.com (mail-oo1-xc30.google.com [IPv6:2607:f8b0:4864:20::c30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 056ACA2 for ; Fri, 3 Nov 2023 13:44:03 -0700 (PDT) Received: by mail-oo1-xc30.google.com with SMTP id 006d021491bc7-581de3e691dso1191313eaf.3 for ; Fri, 03 Nov 2023 13:44:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699044240; x=1699649040; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bI9XD96G3rYBhxmVpBLEJlvEkcRwwcRbcBDXTTzKEj8=; b=esJzcmQPVSUuXqiwhFncT9ea7tME7c/1rCcuLdudhEXe/GBf9nGkh7fjctf2+8hY+r +I4ACtZIlyRBhd+sO+wo73cgJohHPKDq2flKTLtCzrGp1VbSL3U4SLI2BnkRf3s+/3QW yZa5Dad79mqYa3J/F3XX8HpXR5c/6roHlj8ci95hVIwsiUgd9LUjUPlwDNroMo9kbzrJ 6bCSS3izlCu+AoWTYEXacM97wS7LrpoKbO2k2kYsmj6SQO0eCKFm2kSR6OhVR6zl4G2E yxoZ+c8qd2phbHc36WE6U5kLJCOR4JR0zj7G+G5KERhbXiLCLOQVhVQ/csqvqzwz3/PW Wucg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699044240; x=1699649040; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bI9XD96G3rYBhxmVpBLEJlvEkcRwwcRbcBDXTTzKEj8=; b=F8+dcj4yHbmngaH9ENCbca+7CLPh7JKhS/5mryuvpxtZJavhNxNsxTsXb8xut+rdLw dS5Kpb1AmV+hPwtE36XRWuF4Hqd4f7gKby7BtVJDlgDO2HXtkgA+Btz4/lJDdASsitfY /1cGZUWN5UNzEZBWBIvT1lGWO5IEblLRz9s/qJF0tMdG58OYddW4nPc4DkRGlcL/IJxV 0UzVSb6VAI+VvZwQLGhAgsYvlGjw+FUh0/o0EUQIig4jke51LqK3b2us127DDkfp/yQr 6fqUjUArYN6pFIFnEW4BsDoeiUzHNt3Zy24b5lvaSy4nF4dpuTJVNUIMnAFKpuYyPeuP C63w== X-Gm-Message-State: AOJu0YwGE9LO9j5AnopbTDM3Z8UPQsBj00zrt4AG3PCdj298VxLqSzRx DJACzz4deO9oHGknBfuHTlc= X-Google-Smtp-Source: AGHT+IHNi7BEeoLKz9xWjMsgqOqWlLtKIEudka1XTJCd7KEDkYsJMGEW7mA8vUjbiBpc49lgqSZsHw== X-Received: by 2002:a4a:c884:0:b0:586:881d:ea25 with SMTP id t4-20020a4ac884000000b00586881dea25mr23279524ooq.6.1699044240312; Fri, 03 Nov 2023 13:44:00 -0700 (PDT) Received: from bob-3900x.lan (2603-8081-1405-679b-6bc0-11b9-c519-2c18.res6.spectrum.com. [2603:8081:1405:679b:6bc0:11b9:c519:2c18]) by smtp.gmail.com with ESMTPSA id v9-20020a4ae049000000b00581e5b78ce5sm447766oos.38.2023.11.03.13.43.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Nov 2023 13:43:59 -0700 (PDT) From: Bob Pearson To: jgg@nvidia.com, yanjun.zhu@linux.dev, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next 2/6] RDMA/rxe: Handle loopback of mcast packets Date: Fri, 3 Nov 2023 15:43:21 -0500 Message-Id: <20231103204324.9606-3-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231103204324.9606-1-rpearsonhpe@gmail.com> References: <20231103204324.9606-1-rpearsonhpe@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Add a mask bit to indicate that a multicast packet has been locally sent and use to set the correct qpn for multicast packets. Add code to rxe_xmit_packet() to correctly handle multicast packets which must be sent on the wire and also duplicated to any local qps which may belong the multicast group, but not including the sender. Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c") Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_av.c | 7 +++++++ drivers/infiniband/sw/rxe/rxe_loc.h | 1 + drivers/infiniband/sw/rxe/rxe_net.c | 25 ++++++++++++++++++++++++- drivers/infiniband/sw/rxe/rxe_opcode.h | 2 +- drivers/infiniband/sw/rxe/rxe_recv.c | 4 ++++ drivers/infiniband/sw/rxe/rxe_req.c | 11 +++++++++-- 6 files changed, 46 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c index 4ac17b8def28..022173eb5d75 100644 --- a/drivers/infiniband/sw/rxe/rxe_av.c +++ b/drivers/infiniband/sw/rxe/rxe_av.c @@ -7,6 +7,13 @@ #include "rxe.h" #include "rxe_loc.h" +bool rxe_is_mcast_av(struct rxe_av *av) +{ + struct in6_addr *daddr = (struct in6_addr *)av->grh.dgid.raw; + + return rdma_is_multicast_addr(daddr); +} + void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av) { rxe_av_from_attr(rdma_ah_get_port_num(attr), av, attr); diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 3d2504a0ae56..62b2b25903fc 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -8,6 +8,7 @@ #define RXE_LOC_H /* rxe_av.c */ +bool rxe_is_mcast_av(struct rxe_av *av); void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av); int rxe_chk_ah_attr(struct rxe_dev *rxe, struct rdma_ah_attr *attr); void rxe_av_from_attr(u8 port_num, struct rxe_av *av, diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index cd59666158b1..2fad56fc95e7 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -412,6 +412,27 @@ static int rxe_loopback(struct sk_buff *skb, struct rxe_pkt_info *pkt) return 0; } +/* for a multicast packet must send remotely and looback to any local qps + * that may belong to the mcast group + */ +static int rxe_loop_and_send(struct sk_buff *skb, struct rxe_pkt_info *pkt) +{ + struct sk_buff *cskb; + int err, loc_err = 0; + + if (atomic_read(&pkt->rxe->mcg_num)) { + loc_err = -ENOMEM; + cskb = skb_clone(skb, GFP_KERNEL); + if (cskb) + loc_err = rxe_loopback(cskb, pkt); + } + + err = rxe_send(skb, pkt); + if (loc_err) + err = loc_err; + return err; +} + int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt, struct sk_buff *skb) { @@ -431,7 +452,9 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt, rxe_icrc_generate(skb, pkt); - if (pkt->mask & RXE_LOOPBACK_MASK) + if (pkt->mask & RXE_MCAST_MASK) + err = rxe_loop_and_send(skb, pkt); + else if (pkt->mask & RXE_LOOPBACK_MASK) err = rxe_loopback(skb, pkt); else err = rxe_send(skb, pkt); diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.h b/drivers/infiniband/sw/rxe/rxe_opcode.h index 5686b691d6b8..c4cf672ea26d 100644 --- a/drivers/infiniband/sw/rxe/rxe_opcode.h +++ b/drivers/infiniband/sw/rxe/rxe_opcode.h @@ -85,7 +85,7 @@ enum rxe_hdr_mask { RXE_END_MASK = BIT(NUM_HDR_TYPES + 11), RXE_LOOPBACK_MASK = BIT(NUM_HDR_TYPES + 12), - + RXE_MCAST_MASK = BIT(NUM_HDR_TYPES + 13), RXE_ATOMIC_WRITE_MASK = BIT(NUM_HDR_TYPES + 14), RXE_READ_OR_ATOMIC_MASK = (RXE_READ_MASK | RXE_ATOMIC_MASK), diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c index 5861e4244049..7153de0799fc 100644 --- a/drivers/infiniband/sw/rxe/rxe_recv.c +++ b/drivers/infiniband/sw/rxe/rxe_recv.c @@ -217,6 +217,10 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) list_for_each_entry(mca, &mcg->qp_list, qp_list) { qp = mca->qp; + /* don't reply packet to sender if locally sent */ + if (pkt->mask & RXE_MCAST_MASK && qp_num(qp) == deth_sqp(pkt)) + continue; + /* validate qp for incoming packet */ err = check_type_state(rxe, pkt, qp); if (err) diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c index d8c41fd626a9..599bec88cb54 100644 --- a/drivers/infiniband/sw/rxe/rxe_req.c +++ b/drivers/infiniband/sw/rxe/rxe_req.c @@ -442,8 +442,12 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp, (pkt->mask & (RXE_WRITE_MASK | RXE_IMMDT_MASK)) == (RXE_WRITE_MASK | RXE_IMMDT_MASK)); - qp_num = (pkt->mask & RXE_DETH_MASK) ? ibwr->wr.ud.remote_qpn : - qp->attr.dest_qp_num; + if (pkt->mask & RXE_MCAST_MASK) + qp_num = IB_MULTICAST_QPN; + else if (pkt->mask & RXE_DETH_MASK) + qp_num = ibwr->wr.ud.remote_qpn; + else + qp_num = qp->attr.dest_qp_num; ack_req = ((pkt->mask & RXE_END_MASK) || (qp->req.noack_pkts++ > RXE_MAX_PKT_PER_ACK)); @@ -809,6 +813,9 @@ int rxe_requester(struct rxe_qp *qp) goto err; } + if (rxe_is_mcast_av(av)) + pkt.mask |= RXE_MCAST_MASK; + skb = init_req_packet(qp, av, wqe, opcode, payload, &pkt); if (unlikely(!skb)) { rxe_dbg_qp(qp, "Failed allocating skb\n"); From patchwork Fri Nov 3 20:43:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 13445129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94176C4167B for ; Fri, 3 Nov 2023 20:44:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229910AbjKCUoI (ORCPT ); Fri, 3 Nov 2023 16:44:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229879AbjKCUoH (ORCPT ); Fri, 3 Nov 2023 16:44:07 -0400 Received: from mail-oo1-xc35.google.com (mail-oo1-xc35.google.com [IPv6:2607:f8b0:4864:20::c35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB12DD55 for ; Fri, 3 Nov 2023 13:44:03 -0700 (PDT) Received: by mail-oo1-xc35.google.com with SMTP id 006d021491bc7-5875c300becso1312028eaf.0 for ; Fri, 03 Nov 2023 13:44:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699044242; x=1699649042; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=onyBDs2hKF+yBSOnXMmcCbSkYhsJtvSRzOBTKUXR9Rc=; b=Bhqz58YtnP7ugginGBOgCSHcIesoTFPZPqdba583blR5dkdRYKoKTlnDrXtKXUNpIR 9CU2dC5G9T4uoX5+Eh9Z2itwM9wtEjyCaAgwAg4DSSUM18EUppOfAv9YoQKaMRL8PxmV 0LQzjOQf6bJArnrciO7TdlVdHlz/1D2fbOOApghN32kUKejXPiPwrGqXX9VuJKhpqDR2 eTk+GTEs4alxFu6rrmqxktsMBLKTkVAujlg0Ex3/+awbVaqr/Piz+ncETZxlBzexYXvo rWTlCZIGzG6J0EkfQ8AZAKgvvCJJRNVxaNJtghsatiMkwT9lamIV991AGp8HXS2xMT8L Bk1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699044242; x=1699649042; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=onyBDs2hKF+yBSOnXMmcCbSkYhsJtvSRzOBTKUXR9Rc=; b=dxq8obXHEeqFxIFwHa/GjhIXsXvEKuQmDwvXVzQcjE/N45Yf+i586iRyZOMfTDRjaX 5lLxyQPtO6yud6/5skQU0u6u3uWKrDVyyqpzrYT1IKfz2zG22tiKN3zUvF+lcukJrLdj OVpidT3Qm7Ft1Mq10sijJkZ0yk+1+zZdc8PXzJGqj+YEIxt7b1tS+AOfOFpdi4w5CFTo i9/gJHZXssVTyxKBWGFOjIiWM/2TheDVL5Cko2RHo4J9OmWP3STXa76fnDvpC3sfHRLM Rd21EzlFMXhSBbUTa+5xJyFwq0maqZ38ZXVgZx4lgVrooc2CTCq25+UDxMmVke5/5yT1 +kgQ== X-Gm-Message-State: AOJu0Yy9zs7K49q4ZemKie4yLbJO5Ew7d22QLWXfgPFwpMcvmu91rV4K i8lfZixduGPQaGy/4UtAKcQ= X-Google-Smtp-Source: AGHT+IFN5Bm8xE6tc6MYKiRN+Dwn4Qm9Y7oyOp6a4wBqihBaxCUY6IYnoy8UMU7xuofiapME0p6w7Q== X-Received: by 2002:a05:6820:200e:b0:57b:469d:8af6 with SMTP id by14-20020a056820200e00b0057b469d8af6mr24767736oob.4.1699044241358; Fri, 03 Nov 2023 13:44:01 -0700 (PDT) Received: from bob-3900x.lan (2603-8081-1405-679b-6bc0-11b9-c519-2c18.res6.spectrum.com. [2603:8081:1405:679b:6bc0:11b9:c519:2c18]) by smtp.gmail.com with ESMTPSA id v9-20020a4ae049000000b00581e5b78ce5sm447766oos.38.2023.11.03.13.44.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Nov 2023 13:44:00 -0700 (PDT) From: Bob Pearson To: jgg@nvidia.com, yanjun.zhu@linux.dev, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next 3/6] RDMA/rxe: Register IP mcast address Date: Fri, 3 Nov 2023 15:43:22 -0500 Message-Id: <20231103204324.9606-4-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231103204324.9606-1-rpearsonhpe@gmail.com> References: <20231103204324.9606-1-rpearsonhpe@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Add code to rxe_mcast_add() and rxe_mcast_del() to register/deregister the IP multicast address. This is required for multicast traffic to reach the rxe driver. Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c") Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_mcast.c | 110 +++++++++++++++++++++----- drivers/infiniband/sw/rxe/rxe_net.c | 2 +- drivers/infiniband/sw/rxe/rxe_net.h | 1 + drivers/infiniband/sw/rxe/rxe_verbs.h | 1 + 4 files changed, 93 insertions(+), 21 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c index 86cc2e18a7fd..ec757b955979 100644 --- a/drivers/infiniband/sw/rxe/rxe_mcast.c +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c @@ -19,38 +19,107 @@ * mcast packets in the rxe receive path. */ +#include + #include "rxe.h" -/** - * rxe_mcast_add - add multicast address to rxe device - * @rxe: rxe device object - * @mgid: multicast address as a gid - * - * Returns 0 on success else an error - */ -static int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid) +/* register mcast IP and MAC addresses with net stack */ +static int rxe_mcast_add6(struct rxe_dev *rxe, union ib_gid *mgid) { unsigned char ll_addr[ETH_ALEN]; + struct in6_addr *addr6 = (struct in6_addr *)mgid; + int err; + + rtnl_lock(); + err = ipv6_sock_mc_join(recv_sockets.sk6->sk, rxe->ndev->ifindex, + addr6); + rtnl_unlock(); + if (err && err != -EADDRINUSE) + goto err_out; ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr); + err = dev_mc_add(rxe->ndev, ll_addr); + if (err) + goto err_drop; + + return 0; - return dev_mc_add(rxe->ndev, ll_addr); +err_drop: + ipv6_sock_mc_drop(recv_sockets.sk6->sk, rxe->ndev->ifindex, addr6); +err_out: + return err; } -/** - * rxe_mcast_del - delete multicast address from rxe device - * @rxe: rxe device object - * @mgid: multicast address as a gid - * - * Returns 0 on success else an error - */ -static int rxe_mcast_del(struct rxe_dev *rxe, union ib_gid *mgid) +static int rxe_mcast_add(struct rxe_mcg *mcg) { + struct rxe_dev *rxe = mcg->rxe; + union ib_gid *mgid = &mcg->mgid; + struct ip_mreqn imr = {}; unsigned char ll_addr[ETH_ALEN]; + int err; + + if (mcg->is_ipv6) + return rxe_mcast_add6(rxe, mgid); + + imr.imr_multiaddr = *(struct in_addr *)(mgid->raw + 12); + imr.imr_ifindex = rxe->ndev->ifindex; + rtnl_lock(); + err = ip_mc_join_group(recv_sockets.sk4->sk, &imr); + rtnl_unlock(); + if (err && err != -EADDRINUSE) + goto err_out; + + ip_eth_mc_map(imr.imr_multiaddr.s_addr, ll_addr); + err = dev_mc_add(rxe->ndev, ll_addr); + if (err) + goto err_leave; + + return 0; + +err_leave: + ip_mc_leave_group(recv_sockets.sk4->sk, &imr); +err_out: + return err; +} + +/* deregister mcast IP and MAC addresses with net stack */ +static int rxe_mcast_del6(struct rxe_dev *rxe, union ib_gid *mgid) +{ + unsigned char ll_addr[ETH_ALEN]; + int err, err2; ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr); + err = dev_mc_del(rxe->ndev, ll_addr); + + rtnl_lock(); + err2 = ipv6_sock_mc_drop(recv_sockets.sk6->sk, + rxe->ndev->ifindex, (struct in6_addr *)mgid); + rtnl_unlock(); + + return err ?: err2; +} + +static int rxe_mcast_del(struct rxe_mcg *mcg) +{ + struct rxe_dev *rxe = mcg->rxe; + union ib_gid *mgid = &mcg->mgid; + struct ip_mreqn imr = {}; + unsigned char ll_addr[ETH_ALEN]; + int err, err2; + + if (mcg->is_ipv6) + return rxe_mcast_del6(rxe, mgid); + + imr.imr_multiaddr = *(struct in_addr *)(mgid->raw + 12); + imr.imr_ifindex = rxe->ndev->ifindex; + ip_eth_mc_map(imr.imr_multiaddr.s_addr, ll_addr); + err = dev_mc_del(rxe->ndev, ll_addr); + + rtnl_lock(); + err2 = ip_mc_leave_group(recv_sockets.sk4->sk, &imr); + rtnl_unlock(); - return dev_mc_del(rxe->ndev, ll_addr); + return err ?: err2; } /** @@ -164,6 +233,7 @@ static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid, { kref_init(&mcg->ref_cnt); memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid)); + mcg->is_ipv6 = !ipv6_addr_v4mapped((struct in6_addr *)mgid); INIT_LIST_HEAD(&mcg->qp_list); mcg->rxe = rxe; @@ -225,7 +295,7 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid) spin_unlock_bh(&rxe->mcg_lock); /* add mcast address outside of lock */ - err = rxe_mcast_add(rxe, mgid); + err = rxe_mcast_add(mcg); if (!err) return mcg; @@ -273,7 +343,7 @@ static void __rxe_destroy_mcg(struct rxe_mcg *mcg) static void rxe_destroy_mcg(struct rxe_mcg *mcg) { /* delete mcast address outside of lock */ - rxe_mcast_del(mcg->rxe, &mcg->mgid); + rxe_mcast_del(mcg); spin_lock_bh(&mcg->rxe->mcg_lock); __rxe_destroy_mcg(mcg); diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index 2fad56fc95e7..36617d07fddf 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -18,7 +18,7 @@ #include "rxe_net.h" #include "rxe_loc.h" -static struct rxe_recv_sockets recv_sockets; +struct rxe_recv_sockets recv_sockets; static struct dst_entry *rxe_find_route4(struct rxe_qp *qp, struct net_device *ndev, diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h index 45d80d00f86b..89cee7d5340f 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.h +++ b/drivers/infiniband/sw/rxe/rxe_net.h @@ -15,6 +15,7 @@ struct rxe_recv_sockets { struct socket *sk4; struct socket *sk6; }; +extern struct rxe_recv_sockets recv_sockets; int rxe_net_add(const char *ibdev_name, struct net_device *ndev); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index ccb9d19ffe8a..7be9e6232dd9 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -352,6 +352,7 @@ struct rxe_mcg { atomic_t qp_num; u32 qkey; u16 pkey; + bool is_ipv6; }; struct rxe_mca { From patchwork Fri Nov 3 20:43:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 13445127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00EF4C4167D for ; Fri, 3 Nov 2023 20:44:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229436AbjKCUoH (ORCPT ); Fri, 3 Nov 2023 16:44:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229484AbjKCUoG (ORCPT ); Fri, 3 Nov 2023 16:44:06 -0400 Received: from mail-ot1-x334.google.com (mail-ot1-x334.google.com [IPv6:2607:f8b0:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B47BAD53 for ; Fri, 3 Nov 2023 13:44:03 -0700 (PDT) Received: by mail-ot1-x334.google.com with SMTP id 46e09a7af769-6ce532451c7so1365195a34.2 for ; Fri, 03 Nov 2023 13:44:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699044243; x=1699649043; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Euuk1CoKdsLX4uJckCJJnbeWBSM5LTqAOY/mINYTWMU=; b=OycQNzNFakPjq4Gyae2o/LL2diUQIarcINS5zILDTiOXGeV+XkcCIU09gRiHdbLfmo pdpC5qQ95jIj8QLZgYD5T8SY5EM1Vl5TUIeBkyBUAX7P2Gk8pwm8AIsDsCGIrIZFlq2c XmOZ8wqzg9L/fplRyXLJDxxkcSvxvoUB4uRicQaBm+jbTfortZTmQqpmxxBK2zM7jEIm bntaYT3gr8yQIFmwiU5YSM0fm8uXFUrafXNJggQCE72wjaV6qUnG3Qj4G90Vm0UIh0C5 3MdGrawA9LAmalez1XN0wer+CiF3dM3sjpy6EHFzveEHwbyjdeANgJq+YehVtlt7c0F8 uuNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699044243; x=1699649043; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Euuk1CoKdsLX4uJckCJJnbeWBSM5LTqAOY/mINYTWMU=; b=CQywwFIsufftG+/SXYJTEa2ZNL31Xp6POfMKqk/b2JYINxWHQjI0CvVJ8/ms7rWFx1 H0sqUddi2g2BZxvG0QbQ57qWspbaNnWkw6WyXa67FkAD/BBpyuZJa2PAn315zcUEyWPf 3jx8N9rsBbE24InLFOB+3Z8tt1WZz7S9c0GGHfZ7+wWGQOq/h66KIMpF5rOeQGAR5MPx DmkmeZARpSf1prDDF/4D8zAh1lQo9Nuq0lBLF6sRUhMCusJxG5CcAJwvUH4eDq5gjCLh b3xHKK0+ETkP/gQ1jKVMBt0jzJplN8MQuX/hDxttWdmqMEcP+C0R6trbX1CBFXnIcDpq jk5A== X-Gm-Message-State: AOJu0YzU9lycmnQIye2eaKSJ0/O3uwxMBHuVoi2wBF07vzeJBNx/gKke 2q4f++HXhrlu0pkqBNaudv289c/1Td06Rg== X-Google-Smtp-Source: AGHT+IFXnBgBgsSDPHB9xS+Xcyn7BwhgM5Twm49UF0UP/V42cJEfpwPQGtsBEswNk66PXweC0v1n3A== X-Received: by 2002:a9d:6390:0:b0:6d3:1e5a:d928 with SMTP id w16-20020a9d6390000000b006d31e5ad928mr8616796otk.9.1699044243039; Fri, 03 Nov 2023 13:44:03 -0700 (PDT) Received: from bob-3900x.lan (2603-8081-1405-679b-6bc0-11b9-c519-2c18.res6.spectrum.com. [2603:8081:1405:679b:6bc0:11b9:c519:2c18]) by smtp.gmail.com with ESMTPSA id v9-20020a4ae049000000b00581e5b78ce5sm447766oos.38.2023.11.03.13.44.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Nov 2023 13:44:02 -0700 (PDT) From: Bob Pearson To: jgg@nvidia.com, yanjun.zhu@linux.dev, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next 4/6] RDMA/rxe: Let rxe_lookup_mcg use rcu_read_lock Date: Fri, 3 Nov 2023 15:43:23 -0500 Message-Id: <20231103204324.9606-5-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231103204324.9606-1-rpearsonhpe@gmail.com> References: <20231103204324.9606-1-rpearsonhpe@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Change locking of read side operations of the multicast group red-black tree to use rcu read locking. This will allow changing the mcast lock in the next patch to be changed to a mutex without breaking rxe_recv.c. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_mcast.c | 35 +++++++-------------------- drivers/infiniband/sw/rxe/rxe_verbs.h | 1 + 2 files changed, 10 insertions(+), 26 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c index ec757b955979..d7b8e31ab480 100644 --- a/drivers/infiniband/sw/rxe/rxe_mcast.c +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c @@ -148,7 +148,7 @@ static void __rxe_insert_mcg(struct rxe_mcg *mcg) link = &(*link)->rb_right; } - rb_link_node(&mcg->node, node, link); + rb_link_node_rcu(&mcg->node, node, link); rb_insert_color(&mcg->node, tree); } @@ -164,14 +164,13 @@ static void __rxe_remove_mcg(struct rxe_mcg *mcg) } /** - * __rxe_lookup_mcg - lookup mcg in rxe->mcg_tree while holding lock + * rxe_lookup_mcg - lookup mcg in rxe->mcg_tree while holding lock * @rxe: rxe device object * @mgid: multicast IP address * - * Context: caller must hold rxe->mcg_lock * Returns: mcg on success and takes a ref to mcg else NULL */ -static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe, +struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid) { struct rb_root *tree = &rxe->mcg_tree; @@ -179,7 +178,8 @@ static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe, struct rb_node *node; int cmp; - node = tree->rb_node; + rcu_read_lock(); + node = rcu_dereference_raw(tree->rb_node); while (node) { mcg = rb_entry(node, struct rxe_mcg, node); @@ -187,12 +187,13 @@ static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe, cmp = memcmp(&mcg->mgid, mgid, sizeof(*mgid)); if (cmp > 0) - node = node->rb_left; + node = rcu_dereference_raw(node->rb_left); else if (cmp < 0) - node = node->rb_right; + node = rcu_dereference_raw(node->rb_right); else break; } + rcu_read_unlock(); if (node) { kref_get(&mcg->ref_cnt); @@ -202,24 +203,6 @@ static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe, return NULL; } -/** - * rxe_lookup_mcg - lookup up mcg in red-back tree - * @rxe: rxe device object - * @mgid: multicast IP address - * - * Returns: mcg if found else NULL - */ -struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid) -{ - struct rxe_mcg *mcg; - - spin_lock_bh(&rxe->mcg_lock); - mcg = __rxe_lookup_mcg(rxe, mgid); - spin_unlock_bh(&rxe->mcg_lock); - - return mcg; -} - /** * __rxe_init_mcg - initialize a new mcg * @rxe: rxe device @@ -313,7 +296,7 @@ void rxe_cleanup_mcg(struct kref *kref) { struct rxe_mcg *mcg = container_of(kref, typeof(*mcg), ref_cnt); - kfree(mcg); + kfree_rcu(mcg, rcu); } /** diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 7be9e6232dd9..8058e5039322 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -345,6 +345,7 @@ struct rxe_mw { struct rxe_mcg { struct rb_node node; + struct rcu_head rcu; struct kref ref_cnt; struct rxe_dev *rxe; struct list_head qp_list; From patchwork Fri Nov 3 20:43:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 13445130 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 277E2C4332F for ; Fri, 3 Nov 2023 20:44:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229484AbjKCUoJ (ORCPT ); Fri, 3 Nov 2023 16:44:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229954AbjKCUoI (ORCPT ); Fri, 3 Nov 2023 16:44:08 -0400 Received: from mail-ot1-x333.google.com (mail-ot1-x333.google.com [IPv6:2607:f8b0:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE7A2A2 for ; Fri, 3 Nov 2023 13:44:04 -0700 (PDT) Received: by mail-ot1-x333.google.com with SMTP id 46e09a7af769-6ce2ee17cb5so1528392a34.2 for ; Fri, 03 Nov 2023 13:44:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699044244; x=1699649044; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xvg8YVii2Cfv5M5FzO/VP9LK/tQ78O2pSYd5b5nzYnQ=; b=aULExI5e8zNzxA5tb9mFN5cH8d5YYn4Npn4yFt+RxHUgd3R1PK9NFRKrNnvReWeaqK HdfHzs+uNHPdM0e5uVDCHpYkXHg9MpY3Ht6ghPlhT5E15xlWqym435UQCBPvJ6KBFhCV abplMkRsu8zQk0fViQL8wc26zrQGVTPnvrlM35hEDbuay5GcPFa6irRftS7ronBrkiyi VgPl0OYDcYRfPebT4j3+joQlarcZF1YJ55dZWvcTHjkZ38nNv0d2wShlquAAUJk9fnnW wMmNsQQrHEmNzDX5Rlzmqt/cKsjNlVMFeE67i3XQxdq62UsM9qFm9W5H6YmUMIXBBWHn s3Ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699044244; x=1699649044; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xvg8YVii2Cfv5M5FzO/VP9LK/tQ78O2pSYd5b5nzYnQ=; b=PeD07qLzNqGy/8b0OLgsBNZh5f5vcJ7WPcUg6MHh5oSrwnP+9d/dvSEj23unGXSHDG B65V3CttTIWewHSO+TmLW2YkIpjE/ut2J85xRAEyyO8Q++YQqJnrMQl4+i0ShcWkD6Lk 8FuHJUWl+wPj6TsgzxaqC1IGDcTPdw5OyUzpygwHJmetYCLnlREJNJkKeBvq0cS6qXd8 0k41N66mvy3+3LoegycbxNCYf7fdvHW+OZcqW6XzXCRLgm8pSclrsYdPU+WDsRCJ3/Ty yOt4IN9rgcvynU3J7TJZ4crROz1OHpIAlVb5IJefzsUsDUbVG1foHwcPGh6GMGkXCWJ9 4V4Q== X-Gm-Message-State: AOJu0Ywx3cUCRMcpfwbXOHyBMfY8FYcNB1mJqTIeyDFiNR4PD7uCKBwt Dctu4E/lLFDgsHre/YMqqSQ= X-Google-Smtp-Source: AGHT+IHkXWRFiBbYoffWTkbF7H1+dnThG55j4ni72oAS2ECCLS2yBTAU/d61FfBv1LlgaMZOsn1qyQ== X-Received: by 2002:a05:6830:3153:b0:6ce:29cd:b699 with SMTP id c19-20020a056830315300b006ce29cdb699mr26525709ots.10.1699044244142; Fri, 03 Nov 2023 13:44:04 -0700 (PDT) Received: from bob-3900x.lan (2603-8081-1405-679b-6bc0-11b9-c519-2c18.res6.spectrum.com. [2603:8081:1405:679b:6bc0:11b9:c519:2c18]) by smtp.gmail.com with ESMTPSA id v9-20020a4ae049000000b00581e5b78ce5sm447766oos.38.2023.11.03.13.44.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Nov 2023 13:44:03 -0700 (PDT) From: Bob Pearson To: jgg@nvidia.com, yanjun.zhu@linux.dev, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next 5/6] RDMA/rxe: Split multicast lock Date: Fri, 3 Nov 2023 15:43:24 -0500 Message-Id: <20231103204324.9606-6-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231103204324.9606-1-rpearsonhpe@gmail.com> References: <20231103204324.9606-1-rpearsonhpe@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Split rxe->mcg_lock into two locks. One to protect mcg->qp_list and one to protect rxe->mcg_tree (red-black tree) write side operations and provide serialization between rxe_attach_mcast and rxe_detach_mcast. Make the qp_list lock a spin_lock_irqsave lock and move to the mcg struct. It protects the qp_list from simultaneous access from rxe_mcast.c and rxe_recv.c when processing incoming multi- cast packets. In theory some ethernet driver could bypass NAPI so an irq lock is better than a bh lock. Make the mcg_tree lock a mutex since the attach/detach APIs are not called in atomic context. This allows some significant cleanup since we can call kzalloc while holding the mutex so some recheck code can be eliminated. Use rcu to protect mcg_tree read side operations as set up in the previous patch. rxe_recv_mcast_pkt which does run in an atomic context now does not use the mcg_mutex lock. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe.c | 2 +- drivers/infiniband/sw/rxe/rxe_mcast.c | 256 ++++++++++---------------- drivers/infiniband/sw/rxe/rxe_recv.c | 5 +- drivers/infiniband/sw/rxe/rxe_verbs.h | 3 +- 4 files changed, 106 insertions(+), 160 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 54c723a6edda..147cb16e937d 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -142,7 +142,7 @@ static void rxe_init(struct rxe_dev *rxe) INIT_LIST_HEAD(&rxe->pending_mmaps); /* init multicast support */ - spin_lock_init(&rxe->mcg_lock); + mutex_init(&rxe->mcg_mutex); rxe->mcg_tree = RB_ROOT; mutex_init(&rxe->usdev_lock); diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c index d7b8e31ab480..bca5b022b797 100644 --- a/drivers/infiniband/sw/rxe/rxe_mcast.c +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c @@ -126,7 +126,7 @@ static int rxe_mcast_del(struct rxe_mcg *mcg) * __rxe_insert_mcg - insert an mcg into red-black tree (rxe->mcg_tree) * @mcg: mcg object with an embedded red-black tree node * - * Context: caller must hold a reference to mcg and rxe->mcg_lock and + * Context: caller must hold a reference to mcg and rxe->mcg_mutex and * is responsible to avoid adding the same mcg twice to the tree. */ static void __rxe_insert_mcg(struct rxe_mcg *mcg) @@ -156,7 +156,7 @@ static void __rxe_insert_mcg(struct rxe_mcg *mcg) * __rxe_remove_mcg - remove an mcg from red-black tree holding lock * @mcg: mcast group object with an embedded red-black tree node * - * Context: caller must hold a reference to mcg and rxe->mcg_lock + * Context: caller must hold a reference to mcg and rxe->mcg_mutex */ static void __rxe_remove_mcg(struct rxe_mcg *mcg) { @@ -203,34 +203,6 @@ struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, return NULL; } -/** - * __rxe_init_mcg - initialize a new mcg - * @rxe: rxe device - * @mgid: multicast address as a gid - * @mcg: new mcg object - * - * Context: caller should hold rxe->mcg lock - */ -static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid, - struct rxe_mcg *mcg) -{ - kref_init(&mcg->ref_cnt); - memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid)); - mcg->is_ipv6 = !ipv6_addr_v4mapped((struct in6_addr *)mgid); - INIT_LIST_HEAD(&mcg->qp_list); - mcg->rxe = rxe; - - /* caller holds a ref on mcg but that will be - * dropped when mcg goes out of scope. We need to take a ref - * on the pointer that will be saved in the red-black tree - * by __rxe_insert_mcg and used to lookup mcg from mgid later. - * Inserting mcg makes it visible to outside so this should - * be done last after the object is ready. - */ - kref_get(&mcg->ref_cnt); - __rxe_insert_mcg(mcg); -} - /** * rxe_get_mcg - lookup or allocate a mcg * @rxe: rxe device object @@ -240,51 +212,48 @@ static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid, */ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid) { - struct rxe_mcg *mcg, *tmp; + struct rxe_mcg *mcg; int err; - if (rxe->attr.max_mcast_grp == 0) - return ERR_PTR(-EINVAL); - - /* check to see if mcg already exists */ + mutex_lock(&rxe->mcg_mutex); mcg = rxe_lookup_mcg(rxe, mgid); if (mcg) - return mcg; + goto out; /* nothing to do */ - /* check to see if we have reached limit */ if (atomic_inc_return(&rxe->mcg_num) > rxe->attr.max_mcast_grp) { - err = -ENOMEM; + err = -EINVAL; goto err_dec; } - /* speculative alloc of new mcg */ mcg = kzalloc(sizeof(*mcg), GFP_KERNEL); if (!mcg) { err = -ENOMEM; goto err_dec; } - spin_lock_bh(&rxe->mcg_lock); - /* re-check to see if someone else just added it */ - tmp = __rxe_lookup_mcg(rxe, mgid); - if (tmp) { - spin_unlock_bh(&rxe->mcg_lock); - atomic_dec(&rxe->mcg_num); - kfree(mcg); - return tmp; - } - - __rxe_init_mcg(rxe, mgid, mcg); - spin_unlock_bh(&rxe->mcg_lock); + memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid)); + mcg->is_ipv6 = !ipv6_addr_v4mapped((struct in6_addr *)mgid); + mcg->rxe = rxe; + kref_init(&mcg->ref_cnt); + INIT_LIST_HEAD(&mcg->qp_list); + spin_lock_init(&mcg->lock); + kref_get(&mcg->ref_cnt); + __rxe_insert_mcg(mcg); - /* add mcast address outside of lock */ err = rxe_mcast_add(mcg); - if (!err) - return mcg; + if (err) + goto err_free; +out: + mutex_unlock(&rxe->mcg_mutex); + return mcg; + +err_free: + __rxe_remove_mcg(mcg); kfree(mcg); err_dec: atomic_dec(&rxe->mcg_num); + mutex_unlock(&rxe->mcg_mutex); return ERR_PTR(err); } @@ -300,10 +269,10 @@ void rxe_cleanup_mcg(struct kref *kref) } /** - * __rxe_destroy_mcg - destroy mcg object holding rxe->mcg_lock + * __rxe_destroy_mcg - destroy mcg object holding rxe->mcg_mutex * @mcg: the mcg object * - * Context: caller is holding rxe->mcg_lock + * Context: caller is holding rxe->mcg_mutex * no qp's are attached to mcg */ static void __rxe_destroy_mcg(struct rxe_mcg *mcg) @@ -328,151 +297,123 @@ static void rxe_destroy_mcg(struct rxe_mcg *mcg) /* delete mcast address outside of lock */ rxe_mcast_del(mcg); - spin_lock_bh(&mcg->rxe->mcg_lock); + mutex_lock(&mcg->rxe->mcg_mutex); __rxe_destroy_mcg(mcg); - spin_unlock_bh(&mcg->rxe->mcg_lock); + mutex_unlock(&mcg->rxe->mcg_mutex); } /** - * __rxe_init_mca - initialize a new mca holding lock + * rxe_attach_mcg - attach qp to mcg if not already attached * @qp: qp object * @mcg: mcg object - * @mca: empty space for new mca - * - * Context: caller must hold references on qp and mcg, rxe->mcg_lock - * and pass memory for new mca * * Returns: 0 on success else an error */ -static int __rxe_init_mca(struct rxe_qp *qp, struct rxe_mcg *mcg, - struct rxe_mca *mca) +static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) { - struct rxe_dev *rxe = to_rdev(qp->ibqp.device); - int n; + struct rxe_dev *rxe = mcg->rxe; + struct rxe_mca *mca; + unsigned long flags; + int err; - n = atomic_inc_return(&rxe->mcg_attach); - if (n > rxe->attr.max_total_mcast_qp_attach) { - atomic_dec(&rxe->mcg_attach); - return -ENOMEM; + mutex_lock(&rxe->mcg_mutex); + spin_lock_irqsave(&mcg->lock, flags); + list_for_each_entry(mca, &mcg->qp_list, qp_list) { + if (mca->qp == qp) { + spin_unlock_irqrestore(&mcg->lock, flags); + goto out; /* nothing to do */ + } } + spin_unlock_irqrestore(&mcg->lock, flags); - n = atomic_inc_return(&mcg->qp_num); - if (n > rxe->attr.max_mcast_qp_attach) { - atomic_dec(&mcg->qp_num); - atomic_dec(&rxe->mcg_attach); - return -ENOMEM; + if (atomic_inc_return(&rxe->mcg_attach) > + rxe->attr.max_total_mcast_qp_attach) { + err = -EINVAL; + goto err_dec_attach; } - atomic_inc(&qp->mcg_num); + if (atomic_inc_return(&mcg->qp_num) > + rxe->attr.max_mcast_qp_attach) { + err = -EINVAL; + goto err_dec_qp_num; + } + mca = kzalloc(sizeof(*mca), GFP_KERNEL); + if (!mca) { + err = -ENOMEM; + goto err_dec_qp_num; + } + + atomic_inc(&qp->mcg_num); rxe_get(qp); mca->qp = qp; + spin_lock_irqsave(&mcg->lock, flags); list_add_tail(&mca->qp_list, &mcg->qp_list); - + spin_unlock_irqrestore(&mcg->lock, flags); +out: + mutex_unlock(&rxe->mcg_mutex); return 0; + +err_dec_qp_num: + atomic_dec(&mcg->qp_num); +err_dec_attach: + atomic_dec(&rxe->mcg_attach); + mutex_unlock(&rxe->mcg_mutex); + return err; } /** - * rxe_attach_mcg - attach qp to mcg if not already attached - * @qp: qp object + * rxe_detach_mcg - detach qp from mcg * @mcg: mcg object + * @qp: qp object * - * Context: caller must hold reference on qp and mcg. - * Returns: 0 on success else an error + * Returns: 0 on success else an error if qp is not attached. */ -static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) +static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) { struct rxe_dev *rxe = mcg->rxe; - struct rxe_mca *mca, *tmp; - int err; + struct rxe_mca *mca; + unsigned long flags; + int err = 0; - /* check to see if the qp is already a member of the group */ - spin_lock_bh(&rxe->mcg_lock); + mutex_lock(&rxe->mcg_mutex); + spin_lock_irqsave(&mcg->lock, flags); list_for_each_entry(mca, &mcg->qp_list, qp_list) { if (mca->qp == qp) { - spin_unlock_bh(&rxe->mcg_lock); - return 0; + spin_unlock_irqrestore(&mcg->lock, flags); + goto found; } } - spin_unlock_bh(&rxe->mcg_lock); + spin_unlock_irqrestore(&mcg->lock, flags); - /* speculative alloc new mca without using GFP_ATOMIC */ - mca = kzalloc(sizeof(*mca), GFP_KERNEL); - if (!mca) - return -ENOMEM; - - spin_lock_bh(&rxe->mcg_lock); - /* re-check to see if someone else just attached qp */ - list_for_each_entry(tmp, &mcg->qp_list, qp_list) { - if (tmp->qp == qp) { - kfree(mca); - err = 0; - goto out; - } - } - - err = __rxe_init_mca(qp, mcg, mca); - if (err) - kfree(mca); -out: - spin_unlock_bh(&rxe->mcg_lock); - return err; -} + /* we didn't find the qp on the list */ + err = -EINVAL; + goto err_out; -/** - * __rxe_cleanup_mca - cleanup mca object holding lock - * @mca: mca object - * @mcg: mcg object - * - * Context: caller must hold a reference to mcg and rxe->mcg_lock - */ -static void __rxe_cleanup_mca(struct rxe_mca *mca, struct rxe_mcg *mcg) -{ +found: + spin_lock_irqsave(&mcg->lock, flags); list_del(&mca->qp_list); + spin_unlock_irqrestore(&mcg->lock, flags); atomic_dec(&mcg->qp_num); atomic_dec(&mcg->rxe->mcg_attach); atomic_dec(&mca->qp->mcg_num); rxe_put(mca->qp); - kfree(mca); -} -/** - * rxe_detach_mcg - detach qp from mcg - * @mcg: mcg object - * @qp: qp object - * - * Returns: 0 on success else an error if qp is not attached. - */ -static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) -{ - struct rxe_dev *rxe = mcg->rxe; - struct rxe_mca *mca, *tmp; - - spin_lock_bh(&rxe->mcg_lock); - list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) { - if (mca->qp == qp) { - __rxe_cleanup_mca(mca, mcg); - - /* if the number of qp's attached to the - * mcast group falls to zero go ahead and - * tear it down. This will not free the - * object since we are still holding a ref - * from the caller - */ - if (atomic_read(&mcg->qp_num) <= 0) - __rxe_destroy_mcg(mcg); - - spin_unlock_bh(&rxe->mcg_lock); - return 0; - } - } + /* if the number of qp's attached to the + * mcast group falls to zero go ahead and + * tear it down. This will not free the + * object since we are still holding a ref + * from the caller + */ + if (atomic_read(&mcg->qp_num) <= 0) + __rxe_destroy_mcg(mcg); - /* we didn't find the qp on the list */ - spin_unlock_bh(&rxe->mcg_lock); - return -EINVAL; +err_out: + mutex_unlock(&rxe->mcg_mutex); + return err; } /** @@ -490,6 +431,9 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid) struct rxe_qp *qp = to_rqp(ibqp); struct rxe_mcg *mcg; + if (rxe->attr.max_mcast_grp == 0) + return -EINVAL; + /* takes a ref on mcg if successful */ mcg = rxe_get_mcg(rxe, mgid); if (IS_ERR(mcg)) diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c index 7153de0799fc..6cf0da958864 100644 --- a/drivers/infiniband/sw/rxe/rxe_recv.c +++ b/drivers/infiniband/sw/rxe/rxe_recv.c @@ -194,6 +194,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) struct rxe_mca *mca; struct rxe_qp *qp; union ib_gid dgid; + unsigned long flags; int err; if (skb->protocol == htons(ETH_P_IP)) @@ -207,7 +208,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) if (!mcg) goto drop; /* mcast group not registered */ - spin_lock_bh(&rxe->mcg_lock); + spin_lock_irqsave(&mcg->lock, flags); /* this is unreliable datagram service so we let * failures to deliver a multicast packet to a @@ -259,7 +260,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) } } - spin_unlock_bh(&rxe->mcg_lock); + spin_unlock_irqrestore(&mcg->lock, flags); kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 8058e5039322..f21963dcb2c8 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -351,6 +351,7 @@ struct rxe_mcg { struct list_head qp_list; union ib_gid mgid; atomic_t qp_num; + spinlock_t lock; /* protect qp_list */ u32 qkey; u16 pkey; bool is_ipv6; @@ -390,7 +391,7 @@ struct rxe_dev { struct rxe_pool mw_pool; /* multicast support */ - spinlock_t mcg_lock; + struct mutex mcg_mutex; struct rb_root mcg_tree; atomic_t mcg_num; atomic_t mcg_attach; From patchwork Fri Nov 3 20:43:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 13445131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44481C4167B for ; Fri, 3 Nov 2023 20:44:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229954AbjKCUoK (ORCPT ); Fri, 3 Nov 2023 16:44:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230050AbjKCUoJ (ORCPT ); Fri, 3 Nov 2023 16:44:09 -0400 Received: from mail-oo1-xc31.google.com (mail-oo1-xc31.google.com [IPv6:2607:f8b0:4864:20::c31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC6DBD53 for ; Fri, 3 Nov 2023 13:44:05 -0700 (PDT) Received: by mail-oo1-xc31.google.com with SMTP id 006d021491bc7-5842a7fdc61so1217031eaf.3 for ; Fri, 03 Nov 2023 13:44:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699044245; x=1699649045; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ncx53wZtdrJtktFdOWOr6gN3I6qmi3w2Yix/6ZjcL+U=; b=CXaQNogxkR82G9v3qEYm1GToPKEF7bKUIMXNgFLH1GU+ai0Tl+Gr/pzBd5APzMHuKu Jc2JhM4LbyrNiCbJz3JLqACmkQBBihOzL7vEv2PGV5JNvD1kQvSabVmecgpSYoonvYMb Os2eb1ImLcUlZkRVhbdg0fWqbl3+2wMYmGYULFgsf+LG6k5dle+y4fqZC6S4+5HU66i1 EbnEIIuHgq2pSlmF12WkFHTMOs6NFQOMX75npzDfOyoNDxwmm8Fgeeu6MGQ4jb2jR223 60BtdGEuHKhpMdOJlUo/r1js/ABVa47x05b9cZhtE8oAbWYAciNJlL+5jtaEzOIm9ReP LDeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699044245; x=1699649045; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ncx53wZtdrJtktFdOWOr6gN3I6qmi3w2Yix/6ZjcL+U=; b=nhbwmijlzl9Jj2houmPcz+q96qJTYzCelWWlW7lC9QySPGxg78buwnoRjnN6WAA2KJ xpTWx5kxBDJa2f28sNQxR+VqF6WWQDPZfK8B/i4LynwhvRKfkX/2FwlOVV0GLZEe4Bz+ g8/1BKQrtyVJnRVK7zONoerwn867wH0S4NEYl2PXwYzwuXmOpZOm9JYG68SteJVedYuh P2wcuwBxLbE/y+48dOvXupFTx+1zLeM6KLI50BI7OJNmnNb+K75fLzYQg8FxTmSBwJr7 iHzTWIsxNzXQKRAL9tjhebPptbT9x6Psm54Ts5lmSzQGEFbqiXuaB0kC+AgWSeEULmoy R6gA== X-Gm-Message-State: AOJu0YyEe5YNLEovK5BsKCH/GKsWeCcCQzGfKtUO/KIPTmGU8E9peaWm 5XCH1rAz4BQLflMT+/BiA1A= X-Google-Smtp-Source: AGHT+IE2W7/y5mXVnzKAWWl8CJ7109xJUHDayVacqis8jO7+d5cG99AcEsxFg4NJA4kCz0Eh/7ML5w== X-Received: by 2002:a4a:a649:0:b0:582:c8b4:d9df with SMTP id j9-20020a4aa649000000b00582c8b4d9dfmr22355882oom.1.1699044245050; Fri, 03 Nov 2023 13:44:05 -0700 (PDT) Received: from bob-3900x.lan (2603-8081-1405-679b-6bc0-11b9-c519-2c18.res6.spectrum.com. [2603:8081:1405:679b:6bc0:11b9:c519:2c18]) by smtp.gmail.com with ESMTPSA id v9-20020a4ae049000000b00581e5b78ce5sm447766oos.38.2023.11.03.13.44.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Nov 2023 13:44:04 -0700 (PDT) From: Bob Pearson To: jgg@nvidia.com, yanjun.zhu@linux.dev, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next 6/6] RDMA/rxe: Cleanup mcg lifetime Date: Fri, 3 Nov 2023 15:43:25 -0500 Message-Id: <20231103204324.9606-7-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231103204324.9606-1-rpearsonhpe@gmail.com> References: <20231103204324.9606-1-rpearsonhpe@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Fix up mcg reference counting so the ref count will drop to zero correctly and move code from rxe_destroy_mcg to rxe_cleanup_mcg since rxe_destroy is no longer needed. Also general code cleanup. Drop comments on statics, etc. Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c") Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_loc.h | 2 +- drivers/infiniband/sw/rxe/rxe_mcast.c | 190 ++++++++------------------ drivers/infiniband/sw/rxe/rxe_recv.c | 2 +- 3 files changed, 58 insertions(+), 136 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 62b2b25903fc..0509ccdaa2f2 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -37,7 +37,7 @@ void rxe_cq_cleanup(struct rxe_pool_elem *elem); struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid); int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid); int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid); -void rxe_cleanup_mcg(struct kref *kref); +int rxe_put_mcg(struct rxe_mcg *mcg); /* rxe_mmap.c */ struct rxe_mmap_info { diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c index bca5b022b797..65a420a540cd 100644 --- a/drivers/infiniband/sw/rxe/rxe_mcast.c +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c @@ -23,7 +23,6 @@ #include "rxe.h" -/* register mcast IP and MAC addresses with net stack */ static int rxe_mcast_add6(struct rxe_dev *rxe, union ib_gid *mgid) { unsigned char ll_addr[ETH_ALEN]; @@ -82,7 +81,6 @@ static int rxe_mcast_add(struct rxe_mcg *mcg) return err; } -/* deregister mcast IP and MAC addresses with net stack */ static int rxe_mcast_del6(struct rxe_dev *rxe, union ib_gid *mgid) { unsigned char ll_addr[ETH_ALEN]; @@ -122,13 +120,31 @@ static int rxe_mcast_del(struct rxe_mcg *mcg) return err ?: err2; } -/** - * __rxe_insert_mcg - insert an mcg into red-black tree (rxe->mcg_tree) - * @mcg: mcg object with an embedded red-black tree node - * - * Context: caller must hold a reference to mcg and rxe->mcg_mutex and - * is responsible to avoid adding the same mcg twice to the tree. - */ +static void __rxe_remove_mcg(struct rxe_mcg *mcg) +{ + rb_erase(&mcg->node, &mcg->rxe->mcg_tree); +} + +static void rxe_cleanup_mcg(struct kref *kref) +{ + struct rxe_mcg *mcg = container_of(kref, typeof(*mcg), ref_cnt); + + __rxe_remove_mcg(mcg); + rxe_mcast_del(mcg); + atomic_dec(&mcg->rxe->mcg_num); + kfree_rcu(mcg, rcu); +} + +static int rxe_get_mcg(struct rxe_mcg *mcg) +{ + return kref_get_unless_zero(&mcg->ref_cnt); +} + +int rxe_put_mcg(struct rxe_mcg *mcg) +{ + return kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); +} + static void __rxe_insert_mcg(struct rxe_mcg *mcg) { struct rb_root *tree = &mcg->rxe->mcg_tree; @@ -144,34 +160,17 @@ static void __rxe_insert_mcg(struct rxe_mcg *mcg) cmp = memcmp(&tmp->mgid, &mcg->mgid, sizeof(mcg->mgid)); if (cmp > 0) link = &(*link)->rb_left; - else + else if (cmp < 0) link = &(*link)->rb_right; + else + WARN_ON_ONCE(1); } rb_link_node_rcu(&mcg->node, node, link); rb_insert_color(&mcg->node, tree); } -/** - * __rxe_remove_mcg - remove an mcg from red-black tree holding lock - * @mcg: mcast group object with an embedded red-black tree node - * - * Context: caller must hold a reference to mcg and rxe->mcg_mutex - */ -static void __rxe_remove_mcg(struct rxe_mcg *mcg) -{ - rb_erase(&mcg->node, &mcg->rxe->mcg_tree); -} - -/** - * rxe_lookup_mcg - lookup mcg in rxe->mcg_tree while holding lock - * @rxe: rxe device object - * @mgid: multicast IP address - * - * Returns: mcg on success and takes a ref to mcg else NULL - */ -struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, - union ib_gid *mgid) +struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid) { struct rb_root *tree = &rxe->mcg_tree; struct rxe_mcg *mcg; @@ -196,21 +195,16 @@ struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, rcu_read_unlock(); if (node) { - kref_get(&mcg->ref_cnt); + /* take a ref on mcg for each lookup */ + rxe_get_mcg(mcg); return mcg; } return NULL; } -/** - * rxe_get_mcg - lookup or allocate a mcg - * @rxe: rxe device object - * @mgid: multicast IP address as a gid - * - * Returns: mcg on success else ERR_PTR(error) - */ -static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid) +/* find an existing mcg or allocate a new one */ +static struct rxe_mcg *rxe_alloc_mcg(struct rxe_dev *rxe, union ib_gid *mgid) { struct rxe_mcg *mcg; int err; @@ -234,22 +228,22 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid) memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid)); mcg->is_ipv6 = !ipv6_addr_v4mapped((struct in6_addr *)mgid); mcg->rxe = rxe; + /* take ref on mcg when created */ kref_init(&mcg->ref_cnt); INIT_LIST_HEAD(&mcg->qp_list); spin_lock_init(&mcg->lock); - kref_get(&mcg->ref_cnt); - __rxe_insert_mcg(mcg); err = rxe_mcast_add(mcg); if (err) goto err_free; + /* can insert into tree now that mcg is finished */ + __rxe_insert_mcg(mcg); out: mutex_unlock(&rxe->mcg_mutex); return mcg; err_free: - __rxe_remove_mcg(mcg); kfree(mcg); err_dec: atomic_dec(&rxe->mcg_num); @@ -257,64 +251,12 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid) return ERR_PTR(err); } -/** - * rxe_cleanup_mcg - cleanup mcg for kref_put - * @kref: struct kref embnedded in mcg - */ -void rxe_cleanup_mcg(struct kref *kref) -{ - struct rxe_mcg *mcg = container_of(kref, typeof(*mcg), ref_cnt); - - kfree_rcu(mcg, rcu); -} - -/** - * __rxe_destroy_mcg - destroy mcg object holding rxe->mcg_mutex - * @mcg: the mcg object - * - * Context: caller is holding rxe->mcg_mutex - * no qp's are attached to mcg - */ -static void __rxe_destroy_mcg(struct rxe_mcg *mcg) -{ - struct rxe_dev *rxe = mcg->rxe; - - /* remove mcg from red-black tree then drop ref */ - __rxe_remove_mcg(mcg); - kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); - - atomic_dec(&rxe->mcg_num); -} - -/** - * rxe_destroy_mcg - destroy mcg object - * @mcg: the mcg object - * - * Context: no qp's are attached to mcg - */ -static void rxe_destroy_mcg(struct rxe_mcg *mcg) -{ - /* delete mcast address outside of lock */ - rxe_mcast_del(mcg); - - mutex_lock(&mcg->rxe->mcg_mutex); - __rxe_destroy_mcg(mcg); - mutex_unlock(&mcg->rxe->mcg_mutex); -} - -/** - * rxe_attach_mcg - attach qp to mcg if not already attached - * @qp: qp object - * @mcg: mcg object - * - * Returns: 0 on success else an error - */ -static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) +static int rxe_attach_mcg(struct rxe_qp *qp, struct rxe_mcg *mcg) { struct rxe_dev *rxe = mcg->rxe; struct rxe_mca *mca; unsigned long flags; - int err; + int err = 0; mutex_lock(&rxe->mcg_mutex); spin_lock_irqsave(&mcg->lock, flags); @@ -348,29 +290,28 @@ static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) rxe_get(qp); mca->qp = qp; + /* hold a ref on mcg for each attached qp + * protects the pointers in mca->qp_list + */ + rxe_get_mcg(mcg); + spin_lock_irqsave(&mcg->lock, flags); list_add_tail(&mca->qp_list, &mcg->qp_list); spin_unlock_irqrestore(&mcg->lock, flags); -out: - mutex_unlock(&rxe->mcg_mutex); - return 0; + goto out; err_dec_qp_num: atomic_dec(&mcg->qp_num); err_dec_attach: atomic_dec(&rxe->mcg_attach); +out: + /* drop the ref on mcg from rxe_alloc_mcg */ + rxe_put_mcg(mcg); mutex_unlock(&rxe->mcg_mutex); return err; } -/** - * rxe_detach_mcg - detach qp from mcg - * @mcg: mcg object - * @qp: qp object - * - * Returns: 0 on success else an error if qp is not attached. - */ -static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) +static int rxe_detach_mcg(struct rxe_qp *qp, struct rxe_mcg *mcg) { struct rxe_dev *rxe = mcg->rxe; struct rxe_mca *mca; @@ -387,7 +328,6 @@ static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) } spin_unlock_irqrestore(&mcg->lock, flags); - /* we didn't find the qp on the list */ err = -EINVAL; goto err_out; @@ -395,23 +335,18 @@ static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) spin_lock_irqsave(&mcg->lock, flags); list_del(&mca->qp_list); spin_unlock_irqrestore(&mcg->lock, flags); + /* drop the ref on mcg from rxe_attach_mcg */ + rxe_put_mcg(mcg); atomic_dec(&mcg->qp_num); atomic_dec(&mcg->rxe->mcg_attach); atomic_dec(&mca->qp->mcg_num); + /* drop the ref on qp that was protecting mca->qp */ rxe_put(mca->qp); kfree(mca); - - /* if the number of qp's attached to the - * mcast group falls to zero go ahead and - * tear it down. This will not free the - * object since we are still holding a ref - * from the caller - */ - if (atomic_read(&mcg->qp_num) <= 0) - __rxe_destroy_mcg(mcg); - err_out: + /* drop the ref on mcg from rxe_lookup_mcg */ + rxe_put_mcg(mcg); mutex_unlock(&rxe->mcg_mutex); return err; } @@ -426,7 +361,6 @@ static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) */ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid) { - int err; struct rxe_dev *rxe = to_rdev(ibqp->device); struct rxe_qp *qp = to_rqp(ibqp); struct rxe_mcg *mcg; @@ -435,19 +369,11 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid) return -EINVAL; /* takes a ref on mcg if successful */ - mcg = rxe_get_mcg(rxe, mgid); + mcg = rxe_alloc_mcg(rxe, mgid); if (IS_ERR(mcg)) return PTR_ERR(mcg); - err = rxe_attach_mcg(mcg, qp); - - /* if we failed to attach the first qp to mcg tear it down */ - if (atomic_read(&mcg->qp_num) == 0) - rxe_destroy_mcg(mcg); - - kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); - - return err; + return rxe_attach_mcg(qp, mcg); } /** @@ -463,14 +389,10 @@ int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid) struct rxe_dev *rxe = to_rdev(ibqp->device); struct rxe_qp *qp = to_rqp(ibqp); struct rxe_mcg *mcg; - int err; mcg = rxe_lookup_mcg(rxe, mgid); if (!mcg) return -EINVAL; - err = rxe_detach_mcg(mcg, qp); - kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); - - return err; + return rxe_detach_mcg(qp, mcg); } diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c index 6cf0da958864..e3ec3dfc57f4 100644 --- a/drivers/infiniband/sw/rxe/rxe_recv.c +++ b/drivers/infiniband/sw/rxe/rxe_recv.c @@ -262,7 +262,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) spin_unlock_irqrestore(&mcg->lock, flags); - kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); + rxe_put_mcg(mcg); if (likely(!skb)) return;