From patchwork Tue Mar 4 12:43:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Guangguan Wang X-Patchwork-Id: 14000644 X-Patchwork-Delegate: kuba@kernel.org Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91B981FDE05; Tue, 4 Mar 2025 12:43:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741092205; cv=none; b=WnivOzIyawbjXQoPCrMA7Dlh+Ybwlc8oXJtbMPvhMlLe3MR4ybtyDlY2JD2HkTib6OuTGNHiFMviARjxH5GvGe+Mp4lYwq+5ty9gJBET+sguPDrjk3cTRWrxH5N4CrIkwyv3FZFwurZnTZzbt0OC+eO2Joe931A/DY33WQN5ao4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741092205; c=relaxed/simple; bh=xHheJxk9MPOwMJe3gJyARK3kWzh1maczaHnY4ttiy6U=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=c/iw2bmRk0Tmzis9pB9uKQCGVEq9tX8Vyfhf3gsxyqfo1c0FhRJrWBJkHWEqMMjy4w/tk+YIyHntuM5cDVcpTh4RDwWajD5T8hogJx+umaS5xwQ7+WVH1YGrH1uu1HHSzSRlUcPktddSYi4zFPQ0Q5iH3PkpyDEeTYsOTbBTliA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=BYi1lU9i; arc=none smtp.client-ip=115.124.30.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="BYi1lU9i" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1741092191; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=kIBxrLXVX5Go0KqRs9bm+++JgxHYhnl8q+HXTDClonc=; b=BYi1lU9iZu9280PYF4WK0v/v0S8CulVv4Q5A6aCjlJhwhpYCavtYlkcoYY9seLeXtq2qcc4uAjWg4kxOY+mEM05tZoA6MwB6cykMUM7TXuGSVDjex9w60QbNWU78MgtI2L0aLGgqlmOkxIVw3xMBbetEe1W0HHFXB4OPYRF4Fd8= Received: from localhost.localdomain(mailfrom:guangguan.wang@linux.alibaba.com fp:SMTPD_---0WQiJ44j_1741092189 cluster:ay36) by smtp.aliyun-inc.com; Tue, 04 Mar 2025 20:43:10 +0800 From: Guangguan Wang To: wenjia@linux.ibm.com, pasic@linux.ibm.com, jaka@linux.ibm.com, alibuda@linux.alibaba.com, tonylu@linux.alibaba.com, guwen@linux.alibaba.com Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, horms@kernel.org, linux-rdma@vger.kernel.org, linux-s390@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2] net/smc: use the correct ndev to find pnetid by pnetid table Date: Tue, 4 Mar 2025 20:43:04 +0800 Message-Id: <20250304124304.13732-1-guangguan.wang@linux.alibaba.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org When using smc_pnet in SMC, it will only search the pnetid in the base_ndev of the netdev hierarchy(both HW PNETID and User-defined sw pnetid). This may not work for some scenarios when using SMC in container on cloud environment. In container, there have choices of different container network, such as directly using host network, virtual network IPVLAN, veth, etc. Different choices of container network have different netdev hierarchy. Examples of netdev hierarchy show below. (eth0 and eth1 in host below is the netdev directly related to the physical device). _______________________________ | _________________ | | |POD | | | | | | | | eth0_________ | | | |____| |__| | | | | | | | | | | eth1|base_ndev| eth0_______ | | | | | RDMA || | host |_________| |_______|| --------------------------------- netdev hierarchy if directly using host network ________________________________ | _________________ | | |POD __________ | | | | |upper_ndev| | | | |eth0|__________| | | | |_______|_________| | | |lower netdev | | __|______ | | eth1| | eth0_______ | | |base_ndev| | RDMA || | host |_________| |_______|| --------------------------------- netdev hierarchy if using IPVLAN _______________________________ | _____________________ | | |POD _________ | | | | |base_ndev|| | | |eth0(veth)|_________|| | | |____________|________| | | |pairs | | _______|_ | | | | eth0_______ | | veth|base_ndev| | RDMA || | |_________| |_______|| | _________ | | eth1|base_ndev| | | host |_________| | --------------------------------- netdev hierarchy if using veth Due to some reasons, the eth1 in host is not RDMA attached netdevice, pnetid is needed to map the eth1(in host) with RDMA device so that POD can do SMC-R. Because the eth1(in host) is managed by CNI plugin(such as Terway, network management plugin in container environment), and in cloud environment the eth(in host) can dynamically be inserted by CNI when POD create and dynamically be removed by CNI when POD destroy and no POD related to the eth(in host) anymore. It is hard to config the pnetid to the eth1(in host). But it is easy to config the pnetid to the netdevice which can be seen in POD. When do SMC-R, both the container directly using host network and the container using veth network can successfully match the RDMA device, because the configured pnetid netdev is a base_ndev. But the container using IPVLAN can not successfully match the RDMA device and 0x03030000 fallback happens, because the configured pnetid netdev is not a base_ndev. Additionally, if config pnetid to the eth1(in host) also can not work for matching RDMA device when using veth network and doing SMC-R in POD. To resolve the problems list above, this patch extends to search user -defined sw pnetid in the clc handshake ndev when no pnetid can be found in the base_ndev, and the base_ndev take precedence over ndev for backward compatibility. This patch also can unify the pnetid setup of different network choices list above in container(Config user-defined sw pnetid in the netdevice can be seen in POD). Signed-off-by: Guangguan Wang --- net/smc/smc_pnet.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c index 716808f374a8..b391c2ef463f 100644 --- a/net/smc/smc_pnet.c +++ b/net/smc/smc_pnet.c @@ -1079,14 +1079,16 @@ static void smc_pnet_find_roce_by_pnetid(struct net_device *ndev, struct smc_init_info *ini) { u8 ndev_pnetid[SMC_MAX_PNETID_LEN]; + struct net_device *base_ndev; struct net *net; - ndev = pnet_find_base_ndev(ndev); + base_ndev = pnet_find_base_ndev(ndev); net = dev_net(ndev); - if (smc_pnetid_by_dev_port(ndev->dev.parent, ndev->dev_port, + if (smc_pnetid_by_dev_port(base_ndev->dev.parent, base_ndev->dev_port, ndev_pnetid) && + smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid) && smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid)) { - smc_pnet_find_rdma_dev(ndev, ini); + smc_pnet_find_rdma_dev(base_ndev, ini); return; /* pnetid could not be determined */ } _smc_pnet_find_roce_by_pnetid(ndev_pnetid, ini, NULL, net);