Message ID | 20241227040455.91854-1-guangguan.wang@linux.alibaba.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | [net] net/smc: use the correct ndev to find pnetid by pnetid table | expand |
On Fri, 27 Dec 2024 12:04:55 +0800 Guangguan Wang wrote: > The command 'smc_pnet -a -I <ethx> <pnetid>' will add <pnetid> > to the pnetid table and will attach the <pnetid> to net device > whose name is <ethx>. But When do SMCR by <ethx>, in function > smc_pnet_find_roce_by_pnetid, it will use <ethx>'s base ndev's > pnetid to match rdma device, not <ethx>'s pnetid. The asymmetric > use of the pnetid seems weird. Sometimes it is difficult to know > the hierarchy of net device what may make it difficult to configure > the pnetid and to use the pnetid. Looking into the history of > commit, it was the commit 890a2cb4a966 ("net/smc: rework pnet table") > that changes the ndev from the <ethx> to the <ethx>'s base ndev > when finding pnetid by pnetid table. It seems a mistake. > > This patch changes the ndev back to the <ethx> when finding pnetid > by pnetid table. SMC maintainers, please review..
On 2024/12/27 12:04, Guangguan Wang wrote: > The command 'smc_pnet -a -I <ethx> <pnetid>' will add <pnetid> > to the pnetid table and will attach the <pnetid> to net device > whose name is <ethx>. But When do SMCR by <ethx>, in function > smc_pnet_find_roce_by_pnetid, it will use <ethx>'s base ndev's > pnetid to match rdma device, not <ethx>'s pnetid. The asymmetric > use of the pnetid seems weird. Sometimes it is difficult to know > the hierarchy of net device what may make it difficult to configure > the pnetid and to use the pnetid. Looking into the history of > commit, it was the commit 890a2cb4a966 ("net/smc: rework pnet table") > that changes the ndev from the <ethx> to the <ethx>'s base ndev > when finding pnetid by pnetid table. It seems a mistake. > > This patch changes the ndev back to the <ethx> when finding pnetid > by pnetid table. > > Fixes: 890a2cb4a966 ("net/smc: rework pnet table") > Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> It makes sense to me, thanks! Reviewed-by: Wen Gu <guwen@linux.alibaba.com> > --- > net/smc/smc_pnet.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c > index 716808f374a8..cc098780970b 100644 > --- a/net/smc/smc_pnet.c > +++ b/net/smc/smc_pnet.c > @@ -1079,14 +1079,15 @@ static void smc_pnet_find_roce_by_pnetid(struct net_device *ndev, > struct smc_init_info *ini) > { > u8 ndev_pnetid[SMC_MAX_PNETID_LEN]; > + struct net_device *base_ndev; > struct net *net; > > - ndev = pnet_find_base_ndev(ndev); > + base_ndev = pnet_find_base_ndev(ndev); > net = dev_net(ndev); > - if (smc_pnetid_by_dev_port(ndev->dev.parent, ndev->dev_port, > + if (smc_pnetid_by_dev_port(base_ndev->dev.parent, base_ndev->dev_port, > ndev_pnetid) && > smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid)) { > - smc_pnet_find_rdma_dev(ndev, ini); > + smc_pnet_find_rdma_dev(base_ndev, ini); > return; /* pnetid could not be determined */ > } > _smc_pnet_find_roce_by_pnetid(ndev_pnetid, ini, NULL, net);
On 12/27/24 5:04 AM, Guangguan Wang wrote: > The command 'smc_pnet -a -I <ethx> <pnetid>' will add <pnetid> > to the pnetid table and will attach the <pnetid> to net device > whose name is <ethx>. But When do SMCR by <ethx>, in function > smc_pnet_find_roce_by_pnetid, it will use <ethx>'s base ndev's > pnetid to match rdma device, not <ethx>'s pnetid. The asymmetric > use of the pnetid seems weird. Sometimes it is difficult to know > the hierarchy of net device what may make it difficult to configure > the pnetid and to use the pnetid. Looking into the history of > commit, it was the commit 890a2cb4a966 ("net/smc: rework pnet table") > that changes the ndev from the <ethx> to the <ethx>'s base ndev > when finding pnetid by pnetid table. It seems a mistake. > > This patch changes the ndev back to the <ethx> when finding pnetid > by pnetid table. > > Fixes: 890a2cb4a966 ("net/smc: rework pnet table") > Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> If I read correctly, this will break existing applications using the lookup schema introduced by the blamed commit - which is not very recent. Perhaps for a net patch would be better to support both lookup schemas i.e. (smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid) || smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid)) ? Thanks, Paolo
On Tue, 7 Jan 2025 09:44:30 +0100 Paolo Abeni <pabeni@redhat.com> wrote: > On 12/27/24 5:04 AM, Guangguan Wang wrote: @Guangguan Wang: please use my linux.ibm.com address in the future. > > The command 'smc_pnet -a -I <ethx> <pnetid>' will add <pnetid> > > to the pnetid table and will attach the <pnetid> to net device > > whose name is <ethx>. But When do SMCR by <ethx>, in function > > smc_pnet_find_roce_by_pnetid, it will use <ethx>'s base ndev's > > pnetid to match rdma device, not <ethx>'s pnetid. The asymmetric > > use of the pnetid seems weird. Sometimes it is difficult to know > > the hierarchy of net device what may make it difficult to configure > > the pnetid and to use the pnetid. Looking into the history of > > commit, it was the commit 890a2cb4a966 ("net/smc: rework pnet table") > > that changes the ndev from the <ethx> to the <ethx>'s base ndev > > when finding pnetid by pnetid table. It seems a mistake. > > > > This patch changes the ndev back to the <ethx> when finding pnetid > > by pnetid table. > > > > Fixes: 890a2cb4a966 ("net/smc: rework pnet table") > > Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> > > If I read correctly, this will break existing applications using the > lookup schema introduced by the blamed commit - which is not very > recent. Hi Paolo, sorry for chiming in late. Wenjia is on vacation and Jan is out sick! After some reading and thinking I could not figure out how 890a2cb4a966 ("net/smc: rework pnet table") is broken. Admittedly I'm not really a net guy,and I'm mostly guessing what that lower and upper device stuff is, so please bear with me. All that said, I do think that going to the lowest netdev in the hierarchy is a sane thing to do here. I assume that lower and upper devices are applicable to stuff like bonding. PNETID stands for "Physical Network Identifier" and the idea is that iff two ports are connected to the same physical network then they should have the same PNETID. And on s390 PNETID can come and often is comming "from the hardware". Now for something like a bond of two OSA interfaces, I would expect the two legs of the bond to probably have a "HW PNETID", but the netdev representing the bond itself won't have one unless the Linux admin defines a software PNETID, which is work, and can't have a HW PNETID because it is a software construct within Linux. Breaking for example an active-backup bond setup where the legs have HW PNETIDs and the admin did not bother to specify a PNETID for the bond is not acceptable. Let me also note that if ndev is a leaf (i.e. there is no lower device to it) then ndev == base_ndev, and the whole discussion does not matter for that case. Again I have to emphasize that my domain knowledge is very limited, but I really don't feel comfortable going forward with this without Jan or Wenjia weighing in on the matter. Paolo thanks for bringing this up! > > Perhaps for a net patch would be better to support both lookup schemas > i.e. > > (smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid) || > smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid)) > > ? > Hm, I guess the idea here is that if ndev has a PNETID then it should take precedence, but if not we should try to obtain the PNETID of its "base_ndev". I'm not sure this would make things better compared to the original idea of caring about the leaf. Which makes me question my understanding of the problem statement from the commit message. BTW to implement the logic proposed by you Paolo, as understood by me, we would have to use "&&" instead of "||". The whole expression is supposed to evaluate to false if a pnetid is found and to true if no pnet_id is found. smc_pnet_find_ndev_pnetid_by_table(ndev) returns false if a pnetid is found. I.e. if not found we would just short circuit to true and not call smc_pnet_find_ndev_pnetid_by_table(base_ndev), which is not what I believe you wanted to propose. To sum it up, please let us wait until Wenjia or Jan chime in. Copying Alexandra as well: she is more of a net person than I am, and maybe she has a more informed opinion. Regards, Halil [...]
On 2025/1/8 03:32, Halil Pasic wrote: > On Tue, 7 Jan 2025 09:44:30 +0100 > Paolo Abeni <pabeni@redhat.com> wrote: > >> On 12/27/24 5:04 AM, Guangguan Wang wrote: > > @Guangguan Wang: please use my linux.ibm.com address > in the future. Get it. > >>> The command 'smc_pnet -a -I <ethx> <pnetid>' will add <pnetid> >>> to the pnetid table and will attach the <pnetid> to net device >>> whose name is <ethx>. But When do SMCR by <ethx>, in function >>> smc_pnet_find_roce_by_pnetid, it will use <ethx>'s base ndev's >>> pnetid to match rdma device, not <ethx>'s pnetid. The asymmetric >>> use of the pnetid seems weird. Sometimes it is difficult to know >>> the hierarchy of net device what may make it difficult to configure >>> the pnetid and to use the pnetid. Looking into the history of >>> commit, it was the commit 890a2cb4a966 ("net/smc: rework pnet table") >>> that changes the ndev from the <ethx> to the <ethx>'s base ndev >>> when finding pnetid by pnetid table. It seems a mistake. >>> >>> This patch changes the ndev back to the <ethx> when finding pnetid >>> by pnetid table. >>> >>> Fixes: 890a2cb4a966 ("net/smc: rework pnet table") >>> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> >> >> If I read correctly, this will break existing applications using the >> lookup schema introduced by the blamed commit - which is not very >> recent. > > Hi Paolo, > > sorry for chiming in late. Wenjia is on vacation and Jan is out sick! > After some reading and thinking I could not figure out how 890a2cb4a966 > ("net/smc: rework pnet table") is broken. Before commit 890a2cb4a966: smc_pnet_find_roce_resource smc_pnet_find_roce_by_pnetid(ndev, ...) /* lookup via hardware-defined pnetid */ smc_pnetid_by_dev_port(base_ndev, ...) smc_pnet_find_roce_by_table(ndev, ...) /* lookup via SMC PNET table */ { ... list_for_each_entry(pnetelem, &smc_pnettable.pnetlist, list) { if (ndev == pnetelem->ndev) { /* notice here, it was ndev to matching pnetid element in pnet table */ ... } After commit 890a2cb4a966: smc_pnet_find_roce_resource smc_pnet_find_roce_by_pnetid { ... base_ndev = pnet_find_base_ndev(ndev); /* rename the variable name to base_ndev for better understanding */ smc_pnetid_by_dev_port(base_ndev, ...) smc_pnet_find_ndev_pnetid_by_table(base_ndev, ...) { ... list_for_each_entry(pnetelem, &smc_pnettable.pnetlist, list) { if (base_ndev == pnetelem->ndev) { /* notice here, it is base_ndev to matching pnetid element in pnet table */ ... } } The commit 890a2cb4a966 has changed ndev to base_ndev when matching pnetid element in pnet table. But in the function smc_pnet_add_eth, the pnetid is attached to the ndev itself, not the base_ndev. smc_pnet_add_eth(...) { ... ndev = dev_get_by_name(net, eth_name); ... if (new_netdev) { if (ndev) { new_pe->ndev = ndev; netdev_tracker_alloc(ndev, &new_pe->dev_tracker, GFP_ATOMIC); } list_add_tail(&new_pe->list, &pnettable->pnetlist); mutex_unlock(&pnettable->lock); } else { ... } > > Admittedly I'm not really a net guy,and I'm mostly guessing what that > lower and upper device stuff is, so please bear with me. All that said, I > do think that going to the lowest netdev in the hierarchy is a sane > thing to do here. I assume that lower and upper devices are applicable > to stuff like bonding. > > PNETID stands for "Physical Network Identifier" and the idea is that iff > two ports are connected to the same physical network then they should > have the same PNETID. And on s390 PNETID can come and often is comming > "from the hardware". Now for something like a bond of two OSA > interfaces, I would expect the two legs of the bond to probably have a > "HW PNETID", but the netdev representing the bond itself won't have one > unless the Linux admin defines a software PNETID, which is work, and > can't have a HW PNETID because it is a software construct within Linux. > Breaking for example an active-backup bond setup where the legs have > HW PNETIDs and the admin did not bother to specify a PNETID for the bond > is not acceptable. > > Let me also note that if ndev is a leaf (i.e. there is no lower device to > it) then ndev == base_ndev, and the whole discussion does not matter for > that case. > > Again I have to emphasize that my domain knowledge is very limited, but > I really don't feel comfortable going forward with this without Jan or > Wenjia weighing in on the matter. > > Paolo thanks for bringing this up! > >> >> Perhaps for a net patch would be better to support both lookup schemas >> i.e. >> >> (smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid) || >> smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid)) >> >> ? >> > > Hm, I guess the idea here is that if ndev has a PNETID then it should > take precedence, but if not we should try to obtain the PNETID of its > "base_ndev". I'm not sure this would make things better compared to the > original idea of caring about the leaf. Which makes me question my > understanding of the problem statement from the commit message. > > BTW to implement the logic proposed by you Paolo, as understood by me, > we would have to use "&&" instead of "||". The whole expression is > supposed > to evaluate to false if a pnetid is found and to true if no pnet_id is > found. smc_pnet_find_ndev_pnetid_by_table(ndev) returns false if a pnetid > is found. I.e. if not found we would just short circuit to true and not > call smc_pnet_find_ndev_pnetid_by_table(base_ndev), which is not what I > believe you wanted to propose. Yes, it should be (smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid) && smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid)) if for the consideration of application's compatible usage of smc_pnet. Thanks, Guangguan Wang > > To sum it up, please let us wait until Wenjia or Jan chime in. Copying > Alexandra as well: she is more of a net person than I am, and maybe she > has a more informed opinion. > > Regards, > Halil > [...] >
On 07.01.25 20:32, Halil Pasic wrote: > On Tue, 7 Jan 2025 09:44:30 +0100 > Paolo Abeni <pabeni@redhat.com> wrote: > >> On 12/27/24 5:04 AM, Guangguan Wang wrote: ... >>> The command 'smc_pnet -a -I <ethx> <pnetid>' will add <pnetid> >>> to the pnetid table and will attach the <pnetid> to net device >>> whose name is <ethx>. But When do SMCR by <ethx>, in function >>> smc_pnet_find_roce_by_pnetid, it will use <ethx>'s base ndev's >>> pnetid to match rdma device, not <ethx>'s pnetid. The asymmetric >>> use of the pnetid seems weird. Sometimes it is difficult to know >>> the hierarchy of net device what may make it difficult to configure >>> the pnetid and to use the pnetid. Looking into the history of >>> commit, it was the commit 890a2cb4a966 ("net/smc: rework pnet table") >>> that changes the ndev from the <ethx> to the <ethx>'s base ndev >>> when finding pnetid by pnetid table. It seems a mistake. >>> >>> This patch changes the ndev back to the <ethx> when finding pnetid >>> by pnetid table. >>> >>> Fixes: 890a2cb4a966 ("net/smc: rework pnet table") >>> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> >> >> If I read correctly, this will break existing applications using the >> lookup schema introduced by the blamed commit - which is not very >> recent. I agree that this patch may break existing applications or existing configuration automation scripts. ... > PNETID stands for "Physical Network Identifier" and the idea is that iff > two ports are connected to the same physical network then they should > have the same PNETID. And on s390 PNETID can come and often is comming > "from the hardware". ... HW pnetids (smc_pnetid_by_dev_port()) are only visible at the base netdevice. It seems that the pnetid table, managed by the smc_pnet tool, tries to mimick that behaviour, and the concept (recommendation?) would be to set the user-defined pnetid also for the base netdevice and then use the upper level netdevices for the tcp connection. Which makes some sense, all upper level devices have the same connectivity as the base device. So this patch would break a setup that follows this concept and only sets the pnetid at the base netdevice. Optionally you can set a user-defined pnetid on upper level devices (maybe for usability??), but as Guangguan noticed, that has no practical impact. In the documentation I see examples where the same pnetid is set for upper and base device. You cannot set a user-defined pnetid on a upper level device, if the base device has a HW pnetid (smc_pnet_add_eth()) which makes some sense, not even the same pnetid (makes less sense IMO). However you can set different user-defined pnetids on the upper netdevices and the base device, which makes no sense to me. >> >> Perhaps for a net patch would be better to support both lookup schemas >> i.e. >> >> (smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid) || >> smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid)) >> >> ? This may yield undesirable results, if base pnetid and upper pnetid differ.. But then maybe GIGO? ... > BTW to implement the logic proposed by you Paolo, as understood by me, > we would have to use "&&" instead of "||". +1 Another idea may be to change the setting of a user-defined pnetid on an upper level netdevice to - fail if the base netdevice has a different pnetid - set the pnetid of the base device , if it is currently unset.
On Wed, 8 Jan 2025 12:57:00 +0800 Guangguan Wang <guangguan.wang@linux.alibaba.com> wrote: > > sorry for chiming in late. Wenjia is on vacation and Jan is out sick! > > After some reading and thinking I could not figure out how 890a2cb4a966 > > ("net/smc: rework pnet table") is broken. > > Before commit 890a2cb4a966: > smc_pnet_find_roce_resource > smc_pnet_find_roce_by_pnetid(ndev, ...) /* lookup via hardware-defined pnetid */ > smc_pnetid_by_dev_port(base_ndev, ...) > smc_pnet_find_roce_by_table(ndev, ...) /* lookup via SMC PNET table */ > { > ... > list_for_each_entry(pnetelem, &smc_pnettable.pnetlist, list) { > if (ndev == pnetelem->ndev) { /* notice here, it was ndev to matching pnetid element in pnet table */ > ... > } > > After commit 890a2cb4a966: > smc_pnet_find_roce_resource > smc_pnet_find_roce_by_pnetid > { > ... > base_ndev = pnet_find_base_ndev(ndev); /* rename the variable name to base_ndev for better understanding */ > smc_pnetid_by_dev_port(base_ndev, ...) > smc_pnet_find_ndev_pnetid_by_table(base_ndev, ...) > { > ... > list_for_each_entry(pnetelem, &smc_pnettable.pnetlist, list) { > if (base_ndev == pnetelem->ndev) { /* notice here, it is base_ndev to matching pnetid element in pnet table */ > ... > } > > } > > The commit 890a2cb4a966 has changed ndev to base_ndev when matching pnetid element in pnet table. > But in the function smc_pnet_add_eth, the pnetid is attached to the ndev itself, not the base_ndev. > smc_pnet_add_eth(...) > { > ... > ndev = dev_get_by_name(net, eth_name); > ... > if (new_netdev) { > if (ndev) { > new_pe->ndev = ndev; > netdev_tracker_alloc(ndev, &new_pe->dev_tracker, > GFP_ATOMIC); > } > list_add_tail(&new_pe->list, &pnettable->pnetlist); > mutex_unlock(&pnettable->lock); > } else { > ... > } I still not understand why do you think that 890a2cb4a966~1 is better than 890a2cb4a966 even if things changed with 890a2cb4a966 which I did not verify for myself but am willing to assume. Is there some particular setup that you think would benefit from you patch? I.e. going back to the 890a2cb4a966~1 behavior I suppose. I think I showed a valid and practical setup that would break with your patch as is. Do you agree with that statement? Regards, Halil
diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c index 716808f374a8..cc098780970b 100644 --- a/net/smc/smc_pnet.c +++ b/net/smc/smc_pnet.c @@ -1079,14 +1079,15 @@ static void smc_pnet_find_roce_by_pnetid(struct net_device *ndev, struct smc_init_info *ini) { u8 ndev_pnetid[SMC_MAX_PNETID_LEN]; + struct net_device *base_ndev; struct net *net; - ndev = pnet_find_base_ndev(ndev); + base_ndev = pnet_find_base_ndev(ndev); net = dev_net(ndev); - if (smc_pnetid_by_dev_port(ndev->dev.parent, ndev->dev_port, + if (smc_pnetid_by_dev_port(base_ndev->dev.parent, base_ndev->dev_port, ndev_pnetid) && smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid)) { - smc_pnet_find_rdma_dev(ndev, ini); + smc_pnet_find_rdma_dev(base_ndev, ini); return; /* pnetid could not be determined */ } _smc_pnet_find_roce_by_pnetid(ndev_pnetid, ini, NULL, net);
The command 'smc_pnet -a -I <ethx> <pnetid>' will add <pnetid> to the pnetid table and will attach the <pnetid> to net device whose name is <ethx>. But When do SMCR by <ethx>, in function smc_pnet_find_roce_by_pnetid, it will use <ethx>'s base ndev's pnetid to match rdma device, not <ethx>'s pnetid. The asymmetric use of the pnetid seems weird. Sometimes it is difficult to know the hierarchy of net device what may make it difficult to configure the pnetid and to use the pnetid. Looking into the history of commit, it was the commit 890a2cb4a966 ("net/smc: rework pnet table") that changes the ndev from the <ethx> to the <ethx>'s base ndev when finding pnetid by pnetid table. It seems a mistake. This patch changes the ndev back to the <ethx> when finding pnetid by pnetid table. Fixes: 890a2cb4a966 ("net/smc: rework pnet table") Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> --- net/smc/smc_pnet.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)