diff mbox series

[net] net/smc: use the correct ndev to find pnetid by pnetid table

Message ID 20241227040455.91854-1-guangguan.wang@linux.alibaba.com (mailing list archive)
State Not Applicable
Headers show
Series [net] net/smc: use the correct ndev to find pnetid by pnetid table | expand

Commit Message

Guangguan Wang Dec. 27, 2024, 4:04 a.m. UTC
The command 'smc_pnet -a -I <ethx> <pnetid>' will add <pnetid>
to the pnetid table and will attach the <pnetid> to net device
whose name is <ethx>. But When do SMCR by <ethx>, in function
smc_pnet_find_roce_by_pnetid, it will use <ethx>'s base ndev's
pnetid to match rdma device, not <ethx>'s pnetid. The asymmetric
use of the pnetid seems weird. Sometimes it is difficult to know
the hierarchy of net device what may make it difficult to configure
the pnetid and to use the pnetid. Looking into the history of
commit, it was the commit 890a2cb4a966 ("net/smc: rework pnet table")
that changes the ndev from the <ethx> to the <ethx>'s base ndev
when finding pnetid by pnetid table. It seems a mistake.

This patch changes the ndev back to the <ethx> when finding pnetid
by pnetid table.

Fixes: 890a2cb4a966 ("net/smc: rework pnet table")
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
---
 net/smc/smc_pnet.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Jakub Kicinski Jan. 4, 2025, 4:40 p.m. UTC | #1
On Fri, 27 Dec 2024 12:04:55 +0800 Guangguan Wang wrote:
> The command 'smc_pnet -a -I <ethx> <pnetid>' will add <pnetid>
> to the pnetid table and will attach the <pnetid> to net device
> whose name is <ethx>. But When do SMCR by <ethx>, in function
> smc_pnet_find_roce_by_pnetid, it will use <ethx>'s base ndev's
> pnetid to match rdma device, not <ethx>'s pnetid. The asymmetric
> use of the pnetid seems weird. Sometimes it is difficult to know
> the hierarchy of net device what may make it difficult to configure
> the pnetid and to use the pnetid. Looking into the history of
> commit, it was the commit 890a2cb4a966 ("net/smc: rework pnet table")
> that changes the ndev from the <ethx> to the <ethx>'s base ndev
> when finding pnetid by pnetid table. It seems a mistake.
> 
> This patch changes the ndev back to the <ethx> when finding pnetid
> by pnetid table.

SMC maintainers, please review..
Wen Gu Jan. 7, 2025, 2:17 a.m. UTC | #2
On 2024/12/27 12:04, Guangguan Wang wrote:
> The command 'smc_pnet -a -I <ethx> <pnetid>' will add <pnetid>
> to the pnetid table and will attach the <pnetid> to net device
> whose name is <ethx>. But When do SMCR by <ethx>, in function
> smc_pnet_find_roce_by_pnetid, it will use <ethx>'s base ndev's
> pnetid to match rdma device, not <ethx>'s pnetid. The asymmetric
> use of the pnetid seems weird. Sometimes it is difficult to know
> the hierarchy of net device what may make it difficult to configure
> the pnetid and to use the pnetid. Looking into the history of
> commit, it was the commit 890a2cb4a966 ("net/smc: rework pnet table")
> that changes the ndev from the <ethx> to the <ethx>'s base ndev
> when finding pnetid by pnetid table. It seems a mistake.
> 
> This patch changes the ndev back to the <ethx> when finding pnetid
> by pnetid table.
> 
> Fixes: 890a2cb4a966 ("net/smc: rework pnet table")
> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>

It makes sense to me, thanks!

Reviewed-by: Wen Gu <guwen@linux.alibaba.com>

> ---
>   net/smc/smc_pnet.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c
> index 716808f374a8..cc098780970b 100644
> --- a/net/smc/smc_pnet.c
> +++ b/net/smc/smc_pnet.c
> @@ -1079,14 +1079,15 @@ static void smc_pnet_find_roce_by_pnetid(struct net_device *ndev,
>   					 struct smc_init_info *ini)
>   {
>   	u8 ndev_pnetid[SMC_MAX_PNETID_LEN];
> +	struct net_device *base_ndev;
>   	struct net *net;
>   
> -	ndev = pnet_find_base_ndev(ndev);
> +	base_ndev = pnet_find_base_ndev(ndev);
>   	net = dev_net(ndev);
> -	if (smc_pnetid_by_dev_port(ndev->dev.parent, ndev->dev_port,
> +	if (smc_pnetid_by_dev_port(base_ndev->dev.parent, base_ndev->dev_port,
>   				   ndev_pnetid) &&
>   	    smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid)) {
> -		smc_pnet_find_rdma_dev(ndev, ini);
> +		smc_pnet_find_rdma_dev(base_ndev, ini);
>   		return; /* pnetid could not be determined */
>   	}
>   	_smc_pnet_find_roce_by_pnetid(ndev_pnetid, ini, NULL, net);
Paolo Abeni Jan. 7, 2025, 8:44 a.m. UTC | #3
On 12/27/24 5:04 AM, Guangguan Wang wrote:
> The command 'smc_pnet -a -I <ethx> <pnetid>' will add <pnetid>
> to the pnetid table and will attach the <pnetid> to net device
> whose name is <ethx>. But When do SMCR by <ethx>, in function
> smc_pnet_find_roce_by_pnetid, it will use <ethx>'s base ndev's
> pnetid to match rdma device, not <ethx>'s pnetid. The asymmetric
> use of the pnetid seems weird. Sometimes it is difficult to know
> the hierarchy of net device what may make it difficult to configure
> the pnetid and to use the pnetid. Looking into the history of
> commit, it was the commit 890a2cb4a966 ("net/smc: rework pnet table")
> that changes the ndev from the <ethx> to the <ethx>'s base ndev
> when finding pnetid by pnetid table. It seems a mistake.
> 
> This patch changes the ndev back to the <ethx> when finding pnetid
> by pnetid table.
> 
> Fixes: 890a2cb4a966 ("net/smc: rework pnet table")
> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>

If I read correctly, this will break existing applications using the
lookup schema introduced by the blamed commit - which is not very recent.

Perhaps for a net patch would be better to support both lookup schemas i.e.

	(smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid) ||
	 smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid))

?

Thanks,

Paolo
Halil Pasic Jan. 7, 2025, 7:32 p.m. UTC | #4
On Tue, 7 Jan 2025 09:44:30 +0100
Paolo Abeni <pabeni@redhat.com> wrote:

> On 12/27/24 5:04 AM, Guangguan Wang wrote:

@Guangguan Wang: please use my linux.ibm.com address
in the future.

> > The command 'smc_pnet -a -I <ethx> <pnetid>' will add <pnetid>
> > to the pnetid table and will attach the <pnetid> to net device
> > whose name is <ethx>. But When do SMCR by <ethx>, in function
> > smc_pnet_find_roce_by_pnetid, it will use <ethx>'s base ndev's
> > pnetid to match rdma device, not <ethx>'s pnetid. The asymmetric
> > use of the pnetid seems weird. Sometimes it is difficult to know
> > the hierarchy of net device what may make it difficult to configure
> > the pnetid and to use the pnetid. Looking into the history of
> > commit, it was the commit 890a2cb4a966 ("net/smc: rework pnet table")
> > that changes the ndev from the <ethx> to the <ethx>'s base ndev
> > when finding pnetid by pnetid table. It seems a mistake.
> > 
> > This patch changes the ndev back to the <ethx> when finding pnetid
> > by pnetid table.
> > 
> > Fixes: 890a2cb4a966 ("net/smc: rework pnet table")
> > Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>  
> 
> If I read correctly, this will break existing applications using the
> lookup schema introduced by the blamed commit - which is not very
> recent.

Hi Paolo,

sorry for chiming in late. Wenjia is on vacation and Jan is out sick!
After some reading and thinking I could not figure out how 890a2cb4a966
("net/smc: rework pnet table") is broken.

Admittedly I'm not really a net guy,and I'm mostly guessing what that
lower and upper device stuff is, so please bear with me. All that said, I
do think that going to the lowest netdev in the hierarchy is a sane
thing to do here.  I assume  that lower and upper devices are applicable
to stuff like bonding. 

PNETID stands for "Physical Network Identifier" and the idea is that iff
two ports are connected to the same physical network then they should
have the same PNETID. And on s390 PNETID can come and often is comming
"from the hardware". Now for something like a bond of two OSA
interfaces, I would expect the two legs of the bond to probably have a
"HW PNETID", but the netdev representing the bond itself won't have one
unless the Linux admin defines a software PNETID, which is work, and
can't have a HW PNETID because it is a software construct within Linux.
Breaking for example an active-backup bond setup where the legs have
HW PNETIDs and the admin did not bother to specify a PNETID for the bond
is not acceptable.

Let me also note that if ndev is a leaf (i.e. there is no lower device to
it) then ndev == base_ndev, and the whole discussion does not matter for
that case.

Again I have to emphasize that my domain knowledge is very limited, but
I really don't feel comfortable going forward with this without Jan or
Wenjia weighing in on the matter.

Paolo thanks for bringing this up!

> 
> Perhaps for a net patch would be better to support both lookup schemas
> i.e.
> 
> 	(smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid) ||
> 	 smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid))
> 
> ?
>

Hm, I guess the idea here is that if ndev has a PNETID then it should
take precedence, but if not we should try to obtain the PNETID of its
"base_ndev". I'm not sure this would make things better compared to the
original idea of caring about the leaf. Which makes me question my
understanding of the problem statement from the commit message.

BTW to implement the logic proposed by you Paolo, as understood by me,
we would have to use "&&" instead of "||". The whole expression is
supposed
to evaluate to false if a pnetid is found and to true if no pnet_id is
found. smc_pnet_find_ndev_pnetid_by_table(ndev) returns false if a pnetid
is found. I.e. if not found we would just short circuit to true and not
call smc_pnet_find_ndev_pnetid_by_table(base_ndev), which is not what I
believe you wanted to propose.

To sum it up, please let us wait until Wenjia or Jan chime in. Copying
Alexandra as well: she is more of a net person than I am, and maybe she
has a more informed opinion.

Regards,
Halil
[...]
Guangguan Wang Jan. 8, 2025, 4:57 a.m. UTC | #5
On 2025/1/8 03:32, Halil Pasic wrote:
> On Tue, 7 Jan 2025 09:44:30 +0100
> Paolo Abeni <pabeni@redhat.com> wrote:
> 
>> On 12/27/24 5:04 AM, Guangguan Wang wrote:
> 
> @Guangguan Wang: please use my linux.ibm.com address
> in the future.
Get it.

> 
>>> The command 'smc_pnet -a -I <ethx> <pnetid>' will add <pnetid>
>>> to the pnetid table and will attach the <pnetid> to net device
>>> whose name is <ethx>. But When do SMCR by <ethx>, in function
>>> smc_pnet_find_roce_by_pnetid, it will use <ethx>'s base ndev's
>>> pnetid to match rdma device, not <ethx>'s pnetid. The asymmetric
>>> use of the pnetid seems weird. Sometimes it is difficult to know
>>> the hierarchy of net device what may make it difficult to configure
>>> the pnetid and to use the pnetid. Looking into the history of
>>> commit, it was the commit 890a2cb4a966 ("net/smc: rework pnet table")
>>> that changes the ndev from the <ethx> to the <ethx>'s base ndev
>>> when finding pnetid by pnetid table. It seems a mistake.
>>>
>>> This patch changes the ndev back to the <ethx> when finding pnetid
>>> by pnetid table.
>>>
>>> Fixes: 890a2cb4a966 ("net/smc: rework pnet table")
>>> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>  
>>
>> If I read correctly, this will break existing applications using the
>> lookup schema introduced by the blamed commit - which is not very
>> recent.
> 
> Hi Paolo,
> 
> sorry for chiming in late. Wenjia is on vacation and Jan is out sick!
> After some reading and thinking I could not figure out how 890a2cb4a966
> ("net/smc: rework pnet table") is broken.

Before commit 890a2cb4a966:
smc_pnet_find_roce_resource
    smc_pnet_find_roce_by_pnetid(ndev, ...) /* lookup via hardware-defined pnetid */
        smc_pnetid_by_dev_port(base_ndev, ...)
    smc_pnet_find_roce_by_table(ndev, ...) /* lookup via SMC PNET table */
    {
        ...
        list_for_each_entry(pnetelem, &smc_pnettable.pnetlist, list) {
                if (ndev == pnetelem->ndev) { /* notice here, it was ndev to matching pnetid element in pnet table */
        ...
    }

After commit 890a2cb4a966:
smc_pnet_find_roce_resource
    smc_pnet_find_roce_by_pnetid
    {
        ...
        base_ndev = pnet_find_base_ndev(ndev); /* rename the variable name to base_ndev for better understanding */
        smc_pnetid_by_dev_port(base_ndev, ...)
        smc_pnet_find_ndev_pnetid_by_table(base_ndev, ...)
        {
                ...
                list_for_each_entry(pnetelem, &smc_pnettable.pnetlist, list) {
                if (base_ndev == pnetelem->ndev) { /* notice here, it is base_ndev to matching pnetid element in pnet table */
                ...
        }

    }

The commit 890a2cb4a966 has changed ndev to base_ndev when matching pnetid element in pnet table.
But in the function smc_pnet_add_eth, the pnetid is attached to the ndev itself, not the base_ndev.
smc_pnet_add_eth(...)
{
    ...
    ndev = dev_get_by_name(net, eth_name);
    ...
        if (new_netdev) {
            if (ndev) {
                new_pe->ndev = ndev;
                netdev_tracker_alloc(ndev, &new_pe->dev_tracker,
                    GFP_ATOMIC);
            }
            list_add_tail(&new_pe->list, &pnettable->pnetlist);
            mutex_unlock(&pnettable->lock);
        } else {
    ...
}

> 
> Admittedly I'm not really a net guy,and I'm mostly guessing what that
> lower and upper device stuff is, so please bear with me. All that said, I
> do think that going to the lowest netdev in the hierarchy is a sane
> thing to do here.  I assume  that lower and upper devices are applicable
> to stuff like bonding. 
> 
> PNETID stands for "Physical Network Identifier" and the idea is that iff
> two ports are connected to the same physical network then they should
> have the same PNETID. And on s390 PNETID can come and often is comming
> "from the hardware". Now for something like a bond of two OSA
> interfaces, I would expect the two legs of the bond to probably have a
> "HW PNETID", but the netdev representing the bond itself won't have one
> unless the Linux admin defines a software PNETID, which is work, and
> can't have a HW PNETID because it is a software construct within Linux.
> Breaking for example an active-backup bond setup where the legs have
> HW PNETIDs and the admin did not bother to specify a PNETID for the bond
> is not acceptable.
> 
> Let me also note that if ndev is a leaf (i.e. there is no lower device to
> it) then ndev == base_ndev, and the whole discussion does not matter for
> that case.
> 
> Again I have to emphasize that my domain knowledge is very limited, but
> I really don't feel comfortable going forward with this without Jan or
> Wenjia weighing in on the matter.
> 
> Paolo thanks for bringing this up!
> 
>>
>> Perhaps for a net patch would be better to support both lookup schemas
>> i.e.
>>
>> 	(smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid) ||
>> 	 smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid))
>>
>> ?
>>
> 
> Hm, I guess the idea here is that if ndev has a PNETID then it should
> take precedence, but if not we should try to obtain the PNETID of its
> "base_ndev". I'm not sure this would make things better compared to the
> original idea of caring about the leaf. Which makes me question my
> understanding of the problem statement from the commit message.
> 
> BTW to implement the logic proposed by you Paolo, as understood by me,
> we would have to use "&&" instead of "||". The whole expression is
> supposed
> to evaluate to false if a pnetid is found and to true if no pnet_id is
> found. smc_pnet_find_ndev_pnetid_by_table(ndev) returns false if a pnetid
> is found. I.e. if not found we would just short circuit to true and not
> call smc_pnet_find_ndev_pnetid_by_table(base_ndev), which is not what I
> believe you wanted to propose.

Yes, it should be
        (smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid) &&
 	 smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid))
if for the consideration of application's compatible usage of smc_pnet.

Thanks,
Guangguan Wang

> 
> To sum it up, please let us wait until Wenjia or Jan chime in. Copying
> Alexandra as well: she is more of a net person than I am, and maybe she
> has a more informed opinion.
> 
> Regards,
> Halil
> [...]
>
diff mbox series

Patch

diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c
index 716808f374a8..cc098780970b 100644
--- a/net/smc/smc_pnet.c
+++ b/net/smc/smc_pnet.c
@@ -1079,14 +1079,15 @@  static void smc_pnet_find_roce_by_pnetid(struct net_device *ndev,
 					 struct smc_init_info *ini)
 {
 	u8 ndev_pnetid[SMC_MAX_PNETID_LEN];
+	struct net_device *base_ndev;
 	struct net *net;
 
-	ndev = pnet_find_base_ndev(ndev);
+	base_ndev = pnet_find_base_ndev(ndev);
 	net = dev_net(ndev);
-	if (smc_pnetid_by_dev_port(ndev->dev.parent, ndev->dev_port,
+	if (smc_pnetid_by_dev_port(base_ndev->dev.parent, base_ndev->dev_port,
 				   ndev_pnetid) &&
 	    smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid)) {
-		smc_pnet_find_rdma_dev(ndev, ini);
+		smc_pnet_find_rdma_dev(base_ndev, ini);
 		return; /* pnetid could not be determined */
 	}
 	_smc_pnet_find_roce_by_pnetid(ndev_pnetid, ini, NULL, net);