Message ID | 1620219241-24979-1-git-send-email-haakon.bugge@oracle.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | [for-rc] IB/cma: Fix false P_Key mismatch messages | expand |
On Wed, May 05, 2021 at 02:54:01PM +0200, Håkon Bugge wrote: > There are three conditions that must be fulfilled in order to consider > a partition match. Those are: > > 1. Both P_Keys must valid > 2. At least one must be a full member > 3. The partitions (lower 15 bits) must match > > In system employing both limited and full membership ports, we see > these false warning messages: > > RDMA CMA: got different BTH P_Key (0x2a00) and primary path P_Key (0xaa00) > RDMA CMA: in the future this may cause the request to be dropped > > even though the partition is the same. > > See IBTA 10.9.1.2 Special P_Keys and 10.9.3 Partition Key Matching for > a reference. > > Fixes: 84424a7fc793 ("IB/cma: Print warning on different inner and header P_Keys") > Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> > --- > drivers/infiniband/core/cma.c | 22 ++++++++++++++++++++-- > 1 file changed, 20 insertions(+), 2 deletions(-) What is this trying to fix? IMHO it is a bug on the sender side to send GMPs to use a pkey that doesn't exactly match the data path pkey. Jason
> On 10 May 2021, at 19:04, Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Wed, May 05, 2021 at 02:54:01PM +0200, Håkon Bugge wrote: >> There are three conditions that must be fulfilled in order to consider >> a partition match. Those are: >> >> 1. Both P_Keys must valid >> 2. At least one must be a full member >> 3. The partitions (lower 15 bits) must match >> >> In system employing both limited and full membership ports, we see >> these false warning messages: >> >> RDMA CMA: got different BTH P_Key (0x2a00) and primary path P_Key (0xaa00) >> RDMA CMA: in the future this may cause the request to be dropped >> >> even though the partition is the same. >> >> See IBTA 10.9.1.2 Special P_Keys and 10.9.3 Partition Key Matching for >> a reference. >> >> Fixes: 84424a7fc793 ("IB/cma: Print warning on different inner and header P_Keys") >> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> >> --- >> drivers/infiniband/core/cma.c | 22 ++++++++++++++++++++-- >> 1 file changed, 20 insertions(+), 2 deletions(-) > > What is this trying to fix? The false warning messages. The wrong way though:-) > IMHO it is a bug on the sender side to send GMPs to use a pkey that > doesn't exactly match the data path pkey. The active connector calls ib_addr_get_pkey(). This function extracts the pkey from byte 8/9 in the device's bcast address. However, RFC 4391 explicitly states: 4.1. Broadcast-GID Parameters The broadcast-GID is set up with the following attributes: 1. P_Key A "Full Membership" P_Key (high-order bit is set to 1) MUST be used so that all members may communicate with one another. In other words, ib_addr_get_pkey() will sometimes wrongly return the full-member version of the partition, when the port is given only the limited member. Let me do some post-coffee, pre-lunch work tomorrow to come up with a solution, aka ib_find_cached_pkey() followed by an ib_get_cached_pkey()? Thxs, Håkon
On Mon, May 10, 2021 at 06:52:54PM +0000, Haakon Bugge wrote: > > > > On 10 May 2021, at 19:04, Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > On Wed, May 05, 2021 at 02:54:01PM +0200, Håkon Bugge wrote: > >> There are three conditions that must be fulfilled in order to consider > >> a partition match. Those are: > >> > >> 1. Both P_Keys must valid > >> 2. At least one must be a full member > >> 3. The partitions (lower 15 bits) must match > >> > >> In system employing both limited and full membership ports, we see > >> these false warning messages: > >> > >> RDMA CMA: got different BTH P_Key (0x2a00) and primary path P_Key (0xaa00) > >> RDMA CMA: in the future this may cause the request to be dropped > >> > >> even though the partition is the same. > >> > >> See IBTA 10.9.1.2 Special P_Keys and 10.9.3 Partition Key Matching for > >> a reference. > >> > >> Fixes: 84424a7fc793 ("IB/cma: Print warning on different inner and header P_Keys") > >> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> > >> drivers/infiniband/core/cma.c | 22 ++++++++++++++++++++-- > >> 1 file changed, 20 insertions(+), 2 deletions(-) > > > > What is this trying to fix? > > The false warning messages. The wrong way though:-) > > > IMHO it is a bug on the sender side to send GMPs to use a pkey that > > doesn't exactly match the data path pkey. > > The active connector calls ib_addr_get_pkey(). This function > extracts the pkey from byte 8/9 in the device's bcast > address. However, RFC 4391 explicitly states: pkeys in CM come only from path records that the SM returns, the above should only be used to feed into a path record query which could then return back a limited pkey. Everything thereafter should use the SM's version of the pkey. Jason
> On 10 May 2021, at 21:12, Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Mon, May 10, 2021 at 06:52:54PM +0000, Haakon Bugge wrote: >> >> >>> On 10 May 2021, at 19:04, Jason Gunthorpe <jgg@nvidia.com> wrote: >>> >>> On Wed, May 05, 2021 at 02:54:01PM +0200, Håkon Bugge wrote: >>>> There are three conditions that must be fulfilled in order to consider >>>> a partition match. Those are: >>>> >>>> 1. Both P_Keys must valid >>>> 2. At least one must be a full member >>>> 3. The partitions (lower 15 bits) must match >>>> >>>> In system employing both limited and full membership ports, we see >>>> these false warning messages: >>>> >>>> RDMA CMA: got different BTH P_Key (0x2a00) and primary path P_Key (0xaa00) >>>> RDMA CMA: in the future this may cause the request to be dropped >>>> >>>> even though the partition is the same. >>>> >>>> See IBTA 10.9.1.2 Special P_Keys and 10.9.3 Partition Key Matching for >>>> a reference. >>>> >>>> Fixes: 84424a7fc793 ("IB/cma: Print warning on different inner and header P_Keys") >>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> >>>> drivers/infiniband/core/cma.c | 22 ++++++++++++++++++++-- >>>> 1 file changed, 20 insertions(+), 2 deletions(-) >>> >>> What is this trying to fix? >> >> The false warning messages. The wrong way though:-) >> >>> IMHO it is a bug on the sender side to send GMPs to use a pkey that >>> doesn't exactly match the data path pkey. >> >> The active connector calls ib_addr_get_pkey(). This function >> extracts the pkey from byte 8/9 in the device's bcast >> address. However, RFC 4391 explicitly states: > > pkeys in CM come only from path records that the SM returns, the above > should only be used to feed into a path record query which could then > return back a limited pkey. > > Everything thereafter should use the SM's version of the pkey. Revisiting this. I think I mis-interpreted the scenario that led to the P_Key mismatch messages. The CM retrieves the pkey_index that matched the P_Key in the BTH (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get the P_Key value of the particular pkey_index. Assume a full-member sends a REQ. In that case, both P_Keys (BTH and primary path_rec) are full. Further, assume the recipient is only a limited member. Since full and limited members of the same partition are eligible to communicate, the P_Key retrieved by cm_get_bth_pkey() will be the limited one. The CMA will then give a warning message, because the P_Key in the primary path and the one incorrectly assumed to be in the BTH, doesn't match. The point is that cm_get_bth_pkey() may return the limited member, even though the packet on the wire had the full-member partition in it's BTH P_Key. So, I think my first commit here isn't that bad after all :-) Thxs, Håkon
On Tue, Jun 29, 2021 at 01:45:35PM +0000, Haakon Bugge wrote: > >>> IMHO it is a bug on the sender side to send GMPs to use a pkey that > >>> doesn't exactly match the data path pkey. > >> > >> The active connector calls ib_addr_get_pkey(). This function > >> extracts the pkey from byte 8/9 in the device's bcast > >> address. However, RFC 4391 explicitly states: > > > > pkeys in CM come only from path records that the SM returns, the above > > should only be used to feed into a path record query which could then > > return back a limited pkey. > > > > Everything thereafter should use the SM's version of the pkey. > > Revisiting this. I think I mis-interpreted the scenario that led to > the P_Key mismatch messages. > > The CM retrieves the pkey_index that matched the P_Key in the BTH > (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get > the P_Key value of the particular pkey_index. > > Assume a full-member sends a REQ. In that case, both P_Keys (BTH and > primary path_rec) are full. Further, assume the recipient is only a > limited member. Since full and limited members of the same partition > are eligible to communicate, the P_Key retrieved by > cm_get_bth_pkey() will be the limited one. It is incorrect for the issuer of the REQ to put a full pkey in the REQ message when the target is a limited member. The CM model in IB has the target fully under the control of the initiator, and it is up to the initiator to ask the SM how the target should generate its return traffic. The SM is reponsible to say that limited->full communication is done using the limited pkey. The initiator is reponsible to place that limited pkey in the REQ message. Somewhere in your system this isn't happening properly, and it is a bug that the CM is correctly identifying. Jason
> On 5 Jul 2021, at 18:26, Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Tue, Jun 29, 2021 at 01:45:35PM +0000, Haakon Bugge wrote: > >>>>> IMHO it is a bug on the sender side to send GMPs to use a pkey that >>>>> doesn't exactly match the data path pkey. >>>> >>>> The active connector calls ib_addr_get_pkey(). This function >>>> extracts the pkey from byte 8/9 in the device's bcast >>>> address. However, RFC 4391 explicitly states: >>> >>> pkeys in CM come only from path records that the SM returns, the above >>> should only be used to feed into a path record query which could then >>> return back a limited pkey. >>> >>> Everything thereafter should use the SM's version of the pkey. >> >> Revisiting this. I think I mis-interpreted the scenario that led to >> the P_Key mismatch messages. >> >> The CM retrieves the pkey_index that matched the P_Key in the BTH >> (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get >> the P_Key value of the particular pkey_index. >> >> Assume a full-member sends a REQ. In that case, both P_Keys (BTH and >> primary path_rec) are full. Further, assume the recipient is only a >> limited member. Since full and limited members of the same partition >> are eligible to communicate, the P_Key retrieved by >> cm_get_bth_pkey() will be the limited one. > > It is incorrect for the issuer of the REQ to put a full pkey in the > REQ message when the target is a limited member. Sorry, I mis-interpreted the spec. I though the PKey in the Path record should be that of the initiator, not the target's. OK. Will come up with a fix. Thxs, Håkon > > The CM model in IB has the target fully under the control of the > initiator, and it is up to the initiator to ask the SM how the target > should generate its return traffic. The SM is reponsible to say that > limited->full communication is done using the limited pkey. > > The initiator is reponsible to place that limited pkey in the REQ > message. > > Somewhere in your system this isn't happening properly, and it is a > bug that the CM is correctly identifying. > > Jason
> On 5 Jul 2021, at 18:59, Haakon Bugge <haakon.bugge@oracle.com> wrote: > > > >> On 5 Jul 2021, at 18:26, Jason Gunthorpe <jgg@nvidia.com> wrote: >> >> On Tue, Jun 29, 2021 at 01:45:35PM +0000, Haakon Bugge wrote: >> >>>>>> IMHO it is a bug on the sender side to send GMPs to use a pkey that >>>>>> doesn't exactly match the data path pkey. >>>>> >>>>> The active connector calls ib_addr_get_pkey(). This function >>>>> extracts the pkey from byte 8/9 in the device's bcast >>>>> address. However, RFC 4391 explicitly states: >>>> >>>> pkeys in CM come only from path records that the SM returns, the above >>>> should only be used to feed into a path record query which could then >>>> return back a limited pkey. >>>> >>>> Everything thereafter should use the SM's version of the pkey. >>> >>> Revisiting this. I think I mis-interpreted the scenario that led to >>> the P_Key mismatch messages. >>> >>> The CM retrieves the pkey_index that matched the P_Key in the BTH >>> (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get >>> the P_Key value of the particular pkey_index. >>> >>> Assume a full-member sends a REQ. In that case, both P_Keys (BTH and >>> primary path_rec) are full. Further, assume the recipient is only a >>> limited member. Since full and limited members of the same partition >>> are eligible to communicate, the P_Key retrieved by >>> cm_get_bth_pkey() will be the limited one. >> >> It is incorrect for the issuer of the REQ to put a full pkey in the >> REQ message when the target is a limited member. > > Sorry, I mis-interpreted the spec. I though the PKey in the Path record should be that of the initiator, not the target's. OK. Will come up with a fix. On the systems I have access to (running Oracle flavour OpenSM in our NM2 switches), the behaviour is exactly the opposite of what you say. So, if we (Oracle) are the only ones seeing this warning (I repeat it here to catch some interest): RDMA CMA: got different BTH P_Key (0x2a00) and primary path P_Key (0xaa00) RDMA CMA: in the future this may cause the request to be dropped then there is no fix in the RDMA stack. It must be fixed in Oracle's OpenSM. The only thing I can do here is to straighten up the warning message, which is imprecise. What about: "the P_Key table entry (0x1234) matching incoming BTH.P_Key differs from primary path P_Key (0x9234)" My literal interpretation of the old warning message confused me! Thxs, Håkon > > > Thxs, Håkon > >> >> The CM model in IB has the target fully under the control of the >> initiator, and it is up to the initiator to ask the SM how the target >> should generate its return traffic. The SM is reponsible to say that >> limited->full communication is done using the limited pkey. >> >> The initiator is reponsible to place that limited pkey in the REQ >> message. >> >> Somewhere in your system this isn't happening properly, and it is a >> bug that the CM is correctly identifying. >> >> Jason
On Thu, Jul 08, 2021 at 03:59:25PM +0000, Haakon Bugge wrote: > > > > On 5 Jul 2021, at 18:59, Haakon Bugge <haakon.bugge@oracle.com> wrote: > > > > > > > >> On 5 Jul 2021, at 18:26, Jason Gunthorpe <jgg@nvidia.com> wrote: > >> > >> On Tue, Jun 29, 2021 at 01:45:35PM +0000, Haakon Bugge wrote: > >> > >>>>>> IMHO it is a bug on the sender side to send GMPs to use a pkey that > >>>>>> doesn't exactly match the data path pkey. > >>>>> > >>>>> The active connector calls ib_addr_get_pkey(). This function > >>>>> extracts the pkey from byte 8/9 in the device's bcast > >>>>> address. However, RFC 4391 explicitly states: > >>>> > >>>> pkeys in CM come only from path records that the SM returns, the above > >>>> should only be used to feed into a path record query which could then > >>>> return back a limited pkey. > >>>> > >>>> Everything thereafter should use the SM's version of the pkey. > >>> > >>> Revisiting this. I think I mis-interpreted the scenario that led to > >>> the P_Key mismatch messages. > >>> > >>> The CM retrieves the pkey_index that matched the P_Key in the BTH > >>> (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get > >>> the P_Key value of the particular pkey_index. > >>> > >>> Assume a full-member sends a REQ. In that case, both P_Keys (BTH and > >>> primary path_rec) are full. Further, assume the recipient is only a > >>> limited member. Since full and limited members of the same partition > >>> are eligible to communicate, the P_Key retrieved by > >>> cm_get_bth_pkey() will be the limited one. > >> > >> It is incorrect for the issuer of the REQ to put a full pkey in the > >> REQ message when the target is a limited member. > > > > Sorry, I mis-interpreted the spec. I though the PKey in the Path record should be that of the initiator, not the target's. OK. Will come up with a fix. > > On the systems I have access to (running Oracle flavour OpenSM in > our NM2 switches), the behaviour is exactly the opposite of what you > say. Check with saquery what is happening, if you request a reversible path from the CM target (limited pkey) to the CM client (full) you should get the limited pkey or the SM is broken. If the SM is working then probably something in the stack is using a reversed src/dest when doing the PR query. It is not intuitive but the PR query should have SGID as the CM Target even though it is running on the CM Client. This is because the REQ is supposed to contain a path that is relative to the target. Everything will be the same except for this small detail about full/limited pkeys. The client can figure out what to do with its own pkey table locally. > "the P_Key table entry (0x1234) matching incoming BTH.P_Key differs from primary path P_Key (0x9234)" "The REQ contains a PKey (0x1234) that is not found in this device's PKey table. Using alternative limited Pkey (0x9234) instead. This is a client bug" Jason
> On 8 Jul 2021, at 20:52, Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Thu, Jul 08, 2021 at 03:59:25PM +0000, Haakon Bugge wrote: >> >> >>> On 5 Jul 2021, at 18:59, Haakon Bugge <haakon.bugge@oracle.com> wrote: >>> >>> >>> >>>> On 5 Jul 2021, at 18:26, Jason Gunthorpe <jgg@nvidia.com> wrote: >>>> >>>> On Tue, Jun 29, 2021 at 01:45:35PM +0000, Haakon Bugge wrote: >>>> >>>>>>>> IMHO it is a bug on the sender side to send GMPs to use a pkey that >>>>>>>> doesn't exactly match the data path pkey. >>>>>>> >>>>>>> The active connector calls ib_addr_get_pkey(). This function >>>>>>> extracts the pkey from byte 8/9 in the device's bcast >>>>>>> address. However, RFC 4391 explicitly states: >>>>>> >>>>>> pkeys in CM come only from path records that the SM returns, the above >>>>>> should only be used to feed into a path record query which could then >>>>>> return back a limited pkey. >>>>>> >>>>>> Everything thereafter should use the SM's version of the pkey. >>>>> >>>>> Revisiting this. I think I mis-interpreted the scenario that led to >>>>> the P_Key mismatch messages. >>>>> >>>>> The CM retrieves the pkey_index that matched the P_Key in the BTH >>>>> (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get >>>>> the P_Key value of the particular pkey_index. >>>>> >>>>> Assume a full-member sends a REQ. In that case, both P_Keys (BTH and >>>>> primary path_rec) are full. Further, assume the recipient is only a >>>>> limited member. Since full and limited members of the same partition >>>>> are eligible to communicate, the P_Key retrieved by >>>>> cm_get_bth_pkey() will be the limited one. >>>> >>>> It is incorrect for the issuer of the REQ to put a full pkey in the >>>> REQ message when the target is a limited member. >>> >>> Sorry, I mis-interpreted the spec. I though the PKey in the Path record should be that of the initiator, not the target's. OK. Will come up with a fix. >> >> On the systems I have access to (running Oracle flavour OpenSM in >> our NM2 switches), the behaviour is exactly the opposite of what you >> say. > > Check with saquery what is happening, if you request a reversible path > from the CM target (limited pkey) to the CM client (full) you should > get the limited pkey or the SM is broken. > > If the SM is working then probably something in the stack is using a > reversed src/dest when doing the PR query. > > It is not intuitive but the PR query should have SGID as the CM Target > even though it is running on the CM Client. That is not how it is today. And because of that, all accesses to the PR assume the d{gid,lid} is the remote peer. To fix this, I have to swap dgid/sgid and ib.dlid/ib.slid all over to get this working. That is pervasive. E.g., even includes ipoib. Let me know if that is what you want. Thxs, Håkon > > This is because the REQ is supposed to contain a path that is relative > to the target. > > Everything will be the same except for this small detail about > full/limited pkeys. > > The client can figure out what to do with its own pkey table locally. > >> "the P_Key table entry (0x1234) matching incoming BTH.P_Key differs from primary path P_Key (0x9234)" > > "The REQ contains a PKey (0x1234) that is not found in this device's > PKey table. Using alternative limited Pkey (0x9234) instead. This is a > client bug" > > Jason
On Fri, Jul 09, 2021 at 04:45:21PM +0000, Haakon Bugge wrote: > > > > On 8 Jul 2021, at 20:52, Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > On Thu, Jul 08, 2021 at 03:59:25PM +0000, Haakon Bugge wrote: > >> > >> > >>> On 5 Jul 2021, at 18:59, Haakon Bugge <haakon.bugge@oracle.com> wrote: > >>> > >>> > >>> > >>>> On 5 Jul 2021, at 18:26, Jason Gunthorpe <jgg@nvidia.com> wrote: > >>>> > >>>> On Tue, Jun 29, 2021 at 01:45:35PM +0000, Haakon Bugge wrote: > >>>> > >>>>>>>> IMHO it is a bug on the sender side to send GMPs to use a pkey that > >>>>>>>> doesn't exactly match the data path pkey. > >>>>>>> > >>>>>>> The active connector calls ib_addr_get_pkey(). This function > >>>>>>> extracts the pkey from byte 8/9 in the device's bcast > >>>>>>> address. However, RFC 4391 explicitly states: > >>>>>> > >>>>>> pkeys in CM come only from path records that the SM returns, the above > >>>>>> should only be used to feed into a path record query which could then > >>>>>> return back a limited pkey. > >>>>>> > >>>>>> Everything thereafter should use the SM's version of the pkey. > >>>>> > >>>>> Revisiting this. I think I mis-interpreted the scenario that led to > >>>>> the P_Key mismatch messages. > >>>>> > >>>>> The CM retrieves the pkey_index that matched the P_Key in the BTH > >>>>> (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get > >>>>> the P_Key value of the particular pkey_index. > >>>>> > >>>>> Assume a full-member sends a REQ. In that case, both P_Keys (BTH and > >>>>> primary path_rec) are full. Further, assume the recipient is only a > >>>>> limited member. Since full and limited members of the same partition > >>>>> are eligible to communicate, the P_Key retrieved by > >>>>> cm_get_bth_pkey() will be the limited one. > >>>> > >>>> It is incorrect for the issuer of the REQ to put a full pkey in the > >>>> REQ message when the target is a limited member. > >>> > >>> Sorry, I mis-interpreted the spec. I though the PKey in the Path record should be that of the initiator, not the target's. OK. Will come up with a fix. > >> > >> On the systems I have access to (running Oracle flavour OpenSM in > >> our NM2 switches), the behaviour is exactly the opposite of what you > >> say. > > > > Check with saquery what is happening, if you request a reversible path > > from the CM target (limited pkey) to the CM client (full) you should > > get the limited pkey or the SM is broken. > > > > If the SM is working then probably something in the stack is using a > > reversed src/dest when doing the PR query. > > > > It is not intuitive but the PR query should have SGID as the CM Target > > even though it is running on the CM Client. > > That is not how it is today. And because of that, all accesses to > the PR assume the d{gid,lid} is the remote peer. To fix this, I have > to swap dgid/sgid and ib.dlid/ib.slid all over to get this > working. That is pervasive. E.g., even includes ipoib. Let me know > if that is what you want. It is only things that use the paths to generate CM REQ messages, and yes it is the right thing to do. Jason
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 2b9ffc2..f5bcf7d 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1368,6 +1368,24 @@ static int cma_save_net_info(struct sockaddr *src_addr, return cma_save_ip_info(src_addr, dst_addr, ib_event, service_id); } +/* + * If at least one of the pkeys is a full member, none of them are + * invalid, and the partitions (lower 15 bits) are equal, we have a + * match. + * + * See IBTA 10.9.1.2 Special P_Keys and 10.9.3 Partition Key Matching + */ + +static bool partition_match(u16 pkey_a, u16 pkey_b) +{ + const u16 fmb = 0x8000; /* Full Member Bit */ + const bool valid_pkeys = (pkey_a & ~fmb) && (pkey_b & ~fmb); + const bool one_full = (pkey_a | pkey_b) & fmb; + const bool same_partition = (pkey_a | fmb) == (pkey_b | fmb); + + return valid_pkeys && one_full && same_partition; +} + static int cma_save_req_info(const struct ib_cm_event *ib_event, struct cma_req_info *req) { @@ -1385,7 +1403,7 @@ static int cma_save_req_info(const struct ib_cm_event *ib_event, req->has_gid = true; req->service_id = req_param->primary_path->service_id; req->pkey = be16_to_cpu(req_param->primary_path->pkey); - if (req->pkey != req_param->bth_pkey) + if (!partition_match(req->pkey, req_param->bth_pkey)) pr_warn_ratelimited("RDMA CMA: got different BTH P_Key (0x%x) and primary path P_Key (0x%x)\n" "RDMA CMA: in the future this may cause the request to be dropped\n", req_param->bth_pkey, req->pkey); @@ -1396,7 +1414,7 @@ static int cma_save_req_info(const struct ib_cm_event *ib_event, req->has_gid = false; req->service_id = sidr_param->service_id; req->pkey = sidr_param->pkey; - if (req->pkey != sidr_param->bth_pkey) + if (!partition_match(req->pkey, sidr_param->bth_pkey)) pr_warn_ratelimited("RDMA CMA: got different BTH P_Key (0x%x) and SIDR request payload P_Key (0x%x)\n" "RDMA CMA: in the future this may cause the request to be dropped\n", sidr_param->bth_pkey, req->pkey);
There are three conditions that must be fulfilled in order to consider a partition match. Those are: 1. Both P_Keys must valid 2. At least one must be a full member 3. The partitions (lower 15 bits) must match In system employing both limited and full membership ports, we see these false warning messages: RDMA CMA: got different BTH P_Key (0x2a00) and primary path P_Key (0xaa00) RDMA CMA: in the future this may cause the request to be dropped even though the partition is the same. See IBTA 10.9.1.2 Special P_Keys and 10.9.3 Partition Key Matching for a reference. Fixes: 84424a7fc793 ("IB/cma: Print warning on different inner and header P_Keys") Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> --- drivers/infiniband/core/cma.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)