diff mbox

[1/1] Ibacm: default pkey for partitioned fabrics

Message ID 1449595982-20781-1-git-send-email-kaike.wan@intel.com (mailing list archive)
State Superseded
Headers show

Commit Message

Wan, Kaike Dec. 8, 2015, 5:33 p.m. UTC
From: Kaike Wan <kaike.wan@intel.com>

In an insecure IB fabric, the default pkey in a port is 0xffff, where each
node is allowed to talk to any other node in the fabric, including the SA
node. However, in a secure fabric, to limit member access, not all nodes
can have the full-member default pkey 0xffff. A typical configuration is
to let SA node have pkey 0xffff while all other nodes have pkey 0x7fff; in
addition, each node can be assigned some other full-member pkeys, such as
0x8001 and 0x8002, so that it can be assigned to different partitions.
In this case, each node can access SA, and yet limits its other access to
only those nodes in its assigned partitions. In such a secure fabric,
however, ibacm will not work by interpreting "default" in its default
address file as 0xffff.

To solve the problem, this patch introduces the following priority to
interpret default pkey:
1. Find the first non-management full-member pkey;
2. If it fails, find pkey 0xffff;
3. If pkey 0xffff is not available, use the first pkey.
This approach will work in both securely and insecurely partitions
fabrics.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
---
 src/acm.c |   52 +++++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 39 insertions(+), 13 deletions(-)

Comments

Jason Gunthorpe Dec. 8, 2015, 9:21 p.m. UTC | #1
On Tue, Dec 08, 2015 at 12:33:02PM -0500, kaike.wan@intel.com wrote:
> From: Kaike Wan <kaike.wan@intel.com>
> 
> In an insecure IB fabric, the default pkey in a port is 0xffff, where each
> node is allowed to talk to any other node in the fabric, including the SA
> node. However, in a secure fabric, to limit member access, not all nodes
> can have the full-member default pkey 0xffff. A typical configuration is
> to let SA node have pkey 0xffff while all other nodes have pkey 0x7fff; in
> addition, each node can be assigned some other full-member pkeys, such as
> 0x8001 and 0x8002, so that it can be assigned to different partitions.
> In this case, each node can access SA, and yet limits its other access to
> only those nodes in its assigned partitions. In such a secure fabric,
> however, ibacm will not work by interpreting "default" in its default
> address file as 0xffff.

ipoib always uses the 0 pkey index to create the default ipoib
interface. (see eg, update_parent_pkey)

When operating securely the SA should place the pkey for default ipoib
operation in pkey index 0, and place 0x7FFF in another index. I run
alot of networks exactly like this and it works very well.

This ensures that ipoib works out of the box without additional
configuration.

> +	/* Determine the default pkey index for SA access first.
> +	 *   Order of preference: 0xffff, 0x7fff, first pkey.

No, IBA says that only the default pkey should be used to talk to the
SA, every port needs 0x7FFF or the full mebership version. Do not
search for the first pkey.

> +	 * Determine the default pkey for parsing address file as well.
> +	 *   order of preference: first full-member non-management pkey,
> +	 *   0xffff, first pkey.
> +	 */

This really should just be the 0 index pkey, which exactly matches how
IPoIB determines the default pkey, which is what matters when talking
rdmacm..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean Dec. 9, 2015, 12:26 a.m. UTC | #2
> > +	 * Determine the default pkey for parsing address file as well.
> > +	 *   order of preference: first full-member non-management pkey,
> > +	 *   0xffff, first pkey.
> > +	 */
> 
> This really should just be the 0 index pkey, which exactly matches how
> IPoIB determines the default pkey, which is what matters when talking
> rdmacm..

Ibacm currently hard-codes the 'default' pkey to 0xffff for ibacm <-> ibacm communication.  If there's no disagreement to switching to pkey[0], I'm fine with that.  I did not realize that ipoib uses this same default.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock Dec. 9, 2015, 12:50 p.m. UTC | #3
On 12/8/2015 12:33 PM, kaike.wan@intel.com wrote:
> From: Kaike Wan <kaike.wan@intel.com>
> 
> In an insecure IB fabric, the default pkey in a port is 0xffff, where each
> node is allowed to talk to any other node in the fabric, including the SA
> node. However, in a secure fabric, to limit member access, not all nodes
> can have the full-member default pkey 0xffff. A typical configuration is
> to let SA node have pkey 0xffff while all other nodes have pkey 0x7fff; in
> addition, each node can be assigned some other full-member pkeys, such as
> 0x8001 and 0x8002, so that it can be assigned to different partitions.
> In this case, each node can access SA, and yet limits its other access to
> only those nodes in its assigned partitions. In such a secure fabric,
> however, ibacm will not work by interpreting "default" in its default
> address file as 0xffff.
> 
> To solve the problem, this patch introduces the following priority to
> interpret default pkey:
> 1. Find the first non-management full-member pkey;
> 2. If it fails, find pkey 0xffff;
> 3. If pkey 0xffff is not available, use the first pkey.
> This approach will work in both securely and insecurely partitions
> fabrics.

Shouldn't the pkey to be used for such interACM communication be
configured ? First full member pkey is non-deterministic. Isn't it the
case that it may not include proper set of ACMs to communicate with ?

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock Dec. 9, 2015, 12:51 p.m. UTC | #4
On 12/8/2015 7:26 PM, Hefty, Sean wrote:
>>> +	 * Determine the default pkey for parsing address file as well.
>>> +	 *   order of preference: first full-member non-management pkey,
>>> +	 *   0xffff, first pkey.
>>> +	 */
>>
>> This really should just be the 0 index pkey, which exactly matches how
>> IPoIB determines the default pkey, which is what matters when talking
>> rdmacm..
> 
> Ibacm currently hard-codes the 'default' pkey to 0xffff for ibacm <-> ibacm communication.   
> If there's no disagreement to switching to pkey[0], I'm fine with that.  

There's no IBA requirement that 0xffff pkey is always in index 0. It's
only a requirement on device bootup with non volatile storage per C10-123.

Furthermore, there's no requirement that 0xffff pkey is present in the
table. It may be that there's only 0x7fff pkey.

> I did not realize that ipoib uses this same default.

I think this is a problem in limiting partition use and should be
changed. Rather than assuming index 0, pkey table should be searched for
this pkey.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock Dec. 9, 2015, 12:51 p.m. UTC | #5
On 12/8/2015 4:21 PM, Jason Gunthorpe wrote:
> On Tue, Dec 08, 2015 at 12:33:02PM -0500, kaike.wan@intel.com wrote:
>> From: Kaike Wan <kaike.wan@intel.com>
>>
>> In an insecure IB fabric, the default pkey in a port is 0xffff, where each
>> node is allowed to talk to any other node in the fabric, including the SA
>> node. However, in a secure fabric, to limit member access, not all nodes
>> can have the full-member default pkey 0xffff. A typical configuration is
>> to let SA node have pkey 0xffff while all other nodes have pkey 0x7fff; in
>> addition, each node can be assigned some other full-member pkeys, such as
>> 0x8001 and 0x8002, so that it can be assigned to different partitions.
>> In this case, each node can access SA, and yet limits its other access to
>> only those nodes in its assigned partitions. In such a secure fabric,
>> however, ibacm will not work by interpreting "default" in its default
>> address file as 0xffff.
> 
> ipoib always uses the 0 pkey index to create the default ipoib
> interface. (see eg, update_parent_pkey)

This is beyond IBA spec and is currently a linux convention for IPoIB.
IMO it should be changed to search for this pkey rather than assume it's
in index 0. There's no requirement that it be in index 0 other than at
bootup with non volatile storage (C10-123).

> When operating securely the SA should place the pkey for default ipoib
> operation in pkey index 0, and place 0x7FFF in another index. I run
> alot of networks exactly like this and it works very well.

Yes, it can run that way but more secure is without the full default
pkey. When full default pkey is in every port, the rest of the
partitioning doesn't really matter...

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wan, Kaike Dec. 9, 2015, 1:07 p.m. UTC | #6
> From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com]
> Sent: Tuesday, December 08, 2015 4:22 PM
> To: Wan, Kaike
> Cc: Hefty, Sean; linux-rdma@vger.kernel.org
> Subject: Re: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics
> 
> On Tue, Dec 08, 2015 at 12:33:02PM -0500, kaike.wan@intel.com wrote:
> > From: Kaike Wan <kaike.wan@intel.com>
> >
> > In an insecure IB fabric, the default pkey in a port is 0xffff, where
> > each node is allowed to talk to any other node in the fabric,
> > including the SA node. However, in a secure fabric, to limit member
> > access, not all nodes can have the full-member default pkey 0xffff. A
> > typical configuration is to let SA node have pkey 0xffff while all
> > other nodes have pkey 0x7fff; in addition, each node can be assigned
> > some other full-member pkeys, such as
> > 0x8001 and 0x8002, so that it can be assigned to different partitions.
> > In this case, each node can access SA, and yet limits its other access
> > to only those nodes in its assigned partitions. In such a secure
> > fabric, however, ibacm will not work by interpreting "default" in its
> > default address file as 0xffff.
> 
> ipoib always uses the 0 pkey index to create the default ipoib interface. (see
> eg, update_parent_pkey)
> 
> When operating securely the SA should place the pkey for default ipoib
> operation in pkey index 0, and place 0x7FFF in another index. I run alot of
> networks exactly like this and it works very well.

In such a configuration, this patch will enable ibacm to use pkey 0 for address resolution through multicast while use 0x7fff for SA access, exactly matching what ipoib is currently doing. 

> 
> This ensures that ipoib works out of the box without additional configuration.

This is exactly the purpose of this patch.

> 
> > +	/* Determine the default pkey index for SA access first.
> > +	 *   Order of preference: 0xffff, 0x7fff, first pkey.
> 
> No, IBA says that only the default pkey should be used to talk to the SA,
> every port needs 0x7FFF or the full mebership version. Do not search for the
> first pkey.

We use the first pkey only if there is neither 0x7fff nor 0xffff in this port. If the port is in compliance with IB Spec, then we will be using either 0xffff or 0x7fff for SA access.

> 
> > +	 * Determine the default pkey for parsing address file as well.
> > +	 *   order of preference: first full-member non-management pkey,
> > +	 *   0xffff, first pkey.
> > +	 */
> 
> This really should just be the 0 index pkey, which exactly matches how IPoIB
> determines the default pkey, which is what matters when talking rdmacm..

It is true in most default configurations. However, since ibacm will use the default pkey for multicast, we want to make sure that it will not use a limited-member pkey to create/join a multicast group (practically of little use in this case) if such a pkey is placed at index 0.

> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wan, Kaike Dec. 9, 2015, 1:24 p.m. UTC | #7
> From: Hal Rosenstock [mailto:hal@dev.mellanox.co.il]

> Sent: Wednesday, December 09, 2015 7:50 AM

> To: Wan, Kaike; Hefty, Sean

> Cc: linux-rdma@vger.kernel.org

> Subject: Re: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics

> 

> On 12/8/2015 12:33 PM, kaike.wan@intel.com wrote:

> > From: Kaike Wan <kaike.wan@intel.com>

> >

> > In an insecure IB fabric, the default pkey in a port is 0xffff, where

> > each node is allowed to talk to any other node in the fabric,

> > including the SA node. However, in a secure fabric, to limit member

> > access, not all nodes can have the full-member default pkey 0xffff. A

> > typical configuration is to let SA node have pkey 0xffff while all

> > other nodes have pkey 0x7fff; in addition, each node can be assigned

> > some other full-member pkeys, such as

> > 0x8001 and 0x8002, so that it can be assigned to different partitions.

> > In this case, each node can access SA, and yet limits its other access

> > to only those nodes in its assigned partitions. In such a secure

> > fabric, however, ibacm will not work by interpreting "default" in its

> > default address file as 0xffff.

> >

> > To solve the problem, this patch introduces the following priority to

> > interpret default pkey:

> > 1. Find the first non-management full-member pkey; 2. If it fails,

> > find pkey 0xffff; 3. If pkey 0xffff is not available, use the first

> > pkey.

> > This approach will work in both securely and insecurely partitions

> > fabrics.

> 

> Shouldn't the pkey to be used for such interACM communication be

> configured ?

Yes. The purpose of this patch is only to make a secure system work out of box (default configuration). When a specific pkey is given in the ibacm_addr.cfg file, there will be no need to interpret the "default" pkey.

> First full member pkey is non-deterministic. Isn't it the case that

> it may not include proper set of ACMs to communicate with ?


This is only for the default configuration, where a reasonable assumption is that members of an intended partition (group of ports) will all have the same full-member pkey. One could argue that a port could have two or more full-member non-management pkeys because it is assigned to multiple partitions. In this case, the port will only join only one multicast group, not all the multicast groups. The reply is that the default ibacm_addr.cfg have only one endpoint with pkey "default" anyway. To make it really work, one needs to edit ibacm_addr.cfg.

Kaike
Hal Rosenstock Dec. 9, 2015, 1:45 p.m. UTC | #8
On 12/9/2015 8:24 AM, Wan, Kaike wrote:
>> From: Hal Rosenstock [mailto:hal@dev.mellanox.co.il]
>> Sent: Wednesday, December 09, 2015 7:50 AM
>> To: Wan, Kaike; Hefty, Sean
>> Cc: linux-rdma@vger.kernel.org
>> Subject: Re: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics
>>
>> On 12/8/2015 12:33 PM, kaike.wan@intel.com wrote:
>>> From: Kaike Wan <kaike.wan@intel.com>
>>>
>>> In an insecure IB fabric, the default pkey in a port is 0xffff, where
>>> each node is allowed to talk to any other node in the fabric,
>>> including the SA node. However, in a secure fabric, to limit member
>>> access, not all nodes can have the full-member default pkey 0xffff. A
>>> typical configuration is to let SA node have pkey 0xffff while all
>>> other nodes have pkey 0x7fff; in addition, each node can be assigned
>>> some other full-member pkeys, such as
>>> 0x8001 and 0x8002, so that it can be assigned to different partitions.
>>> In this case, each node can access SA, and yet limits its other access
>>> to only those nodes in its assigned partitions. In such a secure
>>> fabric, however, ibacm will not work by interpreting "default" in its
>>> default address file as 0xffff.
>>>
>>> To solve the problem, this patch introduces the following priority to
>>> interpret default pkey:
>>> 1. Find the first non-management full-member pkey; 2. If it fails,
>>> find pkey 0xffff; 3. If pkey 0xffff is not available, use the first
>>> pkey.
>>> This approach will work in both securely and insecurely partitions
>>> fabrics.
>>
>> Shouldn't the pkey to be used for such interACM communication be
>> configured ?
> Yes. The purpose of this patch is only to make a secure system work out of box (default configuration). When a specific pkey is given in the ibacm_addr.cfg file, there will be no need to interpret the "default" pkey.
> 
>> First full member pkey is non-deterministic. Isn't it the case that
>> it may not include proper set of ACMs to communicate with ?
> 
> This is only for the default configuration, where a reasonable assumption is that members of an intended 
> partition (group of ports) will all have the same full-member pkey.

Yes, but it may not be first (lowest index) pkey in table of different
ports.

> One could argue that a port could have two or more full-member non-management pkeys because
> it is assigned to multiple partitions. 

Yes, that's a perfectly valid configuration.

> In this case, the port will only join only one multicast group, not all the multicast groups. The reply is 
> that the default ibacm_addr.cfg have only one endpoint with pkey "default" anyway.

In this case, the non default partitions are not useful for ACM and all
ACMs need to share "default" partition.

> To make it really work, one needs to edit ibacm_addr.cfg.

It may work without config depending on a number of factors but can
cause issues to be debugged.

Only sure way is config :-(

-- Hal

> Kaike
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wan, Kaike Dec. 9, 2015, 1:55 p.m. UTC | #9
DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogSGFsIFJvc2Vuc3RvY2sg
W21haWx0bzpoYWxAZGV2Lm1lbGxhbm94LmNvLmlsXQ0KPiBTZW50OiBXZWRuZXNkYXksIERlY2Vt
YmVyIDA5LCAyMDE1IDg6NDYgQU0NCg0KPiA+Pj4gVG8gc29sdmUgdGhlIHByb2JsZW0sIHRoaXMg
cGF0Y2ggaW50cm9kdWNlcyB0aGUgZm9sbG93aW5nIHByaW9yaXR5DQo+ID4+PiB0byBpbnRlcnBy
ZXQgZGVmYXVsdCBwa2V5Og0KPiA+Pj4gMS4gRmluZCB0aGUgZmlyc3Qgbm9uLW1hbmFnZW1lbnQg
ZnVsbC1tZW1iZXIgcGtleTsgMi4gSWYgaXQgZmFpbHMsDQo+ID4+PiBmaW5kIHBrZXkgMHhmZmZm
OyAzLiBJZiBwa2V5IDB4ZmZmZiBpcyBub3QgYXZhaWxhYmxlLCB1c2UgdGhlIGZpcnN0DQo+ID4+
PiBwa2V5Lg0KPiA+Pj4gVGhpcyBhcHByb2FjaCB3aWxsIHdvcmsgaW4gYm90aCBzZWN1cmVseSBh
bmQgaW5zZWN1cmVseSBwYXJ0aXRpb25zDQo+ID4+PiBmYWJyaWNzLg0KPiA+Pg0KPiA+PiBTaG91
bGRuJ3QgdGhlIHBrZXkgdG8gYmUgdXNlZCBmb3Igc3VjaCBpbnRlckFDTSBjb21tdW5pY2F0aW9u
IGJlDQo+ID4+IGNvbmZpZ3VyZWQgPw0KPiA+IFllcy4gVGhlIHB1cnBvc2Ugb2YgdGhpcyBwYXRj
aCBpcyBvbmx5IHRvIG1ha2UgYSBzZWN1cmUgc3lzdGVtIHdvcmsgb3V0IG9mDQo+IGJveCAoZGVm
YXVsdCBjb25maWd1cmF0aW9uKS4gV2hlbiBhIHNwZWNpZmljIHBrZXkgaXMgZ2l2ZW4gaW4gdGhl
DQo+IGliYWNtX2FkZHIuY2ZnIGZpbGUsIHRoZXJlIHdpbGwgYmUgbm8gbmVlZCB0byBpbnRlcnBy
ZXQgdGhlICJkZWZhdWx0IiBwa2V5Lg0KPiA+DQo+ID4+IEZpcnN0IGZ1bGwgbWVtYmVyIHBrZXkg
aXMgbm9uLWRldGVybWluaXN0aWMuIElzbid0IGl0IHRoZSBjYXNlIHRoYXQNCj4gPj4gaXQgbWF5
IG5vdCBpbmNsdWRlIHByb3BlciBzZXQgb2YgQUNNcyB0byBjb21tdW5pY2F0ZSB3aXRoID8NCj4g
Pg0KPiA+IFRoaXMgaXMgb25seSBmb3IgdGhlIGRlZmF1bHQgY29uZmlndXJhdGlvbiwgd2hlcmUg
YSByZWFzb25hYmxlDQo+ID4gYXNzdW1wdGlvbiBpcyB0aGF0IG1lbWJlcnMgb2YgYW4gaW50ZW5k
ZWQgcGFydGl0aW9uIChncm91cCBvZiBwb3J0cykgd2lsbA0KPiBhbGwgaGF2ZSB0aGUgc2FtZSBm
dWxsLW1lbWJlciBwa2V5Lg0KPiANCj4gWWVzLCBidXQgaXQgbWF5IG5vdCBiZSBmaXJzdCAobG93
ZXN0IGluZGV4KSBwa2V5IGluIHRhYmxlIG9mIGRpZmZlcmVudCBwb3J0cy4NCj4gDQoNClRoaXMg
aXMgdGhlIGJlc3QgZWZmb3J0IGFuZCBpdCBzaG91bGQgd29yayBmb3IgbW9zdCBjb21tb24gY29u
ZmlndXJhdGlvbnMsIGJ1dCBtYXkgbm90IHdvcmsgZm9yIG1vcmUgY29tcGxpY2F0ZWQgY2FzZXMu
DQoNCkFueSBvdGhlciBzdWdnZXN0aW9ucz8NCg0KPiA+IFRvIG1ha2UgaXQgcmVhbGx5IHdvcmss
IG9uZSBuZWVkcyB0byBlZGl0IGliYWNtX2FkZHIuY2ZnLg0KPiANCj4gSXQgbWF5IHdvcmsgd2l0
aG91dCBjb25maWcgZGVwZW5kaW5nIG9uIGEgbnVtYmVyIG9mIGZhY3RvcnMgYnV0IGNhbiBjYXVz
ZQ0KPiBpc3N1ZXMgdG8gYmUgZGVidWdnZWQuDQo+IA0KPiBPbmx5IHN1cmUgd2F5IGlzIGNvbmZp
ZyA6LSgNCg0KRXhhY3RseS4NCg0KS2Fpa2UNCg0K
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock Dec. 9, 2015, 2:06 p.m. UTC | #10
On 12/9/2015 8:55 AM, Wan, Kaike wrote:
> This is the best effort and it should work for most common configurations, 
> but may not work for more complicated cases.

Right, there are various scenarios where it will not work. This was one
of them but there are others I can think of.

> Any other suggestions?

Unfortunately not. It comes down to whether the out of box cases
outweigh the debug when it's an exception case. The premise of this
patch is that that's the case.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wan, Kaike Dec. 9, 2015, 2:27 p.m. UTC | #11
DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogSGFsIFJvc2Vuc3RvY2sg
W21haWx0bzpoYWxAZGV2Lm1lbGxhbm94LmNvLmlsXQ0KPiBTZW50OiBXZWRuZXNkYXksIERlY2Vt
YmVyIDA5LCAyMDE1IDk6MDYgQU0NCg0KPiA+IFRoaXMgaXMgdGhlIGJlc3QgZWZmb3J0IGFuZCBp
dCBzaG91bGQgd29yayBmb3IgbW9zdCBjb21tb24NCj4gPiBjb25maWd1cmF0aW9ucywgYnV0IG1h
eSBub3Qgd29yayBmb3IgbW9yZSBjb21wbGljYXRlZCBjYXNlcy4NCj4gDQo+IFJpZ2h0LCB0aGVy
ZSBhcmUgdmFyaW91cyBzY2VuYXJpb3Mgd2hlcmUgaXQgd2lsbCBub3Qgd29yay4gVGhpcyB3YXMg
b25lIG9mDQo+IHRoZW0gYnV0IHRoZXJlIGFyZSBvdGhlcnMgSSBjYW4gdGhpbmsgb2YuDQo+IA0K
PiA+IEFueSBvdGhlciBzdWdnZXN0aW9ucz8NCj4gDQo+IFVuZm9ydHVuYXRlbHkgbm90LiBJdCBj
b21lcyBkb3duIHRvIHdoZXRoZXIgdGhlIG91dCBvZiBib3ggY2FzZXMgb3V0d2VpZ2gNCj4gdGhl
IGRlYnVnIHdoZW4gaXQncyBhbiBleGNlcHRpb24gY2FzZS4gVGhlIHByZW1pc2Ugb2YgdGhpcyBw
YXRjaCBpcyB0aGF0DQo+IHRoYXQncyB0aGUgY2FzZS4NCg0KSSB3b3VsZCBhcmd1ZSB0aGF0IGl0
IGRvZXMuIFdpdGhvdXQgdGhpcyBwYXRjaCwgaWJhY20gd2lsbCBub3Qgd29yayBvbiBzZWN1cmUg
ZmFicmljIG91dCBvZiBib3ggKHdoZXJlICJkZWZhdWx0IiBpcyBpbnRlcnByZXQgYXMgMHhmZmZm
KSwgYW5kIGl0IHdpbGwgYmUgZXF1YWwgb3IgbW9yZSBsaWtlbHkgbm90IHRvIHdvcmsgYnkgZGVm
YXVsdCBpbiBtb3JlIGNvbXBsaWNhdGVkIGNvbmZpZ3VyYXRpb25zLCB3aGVyZSBkZWJ1Z2dpbmcg
aXMgcmVxdWlyZWQgYW55d2F5LiBUaGlzIHBhdGNoIGVuYWJsZXMgaWJhY20gdG8gd29yayBwcm9w
ZXJseSBpbiBtb3N0IGNvbW1vbiBjb25maWd1cmF0aW9ucyBvdXQgb2YgYm94Lg0KDQpLYWlrZQ0K
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock Dec. 9, 2015, 2:36 p.m. UTC | #12
>> Unfortunately not. It comes down to whether the out of box cases outweigh
>> the debug when it's an exception case. The premise of this patch is that
>> that's the case.
> 
> I would argue that it does. Without this patch, ibacm will not work on secure fabric out of box 
> (where "default" is interpret as 0xffff), and it will be equal or more likely not to work by default 
> in more complicated configurations, where debugging is required anyway. 

Why is debugging required anyway in those more complicated cases ? It's
configuration that's required.

> This patch enables ibacm to work properly in most common configurations out of box.

Agreed.

-- Hal

> Kaike
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wan, Kaike Dec. 9, 2015, 3:04 p.m. UTC | #13
> -----Original Message-----

> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-

> owner@vger.kernel.org] On Behalf Of Hal Rosenstock

> Sent: Wednesday, December 09, 2015 9:36 AM



> >> Unfortunately not. It comes down to whether the out of box cases

> >> outweigh the debug when it's an exception case. The premise of this

> >> patch is that that's the case.

> >

> > I would argue that it does. Without this patch, ibacm will not work on

> > secure fabric out of box (where "default" is interpret as 0xffff), and

> > it will be equal or more likely not to work by default in more complicated

> configurations, where debugging is required anyway.

> 

> Why is debugging required anyway in those more complicated cases ? It's

> configuration that's required.

> 


I mean that the user needs to investigate why the fabric is not working out of box. This patch itself does not make configuration of the fabric harder. On the contrary, it relieves the user from having to configure ibacm on each node in those common cases.

Kaike
Hal Rosenstock Dec. 9, 2015, 3:15 p.m. UTC | #14
On 12/9/2015 10:04 AM, Wan, Kaike wrote:
> I mean that the user needs to investigate why the fabric is not working out of box. 

My point is that an educated admin should _know_ to configure in these
cases and that debug is only when things are broken not by default in
these more complex cases. This means the limitations of the out of box
approach needs to be explained in the docs.

> This patch itself does not make configuration of the fabric harder.

Agreed.

> On the contrary, it relieves the user from having to configure ibacm on each node in those common cases.

Agreed.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean Dec. 9, 2015, 4:26 p.m. UTC | #15
> > I mean that the user needs to investigate why the fabric is not working

> out of box.

> 

> My point is that an educated admin should _know_ to configure in these

> cases and that debug is only when things are broken not by default in

> these more complex cases. This means the limitations of the out of box

> approach needs to be explained in the docs.


When IP addresses are used, the corresponding pkey is used.  The issue this patch is addressing is the mapping of 'hostnames' to pkeys, as show in this ibacm_addr.cfg example:

#Name      device port pkey
cst-lin0   mlx4_0 1    default
cst-lin0-1 mlx4_0 1    default
cst-lin0-2 mlx4_0 2    default

Currently, 'default' is hard-coded to a pkey of 0xffff.  The intent is to define a better default value.  Kaike has suggested this be the new default:

1. Find the first non-management full-member pkey;
2. If it fails, find pkey 0xffff;
3. If pkey 0xffff is not available, use the first pkey.

Is there better alternative for what default should be?  Jason was suggesting use pkey[0], which seems less robust in theory, but is simple and may cover the vast majority of real use cases.

The fewer cases where manual configuration is necessary, the fewer emails I receive, and the better off I am.  :)

- Sean
Hal Rosenstock Dec. 9, 2015, 4:49 p.m. UTC | #16
On 12/9/2015 11:26 AM, Hefty, Sean wrote:
>>> I mean that the user needs to investigate why the fabric is not working
>> out of box.
>>
>> My point is that an educated admin should _know_ to configure in these
>> cases and that debug is only when things are broken not by default in
>> these more complex cases. This means the limitations of the out of box
>> approach needs to be explained in the docs.
> 
> When IP addresses are used, the corresponding pkey is used.  The issue this patch is addressing is the mapping of 'hostnames' to pkeys, as show in this ibacm_addr.cfg example:
> 
> #Name      device port pkey
> cst-lin0   mlx4_0 1    default
> cst-lin0-1 mlx4_0 1    default
> cst-lin0-2 mlx4_0 2    default
> 
> Currently, 'default' is hard-coded to a pkey of 0xffff.  The intent is to define a better default value.  Kaike has suggested this be the new default:
> 
> 1. Find the first non-management full-member pkey;

By "non-management full-member pkey", I think you mean "pkey which is
other than a full member of default (0x7fff) partition".

> 2. If it fails, find pkey 0xffff;

To me, finding pkey 0xffff is better/safer than assuming it's in index 0
although it's likely there.

> 3. If pkey 0xffff is not available, use the first pkey.
> 
> Is there better alternative for what default should be?  

Order of 1 and 2 depends on use models for full default partition and
other partitions. Reversing 1 and 2 (full default partition first) would
handle the most common use models and handles Jason's case.

The only common case that I'm aware of where that might fall down is in
the virtualized case. I'm not sure what policy is best there and would
need to think about that scenario some more (and there is more
fundamental issue with ACM in those environments).

> Jason was suggesting use pkey[0], which seems less robust in theory, but is simple and may cover the vast majority of real use cases.

Seems less robust to me too.

> The fewer cases where manual configuration is necessary, the fewer emails I receive, and the better off I am.  :)

Understood.

-- Hal

> - Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wan, Kaike Dec. 9, 2015, 5:06 p.m. UTC | #17
DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogbGludXgtcmRtYS1vd25l
ckB2Z2VyLmtlcm5lbC5vcmcgW21haWx0bzpsaW51eC1yZG1hLQ0KPiBvd25lckB2Z2VyLmtlcm5l
bC5vcmddIE9uIEJlaGFsZiBPZiBIYWwgUm9zZW5zdG9jaw0KPiBTZW50OiBXZWRuZXNkYXksIERl
Y2VtYmVyIDA5LCAyMDE1IDExOjQ5IEFNDQoNCg0KPiA+IFdoZW4gSVAgYWRkcmVzc2VzIGFyZSB1
c2VkLCB0aGUgY29ycmVzcG9uZGluZyBwa2V5IGlzIHVzZWQuICBUaGUgaXNzdWUgdGhpcw0KPiBw
YXRjaCBpcyBhZGRyZXNzaW5nIGlzIHRoZSBtYXBwaW5nIG9mICdob3N0bmFtZXMnIHRvIHBrZXlz
LCBhcyBzaG93IGluIHRoaXMNCj4gaWJhY21fYWRkci5jZmcgZXhhbXBsZToNCj4gPg0KPiA+ICNO
YW1lICAgICAgZGV2aWNlIHBvcnQgcGtleQ0KPiA+IGNzdC1saW4wICAgbWx4NF8wIDEgICAgZGVm
YXVsdA0KPiA+IGNzdC1saW4wLTEgbWx4NF8wIDEgICAgZGVmYXVsdA0KPiA+IGNzdC1saW4wLTIg
bWx4NF8wIDIgICAgZGVmYXVsdA0KPiA+DQo+ID4gQ3VycmVudGx5LCAnZGVmYXVsdCcgaXMgaGFy
ZC1jb2RlZCB0byBhIHBrZXkgb2YgMHhmZmZmLiAgVGhlIGludGVudCBpcyB0byBkZWZpbmUNCj4g
YSBiZXR0ZXIgZGVmYXVsdCB2YWx1ZS4gIEthaWtlIGhhcyBzdWdnZXN0ZWQgdGhpcyBiZSB0aGUg
bmV3IGRlZmF1bHQ6DQo+ID4NCj4gPiAxLiBGaW5kIHRoZSBmaXJzdCBub24tbWFuYWdlbWVudCBm
dWxsLW1lbWJlciBwa2V5Ow0KPiANCj4gQnkgIm5vbi1tYW5hZ2VtZW50IGZ1bGwtbWVtYmVyIHBr
ZXkiLCBJIHRoaW5rIHlvdSBtZWFuICJwa2V5IHdoaWNoIGlzDQo+IG90aGVyIHRoYW4gYSBmdWxs
IG1lbWJlciBvZiBkZWZhdWx0ICgweDdmZmYpIHBhcnRpdGlvbiIuDQoNCkZ1bGwtbWVtYmVyIHBr
ZXkgKHdpdGggYml0IDE1IHNldCBvciAweDgwMDApIG90aGVyIHRoYW4gMHhmZmZmLg0KDQo+IA0K
PiA+IDIuIElmIGl0IGZhaWxzLCBmaW5kIHBrZXkgMHhmZmZmOw0KPiANCj4gVG8gbWUsIGZpbmRp
bmcgcGtleSAweGZmZmYgaXMgYmV0dGVyL3NhZmVyIHRoYW4gYXNzdW1pbmcgaXQncyBpbiBpbmRl
eCAwDQo+IGFsdGhvdWdoIGl0J3MgbGlrZWx5IHRoZXJlLg0KPiANCj4gPiAzLiBJZiBwa2V5IDB4
ZmZmZiBpcyBub3QgYXZhaWxhYmxlLCB1c2UgdGhlIGZpcnN0IHBrZXkuDQo+ID4NCj4gPiBJcyB0
aGVyZSBiZXR0ZXIgYWx0ZXJuYXRpdmUgZm9yIHdoYXQgZGVmYXVsdCBzaG91bGQgYmU/DQo+IA0K
PiBPcmRlciBvZiAxIGFuZCAyIGRlcGVuZHMgb24gdXNlIG1vZGVscyBmb3IgZnVsbCBkZWZhdWx0
IHBhcnRpdGlvbiBhbmQgb3RoZXINCj4gcGFydGl0aW9ucy4gUmV2ZXJzaW5nIDEgYW5kIDIgKGZ1
bGwgZGVmYXVsdCBwYXJ0aXRpb24gZmlyc3QpIHdvdWxkIGhhbmRsZSB0aGUNCj4gbW9zdCBjb21t
b24gdXNlIG1vZGVscyBhbmQgaGFuZGxlcyBKYXNvbidzIGNhc2UuDQoNClJldmVyc2luZyAxIGFu
ZCAyIHdpbGwgY2F1c2UgcHJvYmxlbSBmb3IgdGhlIFNBIG5vZGUgaW4gYSBzZWN1cmUgZmFicmlj
Og0KDQpOb2RlIDE6IDB4ODAwMSwgMHhmZmZmIChTQSBub2RlKQ0KTm9kZSAyOiAweDgwMDEsIDB4
N2ZmZg0KTm9kZSAzOiAweDgwMDEsIDB4N2ZmZg0KDQpJbiB0aGlzICBjYXNlLCBOb2RlIDEgd2ls
bCB1c2UgMHhmZmZmIHdoaWxlIE5vZGVzIDIgYW5kIDMgd2lsbCB1c2UgMHg4MDAxLiBLZWVwaW5n
IHRoZSBvcmRlciB3aWxsIGVuYWJsZSBhbGwgbm9kZXMgdG8gdXNlIDB4ODAwMSBhcyB0aGUgZGVm
YXVsdCwgd2hpY2ggYWxzbyBoYW5kbGVzIEphc29uJ3MgY2FzZS4NCg0KS2Fpa2UNCg0K
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean Dec. 9, 2015, 5:13 p.m. UTC | #18
> > 1. Find the first non-management full-member pkey;


I.e. a pkey with the high-order bit set that is not 0xffff

> > 2. If it fails, find pkey 0xffff;

> 

> Order of 1 and 2 depends on use models for full default partition and

> other partitions. Reversing 1 and 2 (full default partition first) would

> handle the most common use models and handles Jason's case.

> 

> The only common case that I'm aware of where that might fall down is in

> the virtualized case. I'm not sure what policy is best there and would

> need to think about that scenario some more (and there is more

> fundamental issue with ACM in those environments).


Offline I had asked about reversing 1 and 2.  The reasoning given was that ibacm could be running on a 'management' node, but needed to communicate with compute nodes.
 
IOW, the occurrence of a 'non-management, full member' pkey in the pkey table strongly indicates that a node is being used in a secure environment, and ibacm should prefer using that pkey over 0xffff.

Example: Compute nodes are assigned pkeys 0x8000 and 0x7fff.  A node running the job scheduler has pkeys 0xffff and 0x8000 (maybe it's also the backup SA).  Ibacm would need to select pkey 0x8000 for communication.

This seems like a reasonable argument to me.

- Sean
Jason Gunthorpe Dec. 9, 2015, 5:39 p.m. UTC | #19
On Wed, Dec 09, 2015 at 07:51:46AM -0500, Hal Rosenstock wrote:
> > ipoib always uses the 0 pkey index to create the default ipoib
> > interface. (see eg, update_parent_pkey)
> 
> This is beyond IBA spec and is currently a linux convention for IPoIB.
> IMO it should be changed to search for this pkey rather than assume
> it's

I don't think you are following. It uses pkey[0] as the pkey for
ipoib, not necessarily for SA communication.

Since there is no way to know what the desired pkey is for ipoib there
is no possibility to search. Using pkey index is 0 a good solution
since it allows the SM to configure ipoib defaults centrally.

> > When operating securely the SA should place the pkey for default ipoib
> > operation in pkey index 0, and place 0x7FFF in another index. I run
> > alot of networks exactly like this and it works very well.
> 
> Yes, it can run that way but more secure is without the full default
> pkey. When full default pkey is in every port, the rest of the
> partitioning doesn't really matter...

That isn't what I said, I said the pkey for the default ipoib
interface is in pkey[0], eg, the network runs with [0x8001,0x7FFF] as
the pkey table. There is no 0xFFFF pkey except on SA nodes.

Linux automatically creates ib0 on 0x8001 and the rest of the
in-kernel stack (should?) correctly find and use 0x7FFF as the pkey to
use to talk to the SA.

acm should follow ipoib convention for creating it's multicast groups
and setup it's default multicast group using pkey[0]

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Dec. 9, 2015, 5:46 p.m. UTC | #20
On Wed, Dec 09, 2015 at 01:07:14PM +0000, Wan, Kaike wrote:

> > > +	/* Determine the default pkey index for SA access first.
> > > +	 *   Order of preference: 0xffff, 0x7fff, first pkey.
> > 
> > No, IBA says that only the default pkey should be used to talk to the SA,
> > every port needs 0x7FFF or the full mebership version. Do not search for the
> > first pkey.
> 
> We use the first pkey only if there is neither 0x7fff nor 0xffff in
> this port. If the port is in compliance with IB Spec, then we will
> be using either 0xffff or 0x7fff for SA access.

This is just confusing for readers, the IBA spec is very clear on what
pkey must be used to talk to the SA, don't ever use something
else. Follow the spec.

> > > +	 * Determine the default pkey for parsing address file as well.
> > > +	 *   order of preference: first full-member non-management pkey,
> > > +	 *   0xffff, first pkey.
> > > +	 */
> > 
> > This really should just be the 0 index pkey, which exactly matches how IPoIB
> > determines the default pkey, which is what matters when talking rdmacm..
> 
> It is true in most default configurations. However, since ibacm will
> use the default pkey for multicast, we want to make sure that it
> will not use a limited-member pkey to create/join a multicast group
> (practically of little use in this case) if such a pkey is placed at
> index 0.

If you don't follow the exact ipoib algorithm then you get different
answers in some cases and ugly subtle failure modes. Ie this algorithm
will not choose 0xFFFF as the pkey in cases where ipoib would - which
is not acceptable, IMHO.

If the 0 index pkey is not usable for ipoib then ipoib will be broken
too and that is far more likely to be noticed than if acm is broken.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Dec. 9, 2015, 6:22 p.m. UTC | #21
On Wed, Dec 09, 2015 at 05:13:49PM +0000, Hefty, Sean wrote:
 
> Example: Compute nodes are assigned pkeys 0x8000 and 0x7fff.  A node
> running the job scheduler has pkeys 0xffff and 0x8000 (maybe it's
> also the backup SA).  Ibacm would need to select pkey 0x8000 for
> communication.

I've also seen the reverse, eg 0xFFFF is used for default ipoib
communication and 0x8001 is assigned to only some nodes as a child
vlan.

Choosing 0x8001 in that case won't work either.

pkey[0] at least has the logic that the admin will configure things so
that the default ipoib device reaches the broadest audiance makes the
most sense to me. That is what most sites I've seen want to do.

I'm not quite sure what the acm algorithm is, but can't it just figure
out the pkey from the IP routing? Ie if you have an IP address to
resolve a few netlink queries will tell you what pkey to use, and that
is where the acm multicast should go?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean Dec. 9, 2015, 6:37 p.m. UTC | #22
> pkey[0] at least has the logic that the admin will configure things so
> that the default ipoib device reaches the broadest audiance makes the
> most sense to me. That is what most sites I've seen want to do.

Kaike, will pkey[0] work in the configurations that you're targeting with this change?

This seems like a very simple solution that's better than what we have now. 

> I'm not quite sure what the acm algorithm is, but can't it just figure
> out the pkey from the IP routing? Ie if you have an IP address to
> resolve a few netlink queries will tell you what pkey to use, and that
> is where the acm multicast should go?

When an IP address is used, it uses the correct pkey (based on routing data).  I mentioned this in a separate email, but this addresses the case when hostnames are used and ipoib may not be involved.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wan, Kaike Dec. 9, 2015, 6:39 p.m. UTC | #23
> -----Original Message-----
> From: Hefty, Sean
> Sent: Wednesday, December 09, 2015 1:37 PM
> To: Jason Gunthorpe
> Cc: Hal Rosenstock; Wan, Kaike; linux-rdma@vger.kernel.org
> Subject: RE: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics
> 
> > pkey[0] at least has the logic that the admin will configure things so
> > that the default ipoib device reaches the broadest audiance makes the
> > most sense to me. That is what most sites I've seen want to do.
> 
> Kaike, will pkey[0] work in the configurations that you're targeting with this
> change?

Yes. That will work for me.

> 
> This seems like a very simple solution that's better than what we have now.
> 
> > I'm not quite sure what the acm algorithm is, but can't it just figure
> > out the pkey from the IP routing? Ie if you have an IP address to
> > resolve a few netlink queries will tell you what pkey to use, and that
> > is where the acm multicast should go?
> 
> When an IP address is used, it uses the correct pkey (based on routing data).
> I mentioned this in a separate email, but this addresses the case when
> hostnames are used and ipoib may not be involved.
> 
> - Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford Dec. 9, 2015, 9:35 p.m. UTC | #24
On 12/09/2015 01:22 PM, Jason Gunthorpe wrote:
> On Wed, Dec 09, 2015 at 05:13:49PM +0000, Hefty, Sean wrote:
>  
>> Example: Compute nodes are assigned pkeys 0x8000 and 0x7fff.  A node
>> running the job scheduler has pkeys 0xffff and 0x8000 (maybe it's
>> also the backup SA).  Ibacm would need to select pkey 0x8000 for
>> communication.
> 
> I've also seen the reverse, eg 0xFFFF is used for default ipoib
> communication and 0x8001 is assigned to only some nodes as a child
> vlan.

That's what I use internally in our test lab.

> Choosing 0x8001 in that case won't work either.

Nope.  The suggestion here would break our setup.  Well, *would* being
the operative word where what I mean is would if we used ibacm.  But
that's a story for another email...
Hefty, Sean Dec. 9, 2015, 9:52 p.m. UTC | #25
> >> Example: Compute nodes are assigned pkeys 0x8000 and 0x7fff.  A node
> >> running the job scheduler has pkeys 0xffff and 0x8000 (maybe it's
> >> also the backup SA).  Ibacm would need to select pkey 0x8000 for
> >> communication.
> >
> > I've also seen the reverse, eg 0xFFFF is used for default ipoib
> > communication and 0x8001 is assigned to only some nodes as a child
> > vlan.
> 
> That's what I use internally in our test lab.
> 
> > Choosing 0x8001 in that case won't work either.
> 
> Nope.  The suggestion here would break our setup.  Well, *would* being
> the operative word where what I mean is would if we used ibacm.  But
> that's a story for another email...

In this case, Kaike, please change the default to just be pkey[0].

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/src/acm.c b/src/acm.c
index ada0bfb..ce2797c 100644
--- a/src/acm.c
+++ b/src/acm.c
@@ -114,7 +114,8 @@  struct acmc_port {
 	union ibv_gid       *gid_tbl;
 	uint16_t            lid;
 	uint16_t            lid_mask;
-	int                 default_pkey_index;
+	int                 sa_pkey_index;
+	uint16_t            def_acm_pkey;
 };
 
 struct acmc_device {
@@ -2009,7 +2010,7 @@  static int acm_assign_ep_names(struct acmc_ep *ep)
 				continue;
 			}
 		} else {
-			pkey = 0xFFFF;
+			pkey = ep->port->def_acm_pkey;
 		}
 
 		if (!stricmp(dev_name, dev) &&
@@ -2202,7 +2203,11 @@  static void acm_port_up(struct acmc_port *port)
 	uint16_t pkey;
 	int i, ret;
 	struct acmc_prov_context *dev_ctx;
-	int index = -1;
+	int sa_index = -1;
+	int full_mgmt_index = -1;
+	uint16_t def_pkey = 0;
+	int first_pkey_index = -1;
+	uint16_t first_pkey = 0;
 
 	acm_log(1, "%s %d\n", port->dev->device.verbs->device->name, 
 		port->port.port_num);
@@ -2248,24 +2253,45 @@  static void acm_port_up(struct acmc_port *port)
 		goto err1;
 	}
 
-	/* Determine the default pkey first.
-	   Order of preference: 0xffff, 0x7fff, first pkey
-	*/
+	/* Determine the default pkey index for SA access first.
+	 *   Order of preference: 0xffff, 0x7fff, first pkey.
+	 * Determine the default pkey for parsing address file as well.
+	 *   order of preference: first full-member non-management pkey,
+	 *   0xffff, first pkey.
+	 */
 	for (i = 0; i < attr.pkey_tbl_len; i++) {
 		ret = ibv_query_pkey(port->dev->device.verbs, 
 				     port->port.port_num, i, &pkey);
 		if (ret)
 			continue;
 		pkey = ntohs(pkey);
-		if (pkey == 0xffff) {
-			index = i;
-			break;
-		}
-		else if (pkey == 0x7fff) {
-			index = i;
+		if (!(pkey & 0x7ffff))
+			continue;
+
+		if (first_pkey_index < 0) {
+			first_pkey_index = i;
+			first_pkey = pkey;
 		}
-	}
-	port->default_pkey_index = index < 0 ? 0: index;
+
+		if (pkey == 0xffff) {
+			sa_index = i;
+			full_mgmt_index = i;
+		} else if (pkey == 0x7fff) {
+			if (sa_index < 0)
+				sa_index = i;
+		} else if ((def_pkey == 0) && (pkey & 0x8000)) {
+			/* First full-member non-management pkey */
+			def_pkey = pkey;
+		}
+	}
+	port->sa_pkey_index = (sa_index < 0) ?
+		first_pkey_index : sa_index;
+	if (def_pkey)
+		port->def_acm_pkey = def_pkey;
+	else if (full_mgmt_index >= 0)
+		port->def_acm_pkey = 0xffff;
+	else
+		port->def_acm_pkey = first_pkey;
 
 	for (i = 0; i < attr.pkey_tbl_len; i++) {
 		ret = ibv_query_pkey(port->dev->device.verbs, 
@@ -2775,7 +2801,7 @@  int acm_send_sa_mad(struct acm_sa_mad *mad)
 	mad->umad.addr.qkey = port->sa_addr.qkey;
 	mad->umad.addr.lid = htons(port->sa_addr.lid);
 	mad->umad.addr.sl = port->sa_addr.sl;
-	mad->umad.addr.pkey_index = req->ep->port->default_pkey_index;
+	mad->umad.addr.pkey_index = req->ep->port->sa_pkey_index;
 
 	lock_acquire(&port->lock);
 	if (port->sa_credits && DListEmpty(&port->sa_wait)) {