Message ID | 1449595982-20781-1-git-send-email-kaike.wan@intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Tue, Dec 08, 2015 at 12:33:02PM -0500, kaike.wan@intel.com wrote: > From: Kaike Wan <kaike.wan@intel.com> > > In an insecure IB fabric, the default pkey in a port is 0xffff, where each > node is allowed to talk to any other node in the fabric, including the SA > node. However, in a secure fabric, to limit member access, not all nodes > can have the full-member default pkey 0xffff. A typical configuration is > to let SA node have pkey 0xffff while all other nodes have pkey 0x7fff; in > addition, each node can be assigned some other full-member pkeys, such as > 0x8001 and 0x8002, so that it can be assigned to different partitions. > In this case, each node can access SA, and yet limits its other access to > only those nodes in its assigned partitions. In such a secure fabric, > however, ibacm will not work by interpreting "default" in its default > address file as 0xffff. ipoib always uses the 0 pkey index to create the default ipoib interface. (see eg, update_parent_pkey) When operating securely the SA should place the pkey for default ipoib operation in pkey index 0, and place 0x7FFF in another index. I run alot of networks exactly like this and it works very well. This ensures that ipoib works out of the box without additional configuration. > + /* Determine the default pkey index for SA access first. > + * Order of preference: 0xffff, 0x7fff, first pkey. No, IBA says that only the default pkey should be used to talk to the SA, every port needs 0x7FFF or the full mebership version. Do not search for the first pkey. > + * Determine the default pkey for parsing address file as well. > + * order of preference: first full-member non-management pkey, > + * 0xffff, first pkey. > + */ This really should just be the 0 index pkey, which exactly matches how IPoIB determines the default pkey, which is what matters when talking rdmacm.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > + * Determine the default pkey for parsing address file as well. > > + * order of preference: first full-member non-management pkey, > > + * 0xffff, first pkey. > > + */ > > This really should just be the 0 index pkey, which exactly matches how > IPoIB determines the default pkey, which is what matters when talking > rdmacm.. Ibacm currently hard-codes the 'default' pkey to 0xffff for ibacm <-> ibacm communication. If there's no disagreement to switching to pkey[0], I'm fine with that. I did not realize that ipoib uses this same default. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/8/2015 12:33 PM, kaike.wan@intel.com wrote: > From: Kaike Wan <kaike.wan@intel.com> > > In an insecure IB fabric, the default pkey in a port is 0xffff, where each > node is allowed to talk to any other node in the fabric, including the SA > node. However, in a secure fabric, to limit member access, not all nodes > can have the full-member default pkey 0xffff. A typical configuration is > to let SA node have pkey 0xffff while all other nodes have pkey 0x7fff; in > addition, each node can be assigned some other full-member pkeys, such as > 0x8001 and 0x8002, so that it can be assigned to different partitions. > In this case, each node can access SA, and yet limits its other access to > only those nodes in its assigned partitions. In such a secure fabric, > however, ibacm will not work by interpreting "default" in its default > address file as 0xffff. > > To solve the problem, this patch introduces the following priority to > interpret default pkey: > 1. Find the first non-management full-member pkey; > 2. If it fails, find pkey 0xffff; > 3. If pkey 0xffff is not available, use the first pkey. > This approach will work in both securely and insecurely partitions > fabrics. Shouldn't the pkey to be used for such interACM communication be configured ? First full member pkey is non-deterministic. Isn't it the case that it may not include proper set of ACMs to communicate with ? -- Hal -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/8/2015 7:26 PM, Hefty, Sean wrote: >>> + * Determine the default pkey for parsing address file as well. >>> + * order of preference: first full-member non-management pkey, >>> + * 0xffff, first pkey. >>> + */ >> >> This really should just be the 0 index pkey, which exactly matches how >> IPoIB determines the default pkey, which is what matters when talking >> rdmacm.. > > Ibacm currently hard-codes the 'default' pkey to 0xffff for ibacm <-> ibacm communication. > If there's no disagreement to switching to pkey[0], I'm fine with that. There's no IBA requirement that 0xffff pkey is always in index 0. It's only a requirement on device bootup with non volatile storage per C10-123. Furthermore, there's no requirement that 0xffff pkey is present in the table. It may be that there's only 0x7fff pkey. > I did not realize that ipoib uses this same default. I think this is a problem in limiting partition use and should be changed. Rather than assuming index 0, pkey table should be searched for this pkey. -- Hal -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/8/2015 4:21 PM, Jason Gunthorpe wrote: > On Tue, Dec 08, 2015 at 12:33:02PM -0500, kaike.wan@intel.com wrote: >> From: Kaike Wan <kaike.wan@intel.com> >> >> In an insecure IB fabric, the default pkey in a port is 0xffff, where each >> node is allowed to talk to any other node in the fabric, including the SA >> node. However, in a secure fabric, to limit member access, not all nodes >> can have the full-member default pkey 0xffff. A typical configuration is >> to let SA node have pkey 0xffff while all other nodes have pkey 0x7fff; in >> addition, each node can be assigned some other full-member pkeys, such as >> 0x8001 and 0x8002, so that it can be assigned to different partitions. >> In this case, each node can access SA, and yet limits its other access to >> only those nodes in its assigned partitions. In such a secure fabric, >> however, ibacm will not work by interpreting "default" in its default >> address file as 0xffff. > > ipoib always uses the 0 pkey index to create the default ipoib > interface. (see eg, update_parent_pkey) This is beyond IBA spec and is currently a linux convention for IPoIB. IMO it should be changed to search for this pkey rather than assume it's in index 0. There's no requirement that it be in index 0 other than at bootup with non volatile storage (C10-123). > When operating securely the SA should place the pkey for default ipoib > operation in pkey index 0, and place 0x7FFF in another index. I run > alot of networks exactly like this and it works very well. Yes, it can run that way but more secure is without the full default pkey. When full default pkey is in every port, the rest of the partitioning doesn't really matter... -- Hal -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > Sent: Tuesday, December 08, 2015 4:22 PM > To: Wan, Kaike > Cc: Hefty, Sean; linux-rdma@vger.kernel.org > Subject: Re: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics > > On Tue, Dec 08, 2015 at 12:33:02PM -0500, kaike.wan@intel.com wrote: > > From: Kaike Wan <kaike.wan@intel.com> > > > > In an insecure IB fabric, the default pkey in a port is 0xffff, where > > each node is allowed to talk to any other node in the fabric, > > including the SA node. However, in a secure fabric, to limit member > > access, not all nodes can have the full-member default pkey 0xffff. A > > typical configuration is to let SA node have pkey 0xffff while all > > other nodes have pkey 0x7fff; in addition, each node can be assigned > > some other full-member pkeys, such as > > 0x8001 and 0x8002, so that it can be assigned to different partitions. > > In this case, each node can access SA, and yet limits its other access > > to only those nodes in its assigned partitions. In such a secure > > fabric, however, ibacm will not work by interpreting "default" in its > > default address file as 0xffff. > > ipoib always uses the 0 pkey index to create the default ipoib interface. (see > eg, update_parent_pkey) > > When operating securely the SA should place the pkey for default ipoib > operation in pkey index 0, and place 0x7FFF in another index. I run alot of > networks exactly like this and it works very well. In such a configuration, this patch will enable ibacm to use pkey 0 for address resolution through multicast while use 0x7fff for SA access, exactly matching what ipoib is currently doing. > > This ensures that ipoib works out of the box without additional configuration. This is exactly the purpose of this patch. > > > + /* Determine the default pkey index for SA access first. > > + * Order of preference: 0xffff, 0x7fff, first pkey. > > No, IBA says that only the default pkey should be used to talk to the SA, > every port needs 0x7FFF or the full mebership version. Do not search for the > first pkey. We use the first pkey only if there is neither 0x7fff nor 0xffff in this port. If the port is in compliance with IB Spec, then we will be using either 0xffff or 0x7fff for SA access. > > > + * Determine the default pkey for parsing address file as well. > > + * order of preference: first full-member non-management pkey, > > + * 0xffff, first pkey. > > + */ > > This really should just be the 0 index pkey, which exactly matches how IPoIB > determines the default pkey, which is what matters when talking rdmacm.. It is true in most default configurations. However, since ibacm will use the default pkey for multicast, we want to make sure that it will not use a limited-member pkey to create/join a multicast group (practically of little use in this case) if such a pkey is placed at index 0. > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: Hal Rosenstock [mailto:hal@dev.mellanox.co.il] > Sent: Wednesday, December 09, 2015 7:50 AM > To: Wan, Kaike; Hefty, Sean > Cc: linux-rdma@vger.kernel.org > Subject: Re: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics > > On 12/8/2015 12:33 PM, kaike.wan@intel.com wrote: > > From: Kaike Wan <kaike.wan@intel.com> > > > > In an insecure IB fabric, the default pkey in a port is 0xffff, where > > each node is allowed to talk to any other node in the fabric, > > including the SA node. However, in a secure fabric, to limit member > > access, not all nodes can have the full-member default pkey 0xffff. A > > typical configuration is to let SA node have pkey 0xffff while all > > other nodes have pkey 0x7fff; in addition, each node can be assigned > > some other full-member pkeys, such as > > 0x8001 and 0x8002, so that it can be assigned to different partitions. > > In this case, each node can access SA, and yet limits its other access > > to only those nodes in its assigned partitions. In such a secure > > fabric, however, ibacm will not work by interpreting "default" in its > > default address file as 0xffff. > > > > To solve the problem, this patch introduces the following priority to > > interpret default pkey: > > 1. Find the first non-management full-member pkey; 2. If it fails, > > find pkey 0xffff; 3. If pkey 0xffff is not available, use the first > > pkey. > > This approach will work in both securely and insecurely partitions > > fabrics. > > Shouldn't the pkey to be used for such interACM communication be > configured ? Yes. The purpose of this patch is only to make a secure system work out of box (default configuration). When a specific pkey is given in the ibacm_addr.cfg file, there will be no need to interpret the "default" pkey. > First full member pkey is non-deterministic. Isn't it the case that > it may not include proper set of ACMs to communicate with ? This is only for the default configuration, where a reasonable assumption is that members of an intended partition (group of ports) will all have the same full-member pkey. One could argue that a port could have two or more full-member non-management pkeys because it is assigned to multiple partitions. In this case, the port will only join only one multicast group, not all the multicast groups. The reply is that the default ibacm_addr.cfg have only one endpoint with pkey "default" anyway. To make it really work, one needs to edit ibacm_addr.cfg. Kaike
On 12/9/2015 8:24 AM, Wan, Kaike wrote: >> From: Hal Rosenstock [mailto:hal@dev.mellanox.co.il] >> Sent: Wednesday, December 09, 2015 7:50 AM >> To: Wan, Kaike; Hefty, Sean >> Cc: linux-rdma@vger.kernel.org >> Subject: Re: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics >> >> On 12/8/2015 12:33 PM, kaike.wan@intel.com wrote: >>> From: Kaike Wan <kaike.wan@intel.com> >>> >>> In an insecure IB fabric, the default pkey in a port is 0xffff, where >>> each node is allowed to talk to any other node in the fabric, >>> including the SA node. However, in a secure fabric, to limit member >>> access, not all nodes can have the full-member default pkey 0xffff. A >>> typical configuration is to let SA node have pkey 0xffff while all >>> other nodes have pkey 0x7fff; in addition, each node can be assigned >>> some other full-member pkeys, such as >>> 0x8001 and 0x8002, so that it can be assigned to different partitions. >>> In this case, each node can access SA, and yet limits its other access >>> to only those nodes in its assigned partitions. In such a secure >>> fabric, however, ibacm will not work by interpreting "default" in its >>> default address file as 0xffff. >>> >>> To solve the problem, this patch introduces the following priority to >>> interpret default pkey: >>> 1. Find the first non-management full-member pkey; 2. If it fails, >>> find pkey 0xffff; 3. If pkey 0xffff is not available, use the first >>> pkey. >>> This approach will work in both securely and insecurely partitions >>> fabrics. >> >> Shouldn't the pkey to be used for such interACM communication be >> configured ? > Yes. The purpose of this patch is only to make a secure system work out of box (default configuration). When a specific pkey is given in the ibacm_addr.cfg file, there will be no need to interpret the "default" pkey. > >> First full member pkey is non-deterministic. Isn't it the case that >> it may not include proper set of ACMs to communicate with ? > > This is only for the default configuration, where a reasonable assumption is that members of an intended > partition (group of ports) will all have the same full-member pkey. Yes, but it may not be first (lowest index) pkey in table of different ports. > One could argue that a port could have two or more full-member non-management pkeys because > it is assigned to multiple partitions. Yes, that's a perfectly valid configuration. > In this case, the port will only join only one multicast group, not all the multicast groups. The reply is > that the default ibacm_addr.cfg have only one endpoint with pkey "default" anyway. In this case, the non default partitions are not useful for ACM and all ACMs need to share "default" partition. > To make it really work, one needs to edit ibacm_addr.cfg. It may work without config depending on a number of factors but can cause issues to be debugged. Only sure way is config :-( -- Hal > Kaike > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogSGFsIFJvc2Vuc3RvY2sg W21haWx0bzpoYWxAZGV2Lm1lbGxhbm94LmNvLmlsXQ0KPiBTZW50OiBXZWRuZXNkYXksIERlY2Vt YmVyIDA5LCAyMDE1IDg6NDYgQU0NCg0KPiA+Pj4gVG8gc29sdmUgdGhlIHByb2JsZW0sIHRoaXMg cGF0Y2ggaW50cm9kdWNlcyB0aGUgZm9sbG93aW5nIHByaW9yaXR5DQo+ID4+PiB0byBpbnRlcnBy ZXQgZGVmYXVsdCBwa2V5Og0KPiA+Pj4gMS4gRmluZCB0aGUgZmlyc3Qgbm9uLW1hbmFnZW1lbnQg ZnVsbC1tZW1iZXIgcGtleTsgMi4gSWYgaXQgZmFpbHMsDQo+ID4+PiBmaW5kIHBrZXkgMHhmZmZm OyAzLiBJZiBwa2V5IDB4ZmZmZiBpcyBub3QgYXZhaWxhYmxlLCB1c2UgdGhlIGZpcnN0DQo+ID4+ PiBwa2V5Lg0KPiA+Pj4gVGhpcyBhcHByb2FjaCB3aWxsIHdvcmsgaW4gYm90aCBzZWN1cmVseSBh bmQgaW5zZWN1cmVseSBwYXJ0aXRpb25zDQo+ID4+PiBmYWJyaWNzLg0KPiA+Pg0KPiA+PiBTaG91 bGRuJ3QgdGhlIHBrZXkgdG8gYmUgdXNlZCBmb3Igc3VjaCBpbnRlckFDTSBjb21tdW5pY2F0aW9u IGJlDQo+ID4+IGNvbmZpZ3VyZWQgPw0KPiA+IFllcy4gVGhlIHB1cnBvc2Ugb2YgdGhpcyBwYXRj aCBpcyBvbmx5IHRvIG1ha2UgYSBzZWN1cmUgc3lzdGVtIHdvcmsgb3V0IG9mDQo+IGJveCAoZGVm YXVsdCBjb25maWd1cmF0aW9uKS4gV2hlbiBhIHNwZWNpZmljIHBrZXkgaXMgZ2l2ZW4gaW4gdGhl DQo+IGliYWNtX2FkZHIuY2ZnIGZpbGUsIHRoZXJlIHdpbGwgYmUgbm8gbmVlZCB0byBpbnRlcnBy ZXQgdGhlICJkZWZhdWx0IiBwa2V5Lg0KPiA+DQo+ID4+IEZpcnN0IGZ1bGwgbWVtYmVyIHBrZXkg aXMgbm9uLWRldGVybWluaXN0aWMuIElzbid0IGl0IHRoZSBjYXNlIHRoYXQNCj4gPj4gaXQgbWF5 IG5vdCBpbmNsdWRlIHByb3BlciBzZXQgb2YgQUNNcyB0byBjb21tdW5pY2F0ZSB3aXRoID8NCj4g Pg0KPiA+IFRoaXMgaXMgb25seSBmb3IgdGhlIGRlZmF1bHQgY29uZmlndXJhdGlvbiwgd2hlcmUg YSByZWFzb25hYmxlDQo+ID4gYXNzdW1wdGlvbiBpcyB0aGF0IG1lbWJlcnMgb2YgYW4gaW50ZW5k ZWQgcGFydGl0aW9uIChncm91cCBvZiBwb3J0cykgd2lsbA0KPiBhbGwgaGF2ZSB0aGUgc2FtZSBm dWxsLW1lbWJlciBwa2V5Lg0KPiANCj4gWWVzLCBidXQgaXQgbWF5IG5vdCBiZSBmaXJzdCAobG93 ZXN0IGluZGV4KSBwa2V5IGluIHRhYmxlIG9mIGRpZmZlcmVudCBwb3J0cy4NCj4gDQoNClRoaXMg aXMgdGhlIGJlc3QgZWZmb3J0IGFuZCBpdCBzaG91bGQgd29yayBmb3IgbW9zdCBjb21tb24gY29u ZmlndXJhdGlvbnMsIGJ1dCBtYXkgbm90IHdvcmsgZm9yIG1vcmUgY29tcGxpY2F0ZWQgY2FzZXMu DQoNCkFueSBvdGhlciBzdWdnZXN0aW9ucz8NCg0KPiA+IFRvIG1ha2UgaXQgcmVhbGx5IHdvcmss IG9uZSBuZWVkcyB0byBlZGl0IGliYWNtX2FkZHIuY2ZnLg0KPiANCj4gSXQgbWF5IHdvcmsgd2l0 aG91dCBjb25maWcgZGVwZW5kaW5nIG9uIGEgbnVtYmVyIG9mIGZhY3RvcnMgYnV0IGNhbiBjYXVz ZQ0KPiBpc3N1ZXMgdG8gYmUgZGVidWdnZWQuDQo+IA0KPiBPbmx5IHN1cmUgd2F5IGlzIGNvbmZp ZyA6LSgNCg0KRXhhY3RseS4NCg0KS2Fpa2UNCg0K -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/9/2015 8:55 AM, Wan, Kaike wrote: > This is the best effort and it should work for most common configurations, > but may not work for more complicated cases. Right, there are various scenarios where it will not work. This was one of them but there are others I can think of. > Any other suggestions? Unfortunately not. It comes down to whether the out of box cases outweigh the debug when it's an exception case. The premise of this patch is that that's the case. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogSGFsIFJvc2Vuc3RvY2sg W21haWx0bzpoYWxAZGV2Lm1lbGxhbm94LmNvLmlsXQ0KPiBTZW50OiBXZWRuZXNkYXksIERlY2Vt YmVyIDA5LCAyMDE1IDk6MDYgQU0NCg0KPiA+IFRoaXMgaXMgdGhlIGJlc3QgZWZmb3J0IGFuZCBp dCBzaG91bGQgd29yayBmb3IgbW9zdCBjb21tb24NCj4gPiBjb25maWd1cmF0aW9ucywgYnV0IG1h eSBub3Qgd29yayBmb3IgbW9yZSBjb21wbGljYXRlZCBjYXNlcy4NCj4gDQo+IFJpZ2h0LCB0aGVy ZSBhcmUgdmFyaW91cyBzY2VuYXJpb3Mgd2hlcmUgaXQgd2lsbCBub3Qgd29yay4gVGhpcyB3YXMg b25lIG9mDQo+IHRoZW0gYnV0IHRoZXJlIGFyZSBvdGhlcnMgSSBjYW4gdGhpbmsgb2YuDQo+IA0K PiA+IEFueSBvdGhlciBzdWdnZXN0aW9ucz8NCj4gDQo+IFVuZm9ydHVuYXRlbHkgbm90LiBJdCBj b21lcyBkb3duIHRvIHdoZXRoZXIgdGhlIG91dCBvZiBib3ggY2FzZXMgb3V0d2VpZ2gNCj4gdGhl IGRlYnVnIHdoZW4gaXQncyBhbiBleGNlcHRpb24gY2FzZS4gVGhlIHByZW1pc2Ugb2YgdGhpcyBw YXRjaCBpcyB0aGF0DQo+IHRoYXQncyB0aGUgY2FzZS4NCg0KSSB3b3VsZCBhcmd1ZSB0aGF0IGl0 IGRvZXMuIFdpdGhvdXQgdGhpcyBwYXRjaCwgaWJhY20gd2lsbCBub3Qgd29yayBvbiBzZWN1cmUg ZmFicmljIG91dCBvZiBib3ggKHdoZXJlICJkZWZhdWx0IiBpcyBpbnRlcnByZXQgYXMgMHhmZmZm KSwgYW5kIGl0IHdpbGwgYmUgZXF1YWwgb3IgbW9yZSBsaWtlbHkgbm90IHRvIHdvcmsgYnkgZGVm YXVsdCBpbiBtb3JlIGNvbXBsaWNhdGVkIGNvbmZpZ3VyYXRpb25zLCB3aGVyZSBkZWJ1Z2dpbmcg aXMgcmVxdWlyZWQgYW55d2F5LiBUaGlzIHBhdGNoIGVuYWJsZXMgaWJhY20gdG8gd29yayBwcm9w ZXJseSBpbiBtb3N0IGNvbW1vbiBjb25maWd1cmF0aW9ucyBvdXQgb2YgYm94Lg0KDQpLYWlrZQ0K -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Unfortunately not. It comes down to whether the out of box cases outweigh >> the debug when it's an exception case. The premise of this patch is that >> that's the case. > > I would argue that it does. Without this patch, ibacm will not work on secure fabric out of box > (where "default" is interpret as 0xffff), and it will be equal or more likely not to work by default > in more complicated configurations, where debugging is required anyway. Why is debugging required anyway in those more complicated cases ? It's configuration that's required. > This patch enables ibacm to work properly in most common configurations out of box. Agreed. -- Hal > Kaike > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > owner@vger.kernel.org] On Behalf Of Hal Rosenstock > Sent: Wednesday, December 09, 2015 9:36 AM > >> Unfortunately not. It comes down to whether the out of box cases > >> outweigh the debug when it's an exception case. The premise of this > >> patch is that that's the case. > > > > I would argue that it does. Without this patch, ibacm will not work on > > secure fabric out of box (where "default" is interpret as 0xffff), and > > it will be equal or more likely not to work by default in more complicated > configurations, where debugging is required anyway. > > Why is debugging required anyway in those more complicated cases ? It's > configuration that's required. > I mean that the user needs to investigate why the fabric is not working out of box. This patch itself does not make configuration of the fabric harder. On the contrary, it relieves the user from having to configure ibacm on each node in those common cases. Kaike
On 12/9/2015 10:04 AM, Wan, Kaike wrote: > I mean that the user needs to investigate why the fabric is not working out of box. My point is that an educated admin should _know_ to configure in these cases and that debug is only when things are broken not by default in these more complex cases. This means the limitations of the out of box approach needs to be explained in the docs. > This patch itself does not make configuration of the fabric harder. Agreed. > On the contrary, it relieves the user from having to configure ibacm on each node in those common cases. Agreed. -- Hal -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > I mean that the user needs to investigate why the fabric is not working > out of box. > > My point is that an educated admin should _know_ to configure in these > cases and that debug is only when things are broken not by default in > these more complex cases. This means the limitations of the out of box > approach needs to be explained in the docs. When IP addresses are used, the corresponding pkey is used. The issue this patch is addressing is the mapping of 'hostnames' to pkeys, as show in this ibacm_addr.cfg example: #Name device port pkey cst-lin0 mlx4_0 1 default cst-lin0-1 mlx4_0 1 default cst-lin0-2 mlx4_0 2 default Currently, 'default' is hard-coded to a pkey of 0xffff. The intent is to define a better default value. Kaike has suggested this be the new default: 1. Find the first non-management full-member pkey; 2. If it fails, find pkey 0xffff; 3. If pkey 0xffff is not available, use the first pkey. Is there better alternative for what default should be? Jason was suggesting use pkey[0], which seems less robust in theory, but is simple and may cover the vast majority of real use cases. The fewer cases where manual configuration is necessary, the fewer emails I receive, and the better off I am. :) - Sean
On 12/9/2015 11:26 AM, Hefty, Sean wrote: >>> I mean that the user needs to investigate why the fabric is not working >> out of box. >> >> My point is that an educated admin should _know_ to configure in these >> cases and that debug is only when things are broken not by default in >> these more complex cases. This means the limitations of the out of box >> approach needs to be explained in the docs. > > When IP addresses are used, the corresponding pkey is used. The issue this patch is addressing is the mapping of 'hostnames' to pkeys, as show in this ibacm_addr.cfg example: > > #Name device port pkey > cst-lin0 mlx4_0 1 default > cst-lin0-1 mlx4_0 1 default > cst-lin0-2 mlx4_0 2 default > > Currently, 'default' is hard-coded to a pkey of 0xffff. The intent is to define a better default value. Kaike has suggested this be the new default: > > 1. Find the first non-management full-member pkey; By "non-management full-member pkey", I think you mean "pkey which is other than a full member of default (0x7fff) partition". > 2. If it fails, find pkey 0xffff; To me, finding pkey 0xffff is better/safer than assuming it's in index 0 although it's likely there. > 3. If pkey 0xffff is not available, use the first pkey. > > Is there better alternative for what default should be? Order of 1 and 2 depends on use models for full default partition and other partitions. Reversing 1 and 2 (full default partition first) would handle the most common use models and handles Jason's case. The only common case that I'm aware of where that might fall down is in the virtualized case. I'm not sure what policy is best there and would need to think about that scenario some more (and there is more fundamental issue with ACM in those environments). > Jason was suggesting use pkey[0], which seems less robust in theory, but is simple and may cover the vast majority of real use cases. Seems less robust to me too. > The fewer cases where manual configuration is necessary, the fewer emails I receive, and the better off I am. :) Understood. -- Hal > - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogbGludXgtcmRtYS1vd25l ckB2Z2VyLmtlcm5lbC5vcmcgW21haWx0bzpsaW51eC1yZG1hLQ0KPiBvd25lckB2Z2VyLmtlcm5l bC5vcmddIE9uIEJlaGFsZiBPZiBIYWwgUm9zZW5zdG9jaw0KPiBTZW50OiBXZWRuZXNkYXksIERl Y2VtYmVyIDA5LCAyMDE1IDExOjQ5IEFNDQoNCg0KPiA+IFdoZW4gSVAgYWRkcmVzc2VzIGFyZSB1 c2VkLCB0aGUgY29ycmVzcG9uZGluZyBwa2V5IGlzIHVzZWQuICBUaGUgaXNzdWUgdGhpcw0KPiBw YXRjaCBpcyBhZGRyZXNzaW5nIGlzIHRoZSBtYXBwaW5nIG9mICdob3N0bmFtZXMnIHRvIHBrZXlz LCBhcyBzaG93IGluIHRoaXMNCj4gaWJhY21fYWRkci5jZmcgZXhhbXBsZToNCj4gPg0KPiA+ICNO YW1lICAgICAgZGV2aWNlIHBvcnQgcGtleQ0KPiA+IGNzdC1saW4wICAgbWx4NF8wIDEgICAgZGVm YXVsdA0KPiA+IGNzdC1saW4wLTEgbWx4NF8wIDEgICAgZGVmYXVsdA0KPiA+IGNzdC1saW4wLTIg bWx4NF8wIDIgICAgZGVmYXVsdA0KPiA+DQo+ID4gQ3VycmVudGx5LCAnZGVmYXVsdCcgaXMgaGFy ZC1jb2RlZCB0byBhIHBrZXkgb2YgMHhmZmZmLiAgVGhlIGludGVudCBpcyB0byBkZWZpbmUNCj4g YSBiZXR0ZXIgZGVmYXVsdCB2YWx1ZS4gIEthaWtlIGhhcyBzdWdnZXN0ZWQgdGhpcyBiZSB0aGUg bmV3IGRlZmF1bHQ6DQo+ID4NCj4gPiAxLiBGaW5kIHRoZSBmaXJzdCBub24tbWFuYWdlbWVudCBm dWxsLW1lbWJlciBwa2V5Ow0KPiANCj4gQnkgIm5vbi1tYW5hZ2VtZW50IGZ1bGwtbWVtYmVyIHBr ZXkiLCBJIHRoaW5rIHlvdSBtZWFuICJwa2V5IHdoaWNoIGlzDQo+IG90aGVyIHRoYW4gYSBmdWxs IG1lbWJlciBvZiBkZWZhdWx0ICgweDdmZmYpIHBhcnRpdGlvbiIuDQoNCkZ1bGwtbWVtYmVyIHBr ZXkgKHdpdGggYml0IDE1IHNldCBvciAweDgwMDApIG90aGVyIHRoYW4gMHhmZmZmLg0KDQo+IA0K PiA+IDIuIElmIGl0IGZhaWxzLCBmaW5kIHBrZXkgMHhmZmZmOw0KPiANCj4gVG8gbWUsIGZpbmRp bmcgcGtleSAweGZmZmYgaXMgYmV0dGVyL3NhZmVyIHRoYW4gYXNzdW1pbmcgaXQncyBpbiBpbmRl eCAwDQo+IGFsdGhvdWdoIGl0J3MgbGlrZWx5IHRoZXJlLg0KPiANCj4gPiAzLiBJZiBwa2V5IDB4 ZmZmZiBpcyBub3QgYXZhaWxhYmxlLCB1c2UgdGhlIGZpcnN0IHBrZXkuDQo+ID4NCj4gPiBJcyB0 aGVyZSBiZXR0ZXIgYWx0ZXJuYXRpdmUgZm9yIHdoYXQgZGVmYXVsdCBzaG91bGQgYmU/DQo+IA0K PiBPcmRlciBvZiAxIGFuZCAyIGRlcGVuZHMgb24gdXNlIG1vZGVscyBmb3IgZnVsbCBkZWZhdWx0 IHBhcnRpdGlvbiBhbmQgb3RoZXINCj4gcGFydGl0aW9ucy4gUmV2ZXJzaW5nIDEgYW5kIDIgKGZ1 bGwgZGVmYXVsdCBwYXJ0aXRpb24gZmlyc3QpIHdvdWxkIGhhbmRsZSB0aGUNCj4gbW9zdCBjb21t b24gdXNlIG1vZGVscyBhbmQgaGFuZGxlcyBKYXNvbidzIGNhc2UuDQoNClJldmVyc2luZyAxIGFu ZCAyIHdpbGwgY2F1c2UgcHJvYmxlbSBmb3IgdGhlIFNBIG5vZGUgaW4gYSBzZWN1cmUgZmFicmlj Og0KDQpOb2RlIDE6IDB4ODAwMSwgMHhmZmZmIChTQSBub2RlKQ0KTm9kZSAyOiAweDgwMDEsIDB4 N2ZmZg0KTm9kZSAzOiAweDgwMDEsIDB4N2ZmZg0KDQpJbiB0aGlzICBjYXNlLCBOb2RlIDEgd2ls bCB1c2UgMHhmZmZmIHdoaWxlIE5vZGVzIDIgYW5kIDMgd2lsbCB1c2UgMHg4MDAxLiBLZWVwaW5n IHRoZSBvcmRlciB3aWxsIGVuYWJsZSBhbGwgbm9kZXMgdG8gdXNlIDB4ODAwMSBhcyB0aGUgZGVm YXVsdCwgd2hpY2ggYWxzbyBoYW5kbGVzIEphc29uJ3MgY2FzZS4NCg0KS2Fpa2UNCg0K -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > 1. Find the first non-management full-member pkey; I.e. a pkey with the high-order bit set that is not 0xffff > > 2. If it fails, find pkey 0xffff; > > Order of 1 and 2 depends on use models for full default partition and > other partitions. Reversing 1 and 2 (full default partition first) would > handle the most common use models and handles Jason's case. > > The only common case that I'm aware of where that might fall down is in > the virtualized case. I'm not sure what policy is best there and would > need to think about that scenario some more (and there is more > fundamental issue with ACM in those environments). Offline I had asked about reversing 1 and 2. The reasoning given was that ibacm could be running on a 'management' node, but needed to communicate with compute nodes. IOW, the occurrence of a 'non-management, full member' pkey in the pkey table strongly indicates that a node is being used in a secure environment, and ibacm should prefer using that pkey over 0xffff. Example: Compute nodes are assigned pkeys 0x8000 and 0x7fff. A node running the job scheduler has pkeys 0xffff and 0x8000 (maybe it's also the backup SA). Ibacm would need to select pkey 0x8000 for communication. This seems like a reasonable argument to me. - Sean
On Wed, Dec 09, 2015 at 07:51:46AM -0500, Hal Rosenstock wrote: > > ipoib always uses the 0 pkey index to create the default ipoib > > interface. (see eg, update_parent_pkey) > > This is beyond IBA spec and is currently a linux convention for IPoIB. > IMO it should be changed to search for this pkey rather than assume > it's I don't think you are following. It uses pkey[0] as the pkey for ipoib, not necessarily for SA communication. Since there is no way to know what the desired pkey is for ipoib there is no possibility to search. Using pkey index is 0 a good solution since it allows the SM to configure ipoib defaults centrally. > > When operating securely the SA should place the pkey for default ipoib > > operation in pkey index 0, and place 0x7FFF in another index. I run > > alot of networks exactly like this and it works very well. > > Yes, it can run that way but more secure is without the full default > pkey. When full default pkey is in every port, the rest of the > partitioning doesn't really matter... That isn't what I said, I said the pkey for the default ipoib interface is in pkey[0], eg, the network runs with [0x8001,0x7FFF] as the pkey table. There is no 0xFFFF pkey except on SA nodes. Linux automatically creates ib0 on 0x8001 and the rest of the in-kernel stack (should?) correctly find and use 0x7FFF as the pkey to use to talk to the SA. acm should follow ipoib convention for creating it's multicast groups and setup it's default multicast group using pkey[0] Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 09, 2015 at 01:07:14PM +0000, Wan, Kaike wrote: > > > + /* Determine the default pkey index for SA access first. > > > + * Order of preference: 0xffff, 0x7fff, first pkey. > > > > No, IBA says that only the default pkey should be used to talk to the SA, > > every port needs 0x7FFF or the full mebership version. Do not search for the > > first pkey. > > We use the first pkey only if there is neither 0x7fff nor 0xffff in > this port. If the port is in compliance with IB Spec, then we will > be using either 0xffff or 0x7fff for SA access. This is just confusing for readers, the IBA spec is very clear on what pkey must be used to talk to the SA, don't ever use something else. Follow the spec. > > > + * Determine the default pkey for parsing address file as well. > > > + * order of preference: first full-member non-management pkey, > > > + * 0xffff, first pkey. > > > + */ > > > > This really should just be the 0 index pkey, which exactly matches how IPoIB > > determines the default pkey, which is what matters when talking rdmacm.. > > It is true in most default configurations. However, since ibacm will > use the default pkey for multicast, we want to make sure that it > will not use a limited-member pkey to create/join a multicast group > (practically of little use in this case) if such a pkey is placed at > index 0. If you don't follow the exact ipoib algorithm then you get different answers in some cases and ugly subtle failure modes. Ie this algorithm will not choose 0xFFFF as the pkey in cases where ipoib would - which is not acceptable, IMHO. If the 0 index pkey is not usable for ipoib then ipoib will be broken too and that is far more likely to be noticed than if acm is broken. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 09, 2015 at 05:13:49PM +0000, Hefty, Sean wrote: > Example: Compute nodes are assigned pkeys 0x8000 and 0x7fff. A node > running the job scheduler has pkeys 0xffff and 0x8000 (maybe it's > also the backup SA). Ibacm would need to select pkey 0x8000 for > communication. I've also seen the reverse, eg 0xFFFF is used for default ipoib communication and 0x8001 is assigned to only some nodes as a child vlan. Choosing 0x8001 in that case won't work either. pkey[0] at least has the logic that the admin will configure things so that the default ipoib device reaches the broadest audiance makes the most sense to me. That is what most sites I've seen want to do. I'm not quite sure what the acm algorithm is, but can't it just figure out the pkey from the IP routing? Ie if you have an IP address to resolve a few netlink queries will tell you what pkey to use, and that is where the acm multicast should go? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> pkey[0] at least has the logic that the admin will configure things so > that the default ipoib device reaches the broadest audiance makes the > most sense to me. That is what most sites I've seen want to do. Kaike, will pkey[0] work in the configurations that you're targeting with this change? This seems like a very simple solution that's better than what we have now. > I'm not quite sure what the acm algorithm is, but can't it just figure > out the pkey from the IP routing? Ie if you have an IP address to > resolve a few netlink queries will tell you what pkey to use, and that > is where the acm multicast should go? When an IP address is used, it uses the correct pkey (based on routing data). I mentioned this in a separate email, but this addresses the case when hostnames are used and ipoib may not be involved. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Hefty, Sean > Sent: Wednesday, December 09, 2015 1:37 PM > To: Jason Gunthorpe > Cc: Hal Rosenstock; Wan, Kaike; linux-rdma@vger.kernel.org > Subject: RE: [PATCH 1/1] Ibacm: default pkey for partitioned fabrics > > > pkey[0] at least has the logic that the admin will configure things so > > that the default ipoib device reaches the broadest audiance makes the > > most sense to me. That is what most sites I've seen want to do. > > Kaike, will pkey[0] work in the configurations that you're targeting with this > change? Yes. That will work for me. > > This seems like a very simple solution that's better than what we have now. > > > I'm not quite sure what the acm algorithm is, but can't it just figure > > out the pkey from the IP routing? Ie if you have an IP address to > > resolve a few netlink queries will tell you what pkey to use, and that > > is where the acm multicast should go? > > When an IP address is used, it uses the correct pkey (based on routing data). > I mentioned this in a separate email, but this addresses the case when > hostnames are used and ipoib may not be involved. > > - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/09/2015 01:22 PM, Jason Gunthorpe wrote: > On Wed, Dec 09, 2015 at 05:13:49PM +0000, Hefty, Sean wrote: > >> Example: Compute nodes are assigned pkeys 0x8000 and 0x7fff. A node >> running the job scheduler has pkeys 0xffff and 0x8000 (maybe it's >> also the backup SA). Ibacm would need to select pkey 0x8000 for >> communication. > > I've also seen the reverse, eg 0xFFFF is used for default ipoib > communication and 0x8001 is assigned to only some nodes as a child > vlan. That's what I use internally in our test lab. > Choosing 0x8001 in that case won't work either. Nope. The suggestion here would break our setup. Well, *would* being the operative word where what I mean is would if we used ibacm. But that's a story for another email...
> >> Example: Compute nodes are assigned pkeys 0x8000 and 0x7fff. A node > >> running the job scheduler has pkeys 0xffff and 0x8000 (maybe it's > >> also the backup SA). Ibacm would need to select pkey 0x8000 for > >> communication. > > > > I've also seen the reverse, eg 0xFFFF is used for default ipoib > > communication and 0x8001 is assigned to only some nodes as a child > > vlan. > > That's what I use internally in our test lab. > > > Choosing 0x8001 in that case won't work either. > > Nope. The suggestion here would break our setup. Well, *would* being > the operative word where what I mean is would if we used ibacm. But > that's a story for another email... In this case, Kaike, please change the default to just be pkey[0]. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/src/acm.c b/src/acm.c index ada0bfb..ce2797c 100644 --- a/src/acm.c +++ b/src/acm.c @@ -114,7 +114,8 @@ struct acmc_port { union ibv_gid *gid_tbl; uint16_t lid; uint16_t lid_mask; - int default_pkey_index; + int sa_pkey_index; + uint16_t def_acm_pkey; }; struct acmc_device { @@ -2009,7 +2010,7 @@ static int acm_assign_ep_names(struct acmc_ep *ep) continue; } } else { - pkey = 0xFFFF; + pkey = ep->port->def_acm_pkey; } if (!stricmp(dev_name, dev) && @@ -2202,7 +2203,11 @@ static void acm_port_up(struct acmc_port *port) uint16_t pkey; int i, ret; struct acmc_prov_context *dev_ctx; - int index = -1; + int sa_index = -1; + int full_mgmt_index = -1; + uint16_t def_pkey = 0; + int first_pkey_index = -1; + uint16_t first_pkey = 0; acm_log(1, "%s %d\n", port->dev->device.verbs->device->name, port->port.port_num); @@ -2248,24 +2253,45 @@ static void acm_port_up(struct acmc_port *port) goto err1; } - /* Determine the default pkey first. - Order of preference: 0xffff, 0x7fff, first pkey - */ + /* Determine the default pkey index for SA access first. + * Order of preference: 0xffff, 0x7fff, first pkey. + * Determine the default pkey for parsing address file as well. + * order of preference: first full-member non-management pkey, + * 0xffff, first pkey. + */ for (i = 0; i < attr.pkey_tbl_len; i++) { ret = ibv_query_pkey(port->dev->device.verbs, port->port.port_num, i, &pkey); if (ret) continue; pkey = ntohs(pkey); - if (pkey == 0xffff) { - index = i; - break; - } - else if (pkey == 0x7fff) { - index = i; + if (!(pkey & 0x7ffff)) + continue; + + if (first_pkey_index < 0) { + first_pkey_index = i; + first_pkey = pkey; } - } - port->default_pkey_index = index < 0 ? 0: index; + + if (pkey == 0xffff) { + sa_index = i; + full_mgmt_index = i; + } else if (pkey == 0x7fff) { + if (sa_index < 0) + sa_index = i; + } else if ((def_pkey == 0) && (pkey & 0x8000)) { + /* First full-member non-management pkey */ + def_pkey = pkey; + } + } + port->sa_pkey_index = (sa_index < 0) ? + first_pkey_index : sa_index; + if (def_pkey) + port->def_acm_pkey = def_pkey; + else if (full_mgmt_index >= 0) + port->def_acm_pkey = 0xffff; + else + port->def_acm_pkey = first_pkey; for (i = 0; i < attr.pkey_tbl_len; i++) { ret = ibv_query_pkey(port->dev->device.verbs, @@ -2775,7 +2801,7 @@ int acm_send_sa_mad(struct acm_sa_mad *mad) mad->umad.addr.qkey = port->sa_addr.qkey; mad->umad.addr.lid = htons(port->sa_addr.lid); mad->umad.addr.sl = port->sa_addr.sl; - mad->umad.addr.pkey_index = req->ep->port->default_pkey_index; + mad->umad.addr.pkey_index = req->ep->port->sa_pkey_index; lock_acquire(&port->lock); if (port->sa_credits && DListEmpty(&port->sa_wait)) {