Message ID | 1456851143-138332-4-git-send-email-eli@mellanox.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Tue, Mar 01, 2016 at 06:52:16PM +0200, Eli Cohen wrote: > Per the ongoing standardisation process, when virtual HCAs are present > in a network, traffic is routed based on a destination GID. In order to > access the SA we use the well known SA GID. Should we be merging patches based on on-going standards work? > + ah_attr.ah_flags = IB_AH_GRH; > + ah_attr.grh.dgid.global.subnet_prefix = cpu_to_be64(IB_SA_WELL_KNOWN_GID_PREFIX); > + ah_attr.grh.dgid.global.interface_id = cpu_to_be64(IB_SA_WELL_KNOWN_GUID); I'm surprised this hard wired to fe80::2 - surely this should use the current subnet prefix? There should be no traffic prefixed with fe80:: if a subnet is configured with another prefix. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 01, 2016 at 10:44:01AM -0700, Jason Gunthorpe wrote: > On Tue, Mar 01, 2016 at 06:52:16PM +0200, Eli Cohen wrote: > > Per the ongoing standardisation process, when virtual HCAs are present > > in a network, traffic is routed based on a destination GID. In order to > > access the SA we use the well known SA GID. > > Should we be merging patches based on on-going standards work? The spec is in a very advanced state so I think it makes sense to merge. > > > + ah_attr.ah_flags = IB_AH_GRH; > > + ah_attr.grh.dgid.global.subnet_prefix = cpu_to_be64(IB_SA_WELL_KNOWN_GID_PREFIX); > > + ah_attr.grh.dgid.global.interface_id = cpu_to_be64(IB_SA_WELL_KNOWN_GUID); > > I'm surprised this hard wired to fe80::2 - surely this should use > the current subnet prefix? There should be no traffic prefixed with > fe80:: if a subnet is configured with another prefix. > Hmm... seems like no such thing IB_SA_WELL_KNOWN_GID_PREFIX. The subnet prefix is part of the port info but struct ib_port_attr does not define a field for the subnet prefix. How do you think it should be obtained? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 01, 2016 at 08:17:42PM +0200, Eli Cohen wrote: > On Tue, Mar 01, 2016 at 10:44:01AM -0700, Jason Gunthorpe wrote: > > On Tue, Mar 01, 2016 at 06:52:16PM +0200, Eli Cohen wrote: > > > Per the ongoing standardisation process, when virtual HCAs are present > > > in a network, traffic is routed based on a destination GID. In order to > > > access the SA we use the well known SA GID. > > > > Should we be merging patches based on on-going standards work? > The spec is in a very advanced state so I think it makes sense to > merge. How does this interact with the existing SRIOV stuff? It is already very annoying that mlx5 is incompatible with the old scheme. Is the proposal to shift all IB to this new scheme, or still keep mlx4/mlx5 with different approaches? We can't just have the kernel become incompatible with existing SMs. Eg opensm only supports the old scheme last I looked. > > > + ah_attr.ah_flags = IB_AH_GRH; > > > + ah_attr.grh.dgid.global.subnet_prefix = cpu_to_be64(IB_SA_WELL_KNOWN_GID_PREFIX); > > > + ah_attr.grh.dgid.global.interface_id = cpu_to_be64(IB_SA_WELL_KNOWN_GUID); > > > > I'm surprised this hard wired to fe80::2 - surely this should use > > the current subnet prefix? There should be no traffic prefixed with > > fe80:: if a subnet is configured with another prefix. > > > Hmm... seems like no such thing IB_SA_WELL_KNOWN_GID_PREFIX. The > subnet prefix is part of the port info but struct ib_port_attr does > not define a field for the subnet prefix. How do you think it should > be obtained? It makes sense to add it to ib_port_attr, the code already has to call that to get the sm_lid. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 01, 2016 at 11:32:56AM -0700, Jason Gunthorpe wrote: > > How does this interact with the existing SRIOV stuff? It is already > very annoying that mlx5 is incompatible with the old scheme. Is the > proposal to shift all IB to this new scheme, or still keep mlx4/mlx5 with > different approaches? Now we have proper interfaces so we can use these interfaces to implement the same functionality in mlx4 which is there but accessible in a different manner. > > We can't just have the kernel become incompatible with existing > SMs. Eg opensm only supports the old scheme last I looked. > You need a SM which support virtualizaion to have virtualization supported but if you don't you can still work with physical functions in the same way you did before so we don't break anything, we just adding new functionality. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 01, 2016 at 09:07:42PM +0200, Eli Cohen wrote: > On Tue, Mar 01, 2016 at 11:32:56AM -0700, Jason Gunthorpe wrote: > > > > How does this interact with the existing SRIOV stuff? It is already > > very annoying that mlx5 is incompatible with the old scheme. Is the > > proposal to shift all IB to this new scheme, or still keep mlx4/mlx5 with > > different approaches? > > Now we have proper interfaces so we can use these interfaces to > implement the same functionality in mlx4 which is there but accessible > in a different manner. > > We can't just have the kernel become incompatible with existing > > SMs. Eg opensm only supports the old scheme last I looked. > > You need a SM which support virtualizaion to have virtualization > supported but if you don't you can still work with physical functions > in the same way you did before so we don't break anything, we just > adding new functionality. I mean opensm supports the GUID Alias scheme for virtualization, this new virtualization scheme is not compatible, and we shouldn't have the kernel drop support for existing working SMs, by, eg, replacing the mlx4 guid alias scheme with this new scheme. I'm guessing a user controlled switch is going to be necessary here to pick GUID alias or port port virtualization. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 01, 2016 at 12:31:53PM -0700, Jason Gunthorpe wrote: > > I mean opensm supports the GUID Alias scheme for virtualization, this > new virtualization scheme is not compatible, and we shouldn't have the > kernel drop support for existing working SMs, by, eg, replacing the > mlx4 guid alias scheme with this new scheme. > > I'm guessing a user controlled switch is going to be necessary here to > pick GUID alias or port port virtualization. > The alias GUID mechanism remains and can be used with mlx4 devices. With this scheme the admin configures the port and node GUIDs using iprout2 which ends up in the hardware driver configuring the deivce. A virtualization aware SM can read this configuration through MADs. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 1, 2016 at 9:46 PM, Eli Cohen <eli@mellanox.com> wrote: > On Tue, Mar 01, 2016 at 12:31:53PM -0700, Jason Gunthorpe wrote: >> I mean opensm supports the GUID Alias scheme for virtualization, this >> new virtualization scheme is not compatible, and we shouldn't have the >> kernel drop support for existing working SMs, by, eg, replacing the >> mlx4 guid alias scheme with this new scheme. [...] > The alias GUID mechanism remains and can be used with mlx4 devices. > With this scheme the admin configures the port and node GUIDs using > iproute2 which ends up in the hardware driver configuring the device. Indeed. The provisioning through standard means (rtnetlink, iproute2) would work for both mlx4 (uses Alias GUIDs) and mlx5 (uses vHCA/vGUIDs). Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/01/2016 02:46 PM, Eli Cohen wrote: > On Tue, Mar 01, 2016 at 12:31:53PM -0700, Jason Gunthorpe wrote: >> >> I mean opensm supports the GUID Alias scheme for virtualization, this >> new virtualization scheme is not compatible, and we shouldn't have the >> kernel drop support for existing working SMs, by, eg, replacing the >> mlx4 guid alias scheme with this new scheme. >> >> I'm guessing a user controlled switch is going to be necessary here to >> pick GUID alias or port port virtualization. >> > > The alias GUID mechanism remains and can be used with mlx4 devices. > With this scheme the admin configures the port and node GUIDs using > iprout2 which ends up in the hardware driver configuring the deivce. A > virtualization aware SM can read this configuration through MADs. > If the alias GUID mechanism is to be retained, then we need another NDO entry point for setting P_Keys on alias GUID VFs. If we are going to switch over to using iproute2, then the solution needs to be complete.
On Fri, Mar 4, 2016 at 4:37 PM, Doug Ledford <dledford@redhat.com> wrote: > On 03/01/2016 02:46 PM, Eli Cohen wrote: >> On Tue, Mar 01, 2016 at 12:31:53PM -0700, Jason Gunthorpe wrote: >>> >>> I mean opensm supports the GUID Alias scheme for virtualization, this >>> new virtualization scheme is not compatible, and we shouldn't have the >>> kernel drop support for existing working SMs, by, eg, replacing the >>> mlx4 guid alias scheme with this new scheme. >>> >>> I'm guessing a user controlled switch is going to be necessary here to >>> pick GUID alias or port port virtualization. >> The alias GUID mechanism remains and can be used with mlx4 devices. >> With this scheme the admin configures the port and node GUIDs using >> iprout2 which ends up in the hardware driver configuring the deivce. A >> virtualization aware SM can read this configuration through MADs. > If the alias GUID mechanism is to be retained, then we need another NDO > entry point for setting P_Keys on alias GUID VFs. If we are going to > switch over to using iproute2, then the solution needs to be complete. Doug, as you commented on the net/core patch, iproute2 et al can't be really used today to build mlx4 based SRIOV cloud systems. This series doesn't introduce regressions for the way mlx4 setups are made. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 00da80e02154..24926acd4bcd 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -652,6 +652,7 @@ int ib_query_port(struct ib_device *device, if (port_num < rdma_start_port(device) || port_num > rdma_end_port(device)) return -EINVAL; + memset(port_attr, 0, sizeof(*port_attr)); return device->query_port(device, port_num, port_attr); } EXPORT_SYMBOL(ib_query_port); diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index f334090bb612..833d2a99a311 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -886,6 +886,11 @@ static void update_sm_ah(struct work_struct *work) ah_attr.dlid = port_attr.sm_lid; ah_attr.sl = port_attr.sm_sl; ah_attr.port_num = port->port_num; + if (port_attr.grh_required) { + ah_attr.ah_flags = IB_AH_GRH; + ah_attr.grh.dgid.global.subnet_prefix = cpu_to_be64(IB_SA_WELL_KNOWN_GID_PREFIX); + ah_attr.grh.dgid.global.interface_id = cpu_to_be64(IB_SA_WELL_KNOWN_GUID); + } new_ah->ah = ib_create_ah(port->agent->qp->pd, &ah_attr); if (IS_ERR(new_ah->ah)) { diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 284b00c8fea4..5c1f11742c65 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -97,6 +97,11 @@ enum rdma_node_type { RDMA_NODE_USNIC_UDP, }; +enum { + IB_SA_WELL_KNOWN_GID_PREFIX = 0xfe80000000000000ull, + IB_SA_WELL_KNOWN_GUID = 2, +}; + enum rdma_transport_type { RDMA_TRANSPORT_IB, RDMA_TRANSPORT_IWARP, @@ -508,6 +513,7 @@ struct ib_port_attr { u8 active_width; u8 active_speed; u8 phys_state; + bool grh_required; }; enum ib_device_modify_flags {
Per the ongoing standardisation process, when virtual HCAs are present in a network, traffic is routed based on a destination GID. In order to access the SA we use the well known SA GID. We also add a GRH required boolean field to the port attributes which is used to report to the verbs consumer whether this port is connected to a virtual network. We use this field to realize whether we need to create an address vector with GRH to access the subnet administrator. We clear the port attributes struct before calling the hardware driver to make sure the default remains that GRH is not required. Signed-off-by: Eli Cohen <eli@mellanox.com> --- drivers/infiniband/core/device.c | 1 + drivers/infiniband/core/sa_query.c | 5 +++++ include/rdma/ib_verbs.h | 6 ++++++ 3 files changed, 12 insertions(+)