diff mbox

[for-next,03/10] IB/core: Support accessing SA in virtualized environment

Message ID 1456851143-138332-4-git-send-email-eli@mellanox.com (mailing list archive)
State Superseded
Headers show

Commit Message

Eli Cohen March 1, 2016, 4:52 p.m. UTC
Per the ongoing standardisation process, when virtual HCAs are present
in a network, traffic is routed based on a destination GID. In order to
access the SA we use the well known SA GID.

We also add a GRH required boolean field to the port attributes which is
used to report to the verbs consumer whether this port is connected to a
virtual network. We use this field to realize whether we need to create
an address vector with GRH to access the subnet administrator. We clear
the port attributes struct before calling the hardware driver to make
sure the default remains that GRH is not required.

Signed-off-by: Eli Cohen <eli@mellanox.com>
---
 drivers/infiniband/core/device.c   | 1 +
 drivers/infiniband/core/sa_query.c | 5 +++++
 include/rdma/ib_verbs.h            | 6 ++++++
 3 files changed, 12 insertions(+)

Comments

Jason Gunthorpe March 1, 2016, 5:44 p.m. UTC | #1
On Tue, Mar 01, 2016 at 06:52:16PM +0200, Eli Cohen wrote:
> Per the ongoing standardisation process, when virtual HCAs are present
> in a network, traffic is routed based on a destination GID. In order to
> access the SA we use the well known SA GID.

Should we be merging patches based on on-going standards work?

> +		ah_attr.ah_flags = IB_AH_GRH;
> +		ah_attr.grh.dgid.global.subnet_prefix = cpu_to_be64(IB_SA_WELL_KNOWN_GID_PREFIX);
> +		ah_attr.grh.dgid.global.interface_id = cpu_to_be64(IB_SA_WELL_KNOWN_GUID);

I'm surprised this hard wired to fe80::2 - surely this should use
the current subnet prefix? There should be no traffic prefixed with
fe80:: if a subnet is configured with another prefix.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eli Cohen March 1, 2016, 6:17 p.m. UTC | #2
On Tue, Mar 01, 2016 at 10:44:01AM -0700, Jason Gunthorpe wrote:
> On Tue, Mar 01, 2016 at 06:52:16PM +0200, Eli Cohen wrote:
> > Per the ongoing standardisation process, when virtual HCAs are present
> > in a network, traffic is routed based on a destination GID. In order to
> > access the SA we use the well known SA GID.
> 
> Should we be merging patches based on on-going standards work?
The spec is in a very advanced state so I think it makes sense to
merge.

> 
> > +		ah_attr.ah_flags = IB_AH_GRH;
> > +		ah_attr.grh.dgid.global.subnet_prefix = cpu_to_be64(IB_SA_WELL_KNOWN_GID_PREFIX);
> > +		ah_attr.grh.dgid.global.interface_id = cpu_to_be64(IB_SA_WELL_KNOWN_GUID);
> 
> I'm surprised this hard wired to fe80::2 - surely this should use
> the current subnet prefix? There should be no traffic prefixed with
> fe80:: if a subnet is configured with another prefix.
> 
Hmm... seems like no such thing IB_SA_WELL_KNOWN_GID_PREFIX. The
subnet prefix is part of the port info but struct ib_port_attr does
not define a field for the subnet prefix. How do you think it should
be obtained?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe March 1, 2016, 6:32 p.m. UTC | #3
On Tue, Mar 01, 2016 at 08:17:42PM +0200, Eli Cohen wrote:
> On Tue, Mar 01, 2016 at 10:44:01AM -0700, Jason Gunthorpe wrote:
> > On Tue, Mar 01, 2016 at 06:52:16PM +0200, Eli Cohen wrote:
> > > Per the ongoing standardisation process, when virtual HCAs are present
> > > in a network, traffic is routed based on a destination GID. In order to
> > > access the SA we use the well known SA GID.
> > 
> > Should we be merging patches based on on-going standards work?
> The spec is in a very advanced state so I think it makes sense to
> merge.

How does this interact with the existing SRIOV stuff? It is already
very annoying that mlx5 is incompatible with the old scheme. Is the
proposal to shift all IB to this new scheme, or still keep mlx4/mlx5 with
different approaches?

We can't just have the kernel become incompatible with existing
SMs. Eg opensm only supports the old scheme last I looked.

> > > +		ah_attr.ah_flags = IB_AH_GRH;
> > > +		ah_attr.grh.dgid.global.subnet_prefix = cpu_to_be64(IB_SA_WELL_KNOWN_GID_PREFIX);
> > > +		ah_attr.grh.dgid.global.interface_id = cpu_to_be64(IB_SA_WELL_KNOWN_GUID);
> > 
> > I'm surprised this hard wired to fe80::2 - surely this should use
> > the current subnet prefix? There should be no traffic prefixed with
> > fe80:: if a subnet is configured with another prefix.
> > 
> Hmm... seems like no such thing IB_SA_WELL_KNOWN_GID_PREFIX. The
> subnet prefix is part of the port info but struct ib_port_attr does
> not define a field for the subnet prefix. How do you think it should
> be obtained?

It makes sense to add it to ib_port_attr, the code already has to call
that to get the sm_lid.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eli Cohen March 1, 2016, 7:07 p.m. UTC | #4
On Tue, Mar 01, 2016 at 11:32:56AM -0700, Jason Gunthorpe wrote:
> 
> How does this interact with the existing SRIOV stuff? It is already
> very annoying that mlx5 is incompatible with the old scheme. Is the
> proposal to shift all IB to this new scheme, or still keep mlx4/mlx5 with
> different approaches?

Now we have proper interfaces so we can use these interfaces to
implement the same functionality in mlx4 which is there but accessible
in a different manner.
> 
> We can't just have the kernel become incompatible with existing
> SMs. Eg opensm only supports the old scheme last I looked.
> 
You need a SM which support virtualizaion to have virtualization
supported but if you don't you can still work with physical functions
in the same way you did before so we don't break anything, we just
adding new functionality.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe March 1, 2016, 7:31 p.m. UTC | #5
On Tue, Mar 01, 2016 at 09:07:42PM +0200, Eli Cohen wrote:
> On Tue, Mar 01, 2016 at 11:32:56AM -0700, Jason Gunthorpe wrote:
> > 
> > How does this interact with the existing SRIOV stuff? It is already
> > very annoying that mlx5 is incompatible with the old scheme. Is the
> > proposal to shift all IB to this new scheme, or still keep mlx4/mlx5 with
> > different approaches?
> 
> Now we have proper interfaces so we can use these interfaces to
> implement the same functionality in mlx4 which is there but accessible
> in a different manner.

> > We can't just have the kernel become incompatible with existing
> > SMs. Eg opensm only supports the old scheme last I looked.
>
> You need a SM which support virtualizaion to have virtualization
> supported but if you don't you can still work with physical functions
> in the same way you did before so we don't break anything, we just
> adding new functionality.

I mean opensm supports the GUID Alias scheme for virtualization, this
new virtualization scheme is not compatible, and we shouldn't have the
kernel drop support for existing working SMs, by, eg, replacing the
mlx4 guid alias scheme with this new scheme.

I'm guessing a user controlled switch is going to be necessary here to
pick GUID alias or port port virtualization.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eli Cohen March 1, 2016, 7:46 p.m. UTC | #6
On Tue, Mar 01, 2016 at 12:31:53PM -0700, Jason Gunthorpe wrote:
> 
> I mean opensm supports the GUID Alias scheme for virtualization, this
> new virtualization scheme is not compatible, and we shouldn't have the
> kernel drop support for existing working SMs, by, eg, replacing the
> mlx4 guid alias scheme with this new scheme.
> 
> I'm guessing a user controlled switch is going to be necessary here to
> pick GUID alias or port port virtualization.
> 

The alias GUID mechanism remains and can be used with mlx4 devices.
With this scheme the admin configures the port and node GUIDs using
iprout2 which ends up in the hardware driver configuring the deivce. A
virtualization aware SM can read this configuration through MADs.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz March 1, 2016, 9:15 p.m. UTC | #7
On Tue, Mar 1, 2016 at 9:46 PM, Eli Cohen <eli@mellanox.com> wrote:
> On Tue, Mar 01, 2016 at 12:31:53PM -0700, Jason Gunthorpe wrote:

>> I mean opensm supports the GUID Alias scheme for virtualization, this
>> new virtualization scheme is not compatible, and we shouldn't have the
>> kernel drop support for existing working SMs, by, eg, replacing the
>> mlx4 guid alias scheme with this new scheme.

[...]

> The alias GUID mechanism remains and can be used with mlx4 devices.
> With this scheme the admin configures the port and node GUIDs using
> iproute2 which ends up in the hardware driver configuring the device.

Indeed. The provisioning through standard means (rtnetlink, iproute2)
would work for both mlx4 (uses Alias GUIDs) and mlx5 (uses
vHCA/vGUIDs).

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford March 4, 2016, 2:37 p.m. UTC | #8
On 03/01/2016 02:46 PM, Eli Cohen wrote:
> On Tue, Mar 01, 2016 at 12:31:53PM -0700, Jason Gunthorpe wrote:
>>
>> I mean opensm supports the GUID Alias scheme for virtualization, this
>> new virtualization scheme is not compatible, and we shouldn't have the
>> kernel drop support for existing working SMs, by, eg, replacing the
>> mlx4 guid alias scheme with this new scheme.
>>
>> I'm guessing a user controlled switch is going to be necessary here to
>> pick GUID alias or port port virtualization.
>>
> 
> The alias GUID mechanism remains and can be used with mlx4 devices.
> With this scheme the admin configures the port and node GUIDs using
> iprout2 which ends up in the hardware driver configuring the deivce. A
> virtualization aware SM can read this configuration through MADs.
> 

If the alias GUID mechanism is to be retained, then we need another NDO
entry point for setting P_Keys on alias GUID VFs.  If we are going to
switch over to using iproute2, then the solution needs to be complete.
Or Gerlitz March 7, 2016, 7:52 a.m. UTC | #9
On Fri, Mar 4, 2016 at 4:37 PM, Doug Ledford <dledford@redhat.com> wrote:
> On 03/01/2016 02:46 PM, Eli Cohen wrote:
>> On Tue, Mar 01, 2016 at 12:31:53PM -0700, Jason Gunthorpe wrote:
>>>
>>> I mean opensm supports the GUID Alias scheme for virtualization, this
>>> new virtualization scheme is not compatible, and we shouldn't have the
>>> kernel drop support for existing working SMs, by, eg, replacing the
>>> mlx4 guid alias scheme with this new scheme.
>>>
>>> I'm guessing a user controlled switch is going to be necessary here to
>>> pick GUID alias or port port virtualization.

>> The alias GUID mechanism remains and can be used with mlx4 devices.
>> With this scheme the admin configures the port and node GUIDs using
>> iprout2 which ends up in the hardware driver configuring the deivce. A
>> virtualization aware SM can read this configuration through MADs.

> If the alias GUID mechanism is to be retained, then we need another NDO
> entry point for setting P_Keys on alias GUID VFs.  If we are going to
> switch over to using iproute2, then the solution needs to be complete.

Doug, as you commented on the net/core patch, iproute2 et al can't be
really used today to build mlx4 based SRIOV cloud systems. This series
doesn't introduce regressions for the way mlx4 setups are made.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 00da80e02154..24926acd4bcd 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -652,6 +652,7 @@  int ib_query_port(struct ib_device *device,
 	if (port_num < rdma_start_port(device) || port_num > rdma_end_port(device))
 		return -EINVAL;
 
+	memset(port_attr, 0, sizeof(*port_attr));
 	return device->query_port(device, port_num, port_attr);
 }
 EXPORT_SYMBOL(ib_query_port);
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index f334090bb612..833d2a99a311 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -886,6 +886,11 @@  static void update_sm_ah(struct work_struct *work)
 	ah_attr.dlid     = port_attr.sm_lid;
 	ah_attr.sl       = port_attr.sm_sl;
 	ah_attr.port_num = port->port_num;
+	if (port_attr.grh_required) {
+		ah_attr.ah_flags = IB_AH_GRH;
+		ah_attr.grh.dgid.global.subnet_prefix = cpu_to_be64(IB_SA_WELL_KNOWN_GID_PREFIX);
+		ah_attr.grh.dgid.global.interface_id = cpu_to_be64(IB_SA_WELL_KNOWN_GUID);
+	}
 
 	new_ah->ah = ib_create_ah(port->agent->qp->pd, &ah_attr);
 	if (IS_ERR(new_ah->ah)) {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 284b00c8fea4..5c1f11742c65 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -97,6 +97,11 @@  enum rdma_node_type {
 	RDMA_NODE_USNIC_UDP,
 };
 
+enum {
+	IB_SA_WELL_KNOWN_GID_PREFIX	= 0xfe80000000000000ull,
+	IB_SA_WELL_KNOWN_GUID		= 2,
+};
+
 enum rdma_transport_type {
 	RDMA_TRANSPORT_IB,
 	RDMA_TRANSPORT_IWARP,
@@ -508,6 +513,7 @@  struct ib_port_attr {
 	u8			active_width;
 	u8			active_speed;
 	u8                      phys_state;
+	bool			grh_required;
 };
 
 enum ib_device_modify_flags {