diff mbox

[for-next,01/10] net/core: Add support for configuring VF GUIDs

Message ID 1456851143-138332-2-git-send-email-eli@mellanox.com (mailing list archive)
State Superseded
Headers show

Commit Message

Eli Cohen March 1, 2016, 4:52 p.m. UTC
Add two new NLAs to support configuration of Infiniband node or port
GUIDs. New applications can choose to use this interface to configure
GUIDs with iproute2 with commands such as:

ip link set dev ib0 vf 0 node_guid 00:02:c9:03:00:21:6e:70
ip link set dev ib0 vf 0 port_guid 00:02:c9:03:00:21:6e:78

For backwards compatibility, old applications which set the MAC of a VF
may set the VF's port GUID for an infiniband port also via set MAC. The
GUID will be generated from the 6-byte MAC per IETF RFC 7042. Note that
when using set MAC to set a port GUID, the node GUID is set as well (to
the port guid value).

A new ndo, ndo_sef_vf_guid is introduced to notify the net device of the
request to change the GUID.

Signed-off-by: Eli Cohen <eli@mellanox.com>
---
 include/linux/netdevice.h    |  3 ++
 include/uapi/linux/if_link.h |  7 ++++
 net/core/rtnetlink.c         | 79 ++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 83 insertions(+), 6 deletions(-)

Comments

Jason Gunthorpe March 1, 2016, 5:37 p.m. UTC | #1
On Tue, Mar 01, 2016 at 06:52:14PM +0200, Eli Cohen wrote:
> Add two new NLAs to support configuration of Infiniband node or port
> GUIDs. New applications can choose to use this interface to configure
> GUIDs with iproute2 with commands such as:
> 
> ip link set dev ib0 vf 0 node_guid 00:02:c9:03:00:21:6e:70
> ip link set dev ib0 vf 0 port_guid 00:02:c9:03:00:21:6e:78

I like this idea better than the last version..

> +static int handle_vf_mac(struct net_device *dev, struct ifla_vf_mac *ivm)
> +{
[..]
> +	return handle_infiniband_guid(dev, &ivt, IFLA_VF_IB_PORT_GUID);

But is this emulation really necessary? It seems dangerous and
continues the bad practice of assuming IFLA_VF_MAC is fixed to 6 bytes
in size, and is not just LLADDR bytes. I'd rather see mac sets fail on
IB.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eli Cohen March 1, 2016, 5:49 p.m. UTC | #2
On Tue, Mar 01, 2016 at 10:37:51AM -0700, Jason Gunthorpe wrote:
> > +	return handle_infiniband_guid(dev, &ivt, IFLA_VF_IB_PORT_GUID);
> 
> But is this emulation really necessary? It seems dangerous and
> continues the bad practice of assuming IFLA_VF_MAC is fixed to 6 bytes
> in size, and is not just LLADDR bytes. I'd rather see mac sets fail on
> IB.
> 
struct ifla_vf_mac  already defines mac as 32 bytes but the idea here
is that applications that configure six byte Ethernet MACs to VFs will
continue to work without any change. 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe March 1, 2016, 6:25 p.m. UTC | #3
On Tue, Mar 01, 2016 at 07:49:51PM +0200, Eli Cohen wrote:
> On Tue, Mar 01, 2016 at 10:37:51AM -0700, Jason Gunthorpe wrote:
> > > +	return handle_infiniband_guid(dev, &ivt, IFLA_VF_IB_PORT_GUID);
> > 
> > But is this emulation really necessary? It seems dangerous and
> > continues the bad practice of assuming IFLA_VF_MAC is fixed to 6 bytes
> > in size, and is not just LLADDR bytes. I'd rather see mac sets fail on
> > IB.
> > 
> struct ifla_vf_mac  already defines mac as 32 bytes but the idea here
> is that applications that configure six byte Ethernet MACs to VFs will
> continue to work without any change. 

In my view it is incorrect for an application to try and set a 6 byte
mac on an *infiniband* interface, the kernel should refuse to do it.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz March 1, 2016, 9:08 p.m. UTC | #4
On Tue, Mar 1, 2016 at 8:25 PM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Tue, Mar 01, 2016 at 07:49:51PM +0200, Eli Cohen wrote:
>> On Tue, Mar 01, 2016 at 10:37:51AM -0700, Jason Gunthorpe wrote:
>> > > + return handle_infiniband_guid(dev, &ivt, IFLA_VF_IB_PORT_GUID);
>> >
>> > But is this emulation really necessary? It seems dangerous and
>> > continues the bad practice of assuming IFLA_VF_MAC is fixed to 6 bytes
>> > in size, and is not just LLADDR bytes. I'd rather see mac sets fail on
>> > IB.
>> >
>> struct ifla_vf_mac  already defines mac as 32 bytes but the idea here
>> is that applications that configure six byte Ethernet MACs to VFs will
>> continue to work without any change.
>
> In my view it is incorrect for an application to try and set a 6 byte
> mac on an *infiniband* interface, the kernel should refuse to do it.

As Eli wrote, there's a well defined way to extend MAC to GUID. With
that in hand, the idea here is to allow staged/evolved support for IB
Virtualization using un-touched provisioning systems which assign VMs
with 6-byte MACs along with the fully IB aware solution where the
upper level does provision IB GUIDs.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford March 2, 2016, 4:50 p.m. UTC | #5
On 3/1/2016 4:08 PM, Or Gerlitz wrote:
> On Tue, Mar 1, 2016 at 8:25 PM, Jason Gunthorpe
> <jgunthorpe@obsidianresearch.com> wrote:
>> On Tue, Mar 01, 2016 at 07:49:51PM +0200, Eli Cohen wrote:
>>> On Tue, Mar 01, 2016 at 10:37:51AM -0700, Jason Gunthorpe wrote:
>>>>> + return handle_infiniband_guid(dev, &ivt, IFLA_VF_IB_PORT_GUID);
>>>>
>>>> But is this emulation really necessary? It seems dangerous and
>>>> continues the bad practice of assuming IFLA_VF_MAC is fixed to 6 bytes
>>>> in size, and is not just LLADDR bytes. I'd rather see mac sets fail on
>>>> IB.
>>>>
>>> struct ifla_vf_mac  already defines mac as 32 bytes but the idea here
>>> is that applications that configure six byte Ethernet MACs to VFs will
>>> continue to work without any change.
>>
>> In my view it is incorrect for an application to try and set a 6 byte
>> mac on an *infiniband* interface, the kernel should refuse to do it.
> 
> As Eli wrote, there's a well defined way to extend MAC to GUID. With
> that in hand, the idea here is to allow staged/evolved support for IB
> Virtualization using un-touched provisioning systems which assign VMs
> with 6-byte MACs

Exactly *what* provisioning system tries to set the VF_MAC on an IPoIB
interface and expects it to set the GUID of an underlying IB device?

> along with the fully IB aware solution where the
> upper level does provision IB GUIDs.

There has never been upstream support for this MAC->GUID stuff you refer
to.  I'm not convinced we should add it now versus just doing things
right, period.
Or Gerlitz March 2, 2016, 6:40 p.m. UTC | #6
On Wed, Mar 2, 2016 at 6:50 PM, Doug Ledford <dledford@redhat.com> wrote:

> Exactly *what* provisioning system tries to set the VF_MAC on an IPoIB
> interface and expects it to set the GUID of an underlying IB device?

The provisioning system need not be fully aware in all their
components this is IB here, there's PCI linkage that tells these are
VFs of this PF and they have to be used for these VMs.

>> along with the fully IB aware solution where the
>> upper level does provision IB GUIDs.

> There has never been upstream support for this MAC->GUID stuff you refer
> to.  I'm not convinced we should add it now versus just doing things
> right, period.

We **are** doing things right with the new ndo.

Using the small MAC->GUID addition, people could be using non-modified
(or almost non-modified) provisioning systems that assign SRIOV VMs
with a MACs --- just use these patches on their hosts and get DHCP
server supplying IP addresses based on the derived GUIDs (this is
supported today).
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford March 4, 2016, 2:35 p.m. UTC | #7
On 03/02/2016 01:40 PM, Or Gerlitz wrote:
> On Wed, Mar 2, 2016 at 6:50 PM, Doug Ledford <dledford@redhat.com> wrote:
> 
>> Exactly *what* provisioning system tries to set the VF_MAC on an IPoIB
>> interface and expects it to set the GUID of an underlying IB device?
> 
> The provisioning system need not be fully aware in all their
> components this is IB here, there's PCI linkage that tells these are
> VFs of this PF and they have to be used for these VMs.

If I understand you correctly, then I don't think I agree.

From what I read, I gather you mean:

libvirt can be used to control guests today, and you can list a PCI
device as "managed" and specify a MAC address (which has libvirt
assuming the device is an ethernet device).  In that case, libvirt
automatically detaches the device from the host (if attached), figures
out if it's a PF or VF, sets the MAC address using either PF or VF MAC
setting methods in ethtool, attaches the device to the guest, then
starts the guest.  And you're saying we should put the MAC->GUID
transformation into this code for IB so that libvirt can be blissfully
ignorant and people can tell libvirt it's an ethernet device with a MAC
and libvirt will treat it as such and life will be grand.

Except it won't.  Along with setting the GUID, we also need to set the
P_Keys allowed list (at least using the alias GUIDs method of mlx4 you
do, so unless you add a patch to this series to switch mlx4 to this new
method, that's a valid concern).  And nothing in libvirt can do that as
long libvirt thinks this is ethernet because libvirt doesn't control the
vlans on a guest's ethernet device, the guest does.  In that sense, IB
and ethernet vary greatly.

So, at *best*, the solution you are suggesting for existing setups is a
partial solution that leaves things only half done.

I don't see the justification to clutter up upstream code with a
solution that isn't at least completely functional in its
implementation.  If the solution is only partial, then I would rather
leave it out and tell people to upgrade their libvirt to know about IB
devices.

This is actually further backed up, in my mind, by the fact that you can
have RoCE/iWARP Ethernet devices and regular Ethernet devices, and
libvirt needs to be taught the concept of an RDMA capable device,
whether Ethernet or IB, so that when trying to select a host for
migration it can make sure that the migration target has the same
capabilities as the hardware you are migrating from.  So, to me, doing
this right *requires* a libvirt upgrade, and there is no sense in this
middle ground GUID from MAC hack that you are suggesting.

>>> along with the fully IB aware solution where the
>>> upper level does provision IB GUIDs.
> 
>> There has never been upstream support for this MAC->GUID stuff you refer
>> to.  I'm not convinced we should add it now versus just doing things
>> right, period.
> 
> We **are** doing things right with the new ndo.
> 
> Using the small MAC->GUID addition, people could be using non-modified
> (or almost non-modified) provisioning systems that assign SRIOV VMs
> with a MACs --- just use these patches on their hosts and get DHCP
> server supplying IP addresses based on the derived GUIDs (this is
> supported today).
>
Or Gerlitz March 7, 2016, 7:23 a.m. UTC | #8
On Fri, Mar 4, 2016 at 4:35 PM, Doug Ledford <dledford@redhat.com> wrote:

[...]
> So, at *best*, the solution you are suggesting for existing setups is a
> partial solution that leaves things only half done.
[...]

Doug,

Point/s taken, justifying the code re-route of libvirt attempting to
issue set_vf_mac on the PF to go through
set_vf_guid needs handling of more aspects. We will remove it from this patch.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eugene Syromiatnikov Oct. 26, 2021, 3:16 p.m. UTC | #9
On Tue, Mar 01, 2016 at 06:52:14PM +0200, Eli Cohen wrote:
> +struct ifla_vf_guid {
> +	__u32 vf;
> +	__u64 guid;
> +};

This type definition differs in size on 64-bit and (most of) 32-bit
architectures, and it breaks 32-on-64-bit compat applications, as a result.
diff mbox

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5440b7b705eb..7b4ae218b90b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1147,6 +1147,9 @@  struct net_device_ops {
 						   struct nlattr *port[]);
 	int			(*ndo_get_vf_port)(struct net_device *dev,
 						   int vf, struct sk_buff *skb);
+	int			(*ndo_set_vf_guid)(struct net_device *dev,
+						   int vf, u64 guid,
+						   int guid_type);
 	int			(*ndo_set_vf_rss_query_en)(
 						   struct net_device *dev,
 						   int vf, bool setting);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index a30b78090594..1d01e8a4e5dd 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -556,6 +556,8 @@  enum {
 				 */
 	IFLA_VF_STATS,		/* network device statistics */
 	IFLA_VF_TRUST,		/* Trust VF */
+	IFLA_VF_IB_NODE_GUID,	/* VF Infiniband node GUID */
+	IFLA_VF_IB_PORT_GUID,	/* VF Infiniband port GUID */
 	__IFLA_VF_MAX,
 };
 
@@ -588,6 +590,11 @@  struct ifla_vf_spoofchk {
 	__u32 setting;
 };
 
+struct ifla_vf_guid {
+	__u32 vf;
+	__u64 guid;
+};
+
 enum {
 	IFLA_VF_LINK_STATE_AUTO,	/* link state of the uplink */
 	IFLA_VF_LINK_STATE_ENABLE,	/* link always up */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d735e854f916..9db6e5bde786 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1387,6 +1387,8 @@  static const struct nla_policy ifla_vf_policy[IFLA_VF_MAX+1] = {
 	[IFLA_VF_RSS_QUERY_EN]	= { .len = sizeof(struct ifla_vf_rss_query_en) },
 	[IFLA_VF_STATS]		= { .type = NLA_NESTED },
 	[IFLA_VF_TRUST]		= { .len = sizeof(struct ifla_vf_trust) },
+	[IFLA_VF_IB_NODE_GUID]	= { .len = sizeof(struct ifla_vf_guid) },
+	[IFLA_VF_IB_PORT_GUID]	= { .len = sizeof(struct ifla_vf_guid) },
 };
 
 static const struct nla_policy ifla_vf_stats_policy[IFLA_VF_STATS_MAX + 1] = {
@@ -1534,6 +1536,58 @@  static int validate_linkmsg(struct net_device *dev, struct nlattr *tb[])
 	return 0;
 }
 
+static int handle_infiniband_guid(struct net_device *dev, struct ifla_vf_guid *ivt,
+				  int guid_type)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	return ops->ndo_set_vf_guid(dev, ivt->vf, ivt->guid, guid_type);
+}
+
+static int handle_vf_guid(struct net_device *dev, struct ifla_vf_guid *ivt, int guid_type)
+{
+	if (dev->type != ARPHRD_INFINIBAND)
+		return -EOPNOTSUPP;
+
+	return handle_infiniband_guid(dev, ivt, guid_type);
+}
+
+static int handle_vf_mac(struct net_device *dev, struct ifla_vf_mac *ivm)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	struct ifla_vf_guid ivt;
+	u8 *s = ivm->mac;
+	u8 d[8];
+	int err;
+
+	if (dev->type != ARPHRD_INFINIBAND) {
+		if (!ops->ndo_set_vf_mac)
+			return -EOPNOTSUPP;
+
+		return ops->ndo_set_vf_mac(dev, ivm->vf, ivm->mac);
+	}
+
+	if (!ops->ndo_set_vf_guid)
+		return -EOPNOTSUPP;
+
+	d[0] = s[0];
+	d[1] = s[1];
+	d[2] = s[2];
+	d[3] = 0xff;
+	d[4] = 0xfe;
+	d[5] = s[3];
+	d[6] = s[4];
+	d[7] = s[5];
+
+	ivt.vf = ivm->vf;
+	ivt.guid = be64_to_cpu(*(__be64 *)d);
+	err = handle_infiniband_guid(dev, &ivt, IFLA_VF_IB_NODE_GUID);
+	if (err)
+		return err;
+
+	return handle_infiniband_guid(dev, &ivt, IFLA_VF_IB_PORT_GUID);
+}
+
 static int do_setvfinfo(struct net_device *dev, struct nlattr **tb)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
@@ -1542,12 +1596,7 @@  static int do_setvfinfo(struct net_device *dev, struct nlattr **tb)
 	if (tb[IFLA_VF_MAC]) {
 		struct ifla_vf_mac *ivm = nla_data(tb[IFLA_VF_MAC]);
 
-		err = -EOPNOTSUPP;
-		if (ops->ndo_set_vf_mac)
-			err = ops->ndo_set_vf_mac(dev, ivm->vf,
-						  ivm->mac);
-		if (err < 0)
-			return err;
+		return handle_vf_mac(dev, ivm);
 	}
 
 	if (tb[IFLA_VF_VLAN]) {
@@ -1636,6 +1685,24 @@  static int do_setvfinfo(struct net_device *dev, struct nlattr **tb)
 			return err;
 	}
 
+	if (tb[IFLA_VF_IB_NODE_GUID]) {
+		struct ifla_vf_guid *ivt = nla_data(tb[IFLA_VF_IB_NODE_GUID]);
+
+		if (!ops->ndo_set_vf_guid)
+			return -EOPNOTSUPP;
+
+		return handle_vf_guid(dev, ivt, IFLA_VF_IB_NODE_GUID);
+	}
+
+	if (tb[IFLA_VF_IB_PORT_GUID]) {
+		struct ifla_vf_guid *ivt = nla_data(tb[IFLA_VF_IB_PORT_GUID]);
+
+		if (!ops->ndo_set_vf_guid)
+			return -EOPNOTSUPP;
+
+		return handle_vf_guid(dev, ivt, IFLA_VF_IB_PORT_GUID);
+	}
+
 	return err;
 }