diff mbox series

[net-next,v2,07/12] rtnetlink: add new rtm tunnel api for tunnel id filtering

Message ID 20220222025230.2119189-8-roopa@nvidia.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series vxlan metadata device vnifiltering support | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4827 this patch: 4827
netdev/cc_maintainers warning 7 maintainers not CCed: stephen.smalley.work@gmail.com petrm@nvidia.com selinux@vger.kernel.org eparis@parisplace.org me@cooperlees.com paul@paul-moore.com matt@codeconstruct.com.au
netdev/build_clang success Errors and warnings before: 822 this patch: 822
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4982 this patch: 4982
netdev/checkpatch warning CHECK: Please use a blank line after function/struct/union/enum declarations
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Roopa Prabhu Feb. 22, 2022, 2:52 a.m. UTC
This patch adds new rtm tunnel msg and api for tunnel id
filtering in dst_metadata devices. First dst_metadata
device to use the api is vxlan driver with AF_BRIDGE
family.

This and later changes add ability in vxlan driver to do
tunnel id filtering (or vni filtering) on dst_metadata
devices. This is similar to vlan api in the vlan filtering bridge.

this patch includes selinux nlmsg_route_perms support for RTM_*TUNNEL
api from Benjamin Poirier.

Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
---
 include/uapi/linux/if_link.h   | 26 ++++++++++++++++++++++++++
 include/uapi/linux/rtnetlink.h |  9 +++++++++
 security/selinux/nlmsgtab.c    |  5 ++++-
 3 files changed, 39 insertions(+), 1 deletion(-)

Comments

Jakub Kicinski Feb. 23, 2022, 1:26 a.m. UTC | #1
On Tue, 22 Feb 2022 02:52:25 +0000 Roopa Prabhu wrote:
> +	RTM_NEWTUNNEL = 120,
> +#define RTM_NEWTUNNEL	RTM_NEWTUNNEL
> +	RTM_DELTUNNEL,
> +#define RTM_DELTUNNEL	RTM_DELTUNNEL
> +	RTM_GETTUNNEL,
> +#define RTM_GETTUNNEL	RTM_GETTUNNEL

Why create new RTM_ commands instead of using changelink?

I thought we had to add special commands for bridge because
if the target of the command is not a bridge device but possibly 
a bridge port, which could be anything. That's not the case here.

Is it only about the convenience of add/del vs changelink where
we'd potentially have to pass and parse the entire vni list each time?
Roopa Prabhu Feb. 23, 2022, 2:49 a.m. UTC | #2
On 2/22/22 5:26 PM, Jakub Kicinski wrote:
> On Tue, 22 Feb 2022 02:52:25 +0000 Roopa Prabhu wrote:
>> +	RTM_NEWTUNNEL = 120,
>> +#define RTM_NEWTUNNEL	RTM_NEWTUNNEL
>> +	RTM_DELTUNNEL,
>> +#define RTM_DELTUNNEL	RTM_DELTUNNEL
>> +	RTM_GETTUNNEL,
>> +#define RTM_GETTUNNEL	RTM_GETTUNNEL
> Why create new RTM_ commands instead of using changelink?
>
> I thought we had to add special commands for bridge because
> if the target of the command is not a bridge device but possibly
> a bridge port, which could be anything. That's not the case here.
>
> Is it only about the convenience of add/del vs changelink where
> we'd potentially have to pass and parse the entire vni list each time?

yes, exactly. that's the reason. My first internal version used 
changelink and soon realized it was too limiting.

especially notifications. Its too heavy to notify the full vni list 
every-time.

IIRC bridge also went through a similar transition. Now bridge also has 
RTM_*VLAN commands.

Couldn't think of another way than adding a new msg. Tried to keep the 
name generic for use by potentially other dst/collect metadata devices
Jakub Kicinski Feb. 23, 2022, 3:50 a.m. UTC | #3
On Tue, 22 Feb 2022 18:49:03 -0800 Roopa Prabhu wrote:
> On 2/22/22 5:26 PM, Jakub Kicinski wrote:
> > Why create new RTM_ commands instead of using changelink?
> >
> > I thought we had to add special commands for bridge because
> > if the target of the command is not a bridge device but possibly
> > a bridge port, which could be anything. That's not the case here.
> >
> > Is it only about the convenience of add/del vs changelink where
> > we'd potentially have to pass and parse the entire vni list each time?  
> 
> yes, exactly. that's the reason. My first internal version used 
> changelink and soon realized it was too limiting.
> 
> especially notifications. Its too heavy to notify the full vni list 
> every-time.
> 
> IIRC bridge also went through a similar transition. Now bridge also has 
> RTM_*VLAN commands.

Makes sense. I wasn't quite sure if this isn't over-engineering
 - do deployments really use VxLAN devs with many VNIs?

> Couldn't think of another way than adding a new msg. Tried to keep the 
> name generic for use by potentially other dst/collect metadata devices

Ack, I don't have any better ideas either :)
Roopa Prabhu Feb. 23, 2022, 6:31 a.m. UTC | #4
On 2/22/22 7:50 PM, Jakub Kicinski wrote:
> On Tue, 22 Feb 2022 18:49:03 -0800 Roopa Prabhu wrote:
>> On 2/22/22 5:26 PM, Jakub Kicinski wrote:
>>> Why create new RTM_ commands instead of using changelink?
>>>
>>> I thought we had to add special commands for bridge because
>>> if the target of the command is not a bridge device but possibly
>>> a bridge port, which could be anything. That's not the case here.
>>>
>>> Is it only about the convenience of add/del vs changelink where
>>> we'd potentially have to pass and parse the entire vni list each time?
>> yes, exactly. that's the reason. My first internal version used
>> changelink and soon realized it was too limiting.
>>
>> especially notifications. Its too heavy to notify the full vni list
>> every-time.
>>
>> IIRC bridge also went through a similar transition. Now bridge also has
>> RTM_*VLAN commands.
> Makes sense. I wasn't quite sure if this isn't over-engineering
>   - do deployments really use VxLAN devs with many VNIs?


yes, vnis are unfortunately used to stretch layer 2 domains exceeding 
the 4k vlan limit

In the example provided in this series, vnis can be deployed in the 
ranges of more than 15k if each customer is allocated a 4k vlan range 
(bridge domain),

and a single switch can host 3-4 customers.

the scale is also the reason to switch to dst metadata devices when 
deployed vni numbers get large. The traditional vxlan netdev per vni 
does not work well at this scale.



>
>> Couldn't think of another way than adding a new msg. Tried to keep the
>> name generic for use by potentially other dst/collect metadata devices
> Ack, I don't have any better ideas either :)
Jakub Kicinski Feb. 23, 2022, 4:22 p.m. UTC | #5
On Tue, 22 Feb 2022 22:31:18 -0800 Roopa Prabhu wrote:
> > Makes sense. I wasn't quite sure if this isn't over-engineering
> >   - do deployments really use VxLAN devs with many VNIs?  
> 
> yes, vnis are unfortunately used to stretch layer 2 domains exceeding 
> the 4k vlan limit
> 
> In the example provided in this series, vnis can be deployed in the 
> ranges of more than 15k if each customer is allocated a 4k vlan range 
> (bridge domain),
> 
> and a single switch can host 3-4 customers.
> 
> the scale is also the reason to switch to dst metadata devices when 
> deployed vni numbers get large. The traditional vxlan netdev per vni 
> does not work well at this scale.

Ah, now it all clicked together, thanks!
diff mbox series

Patch

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index e1ba2d51b717..3343514db47d 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -712,7 +712,32 @@  enum ipvlan_mode {
 #define IPVLAN_F_PRIVATE	0x01
 #define IPVLAN_F_VEPA		0x02
 
+/* Tunnel RTM header */
+struct tunnel_msg {
+	__u8 family;
+	__u8 reserved1;
+	__u16 reserved2;
+	__u32 ifindex;
+};
+
 /* VXLAN section */
+enum {
+	VXLAN_VNIFILTER_ENTRY_UNSPEC,
+	VXLAN_VNIFILTER_ENTRY_START,
+	VXLAN_VNIFILTER_ENTRY_END,
+	VXLAN_VNIFILTER_ENTRY_GROUP,
+	VXLAN_VNIFILTER_ENTRY_GROUP6,
+	__VXLAN_VNIFILTER_ENTRY_MAX
+};
+#define VXLAN_VNIFILTER_ENTRY_MAX	(__VXLAN_VNIFILTER_ENTRY_MAX - 1)
+
+enum {
+	VXLAN_VNIFILTER_UNSPEC,
+	VXLAN_VNIFILTER_ENTRY,
+	__VXLAN_VNIFILTER_MAX
+};
+#define VXLAN_VNIFILTER_MAX	(__VXLAN_VNIFILTER_MAX - 1)
+
 enum {
 	IFLA_VXLAN_UNSPEC,
 	IFLA_VXLAN_ID,
@@ -744,6 +769,7 @@  enum {
 	IFLA_VXLAN_GPE,
 	IFLA_VXLAN_TTL_INHERIT,
 	IFLA_VXLAN_DF,
+	IFLA_VXLAN_VNIFILTER, /* only applicable with COLLECT_METADATA mode */
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 93d934cc4613..0970cb4b1b88 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -185,6 +185,13 @@  enum {
 	RTM_GETNEXTHOPBUCKET,
 #define RTM_GETNEXTHOPBUCKET	RTM_GETNEXTHOPBUCKET
 
+	RTM_NEWTUNNEL = 120,
+#define RTM_NEWTUNNEL	RTM_NEWTUNNEL
+	RTM_DELTUNNEL,
+#define RTM_DELTUNNEL	RTM_DELTUNNEL
+	RTM_GETTUNNEL,
+#define RTM_GETTUNNEL	RTM_GETTUNNEL
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };
@@ -756,6 +763,8 @@  enum rtnetlink_groups {
 #define RTNLGRP_BRVLAN		RTNLGRP_BRVLAN
 	RTNLGRP_MCTP_IFADDR,
 #define RTNLGRP_MCTP_IFADDR	RTNLGRP_MCTP_IFADDR
+	RTNLGRP_TUNNEL,
+#define RTNLGRP_TUNNEL		RTNLGRP_TUNNEL
 	__RTNLGRP_MAX
 };
 #define RTNLGRP_MAX	(__RTNLGRP_MAX - 1)
diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
index 94ea2a8b2bb7..6ad3ee02e023 100644
--- a/security/selinux/nlmsgtab.c
+++ b/security/selinux/nlmsgtab.c
@@ -91,6 +91,9 @@  static const struct nlmsg_perm nlmsg_route_perms[] =
 	{ RTM_NEWNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_DELNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_GETNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
+	{ RTM_NEWTUNNEL,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
+	{ RTM_DELTUNNEL,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
+	{ RTM_GETTUNNEL,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
 };
 
 static const struct nlmsg_perm nlmsg_tcpdiag_perms[] =
@@ -176,7 +179,7 @@  int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm)
 		 * structures at the top of this file with the new mappings
 		 * before updating the BUILD_BUG_ON() macro!
 		 */
-		BUILD_BUG_ON(RTM_MAX != (RTM_NEWNEXTHOPBUCKET + 3));
+		BUILD_BUG_ON(RTM_MAX != (RTM_NEWTUNNEL + 3));
 		err = nlmsg_perm(nlmsg_type, perm, nlmsg_route_perms,
 				 sizeof(nlmsg_route_perms));
 		break;