Message ID | 20220222025230.2119189-8-roopa@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | vxlan metadata device vnifiltering support | expand |
On Tue, 22 Feb 2022 02:52:25 +0000 Roopa Prabhu wrote: > + RTM_NEWTUNNEL = 120, > +#define RTM_NEWTUNNEL RTM_NEWTUNNEL > + RTM_DELTUNNEL, > +#define RTM_DELTUNNEL RTM_DELTUNNEL > + RTM_GETTUNNEL, > +#define RTM_GETTUNNEL RTM_GETTUNNEL Why create new RTM_ commands instead of using changelink? I thought we had to add special commands for bridge because if the target of the command is not a bridge device but possibly a bridge port, which could be anything. That's not the case here. Is it only about the convenience of add/del vs changelink where we'd potentially have to pass and parse the entire vni list each time?
On 2/22/22 5:26 PM, Jakub Kicinski wrote: > On Tue, 22 Feb 2022 02:52:25 +0000 Roopa Prabhu wrote: >> + RTM_NEWTUNNEL = 120, >> +#define RTM_NEWTUNNEL RTM_NEWTUNNEL >> + RTM_DELTUNNEL, >> +#define RTM_DELTUNNEL RTM_DELTUNNEL >> + RTM_GETTUNNEL, >> +#define RTM_GETTUNNEL RTM_GETTUNNEL > Why create new RTM_ commands instead of using changelink? > > I thought we had to add special commands for bridge because > if the target of the command is not a bridge device but possibly > a bridge port, which could be anything. That's not the case here. > > Is it only about the convenience of add/del vs changelink where > we'd potentially have to pass and parse the entire vni list each time? yes, exactly. that's the reason. My first internal version used changelink and soon realized it was too limiting. especially notifications. Its too heavy to notify the full vni list every-time. IIRC bridge also went through a similar transition. Now bridge also has RTM_*VLAN commands. Couldn't think of another way than adding a new msg. Tried to keep the name generic for use by potentially other dst/collect metadata devices
On Tue, 22 Feb 2022 18:49:03 -0800 Roopa Prabhu wrote: > On 2/22/22 5:26 PM, Jakub Kicinski wrote: > > Why create new RTM_ commands instead of using changelink? > > > > I thought we had to add special commands for bridge because > > if the target of the command is not a bridge device but possibly > > a bridge port, which could be anything. That's not the case here. > > > > Is it only about the convenience of add/del vs changelink where > > we'd potentially have to pass and parse the entire vni list each time? > > yes, exactly. that's the reason. My first internal version used > changelink and soon realized it was too limiting. > > especially notifications. Its too heavy to notify the full vni list > every-time. > > IIRC bridge also went through a similar transition. Now bridge also has > RTM_*VLAN commands. Makes sense. I wasn't quite sure if this isn't over-engineering - do deployments really use VxLAN devs with many VNIs? > Couldn't think of another way than adding a new msg. Tried to keep the > name generic for use by potentially other dst/collect metadata devices Ack, I don't have any better ideas either :)
On 2/22/22 7:50 PM, Jakub Kicinski wrote: > On Tue, 22 Feb 2022 18:49:03 -0800 Roopa Prabhu wrote: >> On 2/22/22 5:26 PM, Jakub Kicinski wrote: >>> Why create new RTM_ commands instead of using changelink? >>> >>> I thought we had to add special commands for bridge because >>> if the target of the command is not a bridge device but possibly >>> a bridge port, which could be anything. That's not the case here. >>> >>> Is it only about the convenience of add/del vs changelink where >>> we'd potentially have to pass and parse the entire vni list each time? >> yes, exactly. that's the reason. My first internal version used >> changelink and soon realized it was too limiting. >> >> especially notifications. Its too heavy to notify the full vni list >> every-time. >> >> IIRC bridge also went through a similar transition. Now bridge also has >> RTM_*VLAN commands. > Makes sense. I wasn't quite sure if this isn't over-engineering > - do deployments really use VxLAN devs with many VNIs? yes, vnis are unfortunately used to stretch layer 2 domains exceeding the 4k vlan limit In the example provided in this series, vnis can be deployed in the ranges of more than 15k if each customer is allocated a 4k vlan range (bridge domain), and a single switch can host 3-4 customers. the scale is also the reason to switch to dst metadata devices when deployed vni numbers get large. The traditional vxlan netdev per vni does not work well at this scale. > >> Couldn't think of another way than adding a new msg. Tried to keep the >> name generic for use by potentially other dst/collect metadata devices > Ack, I don't have any better ideas either :)
On Tue, 22 Feb 2022 22:31:18 -0800 Roopa Prabhu wrote: > > Makes sense. I wasn't quite sure if this isn't over-engineering > > - do deployments really use VxLAN devs with many VNIs? > > yes, vnis are unfortunately used to stretch layer 2 domains exceeding > the 4k vlan limit > > In the example provided in this series, vnis can be deployed in the > ranges of more than 15k if each customer is allocated a 4k vlan range > (bridge domain), > > and a single switch can host 3-4 customers. > > the scale is also the reason to switch to dst metadata devices when > deployed vni numbers get large. The traditional vxlan netdev per vni > does not work well at this scale. Ah, now it all clicked together, thanks!
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index e1ba2d51b717..3343514db47d 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -712,7 +712,32 @@ enum ipvlan_mode { #define IPVLAN_F_PRIVATE 0x01 #define IPVLAN_F_VEPA 0x02 +/* Tunnel RTM header */ +struct tunnel_msg { + __u8 family; + __u8 reserved1; + __u16 reserved2; + __u32 ifindex; +}; + /* VXLAN section */ +enum { + VXLAN_VNIFILTER_ENTRY_UNSPEC, + VXLAN_VNIFILTER_ENTRY_START, + VXLAN_VNIFILTER_ENTRY_END, + VXLAN_VNIFILTER_ENTRY_GROUP, + VXLAN_VNIFILTER_ENTRY_GROUP6, + __VXLAN_VNIFILTER_ENTRY_MAX +}; +#define VXLAN_VNIFILTER_ENTRY_MAX (__VXLAN_VNIFILTER_ENTRY_MAX - 1) + +enum { + VXLAN_VNIFILTER_UNSPEC, + VXLAN_VNIFILTER_ENTRY, + __VXLAN_VNIFILTER_MAX +}; +#define VXLAN_VNIFILTER_MAX (__VXLAN_VNIFILTER_MAX - 1) + enum { IFLA_VXLAN_UNSPEC, IFLA_VXLAN_ID, @@ -744,6 +769,7 @@ enum { IFLA_VXLAN_GPE, IFLA_VXLAN_TTL_INHERIT, IFLA_VXLAN_DF, + IFLA_VXLAN_VNIFILTER, /* only applicable with COLLECT_METADATA mode */ __IFLA_VXLAN_MAX }; #define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1) diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 93d934cc4613..0970cb4b1b88 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -185,6 +185,13 @@ enum { RTM_GETNEXTHOPBUCKET, #define RTM_GETNEXTHOPBUCKET RTM_GETNEXTHOPBUCKET + RTM_NEWTUNNEL = 120, +#define RTM_NEWTUNNEL RTM_NEWTUNNEL + RTM_DELTUNNEL, +#define RTM_DELTUNNEL RTM_DELTUNNEL + RTM_GETTUNNEL, +#define RTM_GETTUNNEL RTM_GETTUNNEL + __RTM_MAX, #define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1) }; @@ -756,6 +763,8 @@ enum rtnetlink_groups { #define RTNLGRP_BRVLAN RTNLGRP_BRVLAN RTNLGRP_MCTP_IFADDR, #define RTNLGRP_MCTP_IFADDR RTNLGRP_MCTP_IFADDR + RTNLGRP_TUNNEL, +#define RTNLGRP_TUNNEL RTNLGRP_TUNNEL __RTNLGRP_MAX }; #define RTNLGRP_MAX (__RTNLGRP_MAX - 1) diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c index 94ea2a8b2bb7..6ad3ee02e023 100644 --- a/security/selinux/nlmsgtab.c +++ b/security/selinux/nlmsgtab.c @@ -91,6 +91,9 @@ static const struct nlmsg_perm nlmsg_route_perms[] = { RTM_NEWNEXTHOPBUCKET, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, { RTM_DELNEXTHOPBUCKET, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, { RTM_GETNEXTHOPBUCKET, NETLINK_ROUTE_SOCKET__NLMSG_READ }, + { RTM_NEWTUNNEL, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, + { RTM_DELTUNNEL, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, + { RTM_GETTUNNEL, NETLINK_ROUTE_SOCKET__NLMSG_READ }, }; static const struct nlmsg_perm nlmsg_tcpdiag_perms[] = @@ -176,7 +179,7 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm) * structures at the top of this file with the new mappings * before updating the BUILD_BUG_ON() macro! */ - BUILD_BUG_ON(RTM_MAX != (RTM_NEWNEXTHOPBUCKET + 3)); + BUILD_BUG_ON(RTM_MAX != (RTM_NEWTUNNEL + 3)); err = nlmsg_perm(nlmsg_type, perm, nlmsg_route_perms, sizeof(nlmsg_route_perms)); break;
This patch adds new rtm tunnel msg and api for tunnel id filtering in dst_metadata devices. First dst_metadata device to use the api is vxlan driver with AF_BRIDGE family. This and later changes add ability in vxlan driver to do tunnel id filtering (or vni filtering) on dst_metadata devices. This is similar to vlan api in the vlan filtering bridge. this patch includes selinux nlmsg_route_perms support for RTM_*TUNNEL api from Benjamin Poirier. Signed-off-by: Roopa Prabhu <roopa@nvidia.com> --- include/uapi/linux/if_link.h | 26 ++++++++++++++++++++++++++ include/uapi/linux/rtnetlink.h | 9 +++++++++ security/selinux/nlmsgtab.c | 5 ++++- 3 files changed, 39 insertions(+), 1 deletion(-)