diff mbox series

[net-next,v4] vxlan: try to send a packet normally if local bypass fails

Message ID 20230322070414.21257-1-vladimir@nikishkin.pw (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v4] vxlan: try to send a packet normally if local bypass fails | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 19 this patch: 20
netdev/cc_maintainers warning 3 maintainers not CCed: idosch@nvidia.com corbet@lwn.net linux-doc@vger.kernel.org
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 19 this patch: 20
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 95 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Vladimir Nikishkin March 22, 2023, 7:04 a.m. UTC
In vxlan_core, if an fdb entry is pointing to a local
address with some port, the system tries to get the packet to
deliver the packet to the vxlan directly, bypassing the network
stack.

This patch makes it still try canonical delivery, if there is no
linux kernel vxlan listening on this port. This will be useful
for the cases when there is some userspace daemon expecting
vxlan packets for post-processing, or some other implementation
of vxlan.

Signed-off-by: Vladimir Nikishkin <vladimir@nikishkin.pw>
---
 Documentation/networking/vxlan.rst | 13 ++++++++++
 drivers/net/vxlan/vxlan_core.c     | 39 ++++++++++++++++++++++++++++--
 2 files changed, 50 insertions(+), 2 deletions(-)

Comments

Ido Schimmel March 22, 2023, 8:26 a.m. UTC | #1
On Wed, Mar 22, 2023 at 03:04:14PM +0800, Vladimir Nikishkin wrote:
> --- a/Documentation/networking/vxlan.rst
> +++ b/Documentation/networking/vxlan.rst
> @@ -86,3 +86,16 @@ offloaded ports can be interrogated with `ethtool`::
>        Types: geneve, vxlan-gpe
>        Entries (1):
>            port 1230, vxlan-gpe
> +
> +=================
> +Sysctls
> +=================
> +
> +One sysctl influences the behaviour of the vxlan driver.
> +
> + - `vxlan.disable_local_bypass`
> +
> +If set to 1, and if there is a packet destined to the local address, for which the
> +driver cannot find a corresponding vni, it is forwarded to the userspace networking
> +stack. This is useful if there is some userspace UDP tunnel waiting for such
> +packets.

Hi,

I don't believe sysctl is the right interface for this. VXLAN options
are usually / always added as netlink attributes. See ip-link man page
under "VXLAN Type Support".

Also, please add a selftest under tools/testing/selftests/net/. We
already have a bunch of VXLAN tests that you can use as a reference.

Thanks
Nikolay Aleksandrov March 22, 2023, 9:47 a.m. UTC | #2
On 22/03/2023 10:26, Ido Schimmel wrote:
> On Wed, Mar 22, 2023 at 03:04:14PM +0800, Vladimir Nikishkin wrote:
>> --- a/Documentation/networking/vxlan.rst
>> +++ b/Documentation/networking/vxlan.rst
>> @@ -86,3 +86,16 @@ offloaded ports can be interrogated with `ethtool`::
>>        Types: geneve, vxlan-gpe
>>        Entries (1):
>>            port 1230, vxlan-gpe
>> +
>> +=================
>> +Sysctls
>> +=================
>> +
>> +One sysctl influences the behaviour of the vxlan driver.
>> +
>> + - `vxlan.disable_local_bypass`
>> +
>> +If set to 1, and if there is a packet destined to the local address, for which the
>> +driver cannot find a corresponding vni, it is forwarded to the userspace networking
>> +stack. This is useful if there is some userspace UDP tunnel waiting for such
>> +packets.
> 
> Hi,
> 
> I don't believe sysctl is the right interface for this. VXLAN options
> are usually / always added as netlink attributes. See ip-link man page
> under "VXLAN Type Support".
> 
> Also, please add a selftest under tools/testing/selftests/net/. We
> already have a bunch of VXLAN tests that you can use as a reference.
> 
> Thanks

Right, that is what I meant when I suggested making it optional.
Sorry for not being explicit. Please use vxlan netlink attributes.

Cheers,
 Nik
Vladimir Nikishkin March 23, 2023, 6:07 a.m. UTC | #3
Nikolay Aleksandrov <razor@blackwall.org> writes:

> On 22/03/2023 10:26, Ido Schimmel wrote:
>> On Wed, Mar 22, 2023 at 03:04:14PM +0800, Vladimir Nikishkin wrote:
>>> --- a/Documentation/networking/vxlan.rst
>>> +++ b/Documentation/networking/vxlan.rst
>>> @@ -86,3 +86,16 @@ offloaded ports can be interrogated with `ethtool`::
>>>        Types: geneve, vxlan-gpe
>>>        Entries (1):
>>>            port 1230, vxlan-gpe
>>> +
>>> +=================
>>> +Sysctls
>>> +=================
>>> +
>>> +One sysctl influences the behaviour of the vxlan driver.
>>> +
>>> + - `vxlan.disable_local_bypass`
>>> +
>>> +If set to 1, and if there is a packet destined to the local address, for which the
>>> +driver cannot find a corresponding vni, it is forwarded to the userspace networking
>>> +stack. This is useful if there is some userspace UDP tunnel waiting for such
>>> +packets.
>> 
>> Hi,
>> 
>> I don't believe sysctl is the right interface for this. VXLAN options
>> are usually / always added as netlink attributes. See ip-link man page
>> under "VXLAN Type Support".
>> 
>> Also, please add a selftest under tools/testing/selftests/net/. We
>> already have a bunch of VXLAN tests that you can use as a reference.
>> 
>> Thanks
>
> Right, that is what I meant when I suggested making it optional.
> Sorry for not being explicit. Please use vxlan netlink attributes.
>
> Cheers,
>  Nik

Sorry for misunderstanding. I have send two patches to the mailing list,
one for the kernel, and one for the ip-link. If this is the correct way
to do it, I will write a test case. I wouldn't want to test something
which is incorrect by design. Sorry for so many wrong attempts.
diff mbox series

Patch

diff --git a/Documentation/networking/vxlan.rst b/Documentation/networking/vxlan.rst
index 2759dc1cc525..0ac5681093ef 100644
--- a/Documentation/networking/vxlan.rst
+++ b/Documentation/networking/vxlan.rst
@@ -86,3 +86,16 @@  offloaded ports can be interrogated with `ethtool`::
       Types: geneve, vxlan-gpe
       Entries (1):
           port 1230, vxlan-gpe
+
+=================
+Sysctls
+=================
+
+One sysctl influences the behaviour of the vxlan driver.
+
+ - `vxlan.disable_local_bypass`
+
+If set to 1, and if there is a packet destined to the local address, for which the
+driver cannot find a corresponding vni, it is forwarded to the userspace networking
+stack. This is useful if there is some userspace UDP tunnel waiting for such
+packets.
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 561fe1b314f5..cef15b9d3c9e 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -15,6 +15,7 @@ 
 #include <linux/igmp.h>
 #include <linux/if_ether.h>
 #include <linux/ethtool.h>
+#include <linux/sysctl.h>
 #include <net/arp.h>
 #include <net/ndisc.h>
 #include <net/gro.h>
@@ -53,6 +54,30 @@  static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
+static int disable_local_bypass;
+struct ctl_table_header *vxlan_sysctl_header;
+static struct ctl_table vxlan_sysctl_child[] = {
+	{
+		.procname = "disable_local_bypass",
+		.data = &disable_local_bypass,
+		.maxlen = sizeof(int),
+		.mode = 0644,
+		.proc_handler = &proc_dointvec_minmax,
+		.extra1 = SYSCTL_ZERO,
+		.extra2 = SYSCTL_ONE,
+	},
+	{}
+};
+
+static struct ctl_table vxlan_sysctl_parent[] = {
+	{
+		.procname = "vxlan",
+		.mode = 0555,
+		.child = vxlan_sysctl_child,
+	},
+	{}
+};
+
 unsigned int vxlan_net_id;
 
 const u8 all_zeros_mac[ETH_ALEN + 2];
@@ -2355,18 +2380,21 @@  static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev,
 	    !(rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) {
 		struct vxlan_dev *dst_vxlan;
 
-		dst_release(dst);
 		dst_vxlan = vxlan_find_vni(vxlan->net, dst_ifindex, vni,
 					   daddr->sa.sa_family, dst_port,
 					   vxlan->cfg.flags);
-		if (!dst_vxlan) {
+		if (!dst_vxlan && !disable_local_bypass) {
+			dst_release(dst);
 			dev->stats.tx_errors++;
 			vxlan_vnifilter_count(vxlan, vni, NULL,
 					      VXLAN_VNI_STATS_TX_ERRORS, 0);
 			kfree_skb(skb);
 
 			return -ENOENT;
+		} else if (!dst_vxlan && disable_local_bypass) {
+			return 0;
 		}
+		dst_release(dst);
 		vxlan_encap_bypass(skb, vxlan, dst_vxlan, vni, true);
 		return 1;
 	}
@@ -4671,6 +4699,12 @@  static struct pernet_operations vxlan_net_ops = {
 static int __init vxlan_init_module(void)
 {
 	int rc;
+	vxlan_sysctl_header =
+		register_sysctl_table(vxlan_sysctl_parent);
+	if (!vxlan_sysctl_header) {
+		pr_alert("Error: Failed to register vxlan sysctl subtree\n");
+		return -EFAULT;
+	}
 
 	get_random_bytes(&vxlan_salt, sizeof(vxlan_salt));
 
@@ -4706,6 +4740,7 @@  late_initcall(vxlan_init_module);
 
 static void __exit vxlan_cleanup_module(void)
 {
+	unregister_sysctl_table(vxlan_sysctl_header);
 	vxlan_vnifilter_uninit();
 	rtnl_link_unregister(&vxlan_link_ops);
 	unregister_switchdev_notifier(&vxlan_switchdev_notifier_block);