diff mbox series

[bpf,1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter

Message ID 160312349501.7917.13131363910387009253.stgit@toke.dk (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf: Rework bpf_redirect_neigh() to allow supplying nexthop from caller | expand

Commit Message

Toke Høiland-Jørgensen Oct. 19, 2020, 4:04 p.m. UTC
From: Toke Høiland-Jørgensen <toke@redhat.com>

Based on the discussion in [0], update the bpf_redirect_neigh() helper to
accept an optional parameter specifying the nexthop information. This makes
it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without
incurring a duplicate FIB lookup - since the FIB lookup helper will return
the nexthop information even if no neighbour is present, this can simply be
passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns
BPF_FIB_LKUP_RET_NO_NEIGH.

[0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/filter.h         |    9 ++
 include/uapi/linux/bpf.h       |   22 +++++-
 net/core/filter.c              |  159 +++++++++++++++++++++++++---------------
 scripts/bpf_helpers_doc.py     |    1 
 tools/include/uapi/linux/bpf.h |   22 +++++-
 5 files changed, 145 insertions(+), 68 deletions(-)

Comments

Daniel Borkmann Oct. 19, 2020, 6:23 p.m. UTC | #1
On 10/19/20 6:04 PM, Toke Høiland-Jørgensen wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
> 
> Based on the discussion in [0], update the bpf_redirect_neigh() helper to
> accept an optional parameter specifying the nexthop information. This makes
> it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without
> incurring a duplicate FIB lookup - since the FIB lookup helper will return
> the nexthop information even if no neighbour is present, this can simply be
> passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns
> BPF_FIB_LKUP_RET_NO_NEIGH.
> 
> [0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>

Looks good to me, thanks! I'll wait till David gets a chance as well to review.
One thing that would have made sense to me (probably worth a v2) is to keep the
fib lookup flag you had back then, meaning sth like BPF_FIB_SKIP_NEIGH which
would then return a BPF_FIB_LKUP_RET_NO_NEIGH before doing the neigh lookup inside
the bpf_ipv{4,6}_fib_lookup() so that programs can just unconditionally use the
combination of bpf_fib_lookup(skb, [...], BPF_FIB_SKIP_NEIGH) with the
bpf_redirect_neigh([...]) extension in that case and not do this bpf_redirect()
vs bpf_redirect_neigh() dance as you have in the selftest in patch 2/2.

Thanks,
Daniel
kernel test robot Oct. 19, 2020, 8:23 p.m. UTC | #2
Hi "Toke,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf/master]

url:    https://github.com/0day-ci/linux/commits/Toke-H-iland-J-rgensen/bpf-Rework-bpf_redirect_neigh-to-allow-supplying-nexthop-from-caller/20201020-000711
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master
config: x86_64-randconfig-a016-20201019 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 094e9f4779eb9b5c6a49014f2f80b8cbb833572f)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/250f45235b06a53ae71587457c94d9c3a3a99f9a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Toke-H-iland-J-rgensen/bpf-Rework-bpf_redirect_neigh-to-allow-supplying-nexthop-from-caller/20201020-000711
        git checkout 250f45235b06a53ae71587457c94d9c3a3a99f9a
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

>> net/core/filter.c:2268:27: error: redefinition of parameter 'dev'
                                      struct net_device *dev, struct bpf_nh_params *nh)
                                                         ^
   net/core/filter.c:2267:76: note: previous declaration is here
   static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
                                                                              ^
>> net/core/filter.c:2268:27: warning: omitting the parameter name in a function definition is a C2x extension [-Wc2x-extensions]
                                      struct net_device *dev, struct bpf_nh_params *nh)
                                                         ^
   1 warning and 1 error generated.

vim +/dev +2268 net/core/filter.c

  2225	
  2226	static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
  2227					   struct bpf_nh_params *nh)
  2228	{
  2229		const struct ipv6hdr *ip6h = ipv6_hdr(skb);
  2230		struct net *net = dev_net(dev);
  2231		int err, ret = NET_XMIT_DROP;
  2232	
  2233		if (!nh) {
  2234			struct dst_entry *dst;
  2235			struct flowi6 fl6 = {
  2236				.flowi6_flags = FLOWI_FLAG_ANYSRC,
  2237				.flowi6_mark  = skb->mark,
  2238				.flowlabel    = ip6_flowinfo(ip6h),
  2239				.flowi6_oif   = dev->ifindex,
  2240				.flowi6_proto = ip6h->nexthdr,
  2241				.daddr	      = ip6h->daddr,
  2242				.saddr	      = ip6h->saddr,
  2243			};
  2244	
  2245			dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL);
  2246			if (IS_ERR(dst))
  2247				goto out_drop;
  2248	
  2249			skb_dst_set(skb, dst);
  2250		} else if (nh->nh_family != AF_INET6) {
  2251			goto out_drop;
  2252		}
  2253	
  2254		err = bpf_out_neigh_v6(net, skb, dev, nh);
  2255		if (unlikely(net_xmit_eval(err)))
  2256			dev->stats.tx_errors++;
  2257		else
  2258			ret = NET_XMIT_SUCCESS;
  2259		goto out_xmit;
  2260	out_drop:
  2261		dev->stats.tx_errors++;
  2262		kfree_skb(skb);
  2263	out_xmit:
  2264		return ret;
  2265	}
  2266	#else
  2267	static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
> 2268					   struct net_device *dev, struct bpf_nh_params *nh)
  2269	{
  2270		kfree_skb(skb);
  2271		return NET_XMIT_DROP;
  2272	}
  2273	#endif /* CONFIG_IPV6 */
  2274	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
David Ahern Oct. 20, 2020, 3:08 a.m. UTC | #3
On 10/19/20 10:04 AM, Toke Høiland-Jørgensen wrote:
> @@ -4906,6 +4910,16 @@ struct bpf_fib_lookup {
>  	__u8	dmac[6];     /* ETH_ALEN */
>  };
>  
> +struct bpf_redir_neigh {
> +	/* network family for lookup (AF_INET, AF_INET6) */
> +	__u8	nh_family;

Define every byte. This is a 3-byte hole that should have unused's and
the helper verifies they are set to 0.

> +	/* network address of nexthop; skips fib lookup to find gateway */
> +	union {
> +		__be32		ipv4_nh;
> +		__u32		ipv6_nh[4];  /* in6_addr; network order */
> +	};
> +};
> +
>  enum bpf_task_fd_type {
>  	BPF_FD_TYPE_RAW_TRACEPOINT,	/* tp name */
>  	BPF_FD_TYPE_TRACEPOINT,		/* tp name */
David Ahern Oct. 20, 2020, 3:12 a.m. UTC | #4
On 10/19/20 12:23 PM, Daniel Borkmann wrote:
> Looks good to me, thanks! I'll wait till David gets a chance as well to
> review.
> One thing that would have made sense to me (probably worth a v2) is to
> keep the
> fib lookup flag you had back then, meaning sth like BPF_FIB_SKIP_NEIGH
> which
> would then return a BPF_FIB_LKUP_RET_NO_NEIGH before doing the neigh
> lookup inside
> the bpf_ipv{4,6}_fib_lookup() so that programs can just unconditionally
> use the
> combination of bpf_fib_lookup(skb, [...], BPF_FIB_SKIP_NEIGH) with the
> bpf_redirect_neigh([...]) extension in that case and not do this
> bpf_redirect()
> vs bpf_redirect_neigh() dance as you have in the selftest in patch 2/2.

That puts the overhead on bpf_ipv{4,6}_fib_lookup. The existiong helpers
return BPF_FIB_LKUP_RET_NO_NEIGH which is the key to the bpf program to
call the bpf_redirect_neigh - making the program deal with the overhead
as needed on failures.
Daniel Borkmann Oct. 20, 2020, 8:59 a.m. UTC | #5
On 10/20/20 5:12 AM, David Ahern wrote:
> On 10/19/20 12:23 PM, Daniel Borkmann wrote:
>> Looks good to me, thanks! I'll wait till David gets a chance as well to
>> review.
>> One thing that would have made sense to me (probably worth a v2) is to
>> keep the
>> fib lookup flag you had back then, meaning sth like BPF_FIB_SKIP_NEIGH
>> which
>> would then return a BPF_FIB_LKUP_RET_NO_NEIGH before doing the neigh
>> lookup inside
>> the bpf_ipv{4,6}_fib_lookup() so that programs can just unconditionally
>> use the
>> combination of bpf_fib_lookup(skb, [...], BPF_FIB_SKIP_NEIGH) with the
>> bpf_redirect_neigh([...]) extension in that case and not do this
>> bpf_redirect()
>> vs bpf_redirect_neigh() dance as you have in the selftest in patch 2/2.
> 
> That puts the overhead on bpf_ipv{4,6}_fib_lookup. The existiong helpers
> return BPF_FIB_LKUP_RET_NO_NEIGH which is the key to the bpf program to
> call the bpf_redirect_neigh - making the program deal with the overhead
> as needed on failures.

But if I know there's high chance anyway that __ipv*_neigh_lookup_noref*()
is failing for bpf_ipv{4,6}_fib_lookup() why even paying the price of the
hash table lookup in there? Simple test to skip and return early would be
much cheaper, and branch predictor should work it out just fine given that
setting is pretty much static anyway; I'm not sure I'm seeing why this would
be much of a concern..
David Ahern Oct. 20, 2020, 1:45 p.m. UTC | #6
On 10/20/20 2:59 AM, Daniel Borkmann wrote:
> On 10/20/20 5:12 AM, David Ahern wrote:
>> On 10/19/20 12:23 PM, Daniel Borkmann wrote:
>>> Looks good to me, thanks! I'll wait till David gets a chance as well to
>>> review.
>>> One thing that would have made sense to me (probably worth a v2) is to
>>> keep the
>>> fib lookup flag you had back then, meaning sth like BPF_FIB_SKIP_NEIGH
>>> which
>>> would then return a BPF_FIB_LKUP_RET_NO_NEIGH before doing the neigh
>>> lookup inside
>>> the bpf_ipv{4,6}_fib_lookup() so that programs can just unconditionally
>>> use the
>>> combination of bpf_fib_lookup(skb, [...], BPF_FIB_SKIP_NEIGH) with the
>>> bpf_redirect_neigh([...]) extension in that case and not do this
>>> bpf_redirect()
>>> vs bpf_redirect_neigh() dance as you have in the selftest in patch 2/2.
>>
>> That puts the overhead on bpf_ipv{4,6}_fib_lookup. The existiong helpers
>> return BPF_FIB_LKUP_RET_NO_NEIGH which is the key to the bpf program to
>> call the bpf_redirect_neigh - making the program deal with the overhead
>> as needed on failures.
> 
> But if I know there's high chance anyway that __ipv*_neigh_lookup_noref*()
> is failing for bpf_ipv{4,6}_fib_lookup() why even paying the price of the
> hash table lookup in there? Simple test to skip and return early would be
> much cheaper, and branch predictor should work it out just fine given that
> setting is pretty much static anyway; I'm not sure I'm seeing why this
> would
> be much of a concern..

The death by a 1000 paper cuts mantra.

The new bpf_redirect_neigh helper only works for skb's; the older
bpf_fib_lookup helpers work for XDP and skb's.

The primary reason for a program to need both helpers back to back is
when the neighbor entry is invalid. A condition where the nexthop
address is valid yet the neighbor entry is not resolving or in an
invalid state should be a rare event - like startup.

The existing helper has enough 'if' checks in it, forced by the
multitude of features the stack supports. Rare runtime events should be
handled by the bpf programs.
diff mbox series

Patch

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 20fc24c9779a..ba9de7188cd0 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -607,12 +607,21 @@  struct bpf_skb_data_end {
 	void *data_end;
 };
 
+struct bpf_nh_params {
+	u8 nh_family;
+	union {
+		__u32 ipv4_nh;
+		struct in6_addr ipv6_nh;
+	};
+};
+
 struct bpf_redirect_info {
 	u32 flags;
 	u32 tgt_index;
 	void *tgt_value;
 	struct bpf_map *map;
 	u32 kern_flags;
+	struct bpf_nh_params nh;
 };
 
 DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index bf5a99d803e4..82fd73ac1a13 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3677,15 +3677,19 @@  union bpf_attr {
  * 	Return
  * 		The id is returned or 0 in case the id could not be retrieved.
  *
- * long bpf_redirect_neigh(u32 ifindex, u64 flags)
+ * long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params, int plen, u64 flags)
  * 	Description
  * 		Redirect the packet to another net device of index *ifindex*
  * 		and fill in L2 addresses from neighboring subsystem. This helper
  * 		is somewhat similar to **bpf_redirect**\ (), except that it
  * 		populates L2 addresses as well, meaning, internally, the helper
- * 		performs a FIB lookup based on the skb's networking header to
- * 		get the address of the next hop and then relies on the neighbor
- * 		lookup for the L2 address of the nexthop.
+ * 		relies on the neighbor lookup for the L2 address of the nexthop.
+ *
+ * 		The helper will perform a FIB lookup based on the skb's
+ * 		networking header to get the address of the next hop, unless
+ * 		this is supplied by the caller in the *params* argument. The
+ * 		*plen* argument indicates the len of *params* and should be set
+ * 		to 0 if *params* is NULL.
  *
  * 		The *flags* argument is reserved and must be 0. The helper is
  * 		currently only supported for tc BPF program types, and enabled
@@ -4906,6 +4910,16 @@  struct bpf_fib_lookup {
 	__u8	dmac[6];     /* ETH_ALEN */
 };
 
+struct bpf_redir_neigh {
+	/* network family for lookup (AF_INET, AF_INET6) */
+	__u8	nh_family;
+	/* network address of nexthop; skips fib lookup to find gateway */
+	union {
+		__be32		ipv4_nh;
+		__u32		ipv6_nh[4];  /* in6_addr; network order */
+	};
+};
+
 enum bpf_task_fd_type {
 	BPF_FD_TYPE_RAW_TRACEPOINT,	/* tp name */
 	BPF_FD_TYPE_TRACEPOINT,		/* tp name */
diff --git a/net/core/filter.c b/net/core/filter.c
index c5e2a1c5fd8d..af8634cd4ba2 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2165,12 +2165,12 @@  static int __bpf_redirect(struct sk_buff *skb, struct net_device *dev,
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
-static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
+static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb,
+			    struct net_device *dev, struct bpf_nh_params *nh)
 {
-	struct dst_entry *dst = skb_dst(skb);
-	struct net_device *dev = dst->dev;
 	u32 hh_len = LL_RESERVED_SPACE(dev);
 	const struct in6_addr *nexthop;
+	struct dst_entry *dst = NULL;
 	struct neighbour *neigh;
 
 	if (dev_xmit_recursion()) {
@@ -2196,8 +2196,13 @@  static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
 	}
 
 	rcu_read_lock_bh();
-	nexthop = rt6_nexthop(container_of(dst, struct rt6_info, dst),
-			      &ipv6_hdr(skb)->daddr);
+	if (!nh) {
+		dst = skb_dst(skb);
+		nexthop = rt6_nexthop(container_of(dst, struct rt6_info, dst),
+				      &ipv6_hdr(skb)->daddr);
+	} else {
+		nexthop = &nh->ipv6_nh;
+	}
 	neigh = ip_neigh_gw6(dev, nexthop);
 	if (likely(!IS_ERR(neigh))) {
 		int ret;
@@ -2210,36 +2215,43 @@  static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
 		return ret;
 	}
 	rcu_read_unlock_bh();
-	IP6_INC_STATS(dev_net(dst->dev),
-		      ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
+	if (dst)
+		IP6_INC_STATS(dev_net(dst->dev),
+			      ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
 out_drop:
 	kfree_skb(skb);
 	return -ENETDOWN;
 }
 
-static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
+				   struct bpf_nh_params *nh)
 {
 	const struct ipv6hdr *ip6h = ipv6_hdr(skb);
 	struct net *net = dev_net(dev);
 	int err, ret = NET_XMIT_DROP;
-	struct dst_entry *dst;
-	struct flowi6 fl6 = {
-		.flowi6_flags	= FLOWI_FLAG_ANYSRC,
-		.flowi6_mark	= skb->mark,
-		.flowlabel	= ip6_flowinfo(ip6h),
-		.flowi6_oif	= dev->ifindex,
-		.flowi6_proto	= ip6h->nexthdr,
-		.daddr		= ip6h->daddr,
-		.saddr		= ip6h->saddr,
-	};
 
-	dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL);
-	if (IS_ERR(dst))
-		goto out_drop;
+	if (!nh) {
+		struct dst_entry *dst;
+		struct flowi6 fl6 = {
+			.flowi6_flags = FLOWI_FLAG_ANYSRC,
+			.flowi6_mark  = skb->mark,
+			.flowlabel    = ip6_flowinfo(ip6h),
+			.flowi6_oif   = dev->ifindex,
+			.flowi6_proto = ip6h->nexthdr,
+			.daddr	      = ip6h->daddr,
+			.saddr	      = ip6h->saddr,
+		};
+
+		dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL);
+		if (IS_ERR(dst))
+			goto out_drop;
 
-	skb_dst_set(skb, dst);
+		skb_dst_set(skb, dst);
+	} else if (nh->nh_family != AF_INET6) {
+		goto out_drop;
+	}
 
-	err = bpf_out_neigh_v6(net, skb);
+	err = bpf_out_neigh_v6(net, skb, dev, nh);
 	if (unlikely(net_xmit_eval(err)))
 		dev->stats.tx_errors++;
 	else
@@ -2252,7 +2264,8 @@  static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 #else
-static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
+				   struct net_device *dev, struct bpf_nh_params *nh)
 {
 	kfree_skb(skb);
 	return NET_XMIT_DROP;
@@ -2260,11 +2273,9 @@  static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
 #endif /* CONFIG_IPV6 */
 
 #if IS_ENABLED(CONFIG_INET)
-static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb)
+static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb,
+			    struct net_device *dev, struct bpf_nh_params *nh)
 {
-	struct dst_entry *dst = skb_dst(skb);
-	struct rtable *rt = container_of(dst, struct rtable, dst);
-	struct net_device *dev = dst->dev;
 	u32 hh_len = LL_RESERVED_SPACE(dev);
 	struct neighbour *neigh;
 	bool is_v6gw = false;
@@ -2292,7 +2303,21 @@  static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb)
 	}
 
 	rcu_read_lock_bh();
-	neigh = ip_neigh_for_gw(rt, skb, &is_v6gw);
+	if (!nh) {
+		struct dst_entry *dst = skb_dst(skb);
+		struct rtable *rt = container_of(dst, struct rtable, dst);
+
+		neigh = ip_neigh_for_gw(rt, skb, &is_v6gw);
+	} else if (nh->nh_family == AF_INET6) {
+		neigh = ip_neigh_gw6(dev, &nh->ipv6_nh);
+		is_v6gw = true;
+	} else if (nh->nh_family == AF_INET) {
+		neigh = ip_neigh_gw4(dev, nh->ipv4_nh);
+	} else {
+		rcu_read_unlock_bh();
+		goto out_drop;
+	}
+
 	if (likely(!IS_ERR(neigh))) {
 		int ret;
 
@@ -2309,33 +2334,37 @@  static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb)
 	return -ENETDOWN;
 }
 
-static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev,
+				   struct bpf_nh_params *nh)
 {
 	const struct iphdr *ip4h = ip_hdr(skb);
 	struct net *net = dev_net(dev);
 	int err, ret = NET_XMIT_DROP;
-	struct rtable *rt;
-	struct flowi4 fl4 = {
-		.flowi4_flags	= FLOWI_FLAG_ANYSRC,
-		.flowi4_mark	= skb->mark,
-		.flowi4_tos	= RT_TOS(ip4h->tos),
-		.flowi4_oif	= dev->ifindex,
-		.flowi4_proto	= ip4h->protocol,
-		.daddr		= ip4h->daddr,
-		.saddr		= ip4h->saddr,
-	};
 
-	rt = ip_route_output_flow(net, &fl4, NULL);
-	if (IS_ERR(rt))
-		goto out_drop;
-	if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
-		ip_rt_put(rt);
-		goto out_drop;
-	}
+	if (!nh) {
+		struct flowi4 fl4 = {
+			.flowi4_flags = FLOWI_FLAG_ANYSRC,
+			.flowi4_mark  = skb->mark,
+			.flowi4_tos   = RT_TOS(ip4h->tos),
+			.flowi4_oif   = dev->ifindex,
+			.flowi4_proto = ip4h->protocol,
+			.daddr	      = ip4h->daddr,
+			.saddr	      = ip4h->saddr,
+		};
+		struct rtable *rt;
+
+		rt = ip_route_output_flow(net, &fl4, NULL);
+		if (IS_ERR(rt))
+			goto out_drop;
+		if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
+			ip_rt_put(rt);
+			goto out_drop;
+		}
 
-	skb_dst_set(skb, &rt->dst);
+		skb_dst_set(skb, &rt->dst);
+	}
 
-	err = bpf_out_neigh_v4(net, skb);
+	err = bpf_out_neigh_v4(net, skb, dev, nh);
 	if (unlikely(net_xmit_eval(err)))
 		dev->stats.tx_errors++;
 	else
@@ -2348,14 +2377,16 @@  static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 #else
-static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev,
+				   struct net_device *dev, struct bpf_nh_params *nh)
 {
 	kfree_skb(skb);
 	return NET_XMIT_DROP;
 }
 #endif /* CONFIG_INET */
 
-static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev,
+				struct bpf_nh_params *nh)
 {
 	struct ethhdr *ethh = eth_hdr(skb);
 
@@ -2370,9 +2401,9 @@  static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev)
 	skb_reset_network_header(skb);
 
 	if (skb->protocol == htons(ETH_P_IP))
-		return __bpf_redirect_neigh_v4(skb, dev);
+		return __bpf_redirect_neigh_v4(skb, dev, nh);
 	else if (skb->protocol == htons(ETH_P_IPV6))
-		return __bpf_redirect_neigh_v6(skb, dev);
+		return __bpf_redirect_neigh_v6(skb, dev, nh);
 out:
 	kfree_skb(skb);
 	return -ENOTSUPP;
@@ -2382,7 +2413,8 @@  static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev)
 enum {
 	BPF_F_NEIGH	= (1ULL << 1),
 	BPF_F_PEER	= (1ULL << 2),
-#define BPF_F_REDIRECT_INTERNAL	(BPF_F_NEIGH | BPF_F_PEER)
+	BPF_F_NEXTHOP	= (1ULL << 3),
+#define BPF_F_REDIRECT_INTERNAL	(BPF_F_NEIGH | BPF_F_PEER | BPF_F_NEXTHOP)
 };
 
 BPF_CALL_3(bpf_clone_redirect, struct sk_buff *, skb, u32, ifindex, u64, flags)
@@ -2455,8 +2487,8 @@  int skb_do_redirect(struct sk_buff *skb)
 		return -EAGAIN;
 	}
 	return flags & BPF_F_NEIGH ?
-	       __bpf_redirect_neigh(skb, dev) :
-	       __bpf_redirect(skb, dev, flags);
+		__bpf_redirect_neigh(skb, dev, flags & BPF_F_NEXTHOP ? &ri->nh : NULL) :
+		__bpf_redirect(skb, dev, flags);
 out_drop:
 	kfree_skb(skb);
 	return -EINVAL;
@@ -2504,16 +2536,21 @@  static const struct bpf_func_proto bpf_redirect_peer_proto = {
 	.arg2_type      = ARG_ANYTHING,
 };
 
-BPF_CALL_2(bpf_redirect_neigh, u32, ifindex, u64, flags)
+BPF_CALL_4(bpf_redirect_neigh, u32, ifindex, struct bpf_redir_neigh *, params,
+	   int, plen, u64, flags)
 {
 	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
 
-	if (unlikely(flags))
+	if (unlikely((plen && plen < sizeof(*params)) || flags))
 		return TC_ACT_SHOT;
 
-	ri->flags = BPF_F_NEIGH;
+	ri->flags = BPF_F_NEIGH | (plen ? BPF_F_NEXTHOP : 0);
 	ri->tgt_index = ifindex;
 
+	BUILD_BUG_ON(sizeof(struct bpf_redir_neigh) != sizeof(struct bpf_nh_params));
+	if (plen)
+		memcpy(&ri->nh, params, sizeof(ri->nh));
+
 	return TC_ACT_REDIRECT;
 }
 
@@ -2522,7 +2559,9 @@  static const struct bpf_func_proto bpf_redirect_neigh_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_ANYTHING,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type      = ARG_PTR_TO_MEM_OR_NULL,
+	.arg3_type      = ARG_CONST_SIZE_OR_ZERO,
+	.arg4_type	= ARG_ANYTHING,
 };
 
 BPF_CALL_2(bpf_msg_apply_bytes, struct sk_msg *, msg, u32, bytes)
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 7d86fdd190be..6769caae142f 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -453,6 +453,7 @@  class PrinterHelpers(Printer):
             'struct bpf_perf_event_data',
             'struct bpf_perf_event_value',
             'struct bpf_pidns_info',
+            'struct bpf_redir_neigh',
             'struct bpf_sk_lookup',
             'struct bpf_sock',
             'struct bpf_sock_addr',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index bf5a99d803e4..82fd73ac1a13 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3677,15 +3677,19 @@  union bpf_attr {
  * 	Return
  * 		The id is returned or 0 in case the id could not be retrieved.
  *
- * long bpf_redirect_neigh(u32 ifindex, u64 flags)
+ * long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params, int plen, u64 flags)
  * 	Description
  * 		Redirect the packet to another net device of index *ifindex*
  * 		and fill in L2 addresses from neighboring subsystem. This helper
  * 		is somewhat similar to **bpf_redirect**\ (), except that it
  * 		populates L2 addresses as well, meaning, internally, the helper
- * 		performs a FIB lookup based on the skb's networking header to
- * 		get the address of the next hop and then relies on the neighbor
- * 		lookup for the L2 address of the nexthop.
+ * 		relies on the neighbor lookup for the L2 address of the nexthop.
+ *
+ * 		The helper will perform a FIB lookup based on the skb's
+ * 		networking header to get the address of the next hop, unless
+ * 		this is supplied by the caller in the *params* argument. The
+ * 		*plen* argument indicates the len of *params* and should be set
+ * 		to 0 if *params* is NULL.
  *
  * 		The *flags* argument is reserved and must be 0. The helper is
  * 		currently only supported for tc BPF program types, and enabled
@@ -4906,6 +4910,16 @@  struct bpf_fib_lookup {
 	__u8	dmac[6];     /* ETH_ALEN */
 };
 
+struct bpf_redir_neigh {
+	/* network family for lookup (AF_INET, AF_INET6) */
+	__u8	nh_family;
+	/* network address of nexthop; skips fib lookup to find gateway */
+	union {
+		__be32		ipv4_nh;
+		__u32		ipv6_nh[4];  /* in6_addr; network order */
+	};
+};
+
 enum bpf_task_fd_type {
 	BPF_FD_TYPE_RAW_TRACEPOINT,	/* tp name */
 	BPF_FD_TYPE_TRACEPOINT,		/* tp name */