diff mbox series

[RFC,net] vrf: fix incorrect dereferencing of skb->cb

Message ID 1644844229-54977-1-git-send-email-alibuda@linux.alibaba.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC,net] vrf: fix incorrect dereferencing of skb->cb | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5964 this patch: 5964
netdev/cc_maintainers fail 1 blamed authors not CCed: ssuryaextr@gmail.com; 11 maintainers not CCed: andrii@kernel.org kpsingh@kernel.org daniel@iogearbox.net john.fastabend@gmail.com kafai@fb.com songliubraving@fb.com bpf@vger.kernel.org ast@kernel.org keescook@chromium.org ssuryaextr@gmail.com yhs@fb.com
netdev/build_clang success Errors and warnings before: 869 this patch: 869
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 6120 this patch: 6120
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 41 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

D. Wythe Feb. 14, 2022, 1:10 p.m. UTC
From: "D. Wythe" <alibuda@linux.alibaba.com>

There is a failed test case in vrf scenario, it can be reproduced
with script:

./vrf_route_leaking.sh -t ipv6_ping_frag

Which output is similar to following:

TEST: Basic IPv6 connectivity			[ OK ]
TEST: Ping received ICMP Packet too big		[FAIL]

The direct cause of this issue is because the Packet too big in ICMPv6
is sent by the message whose source address is IPv6 loopback address and
is discarded dues to its input interface is not the local loopback device.
But the root cause is there's a bug that affects the correct source
address selection.

In the above case the calling stack is as follows:

    icmp6_send+1
    ip6_fragment+543
    ip6_output+98
    vrf_ip6_local_out+78
    vrf_xmit+424
    dev_hard_start_xmit+221
    __dev_queue_xmit+2792
    ip6_finish_output2+705
    ip6_output+98
    ip6_forward+1464
    ipv6_rcv+67

To be more specific:

ipv6_rcv()
	init skb->cb as struct inet6_skb_parm
...
__dev_queue_xmit()
	qdisc_pkt_len_init()
	init skb->cb as struct qdisc_skb_cb
...
vrf_xmit
	fill skb->cb to zero.
...
ip6_fragment()
icmp6_send()
	treats skb->cb as struct inet6_skb_parm.

icmp6_send() will try to use original input interface in IP6CB for
selecting a more meaningful source address, we should keep the old IP6CB
rather than overwrite it.

This patch try remove the memset and make the struct qdisc_skb_cb
contains inet_skb_parm/inet6_skb_parm, which makes it possible to read
the correct IP6CB in icmp6_send() without any changes in that case.

One problem is that the size of struct qdisc_skb_cb has increased,
the original skb->cb cannnot fit struct bpf_skb_data_end anymore, which
requires increase the size of skb->cb. This patch try increase it by the
union size of inet_skb_parm/inet6_skb_parm, which is 24.

struct bpf_skb_data_end {
	struct qdisc_skb_cb qdisc_cb;
	void *data_meta;
	void *data_end;
};

Fixes: ee201011c1e1 ("vrf: Reset IPCB/IP6CB when processing outbound pkts in vrf dev xmit")
Fixes: 35402e313663 ("net: Add IPv6 support to VRF device")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 drivers/net/vrf.c         | 2 --
 include/linux/skbuff.h    | 2 +-
 include/net/sch_generic.h | 7 +++++++
 3 files changed, 8 insertions(+), 3 deletions(-)

Comments

David Ahern Feb. 17, 2022, 3:50 a.m. UTC | #1
On 2/14/22 6:10 AM, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> There is a failed test case in vrf scenario, it can be reproduced
> with script:
> 
> ./vrf_route_leaking.sh -t ipv6_ping_frag
> 
> Which output is similar to following:
> 
> TEST: Basic IPv6 connectivity			[ OK ]
> TEST: Ping received ICMP Packet too big		[FAIL]
> 
> The direct cause of this issue is because the Packet too big in ICMPv6
> is sent by the message whose source address is IPv6 loopback address and
> is discarded dues to its input interface is not the local loopback device.
> But the root cause is there's a bug that affects the correct source
> address selection.

That is the correct problem, but your solution is not.

> 
> In the above case the calling stack is as follows:
> 
>     icmp6_send+1
>     ip6_fragment+543
>     ip6_output+98
>     vrf_ip6_local_out+78
>     vrf_xmit+424
>     dev_hard_start_xmit+221
>     __dev_queue_xmit+2792
>     ip6_finish_output2+705
>     ip6_output+98
>     ip6_forward+1464
>     ipv6_rcv+67
> 
> To be more specific:
> 
> ipv6_rcv()
> 	init skb->cb as struct inet6_skb_parm
> ...
> __dev_queue_xmit()
> 	qdisc_pkt_len_init()
> 	init skb->cb as struct qdisc_skb_cb
> ...
> vrf_xmit
> 	fill skb->cb to zero.
> ...
> ip6_fragment()
> icmp6_send()
> 	treats skb->cb as struct inet6_skb_parm.
> 
> icmp6_send() will try to use original input interface in IP6CB for
> selecting a more meaningful source address, we should keep the old IP6CB
> rather than overwrite it.

The packet is recirculated through the VRF device so the cb data should
be overwritten. The VRF driver does another route lookup within the VRF.
Address selection should only consider addresses in the domain.
Something is off in that logic for VRF route leaking. I looked into a
bit when Michael posted the tests, but never came back to it.
D. Wythe Feb. 17, 2022, 5:37 a.m. UTC | #2
在 2022/2/17 上午11:50, David Ahern 写道:
> The packet is recirculated through the VRF device so the cb data should
> be overwritten. The VRF driver does another route lookup within the VRF.
> Address selection should only consider addresses in the domain.
> Something is off in that logic for VRF route leaking. I looked into a
> bit when Michael posted the tests, but never came back to it.

Got your point, this patch is not really appropriate considering
that. Another way to complete the test may be to modify the IP address 
of vrf blue in test script,the default local loopback address is the 
reason for this failure. What do you think ?

Thanks for you comments.

Best Wishes.
David Ahern Feb. 18, 2022, 3:53 a.m. UTC | #3
On 2/16/22 10:37 PM, D. Wythe wrote:
> Got your point, this patch is not really appropriate considering
> that. Another way to complete the test may be to modify the IP address
> of vrf blue in test script,the default local loopback address is the
> reason for this failure. What do you think ?

Someone needs to dive into the source address selection code and see why
it fails when crossing vrfs. I found a reminder note:
ipv6_chk_acast_addr needs l3mdev check. Can you take a look?
D. Wythe Feb. 21, 2022, 6:27 a.m. UTC | #4
在 2022/2/18 上午11:53, David Ahern 写道:
> On 2/16/22 10:37 PM, D. Wythe wrote:
>> Got your point, this patch is not really appropriate considering
>> that. Another way to complete the test may be to modify the IP address
>> of vrf blue in test script,the default local loopback address is the
>> reason for this failure. What do you think ?

I am trying to work on rfc6724 and its implementation, which is source 
address selection code. I will reply as soon as I have a conclusion.

> Someone needs to dive into the source address selection code and see why
> it fails when crossing vrfs. I found a reminder note:
> ipv6_chk_acast_addr needs l3mdev check. Can you take a look?

Very valuable tips, I'll look it.

Best Wishes.
diff mbox series

Patch

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index e0b1ab9..a3601d0 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -498,7 +498,6 @@  static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
 	/* strip the ethernet header added for pass through VRF device */
 	__skb_pull(skb, skb_network_offset(skb));
 
-	memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
 	ret = vrf_ip6_local_out(net, skb->sk, skb);
 	if (unlikely(net_xmit_eval(ret)))
 		dev->stats.tx_errors++;
@@ -581,7 +580,6 @@  static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb,
 					       RT_SCOPE_LINK);
 	}
 
-	memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
 	ret = vrf_ip_local_out(dev_net(skb_dst(skb)->dev), skb->sk, skb);
 	if (unlikely(net_xmit_eval(ret)))
 		vrf_dev->stats.tx_errors++;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a5adbf6..0328d05 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -811,7 +811,7 @@  struct sk_buff {
 	 * want to keep them across layers you have to do a skb_clone()
 	 * first. This is owned by whoever has the skb queued ATM.
 	 */
-	char			cb[48] __aligned(8);
+	char			cb[72] __aligned(8);
 
 	union {
 		struct {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9bab396..64cb250 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -19,6 +19,7 @@ 
 #include <net/gen_stats.h>
 #include <net/rtnetlink.h>
 #include <net/flow_offload.h>
+#include <net/ip.h>
 
 struct Qdisc_ops;
 struct qdisc_walker;
@@ -440,6 +441,12 @@  struct tcf_proto {
 };
 
 struct qdisc_skb_cb {
+	union {
+		struct inet_skb_parm	h4;
+#if IS_ENABLED(CONFIG_IPV6)
+		struct inet6_skb_parm	h6;
+#endif
+	} header;
 	struct {
 		unsigned int		pkt_len;
 		u16			slave_dev_queue_mapping;