diff mbox series

[net] ipvs: Always clear ipvs_property flag in skb_scrub_packet()

Message ID 20250221013648.35716-1-lulie@linux.alibaba.com (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [net] ipvs: Always clear ipvs_property flag in skb_scrub_packet() | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 12 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 54 this patch: 54
netdev/source_inline success Was 0 now: 0
netdev/contest fail net-next-2025-02-21--15-00 (tests: 893)

Commit Message

Philo Lu Feb. 21, 2025, 1:36 a.m. UTC
We found an issue when using bpf_redirect with ipvs NAT mode after
commit ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
the same name space"). Particularly, we use bpf_redirect to return
the skb directly back to the netif it comes from, i.e., xnet is
false in skb_scrub_packet(), and then ipvs_property is preserved
and SNAT is skipped in the rx path.

ipvs_property has been already cleared when netns is changed in
commit 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when
SKB net namespace changed"). This patch just clears it in spite of
netns.

Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
---
This is in fact a fix patch, and the issue was found after commit
ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
the same name space"). But I'm not sure if a "Fixes" tag should be
added to that commit.
---
 net/core/skbuff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Nicolas Dichtel Feb. 21, 2025, 10:13 a.m. UTC | #1
Le 21/02/2025 à 02:36, Philo Lu a écrit :
> We found an issue when using bpf_redirect with ipvs NAT mode after
> commit ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
> the same name space"). Particularly, we use bpf_redirect to return
> the skb directly back to the netif it comes from, i.e., xnet is
> false in skb_scrub_packet(), and then ipvs_property is preserved
> and SNAT is skipped in the rx path.
> 
> ipvs_property has been already cleared when netns is changed in
> commit 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when
> SKB net namespace changed"). This patch just clears it in spite of
> netns.
> 
> Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
> ---
> This is in fact a fix patch, and the issue was found after commit
> ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
> the same name space"). But I'm not sure if a "Fixes" tag should be
> added to that commit.
> ---
>  net/core/skbuff.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7b03b64fdcb2..b1c81687e9d8 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -6033,11 +6033,11 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
>  	skb->offload_fwd_mark = 0;
>  	skb->offload_l3_fwd_mark = 0;
>  #endif
> +	ipvs_reset(skb);
>  
>  	if (!xnet)
>  		return;
>  
> -	ipvs_reset(skb);
I don't know IPVS, but I wonder if this patch will not introduce a regression
for other users. skb_scrub_packet() is used by a lot of tunnels, it's not
specific to bpf_redirect().


Regards,
Nicolas
Julian Anastasov Feb. 21, 2025, 11:42 a.m. UTC | #2
Hello,

On Fri, 21 Feb 2025, Philo Lu wrote:

> We found an issue when using bpf_redirect with ipvs NAT mode after
> commit ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
> the same name space"). Particularly, we use bpf_redirect to return
> the skb directly back to the netif it comes from, i.e., xnet is
> false in skb_scrub_packet(), and then ipvs_property is preserved
> and SNAT is skipped in the rx path.
> 
> ipvs_property has been already cleared when netns is changed in
> commit 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when
> SKB net namespace changed"). This patch just clears it in spite of
> netns.
> 
> Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
> ---
> This is in fact a fix patch, and the issue was found after commit
> ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
> the same name space"). But I'm not sure if a "Fixes" tag should be
> added to that commit.

	You can add 2b5ec1a5f973 as a Fixes tag in v2 and I'll ack it.

	Nowadays, ipvs_property prevents unneeded connection
lookups while the packet traverses the hooks (different IPVS
hook handlers can see the same packet in same or next hook).
This includes the case where packet can be routed to local server,
eg. via loopback_xmit(), for example:

Packet from device:
	LOCAL_IN: set ipvs_property=1
	Reroute to local server ?
		Then to hit the local server continue the NF hook
	Otherwise, route via device:
		LOCAL_OUT: use ipvs_property to avoid conn lookup
			and rerouting
		LOCAL_OUT:
			Unexpected divert to local server by someone else?
			=> LOCAL_IN: use ipvs_property to avoid conn
			lookup and rerouting

Packet from local client:
	LOCAL_OUT: set ipvs_property=1
	Reroute to local server?
		Then continue via loopback_xmit() without 
			skb_scrub_packet()
		LOCAL_IN: use ipvs_property to avoid conn lookup and
			rerouting
		Hit the local server
	Otherwise, route via device:
		Others at LOCAL_OUT: Unexpected divert to local server by 
			someone else ?
			=> LOCAL_IN: use ipvs_property to avoid conn 
			lookup and rerouting

	So, we need ipvs_property set only during the normal
path where packet is received and sent, possibly to local
server via local route. We do not care if the packet leaves
this path. Currently, ipvs_property was cleared if netns
changes but it is a safe option. What matters is if packet
is routed via local route. Other paths can reset the flag
safely.

> ---
>  net/core/skbuff.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7b03b64fdcb2..b1c81687e9d8 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -6033,11 +6033,11 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
>  	skb->offload_fwd_mark = 0;
>  	skb->offload_l3_fwd_mark = 0;
>  #endif
> +	ipvs_reset(skb);
>  
>  	if (!xnet)
>  		return;
>  
> -	ipvs_reset(skb);
>  	skb->mark = 0;
>  	skb_clear_tstamp(skb);
>  }
> -- 
> 2.32.0.3.g01195cf9f

Regards

--
Julian Anastasov <ja@ssi.bg>
Philo Lu Feb. 21, 2025, 11:55 a.m. UTC | #3
On 2025/2/21 19:42, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Fri, 21 Feb 2025, Philo Lu wrote:
> 
>> We found an issue when using bpf_redirect with ipvs NAT mode after
>> commit ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
>> the same name space"). Particularly, we use bpf_redirect to return
>> the skb directly back to the netif it comes from, i.e., xnet is
>> false in skb_scrub_packet(), and then ipvs_property is preserved
>> and SNAT is skipped in the rx path.
>>
>> ipvs_property has been already cleared when netns is changed in
>> commit 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when
>> SKB net namespace changed"). This patch just clears it in spite of
>> netns.
>>
>> Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
>> ---
>> This is in fact a fix patch, and the issue was found after commit
>> ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
>> the same name space"). But I'm not sure if a "Fixes" tag should be
>> added to that commit.
> 
> 	You can add 2b5ec1a5f973 as a Fixes tag in v2 and I'll ack it.

Thank you, Julian. You also solve my worries. I'll post v2 soon.
Julian Anastasov Feb. 21, 2025, 3:59 p.m. UTC | #4
Hello,

On Fri, 21 Feb 2025, Nicolas Dichtel wrote:

> Le 21/02/2025 à 02:36, Philo Lu a écrit :
> > We found an issue when using bpf_redirect with ipvs NAT mode after
> > commit ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
> > the same name space"). Particularly, we use bpf_redirect to return
> > the skb directly back to the netif it comes from, i.e., xnet is
> > false in skb_scrub_packet(), and then ipvs_property is preserved
> > and SNAT is skipped in the rx path.
> > 
> > ipvs_property has been already cleared when netns is changed in
> > commit 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when
> > SKB net namespace changed"). This patch just clears it in spite of
> > netns.
> > 
> > Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
> > ---
> > This is in fact a fix patch, and the issue was found after commit
> > ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
> > the same name space"). But I'm not sure if a "Fixes" tag should be
> > added to that commit.
> > ---
> >  net/core/skbuff.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 7b03b64fdcb2..b1c81687e9d8 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -6033,11 +6033,11 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
> >  	skb->offload_fwd_mark = 0;
> >  	skb->offload_l3_fwd_mark = 0;
> >  #endif
> > +	ipvs_reset(skb);
> >  
> >  	if (!xnet)
> >  		return;
> >  
> > -	ipvs_reset(skb);
> I don't know IPVS, but I wonder if this patch will not introduce a regression
> for other users. skb_scrub_packet() is used by a lot of tunnels, it's not
> specific to bpf_redirect().

	Indeed, now we will start to clear the flag for tunnels
and it can cause IPVS to attempt rerouting for UDP tunnels, i.e.
once the packet is routed by IPVS to tunnel and second time later 
after tunneling again when ip_local_out() is called and we see 
ipvs_property=0 for the outer UDP header. Before such patch
ipvs_property remains 1 and we do not try to balance the UDP
traffic. So, for now, this patch may be can spend more cycles for
the traffic via UDP tunnels but this looks like a rare IPVS setup,
i.e. real servers reachable via UDP tunnels. Note that IPVS
has own support for UDP tunnels where it is set as forwarding
method + tunnel type GUE for the real server. It is not affected
by this patch.

Regards

--
Julian Anastasov <ja@ssi.bg>
diff mbox series

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7b03b64fdcb2..b1c81687e9d8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -6033,11 +6033,11 @@  void skb_scrub_packet(struct sk_buff *skb, bool xnet)
 	skb->offload_fwd_mark = 0;
 	skb->offload_l3_fwd_mark = 0;
 #endif
+	ipvs_reset(skb);
 
 	if (!xnet)
 		return;
 
-	ipvs_reset(skb);
 	skb->mark = 0;
 	skb_clear_tstamp(skb);
 }