Message ID | 20250219105248.226962-1-cratiu@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [net] xfrm_output: Force software GSO only in tunnel mode | expand |
On Wed, Feb 19, 2025 at 12:52:48PM +0200, Cosmin Ratiu wrote: > The cited commit fixed a software GSO bug with VXLAN + IPSec in tunnel > mode. Unfortunately, it is slightly broader than necessary, as it also > severely affects performance for Geneve + IPSec transport mode over a > device capable of both HW GSO and IPSec crypto offload. In this case, > xfrm_output unnecessarily triggers software GSO instead of letting the > HW do it. In simple iperf3 tests over Geneve + IPSec transport mode over > a back-2-back pair of NICs with MTU 1500, the performance was observed > to be up to 6x worse when doing software GSO compared to leaving it to > the hardware. > > This commit makes xfrm_output only trigger software GSO in crypto > offload cases for already encapsulated packets in tunnel mode, as not > doing so would then cause the inner tunnel skb->inner_networking_header > to be overwritten and break software GSO for that packet later if the > device turns out to not be capable of HW GSO. > > Taking a closer look at the conditions for the original bug, to better > understand the reasons for this change: > - vxlan_build_skb -> iptunnel_handle_offloads sets inner_protocol and > inner network header. > - then, udp_tunnel_xmit_skb -> ip_tunnel_xmit adds outer transport and > network headers. > - later in the xmit path, xfrm_output -> xfrm_outer_mode_output -> > xfrm4_prepare_output -> xfrm4_tunnel_encap_add overwrites the inner > network header with the one set in ip_tunnel_xmit before adding the > second outer header. > - __dev_queue_xmit -> validate_xmit_skb checks whether GSO segmentation > needs to happen based on dev features. In the original bug, the hw > couldn't segment the packets, so skb_gso_segment was invoked. > - deep in the .gso_segment callback machinery, __skb_udp_tunnel_segment > tries to use the wrong inner network header, expecting the one set in > iptunnel_handle_offloads but getting the one set by xfrm instead. > - a bit later, ipv6_gso_segment accesses the wrong memory based on that > wrong inner network header. > > With the new change, the original bug (or similar ones) cannot happen > again, as xfrm will now trigger software GSO before applying a tunnel. > This concern doesn't exist in packet offload mode, when the HW adds > encapsulation headers. For the non-offloaded packets (crypto in SW), > software GSO is still done unconditionally in the else branch. > > Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> > Reviewed-by: Yael Chemla <ychemla@nvidia.com> > Fixes: a204aef9fd77 ("xfrm: call xfrm_output_gso when inner_protocol is set in xfrm_output") > Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> > --- > net/xfrm/xfrm_output.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Thanks, Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c index f7abd42c077d..42f1ca513879 100644 --- a/net/xfrm/xfrm_output.c +++ b/net/xfrm/xfrm_output.c @@ -758,7 +758,7 @@ int xfrm_output(struct sock *sk, struct sk_buff *skb) skb->encapsulation = 1; if (skb_is_gso(skb)) { - if (skb->inner_protocol) + if (skb->inner_protocol && x->props.mode == XFRM_MODE_TUNNEL) return xfrm_output_gso(net, sk, skb); skb_shinfo(skb)->gso_type |= SKB_GSO_ESP;