Message ID | 9cd9ca4ac2c19be288cb8734a86eb30e4d9e2050.1649715555.git.peilin.ye@bytedance.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,1/2] ip6_gre: Avoid updating tunnel->tun_hlen in __gre6_xmit() | expand |
On Mon, 11 Apr 2022 15:33:00 -0700 Peilin Ye wrote: > The following sequence of events caused the BUG: > > 1. During ip6gretap device initialization, tunnel->tun_hlen (e.g. 4) is > calculated based on old flags (see ip6gre_calc_hlen()); > 2. packet_snd() reserves header room for skb A, assuming > tunnel->tun_hlen is 4; > 3. Later (in clsact Qdisc), the eBPF program sets a new tunnel key for > skb A using bpf_skb_set_tunnel_key() (see _ip6gretap_set_tunnel()); > 4. __gre6_xmit() detects the new tunnel key, and recalculates > "tun_hlen" (e.g. 12) based on new flags (e.g. TUNNEL_KEY and > TUNNEL_SEQ); > 5. gre_build_header() calls skb_push() with insufficient reserved header > room, triggering the BUG. > > As sugguested by Cong, fix it by moving the call to skb_cow_head() after > the recalculation of tun_hlen. > > Reported-by: Feng Zhou <zhoufeng.zf@bytedance.com> > Co-developed-by: Cong Wang <cong.wang@bytedance.com> > Signed-off-by: Cong Wang <cong.wang@bytedance.com> > Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> > --- > Hi all, > > I couldn't find a proper Fixes: tag for this fix; please comment if you > have any sugguestions. Thanks! What's wrong with Fixes: 6712abc168eb ("ip6_gre: add ip6 gre and gretap collect_md mode") ? > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c > index b43a46449130..976236736146 100644 > --- a/net/ipv6/ip6_gre.c > +++ b/net/ipv6/ip6_gre.c > @@ -733,9 +733,6 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb, > else > fl6->daddr = tunnel->parms.raddr; > > - if (skb_cow_head(skb, dev->needed_headroom ?: tunnel->hlen)) > - return -ENOMEM; > - > /* Push GRE header. */ > protocol = (dev->type == ARPHRD_ETHER) ? htons(ETH_P_TEB) : proto; > > @@ -763,6 +760,9 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb, > (TUNNEL_CSUM | TUNNEL_KEY | TUNNEL_SEQ); We should also reject using SEQ with collect_md, but that's a separate issue. > tun_hlen = gre_calc_hlen(flags); > > + if (skb_cow_head(skb, dev->needed_headroom ?: tun_hlen + tunnel->encap_hlen)) > + return -ENOMEM; > + > gre_build_header(skb, tun_hlen, > flags, protocol, > tunnel_id_to_key32(tun_info->key.tun_id),
Hi Jakub, On Thu, Apr 14, 2022 at 01:14:24PM +0200, Jakub Kicinski wrote: > On Mon, 11 Apr 2022 15:33:00 -0700 Peilin Ye wrote: > > I couldn't find a proper Fixes: tag for this fix; please comment if you > > have any sugguestions. Thanks! > > What's wrong with > > Fixes: 6712abc168eb ("ip6_gre: add ip6 gre and gretap collect_md mode") > > ? Thanks! I will add this in v2 soon. > > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c > > index b43a46449130..976236736146 100644 > > --- a/net/ipv6/ip6_gre.c > > +++ b/net/ipv6/ip6_gre.c > > @@ -733,9 +733,6 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb, > > else > > fl6->daddr = tunnel->parms.raddr; > > > > - if (skb_cow_head(skb, dev->needed_headroom ?: tunnel->hlen)) > > - return -ENOMEM; > > - > > /* Push GRE header. */ > > protocol = (dev->type == ARPHRD_ETHER) ? htons(ETH_P_TEB) : proto; > > > > @@ -763,6 +760,9 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb, > > (TUNNEL_CSUM | TUNNEL_KEY | TUNNEL_SEQ); > > We should also reject using SEQ with collect_md, but that's a separate > issue. Could you explain this a bit more? It seems that commit 77a5196a804e ("gre: add sequence number for collect md mode.") added this intentionally. Thanks, Peilin Ye
On Thu, 14 Apr 2022 13:08:54 -0700 Peilin Ye wrote: > > We should also reject using SEQ with collect_md, but that's a separate > > issue. > > Could you explain this a bit more? It seems that commit 77a5196a804e > ("gre: add sequence number for collect md mode.") added this > intentionally. Interesting. Maybe a better way of dealing with the problem would be rejecting SEQ if it's not set on the device itself. When the device is set up without the SEQ bit enabled it disables Tx locking (look for LLTX). This means that multiple CPUs can try to do the tunnel->o_seqno++ in parallel. Not catastrophic but racy for sure.
On Fri, Apr 15, 2022 at 07:11:33PM +0200, Jakub Kicinski wrote: > On Thu, 14 Apr 2022 13:08:54 -0700 Peilin Ye wrote: > > > We should also reject using SEQ with collect_md, but that's a separate > > > issue. > > > > Could you explain this a bit more? It seems that commit 77a5196a804e > > ("gre: add sequence number for collect md mode.") added this > > intentionally. > > Interesting. Maybe a better way of dealing with the problem would be > rejecting SEQ if it's not set on the device itself. According to ip-link(8), the 'external' option is mutually exclusive with the '[o]seq' option. In other words, a collect_md mode IP6GRETAP device should always have the TUNNEL_SEQ flag off in its 'tunnel->parms.o_flags'. (However, I just tried: $ ip link add dev ip6gretap11 type ip6gretap oseq external ^^^^ ^^^^^^^^ ...and my 'ip' executed it with no error. I will take a closer look at iproute2 later; maybe it's undefined behavior...) How about: 1. If 'external', then 'oseq' means "always turn off NETIF_F_LLTX, so it's okay to set TUNNEL_SEQ in e.g. eBPF"; 2. Otherwise, if 'external' but NOT 'oseq', then whenever we see a TUNNEL_SEQ in skb's tunnel info, we do something like WARN_ONCE() then return -EINVAL. ? > When the device is set up without the SEQ bit enabled it disables Tx > locking (look for LLTX). This means that multiple CPUs can try to do > the tunnel->o_seqno++ in parallel. Not catastrophic but racy for sure. Thanks for the explanation! At first glance, I was wondering why don't we make 'o_seqno' atomic until I found commit b790e01aee74 ("ip_gre: lockless xmit"). I quote: """ Even using an atomic_t o_seq, we would increase chance for packets being out of order at receiver. """ I don't fully understand this out-of-order yet, but it seems that making 'o_seqno' atomic is not an option? Thanks, Peilin Ye
On Fri, 15 Apr 2022 23:56:33 -0700 Peilin Ye wrote: > On Fri, Apr 15, 2022 at 07:11:33PM +0200, Jakub Kicinski wrote: > > > Could you explain this a bit more? It seems that commit 77a5196a804e > > > ("gre: add sequence number for collect md mode.") added this > > > intentionally. > > > > Interesting. Maybe a better way of dealing with the problem would be > > rejecting SEQ if it's not set on the device itself. > > According to ip-link(8), the 'external' option is mutually exclusive > with the '[o]seq' option. In other words, a collect_md mode IP6GRETAP > device should always have the TUNNEL_SEQ flag off in its > 'tunnel->parms.o_flags'. > > (However, I just tried: > > $ ip link add dev ip6gretap11 type ip6gretap oseq external > ^^^^ ^^^^^^^^ > ...and my 'ip' executed it with no error. I will take a closer look at > iproute2 later; maybe it's undefined behavior...) > > How about: > > 1. If 'external', then 'oseq' means "always turn off NETIF_F_LLTX, so > it's okay to set TUNNEL_SEQ in e.g. eBPF"; > > 2. Otherwise, if 'external' but NOT 'oseq', then whenever we see a > TUNNEL_SEQ in skb's tunnel info, we do something like WARN_ONCE() then > return -EINVAL. Maybe pr_warn_once(), no need for a stacktrace. > > When the device is set up without the SEQ bit enabled it disables Tx > > locking (look for LLTX). This means that multiple CPUs can try to do > > the tunnel->o_seqno++ in parallel. Not catastrophic but racy for sure. > > Thanks for the explanation! At first glance, I was wondering why don't > we make 'o_seqno' atomic until I found commit b790e01aee74 ("ip_gre: > lockless xmit"). I quote: > > """ > Even using an atomic_t o_seq, we would increase chance for packets being > out of order at receiver. > """ > > I don't fully understand this out-of-order yet, but it seems that making > 'o_seqno' atomic is not an option? atomic_t would also work (if it has enough bits). Whatever is simplest TBH. It's just about correctness, I don't think seq is widely used. Thanks!
On Sat, Apr 16, 2022 at 09:33:20AM +0200, Jakub Kicinski wrote: > On Fri, 15 Apr 2022 23:56:33 -0700 Peilin Ye wrote: > > On Fri, Apr 15, 2022 at 07:11:33PM +0200, Jakub Kicinski wrote: > > > > Could you explain this a bit more? It seems that commit 77a5196a804e > > > > ("gre: add sequence number for collect md mode.") added this > > > > intentionally. > > > > > > Interesting. Maybe a better way of dealing with the problem would be > > > rejecting SEQ if it's not set on the device itself. > > > > According to ip-link(8), the 'external' option is mutually exclusive > > with the '[o]seq' option. In other words, a collect_md mode IP6GRETAP > > device should always have the TUNNEL_SEQ flag off in its > > 'tunnel->parms.o_flags'. > > > > (However, I just tried: > > > > $ ip link add dev ip6gretap11 type ip6gretap oseq external > > ^^^^ ^^^^^^^^ > > ...and my 'ip' executed it with no error. I will take a closer look at > > iproute2 later; maybe it's undefined behavior...) > > > > How about: > > > > 1. If 'external', then 'oseq' means "always turn off NETIF_F_LLTX, so > > it's okay to set TUNNEL_SEQ in e.g. eBPF"; > > > > 2. Otherwise, if 'external' but NOT 'oseq', then whenever we see a > > TUNNEL_SEQ in skb's tunnel info, we do something like WARN_ONCE() then > > return -EINVAL. > > Maybe pr_warn_once(), no need for a stacktrace. Ah, thanks, coffee needed... > > > When the device is set up without the SEQ bit enabled it disables Tx > > > locking (look for LLTX). This means that multiple CPUs can try to do > > > the tunnel->o_seqno++ in parallel. Not catastrophic but racy for sure. > > > > Thanks for the explanation! At first glance, I was wondering why don't > > we make 'o_seqno' atomic until I found commit b790e01aee74 ("ip_gre: > > lockless xmit"). I quote: > > > > """ > > Even using an atomic_t o_seq, we would increase chance for packets being > > out of order at receiver. > > """ > > > > I don't fully understand this out-of-order yet, but it seems that making > > 'o_seqno' atomic is not an option? > > atomic_t would also work (if it has enough bits). Whatever is simplest > TBH. It's just about correctness, I don't think seq is widely used. I see, I will work on this, thanks! Peilin Ye
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index b43a46449130..976236736146 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -733,9 +733,6 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb, else fl6->daddr = tunnel->parms.raddr; - if (skb_cow_head(skb, dev->needed_headroom ?: tunnel->hlen)) - return -ENOMEM; - /* Push GRE header. */ protocol = (dev->type == ARPHRD_ETHER) ? htons(ETH_P_TEB) : proto; @@ -763,6 +760,9 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb, (TUNNEL_CSUM | TUNNEL_KEY | TUNNEL_SEQ); tun_hlen = gre_calc_hlen(flags); + if (skb_cow_head(skb, dev->needed_headroom ?: tun_hlen + tunnel->encap_hlen)) + return -ENOMEM; + gre_build_header(skb, tun_hlen, flags, protocol, tunnel_id_to_key32(tun_info->key.tun_id), @@ -773,6 +773,9 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb, if (tunnel->parms.o_flags & TUNNEL_SEQ) tunnel->o_seqno++; + if (skb_cow_head(skb, dev->needed_headroom ?: tunnel->hlen)) + return -ENOMEM; + gre_build_header(skb, tunnel->tun_hlen, tunnel->parms.o_flags, protocol, tunnel->parms.o_key, htonl(tunnel->o_seqno));