Message ID | 26109603287b4d21545bec125e43b218b545b746.1640111022.git.pabeni@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] veth: ensure skb entering GRO are not cloned. | expand |
On Tue, Dec 21, 2021 at 1:34 PM Paolo Abeni <pabeni@redhat.com> wrote: > > After commit d3256efd8e8b ("veth: allow enabling NAPI even without XDP"), > if GRO is enabled on a veth device and TSO is disabled on the peer > device, TCP skbs will go through the NAPI callback. If there is no XDP > program attached, the veth code does not perform any share check, and > shared/cloned skbs could enter the GRO engine. > > ... > Address the issue checking for cloned skbs even in the GRO-without-XDP > input path. > > Reported-and-tested-by: Ignat Korchagin <ignat@cloudflare.com> > Fixes: d3256efd8e8b ("veth: allow enabling NAPI even without XDP") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > drivers/net/veth.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > index b78894c38933..abd1f949b2f5 100644 > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c > @@ -718,6 +718,14 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > rcu_read_lock(); > xdp_prog = rcu_dereference(rq->xdp_prog); > if (unlikely(!xdp_prog)) { > + if (unlikely(skb_shared(skb) || skb_head_is_locked(skb))) { Why skb_head_is_locked() needed here ? I would think skb_cloned() is enough for the problem we want to address. > + struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC | __GFP_NOWARN); > + > + if (!nskb) > + goto drop; > + consume_skb(skb); > + skb = nskb; > + } > rcu_read_unlock(); > goto out; > } > -- > 2.33.1 > - It seems adding yet memory alloc/free and copies is defeating GRO purpose. - After skb_copy(), GRO is forced to use the expensive frag_list way for aggregation anyway. - veth mtu could be set to 64KB, so we could have order-4 allocation attempts here. Would the following fix [1] be better maybe, in terms of efficiency, and keeping around skb EDT/tstamp information (see recent thread with Martin and Daniel ) I think it also focuses more on the problem (GRO is not capable of dealing with cloned skb yet). Who knows, maybe in the future we will _have_ to add more checks in GRO fast path for some other reason, since it is becoming the Swiss army knife of networking :) Although I guess this whole case (disabling TSO) is moot, I have no idea why anyone would do that :) [1] diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 50eb43e5bf459bb998e264d399bc85d4e9d73594..fe7a4d2f7bfc834ea56d1da185c0f53bfbd22ad0 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -879,8 +879,12 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget, stats->xdp_bytes += skb->len; skb = veth_xdp_rcv_skb(rq, skb, bq, stats); - if (skb) - napi_gro_receive(&rq->xdp_napi, skb); + if (skb) { + if (skb_shared(skb) || skb_cloned(skb)) + netif_receive_skb(skb); + else + napi_gro_receive(&rq->xdp_napi, skb); + } } done++; }
Hello, On Tue, 2021-12-21 at 20:31 -0800, Eric Dumazet wrote: > On Tue, Dec 21, 2021 at 1:34 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > After commit d3256efd8e8b ("veth: allow enabling NAPI even without XDP"), > > if GRO is enabled on a veth device and TSO is disabled on the peer > > device, TCP skbs will go through the NAPI callback. If there is no XDP > > program attached, the veth code does not perform any share check, and > > shared/cloned skbs could enter the GRO engine. > > > > > > ... > > > Address the issue checking for cloned skbs even in the GRO-without-XDP > > input path. > > > > Reported-and-tested-by: Ignat Korchagin <ignat@cloudflare.com> > > Fixes: d3256efd8e8b ("veth: allow enabling NAPI even without XDP") > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > drivers/net/veth.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > > index b78894c38933..abd1f949b2f5 100644 > > --- a/drivers/net/veth.c > > +++ b/drivers/net/veth.c > > @@ -718,6 +718,14 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > > rcu_read_lock(); > > xdp_prog = rcu_dereference(rq->xdp_prog); > > if (unlikely(!xdp_prog)) { > > + if (unlikely(skb_shared(skb) || skb_head_is_locked(skb))) { > > Why skb_head_is_locked() needed here ? > I would think skb_cloned() is enough for the problem we want to address. Thank you for the feedback. I double checked the above: in my test even skb_cloned() suffice. > > + struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC | __GFP_NOWARN); > > + > > + if (!nskb) > > + goto drop; > > + consume_skb(skb); > > + skb = nskb; > > + } > > rcu_read_unlock(); > > goto out; > > } > > -- > > 2.33.1 > > > > - It seems adding yet memory alloc/free and copies is defeating GRO purpose. > - After skb_copy(), GRO is forced to use the expensive frag_list way > for aggregation anyway. > - veth mtu could be set to 64KB, so we could have order-4 allocation > attempts here. > > Would the following fix [1] be better maybe, in terms of efficiency, > and keeping around skb EDT/tstamp > information (see recent thread with Martin and Daniel ) > > I think it also focuses more on the problem (GRO is not capable of > dealing with cloned skb yet). > Who knows, maybe in the future we will _have_ to add more checks in > GRO fast path for some other reason, > since it is becoming the Swiss army knife of networking :) Only vaguely related: I have a bunch of micro optimizations for the GRO engine. I did not submit the patches because I can observe the gain only in micro-benchmarks, but I'm wondering if that could be visible with very high speed TCP stream? I can share the code if that could be of general interest (after some rebasing, the patches predates gro.c) > Although I guess this whole case (disabling TSO) is moot, I have no > idea why anyone would do that :) > > [1] > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > index 50eb43e5bf459bb998e264d399bc85d4e9d73594..fe7a4d2f7bfc834ea56d1da185c0f53bfbd22ad0 > 100644 > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c > @@ -879,8 +879,12 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget, > > stats->xdp_bytes += skb->len; > skb = veth_xdp_rcv_skb(rq, skb, bq, stats); > - if (skb) > - napi_gro_receive(&rq->xdp_napi, skb); > + if (skb) { > + if (skb_shared(skb) || skb_cloned(skb)) > + netif_receive_skb(skb); > + else > + napi_gro_receive(&rq->xdp_napi, skb); > + } > } > done++; > } I tested the above, and it works, too. I thought about something similar, but I overlooked possible OoO or behaviour changes when a packet socket is attached to the paired device (as it would disable GRO). It looks like tcpdump should have not ill-effects (the mmap rx-path releases the skb clone before the orig packet reaches the other end), so I guess the above is fine (and sure is better to avoid more timestamp related problem). Do you prefer to submit it formally, or do you prefer I'll send a v2 with the latter code? Thanks! Paolo
On Wed, Dec 22, 2021 at 3:06 AM Paolo Abeni <pabeni@redhat.com> wrote: > > Hello, > > On Tue, 2021-12-21 at 20:31 -0800, Eric Dumazet wrote: > > On Tue, Dec 21, 2021 at 1:34 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > After commit d3256efd8e8b ("veth: allow enabling NAPI even without XDP"), > > > if GRO is enabled on a veth device and TSO is disabled on the peer > > > device, TCP skbs will go through the NAPI callback. If there is no XDP > > > program attached, the veth code does not perform any share check, and > > > shared/cloned skbs could enter the GRO engine. > > > > > > > > > > ... > > > > > Address the issue checking for cloned skbs even in the GRO-without-XDP > > > input path. > > > > > > Reported-and-tested-by: Ignat Korchagin <ignat@cloudflare.com> > > > Fixes: d3256efd8e8b ("veth: allow enabling NAPI even without XDP") > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > > --- > > > drivers/net/veth.c | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > > > index b78894c38933..abd1f949b2f5 100644 > > > --- a/drivers/net/veth.c > > > +++ b/drivers/net/veth.c > > > @@ -718,6 +718,14 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > > > rcu_read_lock(); > > > xdp_prog = rcu_dereference(rq->xdp_prog); > > > if (unlikely(!xdp_prog)) { > > > + if (unlikely(skb_shared(skb) || skb_head_is_locked(skb))) { > > > > Why skb_head_is_locked() needed here ? > > I would think skb_cloned() is enough for the problem we want to address. > > Thank you for the feedback. > > I double checked the above: in my test even skb_cloned() suffice. > > > > + struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC | __GFP_NOWARN); > > > + > > > + if (!nskb) > > > + goto drop; > > > + consume_skb(skb); > > > + skb = nskb; > > > + } > > > rcu_read_unlock(); > > > goto out; > > > } > > > -- > > > 2.33.1 > > > > > > > - It seems adding yet memory alloc/free and copies is defeating GRO purpose. > > - After skb_copy(), GRO is forced to use the expensive frag_list way > > for aggregation anyway. > > - veth mtu could be set to 64KB, so we could have order-4 allocation > > attempts here. > > > > Would the following fix [1] be better maybe, in terms of efficiency, > > and keeping around skb EDT/tstamp > > information (see recent thread with Martin and Daniel ) > > > > I think it also focuses more on the problem (GRO is not capable of > > dealing with cloned skb yet). > > Who knows, maybe in the future we will _have_ to add more checks in > > GRO fast path for some other reason, > > since it is becoming the Swiss army knife of networking :) > > Only vaguely related: I have a bunch of micro optimizations for the GRO > engine. I did not submit the patches because I can observe the gain > only in micro-benchmarks, but I'm wondering if that could be visible > with very high speed TCP stream? I can share the code if that could be > of general interest (after some rebasing, the patches predates gro.c) > > > Although I guess this whole case (disabling TSO) is moot, I have no > > idea why anyone would do that :) > > > > [1] > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > > index 50eb43e5bf459bb998e264d399bc85d4e9d73594..fe7a4d2f7bfc834ea56d1da185c0f53bfbd22ad0 > > 100644 > > --- a/drivers/net/veth.c > > +++ b/drivers/net/veth.c > > @@ -879,8 +879,12 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget, > > > > stats->xdp_bytes += skb->len; > > skb = veth_xdp_rcv_skb(rq, skb, bq, stats); > > - if (skb) > > - napi_gro_receive(&rq->xdp_napi, skb); > > + if (skb) { > > + if (skb_shared(skb) || skb_cloned(skb)) > > + netif_receive_skb(skb); > > + else > > + napi_gro_receive(&rq->xdp_napi, skb); > > + } > > } > > done++; > > } > > I tested the above, and it works, too. > > I thought about something similar, but I overlooked possible OoO or > behaviour changes when a packet socket is attached to the paired device > (as it would disable GRO). Have you tried a pskb_expand_head() instead of a full copy ? Perhaps that would be enough, and keep all packets going through GRO to make sure OOO is covered. > > It looks like tcpdump should have not ill-effects (the mmap rx-path > releases the skb clone before the orig packet reaches the other end), > so I guess the above is fine (and sure is better to avoid more > timestamp related problem). > > Do you prefer to submit it formally, or do you prefer I'll send a v2 > with the latter code? Sure, please submit a V2. > > Thanks! > > Paolo > >
On Wed, 2021-12-22 at 03:58 -0800, Eric Dumazet wrote: > On Wed, Dec 22, 2021 at 3:06 AM Paolo Abeni <pabeni@redhat.com> wrote: > > I thought about something similar, but I overlooked possible OoO or > > behaviour changes when a packet socket is attached to the paired device > > (as it would disable GRO). > > Have you tried a pskb_expand_head() instead of a full copy ? > Perhaps that would be enough, and keep all packets going through GRO to > make sure OOO is covered. Indeed it looks like it's enough. I'll do some more testing and I'll send a v2 using pskb_expand_head(). Many thanks! Paolo
On Wed, Dec 22, 2021 at 5:17 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Dec 21, 2021 at 1:34 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > After commit d3256efd8e8b ("veth: allow enabling NAPI even without XDP"), > > if GRO is enabled on a veth device and TSO is disabled on the peer > > device, TCP skbs will go through the NAPI callback. If there is no XDP > > program attached, the veth code does not perform any share check, and > > shared/cloned skbs could enter the GRO engine. > > > > > > ... > > > Address the issue checking for cloned skbs even in the GRO-without-XDP > > input path. > > > > Reported-and-tested-by: Ignat Korchagin <ignat@cloudflare.com> > > Fixes: d3256efd8e8b ("veth: allow enabling NAPI even without XDP") > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > drivers/net/veth.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > > index b78894c38933..abd1f949b2f5 100644 > > --- a/drivers/net/veth.c > > +++ b/drivers/net/veth.c > > @@ -718,6 +718,14 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > > rcu_read_lock(); > > xdp_prog = rcu_dereference(rq->xdp_prog); > > if (unlikely(!xdp_prog)) { > > + if (unlikely(skb_shared(skb) || skb_head_is_locked(skb))) { > > Why skb_head_is_locked() needed here ? > I would think skb_cloned() is enough for the problem we want to address. > > > + struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC | __GFP_NOWARN); > > + > > + if (!nskb) > > + goto drop; > > + consume_skb(skb); > > + skb = nskb; > > + } > > rcu_read_unlock(); > > goto out; > > } > > -- > > 2.33.1 > > > > - It seems adding yet memory alloc/free and copies is defeating GRO purpose. > - After skb_copy(), GRO is forced to use the expensive frag_list way > for aggregation anyway. > - veth mtu could be set to 64KB, so we could have order-4 allocation > attempts here. > > Would the following fix [1] be better maybe, in terms of efficiency, > and keeping around skb EDT/tstamp > information (see recent thread with Martin and Daniel ) I've always liked the idea of being able to coherently timestamp from packet ingress to egress. > I think it also focuses more on the problem (GRO is not capable of > dealing with cloned skb yet). > Who knows, maybe in the future we will _have_ to add more checks in > GRO fast path for some other reason, > since it is becoming the Swiss army knife of networking :) GRO is the bane of my sub-gbit existence. I've been wishing we had a compile time option to always split it up or a righter path forward using veth interfaces here: https://github.com/rchac/LibreQoS > Although I guess this whole case (disabling TSO) is moot, I have no > idea why anyone would do that :) > > [1] > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > index 50eb43e5bf459bb998e264d399bc85d4e9d73594..fe7a4d2f7bfc834ea56d1da185c0f53bfbd22ad0 > 100644 > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c > @@ -879,8 +879,12 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget, > > stats->xdp_bytes += skb->len; > skb = veth_xdp_rcv_skb(rq, skb, bq, stats); > - if (skb) > - napi_gro_receive(&rq->xdp_napi, skb); > + if (skb) { > + if (skb_shared(skb) || skb_cloned(skb)) > + netif_receive_skb(skb); > + else > + napi_gro_receive(&rq->xdp_napi, skb); > + } > } > done++; > }
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index b78894c38933..abd1f949b2f5 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -718,6 +718,14 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, rcu_read_lock(); xdp_prog = rcu_dereference(rq->xdp_prog); if (unlikely(!xdp_prog)) { + if (unlikely(skb_shared(skb) || skb_head_is_locked(skb))) { + struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC | __GFP_NOWARN); + + if (!nskb) + goto drop; + consume_skb(skb); + skb = nskb; + } rcu_read_unlock(); goto out; }
After commit d3256efd8e8b ("veth: allow enabling NAPI even without XDP"), if GRO is enabled on a veth device and TSO is disabled on the peer device, TCP skbs will go through the NAPI callback. If there is no XDP program attached, the veth code does not perform any share check, and shared/cloned skbs could enter the GRO engine. Ignat reported a BUG triggered later-on due to the above condition: [ 53.970529][ C1] kernel BUG at net/core/skbuff.c:3574! [ 53.981755][ C1] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 53.982634][ C1] CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 5.16.0-rc5+ #25 [ 53.982634][ C1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 53.982634][ C1] RIP: 0010:skb_shift+0x13ef/0x23b0 [ 53.982634][ C1] Code: ea 03 0f b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 41 0c 00 00 41 80 7f 02 00 4d 8d b5 d0 00 00 00 0f 85 74 f5 ff ff <0f> 0b 4d 8d 77 20 be 04 00 00 00 4c 89 44 24 78 4c 89 f7 4c 89 8c [ 53.982634][ C1] RSP: 0018:ffff8881008f7008 EFLAGS: 00010246 [ 53.982634][ C1] RAX: 0000000000000000 RBX: ffff8881180b4c80 RCX: 0000000000000000 [ 53.982634][ C1] RDX: 0000000000000002 RSI: ffff8881180b4d3c RDI: ffff88810bc9cac2 [ 53.982634][ C1] RBP: ffff8881008f70b8 R08: ffff8881180b4cf4 R09: ffff8881180b4cf0 [ 53.982634][ C1] R10: ffffed1022999e5c R11: 0000000000000002 R12: 0000000000000590 [ 53.982634][ C1] R13: ffff88810f940c80 R14: ffff88810f940d50 R15: ffff88810bc9cac0 [ 53.982634][ C1] FS: 0000000000000000(0000) GS:ffff888235880000(0000) knlGS:0000000000000000 [ 53.982634][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 53.982634][ C1] CR2: 00007ff5f9b86680 CR3: 0000000108ce8004 CR4: 0000000000170ee0 [ 53.982634][ C1] Call Trace: [ 53.982634][ C1] <TASK> [ 53.982634][ C1] tcp_sacktag_walk+0xaba/0x18e0 [ 53.982634][ C1] tcp_sacktag_write_queue+0xe7b/0x3460 [ 53.982634][ C1] tcp_ack+0x2666/0x54b0 [ 53.982634][ C1] tcp_rcv_established+0x4d9/0x20f0 [ 53.982634][ C1] tcp_v4_do_rcv+0x551/0x810 [ 53.982634][ C1] tcp_v4_rcv+0x22ed/0x2ed0 [ 53.982634][ C1] ip_protocol_deliver_rcu+0x96/0xaf0 [ 53.982634][ C1] ip_local_deliver_finish+0x1e0/0x2f0 [ 53.982634][ C1] ip_sublist_rcv_finish+0x211/0x440 [ 53.982634][ C1] ip_list_rcv_finish.constprop.0+0x424/0x660 [ 53.982634][ C1] ip_list_rcv+0x2c8/0x410 [ 53.982634][ C1] __netif_receive_skb_list_core+0x65c/0x910 [ 53.982634][ C1] netif_receive_skb_list_internal+0x5f9/0xcb0 [ 53.982634][ C1] napi_complete_done+0x188/0x6e0 [ 53.982634][ C1] gro_cell_poll+0x10c/0x1d0 [ 53.982634][ C1] __napi_poll+0xa1/0x530 [ 53.982634][ C1] net_rx_action+0x567/0x1270 [ 53.982634][ C1] __do_softirq+0x28a/0x9ba [ 53.982634][ C1] run_ksoftirqd+0x32/0x60 [ 53.982634][ C1] smpboot_thread_fn+0x559/0x8c0 [ 53.982634][ C1] kthread+0x3b9/0x490 [ 53.982634][ C1] ret_from_fork+0x22/0x30 [ 53.982634][ C1] </TASK> Address the issue checking for cloned skbs even in the GRO-without-XDP input path. Reported-and-tested-by: Ignat Korchagin <ignat@cloudflare.com> Fixes: d3256efd8e8b ("veth: allow enabling NAPI even without XDP") Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- drivers/net/veth.c | 8 ++++++++ 1 file changed, 8 insertions(+)