Message ID | e5d4bacef76ef439b6eb8e7f4973161ca131dfee.1620223174.git.pabeni@redhat.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | udp: more FRAGLIST fixes | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 4 maintainers not CCed: alobakin@pm.me jonathan.lemon@gmail.com gnault@redhat.com cong.wang@bytedance.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 1 this patch: 1 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | WARNING: line length of 81 exceeds 80 columns |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 1 this patch: 1 |
netdev/header_inline | success | Link |
On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote: > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor > callback is available, the skb destructor is invoked on each > aggregated packet via skb_release_head_state(). > > Such field (and the pairer skb->sk) is left untouched, so the same > destructor is invoked again when the segmented skbs are freed, leading > to double-free/UaF of the relevant socket. Similar to skb_segment, should the destructor be swapped with the last segment and callback delayed, instead of called immediately as part of segmentation? /* Following permits correct backpressure, for protocols * using skb_set_owner_w(). * Idea is to tranfert ownership from head_skb to last segment. */ if (head_skb->destructor == sock_wfree) { swap(tail->truesize, head_skb->truesize); swap(tail->destructor, head_skb->destructor); swap(tail->sk, head_skb->sk); }
On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote: > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote: > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor > > callback is available, the skb destructor is invoked on each > > aggregated packet via skb_release_head_state(). > > > > Such field (and the pairer skb->sk) is left untouched, so the same > > destructor is invoked again when the segmented skbs are freed, leading > > to double-free/UaF of the relevant socket. > > Similar to skb_segment, should the destructor be swapped with the last > segment and callback delayed, instead of called immediately as part of > segmentation? > > /* Following permits correct backpressure, for protocols > * using skb_set_owner_w(). > * Idea is to tranfert ownership from head_skb to last segment. > */ > if (head_skb->destructor == sock_wfree) { > swap(tail->truesize, head_skb->truesize); > swap(tail->destructor, head_skb->destructor); > swap(tail->sk, head_skb->sk); > } My understanding is that one assumption in the original SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not owned by any socket. AFAICS the above assumption was true until: commit c75fb320d482a5ce6e522378d137fd2c3bf79225 Author: Paolo Abeni <pabeni@redhat.com> Date: Fri Apr 9 13:04:37 2021 +0200 veth: use skb_orphan_partial instead of skb_orphan after that, if the skb is owned, skb->destructor is sock_efree(), so the above code should not trigger. More importantly SKB_GSO_FRAGLIST can only be applied if the inner- most protocol is UDP, so commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475 and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant. Thanks! Paolo
On Wed, May 5, 2021 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote: > > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor > > > callback is available, the skb destructor is invoked on each > > > aggregated packet via skb_release_head_state(). > > > > > > Such field (and the pairer skb->sk) is left untouched, so the same > > > destructor is invoked again when the segmented skbs are freed, leading > > > to double-free/UaF of the relevant socket. > > > > Similar to skb_segment, should the destructor be swapped with the last > > segment and callback delayed, instead of called immediately as part of > > segmentation? > > > > /* Following permits correct backpressure, for protocols > > * using skb_set_owner_w(). > > * Idea is to tranfert ownership from head_skb to last segment. > > */ > > if (head_skb->destructor == sock_wfree) { > > swap(tail->truesize, head_skb->truesize); > > swap(tail->destructor, head_skb->destructor); > > swap(tail->sk, head_skb->sk); > > } > > My understanding is that one assumption in the original > SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not > owned by any socket. > > AFAICS the above assumption was true until: > > commit c75fb320d482a5ce6e522378d137fd2c3bf79225 > Author: Paolo Abeni <pabeni@redhat.com> > Date: Fri Apr 9 13:04:37 2021 +0200 > > veth: use skb_orphan_partial instead of skb_orphan > > after that, if the skb is owned, skb->destructor is sock_efree(), so > the above code should not trigger. Okay, great. > More importantly SKB_GSO_FRAGLIST can only be applied if the inner- > most protocol is UDP, so > commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475 > and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant. I think the first does apply, as it applies to any protocol that uses sock_wfree, not just tcp_wfree? Anyway, the point is moot indeed.
On Wed, 2021-05-05 at 13:30 -0400, Willem de Bruijn wrote: > On Wed, May 5, 2021 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote: > > > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor > > > > callback is available, the skb destructor is invoked on each > > > > aggregated packet via skb_release_head_state(). > > > > > > > > Such field (and the pairer skb->sk) is left untouched, so the same > > > > destructor is invoked again when the segmented skbs are freed, leading > > > > to double-free/UaF of the relevant socket. > > > > > > Similar to skb_segment, should the destructor be swapped with the last > > > segment and callback delayed, instead of called immediately as part of > > > segmentation? > > > > > > /* Following permits correct backpressure, for protocols > > > * using skb_set_owner_w(). > > > * Idea is to tranfert ownership from head_skb to last segment. > > > */ > > > if (head_skb->destructor == sock_wfree) { > > > swap(tail->truesize, head_skb->truesize); > > > swap(tail->destructor, head_skb->destructor); > > > swap(tail->sk, head_skb->sk); > > > } > > > > My understanding is that one assumption in the original > > SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not > > owned by any socket. > > > > AFAICS the above assumption was true until: > > > > commit c75fb320d482a5ce6e522378d137fd2c3bf79225 > > Author: Paolo Abeni <pabeni@redhat.com> > > Date: Fri Apr 9 13:04:37 2021 +0200 > > > > veth: use skb_orphan_partial instead of skb_orphan > > > > after that, if the skb is owned, skb->destructor is sock_efree(), so > > the above code should not trigger. > > Okay, great. > > > More importantly SKB_GSO_FRAGLIST can only be applied if the inner- > > most protocol is UDP, so > > commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475 > > and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant. > > I think the first does apply, as it applies to any protocol that uses > sock_wfree, not just tcp_wfree? Anyway, the point is moot indeed. If we want to be safe about future possible sock_wfree users, I think the approach here should be different: in skb_segment(), tail- >destructor is expected to be NULL, while skb_segment_list(), all the list skbs can be owned by the same socket. Possibly we could open- code skb_release_head_state(), omitting the skb orphaning part for sock_wfree() destructor. Note that the this is not currently needed - sock_wfree destructor can't reach there. Given all the above, I'm unsure if you are fine with (or at least do not oppose to) the code proposed in this patch? Thanks, Paolo
On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On Wed, 2021-05-05 at 13:30 -0400, Willem de Bruijn wrote: > > On Wed, May 5, 2021 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote: > > > > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor > > > > > callback is available, the skb destructor is invoked on each > > > > > aggregated packet via skb_release_head_state(). > > > > > > > > > > Such field (and the pairer skb->sk) is left untouched, so the same > > > > > destructor is invoked again when the segmented skbs are freed, leading > > > > > to double-free/UaF of the relevant socket. > > > > > > > > Similar to skb_segment, should the destructor be swapped with the last > > > > segment and callback delayed, instead of called immediately as part of > > > > segmentation? > > > > > > > > /* Following permits correct backpressure, for protocols > > > > * using skb_set_owner_w(). > > > > * Idea is to tranfert ownership from head_skb to last segment. > > > > */ > > > > if (head_skb->destructor == sock_wfree) { > > > > swap(tail->truesize, head_skb->truesize); > > > > swap(tail->destructor, head_skb->destructor); > > > > swap(tail->sk, head_skb->sk); > > > > } > > > > > > My understanding is that one assumption in the original > > > SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not > > > owned by any socket. > > > > > > AFAICS the above assumption was true until: > > > > > > commit c75fb320d482a5ce6e522378d137fd2c3bf79225 > > > Author: Paolo Abeni <pabeni@redhat.com> > > > Date: Fri Apr 9 13:04:37 2021 +0200 > > > > > > veth: use skb_orphan_partial instead of skb_orphan > > > > > > after that, if the skb is owned, skb->destructor is sock_efree(), so > > > the above code should not trigger. > > > > Okay, great. > > > > > More importantly SKB_GSO_FRAGLIST can only be applied if the inner- > > > most protocol is UDP, so > > > commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475 > > > and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant. > > > > I think the first does apply, as it applies to any protocol that uses > > sock_wfree, not just tcp_wfree? Anyway, the point is moot indeed. > > If we want to be safe about future possible sock_wfree users, I think > the approach here should be different: in skb_segment(), tail- > >destructor is expected to be NULL, while skb_segment_list(), all the > list skbs can be owned by the same socket. Possibly we could open- > code skb_release_head_state(), omitting the skb orphaning part > for sock_wfree() destructor. > > Note that the this is not currently needed - sock_wfree destructor > can't reach there. > > Given all the above, I'm unsure if you are fine with (or at least do > not oppose to) the code proposed in this patch? Yes. Thanks for clarifying, Paolo.
On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote: > On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On Wed, 2021-05-05 at 13:30 -0400, Willem de Bruijn wrote: > > > On Wed, May 5, 2021 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote: > > > > > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor > > > > > > callback is available, the skb destructor is invoked on each > > > > > > aggregated packet via skb_release_head_state(). > > > > > > > > > > > > Such field (and the pairer skb->sk) is left untouched, so the same > > > > > > destructor is invoked again when the segmented skbs are freed, leading > > > > > > to double-free/UaF of the relevant socket. > > > > > > > > > > Similar to skb_segment, should the destructor be swapped with the last > > > > > segment and callback delayed, instead of called immediately as part of > > > > > segmentation? > > > > > > > > > > /* Following permits correct backpressure, for protocols > > > > > * using skb_set_owner_w(). > > > > > * Idea is to tranfert ownership from head_skb to last segment. > > > > > */ > > > > > if (head_skb->destructor == sock_wfree) { > > > > > swap(tail->truesize, head_skb->truesize); > > > > > swap(tail->destructor, head_skb->destructor); > > > > > swap(tail->sk, head_skb->sk); > > > > > } > > > > > > > > My understanding is that one assumption in the original > > > > SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not > > > > owned by any socket. > > > > > > > > AFAICS the above assumption was true until: > > > > > > > > commit c75fb320d482a5ce6e522378d137fd2c3bf79225 > > > > Author: Paolo Abeni <pabeni@redhat.com> > > > > Date: Fri Apr 9 13:04:37 2021 +0200 > > > > > > > > veth: use skb_orphan_partial instead of skb_orphan > > > > > > > > after that, if the skb is owned, skb->destructor is sock_efree(), so > > > > the above code should not trigger. > > > > > > Okay, great. > > > > > > > More importantly SKB_GSO_FRAGLIST can only be applied if the inner- > > > > most protocol is UDP, so > > > > commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475 > > > > and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant. > > > > > > I think the first does apply, as it applies to any protocol that uses > > > sock_wfree, not just tcp_wfree? Anyway, the point is moot indeed. > > > > If we want to be safe about future possible sock_wfree users, I think > > the approach here should be different: in skb_segment(), tail- > > > destructor is expected to be NULL, while skb_segment_list(), all the > > list skbs can be owned by the same socket. Possibly we could open- > > code skb_release_head_state(), omitting the skb orphaning part > > for sock_wfree() destructor. > > > > Note that the this is not currently needed - sock_wfree destructor > > can't reach there. > > > > Given all the above, I'm unsure if you are fine with (or at least do > > not oppose to) the code proposed in this patch? > > Yes. Thanks for clarifying, Paolo. Thank you for reviewing! @David, @Jakub: I see this series is already archived as "change requested", should I repost? Thanks! Paolo
On Thu, 06 May 2021 17:55:36 +0200 Paolo Abeni wrote: > On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote: > > On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > If we want to be safe about future possible sock_wfree users, I think > > > the approach here should be different: in skb_segment(), tail- > > > > destructor is expected to be NULL, while skb_segment_list(), all the > > > list skbs can be owned by the same socket. Possibly we could open- > > > code skb_release_head_state(), omitting the skb orphaning part > > > for sock_wfree() destructor. > > > > > > Note that the this is not currently needed - sock_wfree destructor > > > can't reach there. > > > > > > Given all the above, I'm unsure if you are fine with (or at least do > > > not oppose to) the code proposed in this patch? > > > > Yes. Thanks for clarifying, Paolo. > > Thank you for reviewing! > > @David, @Jakub: I see this series is already archived as "change > requested", should I repost? Yes, please. Patch 2 adds two new sparse warnings. I think you need csum_unfold() to go from __sum16 to __wsum.
On Thu, 2021-05-06 at 14:17 -0700, Jakub Kicinski wrote: > On Thu, 06 May 2021 17:55:36 +0200 Paolo Abeni wrote: > > On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote: > > > On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > If we want to be safe about future possible sock_wfree users, I think > > > > the approach here should be different: in skb_segment(), tail- > > > > > destructor is expected to be NULL, while skb_segment_list(), all the > > > > list skbs can be owned by the same socket. Possibly we could open- > > > > code skb_release_head_state(), omitting the skb orphaning part > > > > for sock_wfree() destructor. > > > > > > > > Note that the this is not currently needed - sock_wfree destructor > > > > can't reach there. > > > > > > > > Given all the above, I'm unsure if you are fine with (or at least do > > > > not oppose to) the code proposed in this patch? > > > > > > Yes. Thanks for clarifying, Paolo. > > > > Thank you for reviewing! > > > > @David, @Jakub: I see this series is already archived as "change > > requested", should I repost? > > Yes, please. Patch 2 adds two new sparse warnings. > > I think you need csum_unfold() to go from __sum16 to __wsum. Yes, indeed. I'll send a v2 with such change, thanks! Paolo >
On Fri, 2021-05-07 at 10:46 +0200, Paolo Abeni wrote: > On Thu, 2021-05-06 at 14:17 -0700, Jakub Kicinski wrote: > > On Thu, 06 May 2021 17:55:36 +0200 Paolo Abeni wrote: > > > On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote: > > > > On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > If we want to be safe about future possible sock_wfree users, I think > > > > > the approach here should be different: in skb_segment(), tail- > > > > > > destructor is expected to be NULL, while skb_segment_list(), all the > > > > > list skbs can be owned by the same socket. Possibly we could open- > > > > > code skb_release_head_state(), omitting the skb orphaning part > > > > > for sock_wfree() destructor. > > > > > > > > > > Note that the this is not currently needed - sock_wfree destructor > > > > > can't reach there. > > > > > > > > > > Given all the above, I'm unsure if you are fine with (or at least do > > > > > not oppose to) the code proposed in this patch? > > > > > > > > Yes. Thanks for clarifying, Paolo. > > > > > > Thank you for reviewing! > > > > > > @David, @Jakub: I see this series is already archived as "change > > > requested", should I repost? > > > > Yes, please. Patch 2 adds two new sparse warnings. > > > > I think you need csum_unfold() to go from __sum16 to __wsum. > > Yes, indeed. I'll send a v2 with such change, thanks! It's taking [much] more than expected, as it turned out that thare are still a number of case where the tx csum is uncorrect. If the traffic comes from a veth we don't have a valid th->csum value at GRO time, setting ip_summed to CHECKSUM_UNNECESSARY - as the current code does - looks wrong. @Steffen: I see in the original discussion about GRO_FRAGLIST introduction that you wanted the GRO packets to be CHECKSUM_UNNECESSARY to avoid csum modification in fwd path. I guess that choice was mostily due performance reasons, to avoid touching the aggregated pkts header at gso_segment_list time, but it looks like it's quite bug prone. If so, I'm unsure the performance gain is worty. I propose to switch to CHECKSUM_PARTIAL. Would you be ok with that? Thanks, Paolo
On Mon, May 10, 2021 at 05:37:58PM +0200, Paolo Abeni wrote: > > It's taking [much] more than expected, as it turned out that thare are > still a number of case where the tx csum is uncorrect. > > If the traffic comes from a veth we don't have a valid th->csum value > at GRO time, setting ip_summed to CHECKSUM_UNNECESSARY - as the current > code does - looks wrong. > @Steffen: I see in the original discussion about GRO_FRAGLIST > introduction that you wanted the GRO packets to be CHECKSUM_UNNECESSARY > to avoid csum modification in fwd path. I guess that choice was mostily > due performance reasons, to avoid touching the aggregated pkts header > at gso_segment_list time, but it looks like it's quite bug prone. If > so, I'm unsure the performance gain is worty. Yes, that was for performance reasons. We don't mangle the packets with fraglist GRO, so the checksum should be still correct when doing GSO. > I propose to switch to > CHECKSUM_PARTIAL. Would you be ok with that? If there are cases where CHECKSUM_UNNECESSARY is problematic, then yes, let's switch to CHECKSUM_PARTIAL. Thanks for doing this Paolo!
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 3ad22870298c..75f3e70f8cd5 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3818,6 +3818,12 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, skb_release_head_state(nskb); __copy_skb_header(nskb, skb); + /* __copy_skb_header() does not initialize the sk-related fields, + * and skb_release_head_state() already orphaned nskb + */ + nskb->sk = NULL; + nskb->destructor = NULL; + skb_headers_offset_update(nskb, skb_headroom(nskb) - skb_headroom(skb)); skb_copy_from_linear_data_offset(skb, -tnl_hlen, nskb->data - tnl_hlen,
While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor callback is available, the skb destructor is invoked on each aggregated packet via skb_release_head_state(). Such field (and the pairer skb->sk) is left untouched, so the same destructor is invoked again when the segmented skbs are freed, leading to double-free/UaF of the relevant socket. The problem is observable since commit c75fb320d482 ("veth: use skb_orphan_partial instead of skb_orphan"), which allowed non orphaned skbs to enter the GRO engine, but the root cause is there since the introduction of GSO fraglist This change addresses the above issue explixitly clearing the segmented skb destructor after the skb_release_head_state() call. Fixes: c75fb320d482 ("veth: use skb_orphan_partial instead of skb_orphan") Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- net/core/skbuff.c | 6 ++++++ 1 file changed, 6 insertions(+)