diff mbox series

[net,1/4] net: fix double-free on fraglist GSO skbs

Message ID e5d4bacef76ef439b6eb8e7f4973161ca131dfee.1620223174.git.pabeni@redhat.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series udp: more FRAGLIST fixes | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers warning 4 maintainers not CCed: alobakin@pm.me jonathan.lemon@gmail.com gnault@redhat.com cong.wang@bytedance.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/header_inline success Link

Commit Message

Paolo Abeni May 5, 2021, 3:35 p.m. UTC
While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
callback is available, the skb destructor is invoked on each
aggregated packet via skb_release_head_state().

Such field (and the pairer skb->sk) is left untouched, so the same
destructor is invoked again when the segmented skbs are freed, leading
to double-free/UaF of the relevant socket.

The problem is observable since commit c75fb320d482 ("veth: use
skb_orphan_partial instead of skb_orphan"), which allowed non orphaned
skbs to enter the GRO engine, but the root cause is there since the
introduction of GSO fraglist

This change addresses the above issue explixitly clearing the segmented
skb destructor after the skb_release_head_state() call.

Fixes: c75fb320d482 ("veth: use skb_orphan_partial instead of skb_orphan")
Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/core/skbuff.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Willem de Bruijn May 5, 2021, 4:13 p.m. UTC | #1
On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
> callback is available, the skb destructor is invoked on each
> aggregated packet via skb_release_head_state().
>
> Such field (and the pairer skb->sk) is left untouched, so the same
> destructor is invoked again when the segmented skbs are freed, leading
> to double-free/UaF of the relevant socket.

Similar to skb_segment, should the destructor be swapped with the last
segment and callback delayed, instead of called immediately as part of
segmentation?

        /* Following permits correct backpressure, for protocols
         * using skb_set_owner_w().
         * Idea is to tranfert ownership from head_skb to last segment.
         */
        if (head_skb->destructor == sock_wfree) {
                swap(tail->truesize, head_skb->truesize);
                swap(tail->destructor, head_skb->destructor);
                swap(tail->sk, head_skb->sk);
        }
Paolo Abeni May 5, 2021, 5:28 p.m. UTC | #2
On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote:
> On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
> > callback is available, the skb destructor is invoked on each
> > aggregated packet via skb_release_head_state().
> > 
> > Such field (and the pairer skb->sk) is left untouched, so the same
> > destructor is invoked again when the segmented skbs are freed, leading
> > to double-free/UaF of the relevant socket.
> 
> Similar to skb_segment, should the destructor be swapped with the last
> segment and callback delayed, instead of called immediately as part of
> segmentation?
> 
>         /* Following permits correct backpressure, for protocols
>          * using skb_set_owner_w().
>          * Idea is to tranfert ownership from head_skb to last segment.
>          */
>         if (head_skb->destructor == sock_wfree) {
>                 swap(tail->truesize, head_skb->truesize);
>                 swap(tail->destructor, head_skb->destructor);
>                 swap(tail->sk, head_skb->sk);
>         }

My understanding is that one assumption in the original
SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not
owned by any socket. 

AFAICS the above assumption was true until:

commit c75fb320d482a5ce6e522378d137fd2c3bf79225
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Apr 9 13:04:37 2021 +0200

    veth: use skb_orphan_partial instead of skb_orphan

after that, if the skb is owned, skb->destructor is sock_efree(), so
the above code should not trigger.

More importantly SKB_GSO_FRAGLIST can only be applied if the inner-
most protocol is UDP, so
commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475
and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant. 

Thanks!

Paolo
Willem de Bruijn May 5, 2021, 5:30 p.m. UTC | #3
On Wed, May 5, 2021 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote:
> > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
> > > callback is available, the skb destructor is invoked on each
> > > aggregated packet via skb_release_head_state().
> > >
> > > Such field (and the pairer skb->sk) is left untouched, so the same
> > > destructor is invoked again when the segmented skbs are freed, leading
> > > to double-free/UaF of the relevant socket.
> >
> > Similar to skb_segment, should the destructor be swapped with the last
> > segment and callback delayed, instead of called immediately as part of
> > segmentation?
> >
> >         /* Following permits correct backpressure, for protocols
> >          * using skb_set_owner_w().
> >          * Idea is to tranfert ownership from head_skb to last segment.
> >          */
> >         if (head_skb->destructor == sock_wfree) {
> >                 swap(tail->truesize, head_skb->truesize);
> >                 swap(tail->destructor, head_skb->destructor);
> >                 swap(tail->sk, head_skb->sk);
> >         }
>
> My understanding is that one assumption in the original
> SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not
> owned by any socket.
>
> AFAICS the above assumption was true until:
>
> commit c75fb320d482a5ce6e522378d137fd2c3bf79225
> Author: Paolo Abeni <pabeni@redhat.com>
> Date:   Fri Apr 9 13:04:37 2021 +0200
>
>     veth: use skb_orphan_partial instead of skb_orphan
>
> after that, if the skb is owned, skb->destructor is sock_efree(), so
> the above code should not trigger.

Okay, great.

> More importantly SKB_GSO_FRAGLIST can only be applied if the inner-
> most protocol is UDP, so
> commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475
> and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant.

I think the first does apply, as it applies to any protocol that uses
sock_wfree, not just tcp_wfree? Anyway, the point is moot indeed.
Paolo Abeni May 6, 2021, 11:06 a.m. UTC | #4
On Wed, 2021-05-05 at 13:30 -0400, Willem de Bruijn wrote:
> On Wed, May 5, 2021 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote:
> > > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
> > > > callback is available, the skb destructor is invoked on each
> > > > aggregated packet via skb_release_head_state().
> > > > 
> > > > Such field (and the pairer skb->sk) is left untouched, so the same
> > > > destructor is invoked again when the segmented skbs are freed, leading
> > > > to double-free/UaF of the relevant socket.
> > > 
> > > Similar to skb_segment, should the destructor be swapped with the last
> > > segment and callback delayed, instead of called immediately as part of
> > > segmentation?
> > > 
> > >         /* Following permits correct backpressure, for protocols
> > >          * using skb_set_owner_w().
> > >          * Idea is to tranfert ownership from head_skb to last segment.
> > >          */
> > >         if (head_skb->destructor == sock_wfree) {
> > >                 swap(tail->truesize, head_skb->truesize);
> > >                 swap(tail->destructor, head_skb->destructor);
> > >                 swap(tail->sk, head_skb->sk);
> > >         }
> > 
> > My understanding is that one assumption in the original
> > SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not
> > owned by any socket.
> > 
> > AFAICS the above assumption was true until:
> > 
> > commit c75fb320d482a5ce6e522378d137fd2c3bf79225
> > Author: Paolo Abeni <pabeni@redhat.com>
> > Date:   Fri Apr 9 13:04:37 2021 +0200
> > 
> >     veth: use skb_orphan_partial instead of skb_orphan
> > 
> > after that, if the skb is owned, skb->destructor is sock_efree(), so
> > the above code should not trigger.
> 
> Okay, great.
> 
> > More importantly SKB_GSO_FRAGLIST can only be applied if the inner-
> > most protocol is UDP, so
> > commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475
> > and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant.
> 
> I think the first does apply, as it applies to any protocol that uses
> sock_wfree, not just tcp_wfree? Anyway, the point is moot indeed.

If we want to be safe about future possible sock_wfree users, I think
the approach here should be different: in skb_segment(), tail-
>destructor is expected to be NULL, while skb_segment_list(), all the
list skbs can be owned by the same socket. Possibly we could open-
code skb_release_head_state(), omitting the skb orphaning part
for sock_wfree() destructor.

Note that the this is not currently needed - sock_wfree destructor
can't reach there.

Given all the above, I'm unsure if you are fine with (or at least do
not oppose to) the code proposed in this patch?

Thanks,

Paolo
Willem de Bruijn May 6, 2021, 2:32 p.m. UTC | #5
On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Wed, 2021-05-05 at 13:30 -0400, Willem de Bruijn wrote:
> > On Wed, May 5, 2021 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote:
> > > > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
> > > > > callback is available, the skb destructor is invoked on each
> > > > > aggregated packet via skb_release_head_state().
> > > > >
> > > > > Such field (and the pairer skb->sk) is left untouched, so the same
> > > > > destructor is invoked again when the segmented skbs are freed, leading
> > > > > to double-free/UaF of the relevant socket.
> > > >
> > > > Similar to skb_segment, should the destructor be swapped with the last
> > > > segment and callback delayed, instead of called immediately as part of
> > > > segmentation?
> > > >
> > > >         /* Following permits correct backpressure, for protocols
> > > >          * using skb_set_owner_w().
> > > >          * Idea is to tranfert ownership from head_skb to last segment.
> > > >          */
> > > >         if (head_skb->destructor == sock_wfree) {
> > > >                 swap(tail->truesize, head_skb->truesize);
> > > >                 swap(tail->destructor, head_skb->destructor);
> > > >                 swap(tail->sk, head_skb->sk);
> > > >         }
> > >
> > > My understanding is that one assumption in the original
> > > SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not
> > > owned by any socket.
> > >
> > > AFAICS the above assumption was true until:
> > >
> > > commit c75fb320d482a5ce6e522378d137fd2c3bf79225
> > > Author: Paolo Abeni <pabeni@redhat.com>
> > > Date:   Fri Apr 9 13:04:37 2021 +0200
> > >
> > >     veth: use skb_orphan_partial instead of skb_orphan
> > >
> > > after that, if the skb is owned, skb->destructor is sock_efree(), so
> > > the above code should not trigger.
> >
> > Okay, great.
> >
> > > More importantly SKB_GSO_FRAGLIST can only be applied if the inner-
> > > most protocol is UDP, so
> > > commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475
> > > and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant.
> >
> > I think the first does apply, as it applies to any protocol that uses
> > sock_wfree, not just tcp_wfree? Anyway, the point is moot indeed.
>
> If we want to be safe about future possible sock_wfree users, I think
> the approach here should be different: in skb_segment(), tail-
> >destructor is expected to be NULL, while skb_segment_list(), all the
> list skbs can be owned by the same socket. Possibly we could open-
> code skb_release_head_state(), omitting the skb orphaning part
> for sock_wfree() destructor.
>
> Note that the this is not currently needed - sock_wfree destructor
> can't reach there.
>
> Given all the above, I'm unsure if you are fine with (or at least do
> not oppose to) the code proposed in this patch?

Yes. Thanks for clarifying, Paolo.
Paolo Abeni May 6, 2021, 3:55 p.m. UTC | #6
On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote:
> On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > On Wed, 2021-05-05 at 13:30 -0400, Willem de Bruijn wrote:
> > > On Wed, May 5, 2021 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote:
> > > > > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
> > > > > > callback is available, the skb destructor is invoked on each
> > > > > > aggregated packet via skb_release_head_state().
> > > > > > 
> > > > > > Such field (and the pairer skb->sk) is left untouched, so the same
> > > > > > destructor is invoked again when the segmented skbs are freed, leading
> > > > > > to double-free/UaF of the relevant socket.
> > > > > 
> > > > > Similar to skb_segment, should the destructor be swapped with the last
> > > > > segment and callback delayed, instead of called immediately as part of
> > > > > segmentation?
> > > > > 
> > > > >         /* Following permits correct backpressure, for protocols
> > > > >          * using skb_set_owner_w().
> > > > >          * Idea is to tranfert ownership from head_skb to last segment.
> > > > >          */
> > > > >         if (head_skb->destructor == sock_wfree) {
> > > > >                 swap(tail->truesize, head_skb->truesize);
> > > > >                 swap(tail->destructor, head_skb->destructor);
> > > > >                 swap(tail->sk, head_skb->sk);
> > > > >         }
> > > > 
> > > > My understanding is that one assumption in the original
> > > > SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not
> > > > owned by any socket.
> > > > 
> > > > AFAICS the above assumption was true until:
> > > > 
> > > > commit c75fb320d482a5ce6e522378d137fd2c3bf79225
> > > > Author: Paolo Abeni <pabeni@redhat.com>
> > > > Date:   Fri Apr 9 13:04:37 2021 +0200
> > > > 
> > > >     veth: use skb_orphan_partial instead of skb_orphan
> > > > 
> > > > after that, if the skb is owned, skb->destructor is sock_efree(), so
> > > > the above code should not trigger.
> > > 
> > > Okay, great.
> > > 
> > > > More importantly SKB_GSO_FRAGLIST can only be applied if the inner-
> > > > most protocol is UDP, so
> > > > commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475
> > > > and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant.
> > > 
> > > I think the first does apply, as it applies to any protocol that uses
> > > sock_wfree, not just tcp_wfree? Anyway, the point is moot indeed.
> > 
> > If we want to be safe about future possible sock_wfree users, I think
> > the approach here should be different: in skb_segment(), tail-
> > > destructor is expected to be NULL, while skb_segment_list(), all the
> > list skbs can be owned by the same socket. Possibly we could open-
> > code skb_release_head_state(), omitting the skb orphaning part
> > for sock_wfree() destructor.
> > 
> > Note that the this is not currently needed - sock_wfree destructor
> > can't reach there.
> > 
> > Given all the above, I'm unsure if you are fine with (or at least do
> > not oppose to) the code proposed in this patch?
> 
> Yes. Thanks for clarifying, Paolo.

Thank you for reviewing!

@David, @Jakub: I see this series is already archived as "change
requested", should I repost?

Thanks!

Paolo
Jakub Kicinski May 6, 2021, 9:17 p.m. UTC | #7
On Thu, 06 May 2021 17:55:36 +0200 Paolo Abeni wrote:
> On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote:
> > On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote:  
> > > If we want to be safe about future possible sock_wfree users, I think
> > > the approach here should be different: in skb_segment(), tail-  
> > > > destructor is expected to be NULL, while skb_segment_list(), all the  
> > > list skbs can be owned by the same socket. Possibly we could open-
> > > code skb_release_head_state(), omitting the skb orphaning part
> > > for sock_wfree() destructor.
> > > 
> > > Note that the this is not currently needed - sock_wfree destructor
> > > can't reach there.
> > > 
> > > Given all the above, I'm unsure if you are fine with (or at least do
> > > not oppose to) the code proposed in this patch?  
> > 
> > Yes. Thanks for clarifying, Paolo.  
> 
> Thank you for reviewing!
> 
> @David, @Jakub: I see this series is already archived as "change
> requested", should I repost?

Yes, please. Patch 2 adds two new sparse warnings. 

I think you need csum_unfold() to go from __sum16 to __wsum.
Paolo Abeni May 7, 2021, 8:46 a.m. UTC | #8
On Thu, 2021-05-06 at 14:17 -0700, Jakub Kicinski wrote:
> On Thu, 06 May 2021 17:55:36 +0200 Paolo Abeni wrote:
> > On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote:
> > > On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote:  
> > > > If we want to be safe about future possible sock_wfree users, I think
> > > > the approach here should be different: in skb_segment(), tail-  
> > > > > destructor is expected to be NULL, while skb_segment_list(), all the  
> > > > list skbs can be owned by the same socket. Possibly we could open-
> > > > code skb_release_head_state(), omitting the skb orphaning part
> > > > for sock_wfree() destructor.
> > > > 
> > > > Note that the this is not currently needed - sock_wfree destructor
> > > > can't reach there.
> > > > 
> > > > Given all the above, I'm unsure if you are fine with (or at least do
> > > > not oppose to) the code proposed in this patch?  
> > > 
> > > Yes. Thanks for clarifying, Paolo.  
> > 
> > Thank you for reviewing!
> > 
> > @David, @Jakub: I see this series is already archived as "change
> > requested", should I repost?
> 
> Yes, please. Patch 2 adds two new sparse warnings. 
> 
> I think you need csum_unfold() to go from __sum16 to __wsum.

Yes, indeed. I'll send a v2 with such change, thanks!

Paolo
>
Paolo Abeni May 10, 2021, 3:37 p.m. UTC | #9
On Fri, 2021-05-07 at 10:46 +0200, Paolo Abeni wrote:
> On Thu, 2021-05-06 at 14:17 -0700, Jakub Kicinski wrote:
> > On Thu, 06 May 2021 17:55:36 +0200 Paolo Abeni wrote:
> > > On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote:
> > > > On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote:  
> > > > > If we want to be safe about future possible sock_wfree users, I think
> > > > > the approach here should be different: in skb_segment(), tail-  
> > > > > > destructor is expected to be NULL, while skb_segment_list(), all the  
> > > > > list skbs can be owned by the same socket. Possibly we could open-
> > > > > code skb_release_head_state(), omitting the skb orphaning part
> > > > > for sock_wfree() destructor.
> > > > > 
> > > > > Note that the this is not currently needed - sock_wfree destructor
> > > > > can't reach there.
> > > > > 
> > > > > Given all the above, I'm unsure if you are fine with (or at least do
> > > > > not oppose to) the code proposed in this patch?  
> > > > 
> > > > Yes. Thanks for clarifying, Paolo.  
> > > 
> > > Thank you for reviewing!
> > > 
> > > @David, @Jakub: I see this series is already archived as "change
> > > requested", should I repost?
> > 
> > Yes, please. Patch 2 adds two new sparse warnings. 
> > 
> > I think you need csum_unfold() to go from __sum16 to __wsum.
> 
> Yes, indeed. I'll send a v2 with such change, thanks!

It's taking [much] more than expected, as it turned out that thare are
still a number of case where the tx csum is uncorrect.

If the traffic comes from a veth we don't have a valid th->csum value
at GRO time, setting ip_summed to CHECKSUM_UNNECESSARY - as the current
code does - looks wrong.
@Steffen: I see in the original discussion about GRO_FRAGLIST
introduction that you wanted the GRO packets to be CHECKSUM_UNNECESSARY
to avoid csum modification in fwd path. I guess that choice was mostily
due performance reasons, to avoid touching the aggregated pkts header
at gso_segment_list time, but it looks like it's quite bug prone. If
so, I'm unsure the performance gain is worty. I propose to switch to
CHECKSUM_PARTIAL. Would you be ok with that?

Thanks,

Paolo
Steffen Klassert May 11, 2021, 9:39 a.m. UTC | #10
On Mon, May 10, 2021 at 05:37:58PM +0200, Paolo Abeni wrote:
> 
> It's taking [much] more than expected, as it turned out that thare are
> still a number of case where the tx csum is uncorrect.
> 
> If the traffic comes from a veth we don't have a valid th->csum value
> at GRO time, setting ip_summed to CHECKSUM_UNNECESSARY - as the current
> code does - looks wrong.
> @Steffen: I see in the original discussion about GRO_FRAGLIST
> introduction that you wanted the GRO packets to be CHECKSUM_UNNECESSARY
> to avoid csum modification in fwd path. I guess that choice was mostily
> due performance reasons, to avoid touching the aggregated pkts header
> at gso_segment_list time, but it looks like it's quite bug prone. If
> so, I'm unsure the performance gain is worty.

Yes, that was for performance reasons. We don't mangle the packets
with fraglist GRO, so the checksum should be still correct when
doing GSO.

> I propose to switch to
> CHECKSUM_PARTIAL. Would you be ok with that?

If there are cases where CHECKSUM_UNNECESSARY is problematic,
then yes, let's switch to CHECKSUM_PARTIAL.

Thanks for doing this Paolo!
diff mbox series

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3ad22870298c..75f3e70f8cd5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3818,6 +3818,12 @@  struct sk_buff *skb_segment_list(struct sk_buff *skb,
 		skb_release_head_state(nskb);
 		 __copy_skb_header(nskb, skb);
 
+		/* __copy_skb_header() does not initialize the sk-related fields,
+		 * and skb_release_head_state() already orphaned nskb
+		 */
+		nskb->sk = NULL;
+		nskb->destructor = NULL;
+
 		skb_headers_offset_update(nskb, skb_headroom(nskb) - skb_headroom(skb));
 		skb_copy_from_linear_data_offset(skb, -tnl_hlen,
 						 nskb->data - tnl_hlen,