diff mbox series

[net] tcp: fix page frag corruption on page fault

Message ID d77eb546e29ce24be9ab88c47a66b70c52ce8109.1637923678.git.pabeni@redhat.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] tcp: fix page frag corruption on page fault | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 2 this patch: 2
netdev/cc_maintainers fail 1 blamed authors not CCed: davem@davemloft.net; 4 maintainers not CCed: yoshfuji@linux-ipv6.org davem@davemloft.net dsahern@kernel.org kuba@kernel.org
netdev/build_clang success Errors and warnings before: 22 this patch: 22
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 6 this patch: 6
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 38 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Paolo Abeni Nov. 26, 2021, noon UTC
Steffen reported a TCP stream corruption for HTTP requests
served by the apache web-server using a cifs mount-point
and memory mapping the relevant file.

The root cause is quite similar to the one addressed by
commit 20eb4f29b602 ("net: fix sk_page_frag() recursion from
memory reclaim"). Here the nested access to the task page frag
is caused by a page fault on the (mmapped) user-space memory
buffer coming from the cifs file.

The page fault handler performs an smb transaction on a different
socket, inside the same process context. Since sk->sk_allaction
for such socket does not prevent the usage for the task_frag,
the nested allocation modify "under the hood" the page frag
in use by the outer sendmsg call, corrupting the stream.

The overall relevant stack trace looks like the following:

httpd 78268 [001] 3461630.850950:      probe:tcp_sendmsg_locked:
        ffffffff91461d91 tcp_sendmsg_locked+0x1
        ffffffff91462b57 tcp_sendmsg+0x27
        ffffffff9139814e sock_sendmsg+0x3e
        ffffffffc06dfe1d smb_send_kvec+0x28
        [...]
        ffffffffc06cfaf8 cifs_readpages+0x213
        ffffffff90e83c4b read_pages+0x6b
        ffffffff90e83f31 __do_page_cache_readahead+0x1c1
        ffffffff90e79e98 filemap_fault+0x788
        ffffffff90eb0458 __do_fault+0x38
        ffffffff90eb5280 do_fault+0x1a0
        ffffffff90eb7c84 __handle_mm_fault+0x4d4
        ffffffff90eb8093 handle_mm_fault+0xc3
        ffffffff90c74f6d __do_page_fault+0x1ed
        ffffffff90c75277 do_page_fault+0x37
        ffffffff9160111e page_fault+0x1e
        ffffffff9109e7b5 copyin+0x25
        ffffffff9109eb40 _copy_from_iter_full+0xe0
        ffffffff91462370 tcp_sendmsg_locked+0x5e0
        ffffffff91462370 tcp_sendmsg_locked+0x5e0
        ffffffff91462b57 tcp_sendmsg+0x27
        ffffffff9139815c sock_sendmsg+0x4c
        ffffffff913981f7 sock_write_iter+0x97
        ffffffff90f2cc56 do_iter_readv_writev+0x156
        ffffffff90f2dff0 do_iter_write+0x80
        ffffffff90f2e1c3 vfs_writev+0xa3
        ffffffff90f2e27c do_writev+0x5c
        ffffffff90c042bb do_syscall_64+0x5b
        ffffffff916000ad entry_SYSCALL_64_after_hwframe+0x65

A possible solution would be adding the __GFP_MEMALLOC flag
to the cifs allocation. That looks dangerous, as the memory
allocated by the cifs fs will not be free soon and such
allocation will not allow for more memory to be freed.

Instead, this patch changes the tcp_sendmsg() code to avoid
touching the page frag after performing the copy from the
user-space buffer. Any page fault or memory reclaim operation
there is now free to touch the task page fragment without
corrupting the state used by the outer sendmsg().

As a downside, if the user-space copy fails, there will be
some additional atomic operations due to the reference counting
on the faulty fragment, but that looks acceptable for a slow
error path.

Reported-by: Steffen Froemer <sfroemer@redhat.com>
Fixes: 5640f7685831 ("net: use a per task frag allocator")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/tcp.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

Comments

Eric Dumazet Nov. 26, 2021, 2:18 p.m. UTC | #1
On Fri, Nov 26, 2021 at 4:00 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Steffen reported a TCP stream corruption for HTTP requests
> served by the apache web-server using a cifs mount-point
> and memory mapping the relevant file.
>
> The root cause is quite similar to the one addressed by
> commit 20eb4f29b602 ("net: fix sk_page_frag() recursion from
> memory reclaim"). Here the nested access to the task page frag
> is caused by a page fault on the (mmapped) user-space memory
> buffer coming from the cifs file.
>
> The page fault handler performs an smb transaction on a different
> socket, inside the same process context. Since sk->sk_allaction
> for such socket does not prevent the usage for the task_frag,
> the nested allocation modify "under the hood" the page frag
> in use by the outer sendmsg call, corrupting the stream.
>
> The overall relevant stack trace looks like the following:
>
> httpd 78268 [001] 3461630.850950:      probe:tcp_sendmsg_locked:
>         ffffffff91461d91 tcp_sendmsg_locked+0x1
>         ffffffff91462b57 tcp_sendmsg+0x27
>         ffffffff9139814e sock_sendmsg+0x3e
>         ffffffffc06dfe1d smb_send_kvec+0x28
>         [...]
>         ffffffffc06cfaf8 cifs_readpages+0x213
>         ffffffff90e83c4b read_pages+0x6b
>         ffffffff90e83f31 __do_page_cache_readahead+0x1c1
>         ffffffff90e79e98 filemap_fault+0x788
>         ffffffff90eb0458 __do_fault+0x38
>         ffffffff90eb5280 do_fault+0x1a0
>         ffffffff90eb7c84 __handle_mm_fault+0x4d4
>         ffffffff90eb8093 handle_mm_fault+0xc3
>         ffffffff90c74f6d __do_page_fault+0x1ed
>         ffffffff90c75277 do_page_fault+0x37
>         ffffffff9160111e page_fault+0x1e
>         ffffffff9109e7b5 copyin+0x25
>         ffffffff9109eb40 _copy_from_iter_full+0xe0
>         ffffffff91462370 tcp_sendmsg_locked+0x5e0
>         ffffffff91462370 tcp_sendmsg_locked+0x5e0
>         ffffffff91462b57 tcp_sendmsg+0x27
>         ffffffff9139815c sock_sendmsg+0x4c
>         ffffffff913981f7 sock_write_iter+0x97
>         ffffffff90f2cc56 do_iter_readv_writev+0x156
>         ffffffff90f2dff0 do_iter_write+0x80
>         ffffffff90f2e1c3 vfs_writev+0xa3
>         ffffffff90f2e27c do_writev+0x5c
>         ffffffff90c042bb do_syscall_64+0x5b
>         ffffffff916000ad entry_SYSCALL_64_after_hwframe+0x65
>
> A possible solution would be adding the __GFP_MEMALLOC flag
> to the cifs allocation. That looks dangerous, as the memory
> allocated by the cifs fs will not be free soon and such
> allocation will not allow for more memory to be freed.
>
> Instead, this patch changes the tcp_sendmsg() code to avoid
> touching the page frag after performing the copy from the
> user-space buffer. Any page fault or memory reclaim operation
> there is now free to touch the task page fragment without
> corrupting the state used by the outer sendmsg().
>
> As a downside, if the user-space copy fails, there will be
> some additional atomic operations due to the reference counting
> on the faulty fragment, but that looks acceptable for a slow
> error path.
>
> Reported-by: Steffen Froemer <sfroemer@redhat.com>
> Fixes: 5640f7685831 ("net: use a per task frag allocator")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/ipv4/tcp.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index bbb3d39c69af..2d85636c1577 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1304,6 +1304,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>                         bool merge = true;
>                         int i = skb_shinfo(skb)->nr_frags;
>                         struct page_frag *pfrag = sk_page_frag(sk);
> +                       unsigned int offset;
>
>                         if (!sk_page_frag_refill(sk, pfrag))
>                                 goto wait_for_space;
> @@ -1331,14 +1332,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>                         if (!sk_wmem_schedule(sk, copy))
>                                 goto wait_for_space;
>
> -                       err = skb_copy_to_page_nocache(sk, &msg->msg_iter, skb,
> -                                                      pfrag->page,
> -                                                      pfrag->offset,
> -                                                      copy);
> -                       if (err)
> -                               goto do_error;
> -
> -                       /* Update the skb. */
> +                       /* Update the skb before accessing the user space buffer
> +                        * so that we leave the task frag in a consistent state.
> +                        * Just in case the page_fault handler need to use it
> +                        */
> +                       offset = pfrag->offset;
>                         if (merge) {
>                                 skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
>                         } else {
> @@ -1347,6 +1345,12 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>                                 page_ref_inc(pfrag->page);
>                         }
>                         pfrag->offset += copy;
> +
> +                       err = skb_copy_to_page_nocache(sk, &msg->msg_iter, skb,
> +                                                      pfrag->page,
> +                                                      offset, copy);
> +                       if (err)
> +                               goto do_error;
>                 } else {
>                         /* First append to a fragless skb builds initial
>                          * pure zerocopy skb
> --
> 2.33.1
>

This patch is completely wrong, you just horribly broke TCP.

Please investigate CIFS and gfpflags_normal_context() tandem to fix
this issue instead.

CIFS needs to make sure TCP will use the private socket sk->sk_frag
Paolo Abeni Nov. 26, 2021, 3:05 p.m. UTC | #2
Hello,

On Fri, 2021-11-26 at 06:18 -0800, Eric Dumazet wrote:
> On Fri, Nov 26, 2021 at 4:00 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > Steffen reported a TCP stream corruption for HTTP requests
> > served by the apache web-server using a cifs mount-point
> > and memory mapping the relevant file.
> > 
> > The root cause is quite similar to the one addressed by
> > commit 20eb4f29b602 ("net: fix sk_page_frag() recursion from
> > memory reclaim"). Here the nested access to the task page frag
> > is caused by a page fault on the (mmapped) user-space memory
> > buffer coming from the cifs file.
> > 
> > The page fault handler performs an smb transaction on a different
> > socket, inside the same process context. Since sk->sk_allaction
> > for such socket does not prevent the usage for the task_frag,
> > the nested allocation modify "under the hood" the page frag
> > in use by the outer sendmsg call, corrupting the stream.
> > 
> > The overall relevant stack trace looks like the following:
> > 
> > httpd 78268 [001] 3461630.850950:      probe:tcp_sendmsg_locked:
> >         ffffffff91461d91 tcp_sendmsg_locked+0x1
> >         ffffffff91462b57 tcp_sendmsg+0x27
> >         ffffffff9139814e sock_sendmsg+0x3e
> >         ffffffffc06dfe1d smb_send_kvec+0x28
> >         [...]
> >         ffffffffc06cfaf8 cifs_readpages+0x213
> >         ffffffff90e83c4b read_pages+0x6b
> >         ffffffff90e83f31 __do_page_cache_readahead+0x1c1
> >         ffffffff90e79e98 filemap_fault+0x788
> >         ffffffff90eb0458 __do_fault+0x38
> >         ffffffff90eb5280 do_fault+0x1a0
> >         ffffffff90eb7c84 __handle_mm_fault+0x4d4
> >         ffffffff90eb8093 handle_mm_fault+0xc3
> >         ffffffff90c74f6d __do_page_fault+0x1ed
> >         ffffffff90c75277 do_page_fault+0x37
> >         ffffffff9160111e page_fault+0x1e
> >         ffffffff9109e7b5 copyin+0x25
> >         ffffffff9109eb40 _copy_from_iter_full+0xe0
> >         ffffffff91462370 tcp_sendmsg_locked+0x5e0
> >         ffffffff91462370 tcp_sendmsg_locked+0x5e0
> >         ffffffff91462b57 tcp_sendmsg+0x27
> >         ffffffff9139815c sock_sendmsg+0x4c
> >         ffffffff913981f7 sock_write_iter+0x97
> >         ffffffff90f2cc56 do_iter_readv_writev+0x156
> >         ffffffff90f2dff0 do_iter_write+0x80
> >         ffffffff90f2e1c3 vfs_writev+0xa3
> >         ffffffff90f2e27c do_writev+0x5c
> >         ffffffff90c042bb do_syscall_64+0x5b
> >         ffffffff916000ad entry_SYSCALL_64_after_hwframe+0x65
> > 
> > A possible solution would be adding the __GFP_MEMALLOC flag
> > to the cifs allocation. That looks dangerous, as the memory
> > allocated by the cifs fs will not be free soon and such
> > allocation will not allow for more memory to be freed.
> > 
> > Instead, this patch changes the tcp_sendmsg() code to avoid
> > touching the page frag after performing the copy from the
> > user-space buffer. Any page fault or memory reclaim operation
> > there is now free to touch the task page fragment without
> > corrupting the state used by the outer sendmsg().
> > 
> > As a downside, if the user-space copy fails, there will be
> > some additional atomic operations due to the reference counting
> > on the faulty fragment, but that looks acceptable for a slow
> > error path.
> > 
> > Reported-by: Steffen Froemer <sfroemer@redhat.com>
> > Fixes: 5640f7685831 ("net: use a per task frag allocator")
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  net/ipv4/tcp.c | 20 ++++++++++++--------
> >  1 file changed, 12 insertions(+), 8 deletions(-)
> > 
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index bbb3d39c69af..2d85636c1577 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -1304,6 +1304,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >                         bool merge = true;
> >                         int i = skb_shinfo(skb)->nr_frags;
> >                         struct page_frag *pfrag = sk_page_frag(sk);
> > +                       unsigned int offset;
> > 
> >                         if (!sk_page_frag_refill(sk, pfrag))
> >                                 goto wait_for_space;
> > @@ -1331,14 +1332,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >                         if (!sk_wmem_schedule(sk, copy))
> >                                 goto wait_for_space;
> > 
> > -                       err = skb_copy_to_page_nocache(sk, &msg->msg_iter, skb,
> > -                                                      pfrag->page,
> > -                                                      pfrag->offset,
> > -                                                      copy);
> > -                       if (err)
> > -                               goto do_error;
> > -
> > -                       /* Update the skb. */
> > +                       /* Update the skb before accessing the user space buffer
> > +                        * so that we leave the task frag in a consistent state.
> > +                        * Just in case the page_fault handler need to use it
> > +                        */
> > +                       offset = pfrag->offset;
> >                         if (merge) {
> >                                 skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
> >                         } else {
> > @@ -1347,6 +1345,12 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >                                 page_ref_inc(pfrag->page);
> >                         }
> >                         pfrag->offset += copy;
> > +
> > +                       err = skb_copy_to_page_nocache(sk, &msg->msg_iter, skb,
> > +                                                      pfrag->page,
> > +                                                      offset, copy);
> > +                       if (err)
> > +                               goto do_error;
> >                 } else {
> >                         /* First append to a fragless skb builds initial
> >                          * pure zerocopy skb
> > --
> > 2.33.1
> > 
> 
> This patch is completely wrong, you just horribly broke TCP.

Double checking I understood correctly: the problem is that in case of
skb_copy_to_page_nocache() error, if the skb is not empty, random data
will be introduced into the TCP stream, or something else/more? I
obviously did not see that before the submission, nor tests cached it,
sorry.

> Please investigate CIFS and gfpflags_normal_context() tandem to fix
> this issue instead.

Do you mean changing gfpflags_normal_context() definition so that cifs
allocation are excluded? Something alike the following should do:

---
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index b976c4177299..f9286aeeded5 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -379,8 +379,8 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
  */
 static inline bool gfpflags_normal_context(const gfp_t gfp_flags)
 {
-       return (gfp_flags & (__GFP_DIRECT_RECLAIM | __GFP_MEMALLOC)) ==
-               __GFP_DIRECT_RECLAIM;
+       return (gfp_flags & (__GFP_DIRECT_RECLAIM | __GFP_MEMALLOC | __GFP_FS)) ==
+               (__GFP_DIRECT_RECLAIM | __GFP_FS);
 }
---
If so there is a caveat: dlm is currently using
gfpflags_normal_context() - apparently to check for non blocking
context. If I'll change gfpflags_normal_context() definition I likely
will have to replace gfpflags_normal_context() with
gfpflags_allow_blocking() in dlm.

In that case the relevant patch should touch both the mm and the fs
subsystem. In that case I guess I should go via the fs tree first and
the via mm?

Thanks!

Paolo
Eric Dumazet Nov. 26, 2021, 3:27 p.m. UTC | #3
On Fri, Nov 26, 2021 at 7:05 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hello,
>
...

> Double checking I understood correctly: the problem is that in case of
> skb_copy_to_page_nocache() error, if the skb is not empty, random data
> will be introduced into the TCP stream, or something else/more? I
> obviously did not see that before the submission, nor tests cached it,
> sorry.

If there is a copy error, the skb is left with an extended fragment
(if a merge happened)
or a new frag, with random data in it, as the copy operation failed.

We are thus going to send garbage on the network on next sendmsg(),
which will append
more stuff in the frag array.

>
> > Please investigate CIFS and gfpflags_normal_context() tandem to fix
> > this issue instead.
>
> Do you mean changing gfpflags_normal_context() definition so that cifs
> allocation are excluded? Something alike the following should do:

We need to find one flag that can ask  gfpflags_normal_context() to
return false for this case.

Or if too difficult, stop only using sk->sk_allocation to decide
between the per-thread frag, or the per socket one.

I presume there are few active CIFS sockets on a host. They should use
a per socket sk_frag.



>
> ---
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index b976c4177299..f9286aeeded5 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -379,8 +379,8 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
>   */
>  static inline bool gfpflags_normal_context(const gfp_t gfp_flags)
>  {
> -       return (gfp_flags & (__GFP_DIRECT_RECLAIM | __GFP_MEMALLOC)) ==
> -               __GFP_DIRECT_RECLAIM;
> +       return (gfp_flags & (__GFP_DIRECT_RECLAIM | __GFP_MEMALLOC | __GFP_FS)) ==
> +               (__GFP_DIRECT_RECLAIM | __GFP_FS);
>  }
> ---
> If so there is a caveat: dlm is currently using
> gfpflags_normal_context() - apparently to check for non blocking
> context. If I'll change gfpflags_normal_context() definition I likely
> will have to replace gfpflags_normal_context() with
> gfpflags_allow_blocking() in dlm.
>
> In that case the relevant patch should touch both the mm and the fs
> subsystem. In that case I guess I should go via the fs tree first and
> the via mm?
>
> Thanks!
>
> Paolo
>
Eric Dumazet Nov. 26, 2021, 3:32 p.m. UTC | #4
On Fri, Nov 26, 2021 at 7:27 AM Eric Dumazet <edumazet@google.com> wrote:

> We need to find one flag that can ask  gfpflags_normal_context() to
> return false for this case.
>
> Or if too difficult, stop only using sk->sk_allocation to decide
> between the per-thread frag, or the per socket one.
>
> I presume there are few active CIFS sockets on a host. They should use
> a per socket sk_frag.
>

A pure networking change could be to use a new MSG_  flag to force sendmsg()
to use the sk->sk_frag, but that would add yet another test in TCP fast path.

About gfpflags_normal_context being used by dlm : we can simply add our own
helper with a new name describing that we want:

Both being in process context, and not from page fault handler .
Paolo Abeni Nov. 26, 2021, 4:04 p.m. UTC | #5
Hello,

On Fri, 2021-11-26 at 07:32 -0800, Eric Dumazet wrote:
> On Fri, Nov 26, 2021 at 7:27 AM Eric Dumazet <edumazet@google.com> wrote:
> 
> > We need to find one flag that can ask  gfpflags_normal_context() to
> > return false for this case.
> > 
> > Or if too difficult, stop only using sk->sk_allocation to decide
> > between the per-thread frag, or the per socket one.
> > 
> > I presume there are few active CIFS sockets on a host. They should use
> > a per socket sk_frag.
> > 
> 
> A pure networking change could be to use a new MSG_  flag to force sendmsg()
> to use the sk->sk_frag, but that would add yet another test in TCP fast path.
> 
> About gfpflags_normal_context being used by dlm : we can simply add our own
> helper with a new name describing that we want:
> 
> Both being in process context, and not from page fault handler .

Thanks for the hints! I'm testing the following (replacing the helper
with direct usage of a suitable check). Plus some adjusting to the
sk_page_frag() description.

Paolo
---
diff --git a/include/net/sock.h b/include/net/sock.h
index b32906e1ab55..8ae6fecfa18c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2442,7 +2442,8 @@ static inline void sk_stream_moderate_sndbuf(struct sock *sk)
  */
 static inline struct page_frag *sk_page_frag(struct sock *sk)
 {
-       if (gfpflags_normal_context(sk->sk_allocation))
+       if ((sk->sk_allocation & (__GFP_DIRECT_RECLAIM | __GFP_MEMALLOC | __GFP_FS)) ==
+           (__GFP_DIRECT_RECLAIM | __GFP_FS))
                return &current->task_frag;
 
        return &sk->sk_frag;
diff mbox series

Patch

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index bbb3d39c69af..2d85636c1577 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1304,6 +1304,7 @@  int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 			bool merge = true;
 			int i = skb_shinfo(skb)->nr_frags;
 			struct page_frag *pfrag = sk_page_frag(sk);
+			unsigned int offset;
 
 			if (!sk_page_frag_refill(sk, pfrag))
 				goto wait_for_space;
@@ -1331,14 +1332,11 @@  int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 			if (!sk_wmem_schedule(sk, copy))
 				goto wait_for_space;
 
-			err = skb_copy_to_page_nocache(sk, &msg->msg_iter, skb,
-						       pfrag->page,
-						       pfrag->offset,
-						       copy);
-			if (err)
-				goto do_error;
-
-			/* Update the skb. */
+			/* Update the skb before accessing the user space buffer
+			 * so that we leave the task frag in a consistent state.
+			 * Just in case the page_fault handler need to use it
+			 */
+			offset = pfrag->offset;
 			if (merge) {
 				skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
 			} else {
@@ -1347,6 +1345,12 @@  int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 				page_ref_inc(pfrag->page);
 			}
 			pfrag->offset += copy;
+
+			err = skb_copy_to_page_nocache(sk, &msg->msg_iter, skb,
+						       pfrag->page,
+						       offset, copy);
+			if (err)
+				goto do_error;
 		} else {
 			/* First append to a fragless skb builds initial
 			 * pure zerocopy skb