diff mbox series

[bpf-next,v1,1/4] tcp: introduce tcp_read_skb()

Message ID 20220410161042.183540-2-xiyou.wangcong@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series sockmap: some performance optimizations | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1106 this patch: 1106
netdev/cc_maintainers warning 12 maintainers not CCed: songliubraving@fb.com davem@davemloft.net andrii@kernel.org kuba@kernel.org pabeni@redhat.com yoshfuji@linux-ipv6.org dsahern@kernel.org kafai@fb.com yhs@fb.com bpf@vger.kernel.org kpsingh@kernel.org ast@kernel.org
netdev/build_clang success Errors and warnings before: 133 this patch: 133
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1111 this patch: 1111
netdev/checkpatch warning WARNING: line length of 84 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-1 fail Logs for Kernel LATEST on ubuntu-latest + selftests
bpf/vmtest-bpf-next-VM_Test-2 fail Logs for Kernel LATEST on z15 + selftests

Commit Message

Cong Wang April 10, 2022, 4:10 p.m. UTC
From: Cong Wang <cong.wang@bytedance.com>

This patch inroduces tcp_read_skb() based on tcp_read_sock(),
a preparation for the next patch which actually introduces
a new sock ops.

TCP is special here, because it has tcp_read_sock() which is
mainly used by splice(). tcp_read_sock() supports partial read
and arbitrary offset, neither of them is needed for sockmap.

Cc: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/net/tcp.h |  2 ++
 net/ipv4/tcp.c    | 72 +++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 66 insertions(+), 8 deletions(-)

Comments

John Fastabend April 12, 2022, 8 p.m. UTC | #1
Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> This patch inroduces tcp_read_skb() based on tcp_read_sock(),
> a preparation for the next patch which actually introduces
> a new sock ops.
> 
> TCP is special here, because it has tcp_read_sock() which is
> mainly used by splice(). tcp_read_sock() supports partial read
> and arbitrary offset, neither of them is needed for sockmap.
> 
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---

Thanks for doing this Cong comment/question inline.

[...]

> +int tcp_read_skb(struct sock *sk, read_descriptor_t *desc,
> +		 sk_read_actor_t recv_actor)
> +{
> +	struct sk_buff *skb;
> +	struct tcp_sock *tp = tcp_sk(sk);
> +	u32 seq = tp->copied_seq;
> +	u32 offset;
> +	int copied = 0;
> +
> +	if (sk->sk_state == TCP_LISTEN)
> +		return -ENOTCONN;
> +	while ((skb = tcp_recv_skb(sk, seq, &offset, true)) != NULL) {

I'm trying to see why we might have an offset here if we always
consume the entire skb. There is a comment in tcp_recv_skb around
GRO packets, but not clear how this applies here if it does at all
to me yet. Will read a bit more I guess.

If the offset can be >0 than we also need to fix the recv_actor to
account for the extra offset in the skb. As is the bpf prog might
see duplicate data. This is a problem on the stream parser now.

Then another fallout is if offset is zero than we could just do
a skb_dequeue here and skip the tcp_recv_skb bool flag addition
and upate.

I'll continue reading after a few other things I need to get
sorted this afternoon, but maybe you have the answer on hand.

> +		if (offset < skb->len) {
> +			int used;
> +			size_t len;
> +
> +			len = skb->len - offset;
> +			used = recv_actor(desc, skb, offset, len);
> +			if (used <= 0) {
> +				if (!copied)
> +					copied = used;
> +				break;
> +			}
> +			if (WARN_ON_ONCE(used > len))
> +				used = len;
> +			seq += used;
> +			copied += used;
> +			offset += used;
> +
> +			if (offset != skb->len)
> +				continue;
> +		}
> +		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
> +			kfree_skb(skb);
> +			++seq;
> +			break;
> +		}
> +		kfree_skb(skb);
> +		if (!desc->count)
> +			break;
> +		WRITE_ONCE(tp->copied_seq, seq);
> +	}
> +	WRITE_ONCE(tp->copied_seq, seq);
> +
> +	tcp_rcv_space_adjust(sk);
> +
> +	/* Clean up data we have read: This will do ACK frames. */
> +	if (copied > 0)
> +		tcp_cleanup_rbuf(sk, copied);
> +
> +	return copied;
> +}
> +EXPORT_SYMBOL(tcp_read_skb);
> +
Cong Wang April 21, 2022, 7 p.m. UTC | #2
On Tue, Apr 12, 2022 at 1:00 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > This patch inroduces tcp_read_skb() based on tcp_read_sock(),
> > a preparation for the next patch which actually introduces
> > a new sock ops.
> >
> > TCP is special here, because it has tcp_read_sock() which is
> > mainly used by splice(). tcp_read_sock() supports partial read
> > and arbitrary offset, neither of them is needed for sockmap.
> >
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > ---
>
> Thanks for doing this Cong comment/question inline.
>
> [...]
>
> > +int tcp_read_skb(struct sock *sk, read_descriptor_t *desc,
> > +              sk_read_actor_t recv_actor)
> > +{
> > +     struct sk_buff *skb;
> > +     struct tcp_sock *tp = tcp_sk(sk);
> > +     u32 seq = tp->copied_seq;
> > +     u32 offset;
> > +     int copied = 0;
> > +
> > +     if (sk->sk_state == TCP_LISTEN)
> > +             return -ENOTCONN;
> > +     while ((skb = tcp_recv_skb(sk, seq, &offset, true)) != NULL) {
>
> I'm trying to see why we might have an offset here if we always
> consume the entire skb. There is a comment in tcp_recv_skb around
> GRO packets, but not clear how this applies here if it does at all
> to me yet. Will read a bit more I guess.
>
> If the offset can be >0 than we also need to fix the recv_actor to
> account for the extra offset in the skb. As is the bpf prog might
> see duplicate data. This is a problem on the stream parser now.
>
> Then another fallout is if offset is zero than we could just do
> a skb_dequeue here and skip the tcp_recv_skb bool flag addition
> and upate.

I think it is mainly for splice(), and of course strparser, but none of
them is touched by my patchset.

>
> I'll continue reading after a few other things I need to get
> sorted this afternoon, but maybe you have the answer on hand.
>

Please let me know if you have any other comments.

Thanks.
Jakub Kicinski April 25, 2022, 7:07 p.m. UTC | #3
On Sun, 10 Apr 2022 09:10:39 -0700 Cong Wang wrote:
> +int tcp_read_skb(struct sock *sk, read_descriptor_t *desc,
> +		 sk_read_actor_t recv_actor)
> +{
> +	struct sk_buff *skb;
> +	struct tcp_sock *tp = tcp_sk(sk);
> +	u32 seq = tp->copied_seq;
> +	u32 offset;
> +	int copied = 0;
> +
> +	if (sk->sk_state == TCP_LISTEN)
> +		return -ENOTCONN;
> +	while ((skb = tcp_recv_skb(sk, seq, &offset, true)) != NULL) {
> +		if (offset < skb->len) {
> +			int used;
> +			size_t len;
> +
> +			len = skb->len - offset;
> +			used = recv_actor(desc, skb, offset, len);
> +			if (used <= 0) {
> +				if (!copied)
> +					copied = used;
> +				break;
> +			}
> +			if (WARN_ON_ONCE(used > len))
> +				used = len;
> +			seq += used;
> +			copied += used;
> +			offset += used;
> +
> +			if (offset != skb->len)
> +				continue;
> +		}
> +		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
> +			kfree_skb(skb);
> +			++seq;
> +			break;
> +		}
> +		kfree_skb(skb);
> +		if (!desc->count)
> +			break;
> +		WRITE_ONCE(tp->copied_seq, seq);
> +	}
> +	WRITE_ONCE(tp->copied_seq, seq);
> +
> +	tcp_rcv_space_adjust(sk);
> +
> +	/* Clean up data we have read: This will do ACK frames. */
> +	if (copied > 0)
> +		tcp_cleanup_rbuf(sk, copied);
> +
> +	return copied;
> +}
> +EXPORT_SYMBOL(tcp_read_skb);

I started prototyping a similar patch for TLS a while back but I have
two functions - one to get the skb and another to consume it. I thought
that's better for TLS, otherwise skbs stuck in the middle layer are not
counted towards the rbuf. Any thoughts on structuring the API that way?
I guess we can refactor that later, since TLS TCP-only we don't need
proto_ops plumbing there.

Overall 
John Fastabend April 26, 2022, 6:27 a.m. UTC | #4
Cong Wang wrote:
> On Tue, Apr 12, 2022 at 1:00 PM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Cong Wang wrote:
> > > From: Cong Wang <cong.wang@bytedance.com>
> > >
> > > This patch inroduces tcp_read_skb() based on tcp_read_sock(),
> > > a preparation for the next patch which actually introduces
> > > a new sock ops.
> > >
> > > TCP is special here, because it has tcp_read_sock() which is
> > > mainly used by splice(). tcp_read_sock() supports partial read
> > > and arbitrary offset, neither of them is needed for sockmap.
> > >
> > > Cc: Eric Dumazet <edumazet@google.com>
> > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > > ---
> >
> > Thanks for doing this Cong comment/question inline.
> >
> > [...]
> >
> > > +int tcp_read_skb(struct sock *sk, read_descriptor_t *desc,
> > > +              sk_read_actor_t recv_actor)
> > > +{
> > > +     struct sk_buff *skb;
> > > +     struct tcp_sock *tp = tcp_sk(sk);
> > > +     u32 seq = tp->copied_seq;
> > > +     u32 offset;
> > > +     int copied = 0;
> > > +
> > > +     if (sk->sk_state == TCP_LISTEN)
> > > +             return -ENOTCONN;
> > > +     while ((skb = tcp_recv_skb(sk, seq, &offset, true)) != NULL) {
> >
> > I'm trying to see why we might have an offset here if we always
> > consume the entire skb. There is a comment in tcp_recv_skb around
> > GRO packets, but not clear how this applies here if it does at all
> > to me yet. Will read a bit more I guess.
> >
> > If the offset can be >0 than we also need to fix the recv_actor to
> > account for the extra offset in the skb. As is the bpf prog might
> > see duplicate data. This is a problem on the stream parser now.
> >
> > Then another fallout is if offset is zero than we could just do
> > a skb_dequeue here and skip the tcp_recv_skb bool flag addition
> > and upate.
> 
> I think it is mainly for splice(), and of course strparser, but none of
> them is touched by my patchset.
> 
> >
> > I'll continue reading after a few other things I need to get
> > sorted this afternoon, but maybe you have the answer on hand.
> >
> 
> Please let me know if you have any other comments.

For now no more comments. If its not used then we can drop the offset
logic in this patch and the code looks much simpler.

> 
> Thanks.
Jakub Sitnicki April 26, 2022, 9:11 a.m. UTC | #5
On Sun, Apr 10, 2022 at 09:10 AM -07, Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
>
> This patch inroduces tcp_read_skb() based on tcp_read_sock(),
> a preparation for the next patch which actually introduces
> a new sock ops.
>
> TCP is special here, because it has tcp_read_sock() which is
> mainly used by splice(). tcp_read_sock() supports partial read
> and arbitrary offset, neither of them is needed for sockmap.
>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---
>  include/net/tcp.h |  2 ++
>  net/ipv4/tcp.c    | 72 +++++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 66 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 6d50a662bf89..f0d4ce6855e1 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -667,6 +667,8 @@ void tcp_get_info(struct sock *, struct tcp_info *);
>  /* Read 'sendfile()'-style from a TCP socket */
>  int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
>  		  sk_read_actor_t recv_actor);
> +int tcp_read_skb(struct sock *sk, read_descriptor_t *desc,
> +		 sk_read_actor_t recv_actor);

Do you think it would be worth adding docs for the newly added function?
Why it exists and how is it different from the tcp_read_sock which has
the same interface?

[...]
Cong Wang April 30, 2022, 5:17 p.m. UTC | #6
On Mon, Apr 25, 2022 at 11:27 PM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> For now no more comments. If its not used then we can drop the offset
> logic in this patch and the code looks much simpler.

Good point, it is left there mainly for tcp_recv_skb(), but it can be dropped
for the rest.

Thanks.
Cong Wang April 30, 2022, 5:18 p.m. UTC | #7
On Tue, Apr 26, 2022 at 2:12 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Sun, Apr 10, 2022 at 09:10 AM -07, Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > This patch inroduces tcp_read_skb() based on tcp_read_sock(),
> > a preparation for the next patch which actually introduces
> > a new sock ops.
> >
> > TCP is special here, because it has tcp_read_sock() which is
> > mainly used by splice(). tcp_read_sock() supports partial read
> > and arbitrary offset, neither of them is needed for sockmap.
> >
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > ---
> >  include/net/tcp.h |  2 ++
> >  net/ipv4/tcp.c    | 72 +++++++++++++++++++++++++++++++++++++++++------
> >  2 files changed, 66 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index 6d50a662bf89..f0d4ce6855e1 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -667,6 +667,8 @@ void tcp_get_info(struct sock *, struct tcp_info *);
> >  /* Read 'sendfile()'-style from a TCP socket */
> >  int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
> >                 sk_read_actor_t recv_actor);
> > +int tcp_read_skb(struct sock *sk, read_descriptor_t *desc,
> > +              sk_read_actor_t recv_actor);
>
> Do you think it would be worth adding docs for the newly added function?
> Why it exists and how is it different from the tcp_read_sock which has
> the same interface?

Yeah, I will add some comments to explain this in V2.

Thanks.
Cong Wang April 30, 2022, 5:22 p.m. UTC | #8
On Mon, Apr 25, 2022 at 12:07 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 10 Apr 2022 09:10:39 -0700 Cong Wang wrote:
> > +int tcp_read_skb(struct sock *sk, read_descriptor_t *desc,
> > +              sk_read_actor_t recv_actor)
> > +{
> > +     struct sk_buff *skb;
> > +     struct tcp_sock *tp = tcp_sk(sk);
> > +     u32 seq = tp->copied_seq;
> > +     u32 offset;
> > +     int copied = 0;
> > +
> > +     if (sk->sk_state == TCP_LISTEN)
> > +             return -ENOTCONN;
> > +     while ((skb = tcp_recv_skb(sk, seq, &offset, true)) != NULL) {
> > +             if (offset < skb->len) {
> > +                     int used;
> > +                     size_t len;
> > +
> > +                     len = skb->len - offset;
> > +                     used = recv_actor(desc, skb, offset, len);
> > +                     if (used <= 0) {
> > +                             if (!copied)
> > +                                     copied = used;
> > +                             break;
> > +                     }
> > +                     if (WARN_ON_ONCE(used > len))
> > +                             used = len;
> > +                     seq += used;
> > +                     copied += used;
> > +                     offset += used;
> > +
> > +                     if (offset != skb->len)
> > +                             continue;
> > +             }
> > +             if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
> > +                     kfree_skb(skb);
> > +                     ++seq;
> > +                     break;
> > +             }
> > +             kfree_skb(skb);
> > +             if (!desc->count)
> > +                     break;
> > +             WRITE_ONCE(tp->copied_seq, seq);
> > +     }
> > +     WRITE_ONCE(tp->copied_seq, seq);
> > +
> > +     tcp_rcv_space_adjust(sk);
> > +
> > +     /* Clean up data we have read: This will do ACK frames. */
> > +     if (copied > 0)
> > +             tcp_cleanup_rbuf(sk, copied);
> > +
> > +     return copied;
> > +}
> > +EXPORT_SYMBOL(tcp_read_skb);
>
> I started prototyping a similar patch for TLS a while back but I have
> two functions - one to get the skb and another to consume it. I thought
> that's better for TLS, otherwise skbs stuck in the middle layer are not
> counted towards the rbuf. Any thoughts on structuring the API that way?
> I guess we can refactor that later, since TLS TCP-only we don't need
> proto_ops plumbing there.

Do you have a pointer to the source code? I am not sure how TLS uses
->read_sock() (or which interface is relevant).

>
> Overall 
Jakub Kicinski May 2, 2022, 4:13 p.m. UTC | #9
On Sat, 30 Apr 2022 10:22:33 -0700 Cong Wang wrote:
> > I started prototyping a similar patch for TLS a while back but I have
> > two functions - one to get the skb and another to consume it. I thought
> > that's better for TLS, otherwise skbs stuck in the middle layer are not
> > counted towards the rbuf. Any thoughts on structuring the API that way?
> > I guess we can refactor that later, since TLS TCP-only we don't need
> > proto_ops plumbing there.  
> 
> Do you have a pointer to the source code? I am not sure how TLS uses
> ->read_sock() (or which interface is relevant).  

Nothing useful, I started hacking on removing strparser but then got
distracted with functional optimizations. TLS calls ->read_sock() thru
strparser.

With a little bit of code duplication TLS should be able to avoid 
the strparser's heavy machinery and cloning each skb.
diff mbox series

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6d50a662bf89..f0d4ce6855e1 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -667,6 +667,8 @@  void tcp_get_info(struct sock *, struct tcp_info *);
 /* Read 'sendfile()'-style from a TCP socket */
 int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 		  sk_read_actor_t recv_actor);
+int tcp_read_skb(struct sock *sk, read_descriptor_t *desc,
+		 sk_read_actor_t recv_actor);
 
 void tcp_initialize_rcv_mss(struct sock *sk);
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e31cf137c614..8b054bcc6849 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1619,7 +1619,7 @@  static void tcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb)
 	__kfree_skb(skb);
 }
 
-static struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off)
+static struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off, bool unlink)
 {
 	struct sk_buff *skb;
 	u32 offset;
@@ -1632,6 +1632,8 @@  static struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off)
 		}
 		if (offset < skb->len || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) {
 			*off = offset;
+			if (unlink)
+				__skb_unlink(skb, &sk->sk_receive_queue);
 			return skb;
 		}
 		/* This looks weird, but this can happen if TCP collapsing
@@ -1665,7 +1667,7 @@  int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 
 	if (sk->sk_state == TCP_LISTEN)
 		return -ENOTCONN;
-	while ((skb = tcp_recv_skb(sk, seq, &offset)) != NULL) {
+	while ((skb = tcp_recv_skb(sk, seq, &offset, false)) != NULL) {
 		if (offset < skb->len) {
 			int used;
 			size_t len;
@@ -1696,7 +1698,7 @@  int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 			 * getting here: tcp_collapse might have deleted it
 			 * while aggregating skbs from the socket queue.
 			 */
-			skb = tcp_recv_skb(sk, seq - 1, &offset);
+			skb = tcp_recv_skb(sk, seq - 1, &offset, false);
 			if (!skb)
 				break;
 			/* TCP coalescing might have appended data to the skb.
@@ -1721,13 +1723,67 @@  int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 
 	/* Clean up data we have read: This will do ACK frames. */
 	if (copied > 0) {
-		tcp_recv_skb(sk, seq, &offset);
+		tcp_recv_skb(sk, seq, &offset, false);
 		tcp_cleanup_rbuf(sk, copied);
 	}
 	return copied;
 }
 EXPORT_SYMBOL(tcp_read_sock);
 
+int tcp_read_skb(struct sock *sk, read_descriptor_t *desc,
+		 sk_read_actor_t recv_actor)
+{
+	struct sk_buff *skb;
+	struct tcp_sock *tp = tcp_sk(sk);
+	u32 seq = tp->copied_seq;
+	u32 offset;
+	int copied = 0;
+
+	if (sk->sk_state == TCP_LISTEN)
+		return -ENOTCONN;
+	while ((skb = tcp_recv_skb(sk, seq, &offset, true)) != NULL) {
+		if (offset < skb->len) {
+			int used;
+			size_t len;
+
+			len = skb->len - offset;
+			used = recv_actor(desc, skb, offset, len);
+			if (used <= 0) {
+				if (!copied)
+					copied = used;
+				break;
+			}
+			if (WARN_ON_ONCE(used > len))
+				used = len;
+			seq += used;
+			copied += used;
+			offset += used;
+
+			if (offset != skb->len)
+				continue;
+		}
+		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
+			kfree_skb(skb);
+			++seq;
+			break;
+		}
+		kfree_skb(skb);
+		if (!desc->count)
+			break;
+		WRITE_ONCE(tp->copied_seq, seq);
+	}
+	WRITE_ONCE(tp->copied_seq, seq);
+
+	tcp_rcv_space_adjust(sk);
+
+	/* Clean up data we have read: This will do ACK frames. */
+	if (copied > 0)
+		tcp_cleanup_rbuf(sk, copied);
+
+	return copied;
+}
+EXPORT_SYMBOL(tcp_read_skb);
+
 int tcp_peek_len(struct socket *sock)
 {
 	return tcp_inq(sock->sk);
@@ -1910,7 +1966,7 @@  static int receive_fallback_to_copy(struct sock *sk,
 		struct sk_buff *skb;
 		u32 offset;
 
-		skb = tcp_recv_skb(sk, tcp_sk(sk)->copied_seq, &offset);
+		skb = tcp_recv_skb(sk, tcp_sk(sk)->copied_seq, &offset, false);
 		if (skb)
 			tcp_zerocopy_set_hint_for_skb(sk, zc, skb, offset);
 	}
@@ -1957,7 +2013,7 @@  static int tcp_zc_handle_leftover(struct tcp_zerocopy_receive *zc,
 	if (skb) {
 		offset = *seq - TCP_SKB_CB(skb)->seq;
 	} else {
-		skb = tcp_recv_skb(sk, *seq, &offset);
+		skb = tcp_recv_skb(sk, *seq, &offset, false);
 		if (TCP_SKB_CB(skb)->has_rxtstamp) {
 			tcp_update_recv_tstamps(skb, tss);
 			zc->msg_flags |= TCP_CMSG_TS;
@@ -2150,7 +2206,7 @@  static int tcp_zerocopy_receive(struct sock *sk,
 				skb = skb->next;
 				offset = seq - TCP_SKB_CB(skb)->seq;
 			} else {
-				skb = tcp_recv_skb(sk, seq, &offset);
+				skb = tcp_recv_skb(sk, seq, &offset, false);
 			}
 
 			if (TCP_SKB_CB(skb)->has_rxtstamp) {
@@ -2206,7 +2262,7 @@  static int tcp_zerocopy_receive(struct sock *sk,
 		tcp_rcv_space_adjust(sk);
 
 		/* Clean up data we have read: This will do ACK frames. */
-		tcp_recv_skb(sk, seq, &offset);
+		tcp_recv_skb(sk, seq, &offset, false);
 		tcp_cleanup_rbuf(sk, length + copylen);
 		ret = 0;
 		if (length == zc->length)