From patchwork Fri May 10 23:21:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 13662025 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26443177991 for ; Fri, 10 May 2024 23:21:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715383328; cv=none; b=G2aAfrsI+RSIKuz/M0qMFjNw51CYFAsrxK7ORjBIQA9ec72+SHZqO3q4VfRmNjgOS3QuP7HlpjYfZTqUJ4U+Fh00wT8GiUgTtW4Av3IUn1SSAwLFBu0XQ3+wnaO6k57pcvIy6c61B+aro4quyVzNtlqF8QtU40r3lfRGPCn8mKE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715383328; c=relaxed/simple; bh=lbfvUeR97uGiv/DJUWchChmhJ1feFUZov6MuqG579sI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=EKFI3fRRvi1bs667Z0+Qx+URHh5QRpmRxZ3u8DnrrVJeXUCbJUreg78eydMk8e9ita9nLLzuLABFeXOlLw2t+Q9o+h0SglbiSBnMCWmrwY3MUT0/snZIAJBbXO7gbTRIcDO78BF1FV1NfBl2ILoPEuDYFpucXdQk1Aj+xvADCn8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=g2Z0msXM; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="g2Z0msXM" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-61bd64c9eadso39329537b3.3 for ; Fri, 10 May 2024 16:21:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1715383313; x=1715988113; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mL94qVPkrTUdaF+8HXmC1ljx/QrLgs/xorgHKpLbE1g=; b=g2Z0msXMAq5hlT3vUWYXjEbSYAFTgxgfx2kvEX9Yz0yGSXObbVZDQUVn3A8TW0hLXh 3lXF9OfiOCo87yTaFN/NPeBWPM6mgo5xmc0YneiQcA4XfMdYt/wUq3rqTr/dd3GObBRs 93CeXItzkYJ3Gge7nAX0Xp9PbpK8J/L2W7m/pvO5x7GBVDwUYpvPR7Xc3V9OG1hlOaE9 mtsI/+d5lJVmD/pc2CTMgvQaobhRmjvwiLkJVoYTIpSfvZ2zfDOFnSQqOhqtfhX7htvL nVSZi1eIrbFPvNzeDWp0EHSJn100jemg2tScWXwdjHoj1ofpZaIkAAnjZb9B+hY3xTzY hTKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715383313; x=1715988113; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mL94qVPkrTUdaF+8HXmC1ljx/QrLgs/xorgHKpLbE1g=; b=SLyoeKHq878zZR1Fx+ZxwFxZ3YVfQgrhO21xYzHgA9K137D7NTqjdc/HJCRMyYMawq /sB0x7DTjRE4hDfnkwEuF9KwNhSPn7nPSaQsYmRYfxQ4jwOGRjOAC811A26YdGUZUd10 M5RQDA3bxTFX07BtpezmFLdsn/vzFXoEyXf9E0eu1/DwMcRaxri6zfAyr60E6zm/YgXx PBrIZ/dzsNDGlYt9mz+noqUEgBnqHp7b0l2T2J4EZmUOVQ3zbSQCAd85jukyCbjarxYe pwr6QD1FK2D/owBmlNvkuYST+qlir2teGNQxdoWVJbfvi9ioHrmR3BnYVxv7cRy261Dc 1TBA== X-Forwarded-Encrypted: i=1; AJvYcCVgnOX+CdavS7KMIZj/xbZJyRpvR0oatnjiNeLDGxnVNwWnrQJKf3ib1IqQe7DJtnA7TChBdU2o3gKJpIRafCZy9FTukBSNA409qw== X-Gm-Message-State: AOJu0Yy4lMki2H5tKfeXa+kWSqGYtPSjOR8zM6DTMpypRtvXv5SLeRHj 4RnjsPob4aeTbudFCXr3dFzspVMBxJOfp30/yXuqWgwNGY5O9u0Eh3YsOO0ZCazfLxWd4ZUWEgy QsVdy5KbtnmPoPNaHkcuOCQ== X-Google-Smtp-Source: AGHT+IF4+UIlqOUZdsHp8JxRT3YAbz5SwADgnP4RkaJQk+U6jFOlJktlCidriw0ZC0BT9bxnT8tke87hhHXiA8eXyg== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2c4:200:3428:1b4e:e75a:d31b]) (user=almasrymina job=sendgmr) by 2002:a05:6902:154e:b0:dee:6866:98c9 with SMTP id 3f1490d57ef6-dee68669b8cmr44814276.1.1715383313482; Fri, 10 May 2024 16:21:53 -0700 (PDT) Date: Fri, 10 May 2024 16:21:21 -0700 In-Reply-To: <20240510232128.1105145-1-almasrymina@google.com> Precedence: bulk X-Mailing-List: linux-mips@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240510232128.1105145-1-almasrymina@google.com> X-Mailer: git-send-email 2.45.0.118.g7fe29c98d7-goog Message-ID: <20240510232128.1105145-11-almasrymina@google.com> Subject: [PATCH net-next v9 10/14] net: add support for skbs with unreadable frags From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-arch@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org Cc: Mina Almasry , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Jonathan Corbet , Richard Henderson , Ivan Kokshaysky , Matt Turner , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Andreas Larsson , Jesper Dangaard Brouer , Ilias Apalodimas , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Arnd Bergmann , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Steffen Klassert , Herbert Xu , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , Pavel Begunkov , David Wei , Jason Gunthorpe , Yunsheng Lin , Shailend Chand , Harshitha Ramamurthy , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , Willem de Bruijn , Kaiyuan Zhang For device memory TCP, we expect the skb headers to be available in host memory for access, and we expect the skb frags to be in device memory and unaccessible to the host. We expect there to be no mixing and matching of device memory frags (unaccessible) with host memory frags (accessible) in the same skb. Add a skb->devmem flag which indicates whether the frags in this skb are device memory frags or not. __skb_fill_netmem_desc() now checks frags added to skbs for net_iov, and marks the skb as skb->devmem accordingly. Add checks through the network stack to avoid accessing the frags of devmem skbs and avoid coalescing devmem skbs with non devmem skbs. Signed-off-by: Willem de Bruijn Signed-off-by: Kaiyuan Zhang Signed-off-by: Mina Almasry --- v9: https://lore.kernel.org/netdev/20240403002053.2376017-11-almasrymina@google.com/ - change skb->readable to skb->unreadable (Pavel/David). skb->readable was very complicated, because by default skbs are readable so the flag needed to be set to true in all code paths where new skbs were created or cloned. Forgetting to set skb->readable=true in some paths caused crashes. Flip it to skb->unreadable so that the default 0 value works well, and we only need to set it to true when we add unreadable frags. v6 - skb->dmabuf -> skb->readable (Pavel). Pavel's original suggestion was to remove the skb->dmabuf flag entirely, but when I looked into it closely, I found the issue that if we remove the flag we have to dereference the shinfo(skb) pointer to obtain the first frag, which can cause a performance regression if it dirties the cache line when the shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf flag into a generic skb->readable flag which can be re-used by io_uring. Changes in v1: - Rename devmem -> dmabuf (David). - Flip skb_frags_not_readable (Jakub). --- include/linux/skbuff.h | 19 +++++++++++++++-- include/net/tcp.h | 5 +++-- net/core/datagram.c | 6 ++++++ net/core/gro.c | 5 ++++- net/core/skbuff.c | 48 ++++++++++++++++++++++++++++++++++++++++-- net/ipv4/tcp.c | 3 +++ net/ipv4/tcp_input.c | 13 +++++++++--- net/ipv4/tcp_output.c | 5 ++++- net/packet/af_packet.c | 4 ++-- 9 files changed, 95 insertions(+), 13 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0dcf1c5824415..e5eb49602c290 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -820,6 +820,8 @@ typedef unsigned char *sk_buff_data_t; * @csum_level: indicates the number of consecutive checksums found in * the packet minus one that have been verified as * CHECKSUM_UNNECESSARY (max 3) + * @unreadable: indicates that at least 1 of the fragments in this skb is + * unreadable. * @dst_pending_confirm: need to confirm neighbour * @decrypted: Decrypted SKB * @slow_gro: state present at GRO time, slower prepare step required @@ -1003,7 +1005,7 @@ struct sk_buff { #if IS_ENABLED(CONFIG_IP_SCTP) __u8 csum_not_inet:1; #endif - + __u8 unreadable:1; #if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS) __u16 tc_index; /* traffic control index */ #endif @@ -1794,6 +1796,12 @@ static inline void skb_zcopy_downgrade_managed(struct sk_buff *skb) __skb_zcopy_downgrade_managed(skb); } +/* Return true if frags in this skb are readable by the host. */ +static inline bool skb_frags_readable(const struct sk_buff *skb) +{ + return !skb->unreadable; +} + static inline void skb_mark_not_on_list(struct sk_buff *skb) { skb->next = NULL; @@ -2510,10 +2518,17 @@ static inline void skb_len_add(struct sk_buff *skb, int delta) static inline void __skb_fill_netmem_desc(struct sk_buff *skb, int i, netmem_ref netmem, int off, int size) { - struct page *page = netmem_to_page(netmem); + struct page *page; __skb_fill_netmem_desc_noacc(skb_shinfo(skb), i, netmem, off, size); + if (netmem_is_net_iov(netmem)) { + skb->unreadable = true; + return; + } + + page = netmem_to_page(netmem); + /* Propagate page pfmemalloc to the skb if we can. The problem is * that not all callers have unique ownership of the page but rely * on page_is_pfmemalloc doing the right thing(tm). diff --git a/include/net/tcp.h b/include/net/tcp.h index 8f63a163c7de9..4389e6a525801 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1059,7 +1059,7 @@ static inline int tcp_skb_mss(const struct sk_buff *skb) static inline bool tcp_skb_can_collapse_to(const struct sk_buff *skb) { - return likely(!TCP_SKB_CB(skb)->eor); + return likely(!TCP_SKB_CB(skb)->eor && skb_frags_readable(skb)); } static inline bool tcp_skb_can_collapse(const struct sk_buff *to, @@ -1067,7 +1067,8 @@ static inline bool tcp_skb_can_collapse(const struct sk_buff *to, { return likely(tcp_skb_can_collapse_to(to) && mptcp_skb_can_collapse(to, from) && - skb_pure_zcopy_same(to, from)); + skb_pure_zcopy_same(to, from) && + skb_frags_readable(to) == skb_frags_readable(from)); } /* Events passed to congestion control interface */ diff --git a/net/core/datagram.c b/net/core/datagram.c index e614cfd8e14a5..b29f881df0e83 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -407,6 +407,9 @@ static int __skb_datagram_iter(const struct sk_buff *skb, int offset, return 0; } + if (!skb_frags_readable(skb)) + goto short_copy; + /* Copy paged appendix. Hmm... why does this look so complicated? */ for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { int end; @@ -619,6 +622,9 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, if (msg && msg->msg_ubuf && msg->sg_from_iter) return msg->sg_from_iter(sk, skb, from, length); + if (!skb_frags_readable(skb)) + return -EFAULT; + frag = skb_shinfo(skb)->nr_frags; while (length && iov_iter_count(from)) { diff --git a/net/core/gro.c b/net/core/gro.c index 10fb2145bcb2b..e216e0faeb665 100644 --- a/net/core/gro.c +++ b/net/core/gro.c @@ -424,6 +424,9 @@ static void gro_pull_from_frag0(struct sk_buff *skb, int grow) { struct skb_shared_info *pinfo = skb_shinfo(skb); + if (WARN_ON_ONCE(!skb_frags_readable(skb))) + return; + BUG_ON(skb->end - skb->tail < grow); memcpy(skb_tail_pointer(skb), NAPI_GRO_CB(skb)->frag0, grow); @@ -445,7 +448,7 @@ static void gro_try_pull_from_frag0(struct sk_buff *skb) { int grow = skb_gro_offset(skb) - skb_headlen(skb); - if (grow > 0) + if (grow > 0 && skb_frags_readable(skb)) gro_pull_from_frag0(skb, grow); } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 99cf1ee73836d..7257bfbb8d2c2 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1951,6 +1951,9 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask) if (skb_shared(skb) || skb_unclone(skb, gfp_mask)) return -EINVAL; + if (!skb_frags_readable(skb)) + return -EFAULT; + if (!num_frags) goto release; @@ -2124,6 +2127,9 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask) unsigned int size; int headerlen; + if (!skb_frags_readable(skb)) + return NULL; + if (WARN_ON_ONCE(skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST)) return NULL; @@ -2462,6 +2468,9 @@ struct sk_buff *skb_copy_expand(const struct sk_buff *skb, struct sk_buff *n; int oldheadroom; + if (!skb_frags_readable(skb)) + return NULL; + if (WARN_ON_ONCE(skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST)) return NULL; @@ -2806,6 +2815,9 @@ void *__pskb_pull_tail(struct sk_buff *skb, int delta) */ int i, k, eat = (skb->tail + delta) - skb->end; + if (!skb_frags_readable(skb)) + return NULL; + if (eat > 0 || skb_cloned(skb)) { if (pskb_expand_head(skb, 0, eat > 0 ? eat + 128 : 0, GFP_ATOMIC)) @@ -2959,6 +2971,9 @@ int skb_copy_bits(const struct sk_buff *skb, int offset, void *to, int len) to += copy; } + if (!skb_frags_readable(skb)) + goto fault; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { int end; skb_frag_t *f = &skb_shinfo(skb)->frags[i]; @@ -3147,6 +3162,9 @@ static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe, /* * then map the fragments */ + if (!skb_frags_readable(skb)) + return false; + for (seg = 0; seg < skb_shinfo(skb)->nr_frags; seg++) { const skb_frag_t *f = &skb_shinfo(skb)->frags[seg]; @@ -3370,6 +3388,9 @@ int skb_store_bits(struct sk_buff *skb, int offset, const void *from, int len) from += copy; } + if (!skb_frags_readable(skb)) + goto fault; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; int end; @@ -3449,6 +3470,9 @@ __wsum __skb_checksum(const struct sk_buff *skb, int offset, int len, pos = copy; } + if (!skb_frags_readable(skb)) + return 0; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { int end; skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; @@ -3549,6 +3573,9 @@ __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset, pos = copy; } + if (!skb_frags_readable(skb)) + return 0; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { int end; @@ -4040,6 +4067,7 @@ static inline void skb_split_inside_header(struct sk_buff *skb, skb_shinfo(skb1)->frags[i] = skb_shinfo(skb)->frags[i]; skb_shinfo(skb1)->nr_frags = skb_shinfo(skb)->nr_frags; + skb1->unreadable = skb->unreadable; skb_shinfo(skb)->nr_frags = 0; skb1->data_len = skb->data_len; skb1->len += skb1->data_len; @@ -4054,6 +4082,7 @@ static inline void skb_split_no_header(struct sk_buff *skb, { int i, k = 0; const int nfrags = skb_shinfo(skb)->nr_frags; + const int unreadable = skb->unreadable; skb_shinfo(skb)->nr_frags = 0; skb1->len = skb1->data_len = skb->len - len; @@ -4087,6 +4116,12 @@ static inline void skb_split_no_header(struct sk_buff *skb, pos += size; } skb_shinfo(skb1)->nr_frags = k; + + if (skb_shinfo(skb)->nr_frags) + skb->unreadable = unreadable; + + if (skb_shinfo(skb1)->nr_frags) + skb1->unreadable = unreadable; } /** @@ -4322,6 +4357,9 @@ unsigned int skb_seq_read(unsigned int consumed, const u8 **data, return block_limit - abs_offset; } + if (!skb_frags_readable(st->cur_skb)) + return 0; + if (st->frag_idx == 0 && !st->frag_data) st->stepped_offset += skb_headlen(st->cur_skb); @@ -5934,7 +5972,10 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, if (to->pp_recycle != from->pp_recycle) return false; - if (len <= skb_tailroom(to)) { + if (skb_frags_readable(from) != skb_frags_readable(to)) + return false; + + if (len <= skb_tailroom(to) && skb_frags_readable(from)) { if (len) BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len)); *delta_truesize = 0; @@ -6111,6 +6152,9 @@ int skb_ensure_writable(struct sk_buff *skb, unsigned int write_len) if (!pskb_may_pull(skb, write_len)) return -ENOMEM; + if (!skb_frags_readable(skb)) + return -EFAULT; + if (!skb_cloned(skb) || skb_clone_writable(skb, write_len)) return 0; @@ -6790,7 +6834,7 @@ void skb_condense(struct sk_buff *skb) { if (skb->data_len) { if (skb->data_len > skb->end - skb->tail || - skb_cloned(skb)) + skb_cloned(skb) || !skb_frags_readable(skb)) return; /* Nice, we can free page frag(s) right now */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 7e7f32f95c684..843197c6ea3ea 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2154,6 +2154,9 @@ static int tcp_zerocopy_receive(struct sock *sk, skb = tcp_recv_skb(sk, seq, &offset); } + if (!skb_frags_readable(skb)) + break; + if (TCP_SKB_CB(skb)->has_rxtstamp) { tcp_update_recv_tstamps(skb, tss); zc->msg_flags |= TCP_CMSG_TS; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4ccfec4bb6248..3529a079e32d8 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5342,6 +5342,9 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, for (end_of_skbs = true; skb != NULL && skb != tail; skb = n) { n = tcp_skb_next(skb, list); + if (!skb_frags_readable(skb)) + goto skip_this; + /* No new bits? It is possible on ofo queue. */ if (!before(start, TCP_SKB_CB(skb)->end_seq)) { skb = tcp_collapse_one(sk, skb, list, root); @@ -5362,17 +5365,20 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, break; } - if (n && n != tail && mptcp_skb_can_collapse(skb, n) && + if (n && n != tail && skb_frags_readable(n) && + mptcp_skb_can_collapse(skb, n) && TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(n)->seq) { end_of_skbs = false; break; } +skip_this: /* Decided to skip this, advance start seq. */ start = TCP_SKB_CB(skb)->end_seq; } if (end_of_skbs || - (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN))) + (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)) || + !skb_frags_readable(skb)) return; __skb_queue_head_init(&tmp); @@ -5414,7 +5420,8 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, if (!skb || skb == tail || !mptcp_skb_can_collapse(nskb, skb) || - (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN))) + (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)) || + !skb_frags_readable(skb)) goto end; if (skb_cmp_decrypted(skb, nskb)) goto end; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 95caf8aaa8bea..24afab8e4d0e5 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2344,7 +2344,8 @@ static bool tcp_can_coalesce_send_queue_head(struct sock *sk, int len) if (unlikely(TCP_SKB_CB(skb)->eor) || tcp_has_tx_tstamp(skb) || - !skb_pure_zcopy_same(skb, next)) + !skb_pure_zcopy_same(skb, next) || + skb_frags_readable(skb) != skb_frags_readable(next)) return false; len -= skb->len; @@ -3264,6 +3265,8 @@ static bool tcp_can_collapse(const struct sock *sk, const struct sk_buff *skb) return false; if (skb_cloned(skb)) return false; + if (!skb_frags_readable(skb)) + return false; /* Some heuristics for collapsing over SACK'd could be invented */ if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) return false; diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 8c6d3fbb4ed87..2dc1b18d3b7dc 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2156,7 +2156,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev, } } - snaplen = skb->len; + snaplen = skb_frags_readable(skb) ? skb->len : skb_headlen(skb); res = run_filter(skb, sk, snaplen); if (!res) @@ -2276,7 +2276,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, } } - snaplen = skb->len; + snaplen = skb_frags_readable(skb) ? skb->len : skb_headlen(skb); res = run_filter(skb, sk, snaplen); if (!res)