From patchwork Wed Sep 29 17:25:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 12526403 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28457C433F5 for ; Wed, 29 Sep 2021 17:25:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F188F6147F for ; Wed, 29 Sep 2021 17:25:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344535AbhI2R1A (ORCPT ); Wed, 29 Sep 2021 13:27:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244721AbhI2R06 (ORCPT ); Wed, 29 Sep 2021 13:26:58 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6B3F6C06161C for ; Wed, 29 Sep 2021 10:25:17 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id o1-20020a056902110100b005b69483a0b4so4558354ybu.0 for ; Wed, 29 Sep 2021 10:25:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=5tyZguByoCxHahWkVA7vUtD9hL/qX8pSEx4rXZ5CwI4=; b=o9dIgGCGDugvfytE6fo+QXqsJQiiN9nFtWcMty4AAizRFH9l79N31ebt2h7b5+6yls BvSK295IhU+Xzwf9siC/gf6/1TpKkbv6vQgALeWo7JRFWVIqOHR8MbTzyKdQO2joj8Wb dU8lnA3NrzVAMOz3FasRDLhvaCcdxoWzA8+6VNM6wLexVXFSnaeeDQqnK/rYXB49R3g6 iHbmie4EHYlk96tM8Ww2MPXMW7RkcaryrttwarLKaexUi+HH5dyhguMU81fZMt6D3Rr/ DIOtDTxORm37vDbkvEScWylNvtZPw4egDYCzYS7a/2YY5QsOps0wEp4usp2470qnXkec vA9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=5tyZguByoCxHahWkVA7vUtD9hL/qX8pSEx4rXZ5CwI4=; b=vEHmwkWYmKd9XTERF5CA7h4bEYXF4AeZh6n3VyedG0xjmUm9lmltdExz96dyEMzyed nZ89WLVyylH8ze4vEi7bab2mZTHfDcwDVMUHSsyGkJjpWAUzAKIvQX4AtXtjoMWcVaJu 2XymcTgCIjPCqJ9PhEvzhZZiNQ+g2pSjDAYyUPesebpeOyh+H7I+cPM9vdXYTmW9FCDZ QCsnzGHi7pCDoBVr4Id2lb1QeFQQ00gG5AzMYNOxeUHAMnu/nG+JyiDaDPkk1bU1qHzf XXLpmg3jKViTi2HWMzkqpS5o8ynowUQ4FBRQg12pGcthy4qC5+FALHENw5UBpMvhCFX4 hrsg== X-Gm-Message-State: AOAM531L6rRDH8Gyd7FPQghpPR65ok9mBzDpNGGUVp4tzvL8HK8bzm16 /iSyqv9aUIvDomRo6+hxsqCWUoyVemk= X-Google-Smtp-Source: ABdhPJyFF7sifE5qNrhf5+H6t8vr/rAnN6tqeGXRZZ8pkmEcoxvVRexLtzLd4+FiSrjM1gU/hrW3n+5eQsY= X-Received: from weiwan.svl.corp.google.com ([2620:15c:2c4:201:a9d9:bcda:fa5:99c6]) (user=weiwan job=sendgmr) by 2002:a25:4cc3:: with SMTP id z186mr1290486yba.212.1632936316728; Wed, 29 Sep 2021 10:25:16 -0700 (PDT) Date: Wed, 29 Sep 2021 10:25:11 -0700 In-Reply-To: <20210929172513.3930074-1-weiwan@google.com> Message-Id: <20210929172513.3930074-2-weiwan@google.com> Mime-Version: 1.0 References: <20210929172513.3930074-1-weiwan@google.com> X-Mailer: git-send-email 2.33.0.685.g46640cef36-goog Subject: [PATCH v3 net-next 1/3] net: add new socket option SO_RESERVE_MEM From: Wei Wang To: "'David S . Miller'" , Jakub Kicinski , netdev@vger.kernel.org Cc: Eric Dumazet , Shakeel Butt Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org This socket option provides a mechanism for users to reserve a certain amount of memory for the socket to use. When this option is set, kernel charges the user specified amount of memory to memcg, as well as sk_forward_alloc. This amount of memory is not reclaimable and is available in sk_forward_alloc for this socket. With this socket option set, the networking stack spends less cycles doing forward alloc and reclaim, which should lead to better system performance, with the cost of an amount of pre-allocated and unreclaimable memory, even under memory pressure. Note: This socket option is only available when memory cgroup is enabled and we require this reserved memory to be charged to the user's memcg. We hope this could avoid mis-behaving users to abused this feature to reserve a large amount on certain sockets and cause unfairness for others. Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet --- include/net/sock.h | 44 +++++++++++++++++--- include/uapi/asm-generic/socket.h | 2 + net/core/sock.c | 69 +++++++++++++++++++++++++++++++ net/core/stream.c | 2 +- net/ipv4/af_inet.c | 2 +- 5 files changed, 112 insertions(+), 7 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 879980de3dcd..447fddb384a4 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -269,6 +269,7 @@ struct bpf_local_storage; * @sk_omem_alloc: "o" is "option" or "other" * @sk_wmem_queued: persistent queue size * @sk_forward_alloc: space allocated forward + * @sk_reserved_mem: space reserved and non-reclaimable for the socket * @sk_napi_id: id of the last napi context to receive data for sk * @sk_ll_usec: usecs to busypoll when there is no data * @sk_allocation: allocation mode @@ -409,6 +410,7 @@ struct sock { #define sk_rmem_alloc sk_backlog.rmem_alloc int sk_forward_alloc; + u32 sk_reserved_mem; #ifdef CONFIG_NET_RX_BUSY_POLL unsigned int sk_ll_usec; /* ===== mostly read cache line ===== */ @@ -1511,20 +1513,49 @@ sk_rmem_schedule(struct sock *sk, struct sk_buff *skb, int size) skb_pfmemalloc(skb); } +static inline int sk_unused_reserved_mem(const struct sock *sk) +{ + int unused_mem; + + if (likely(!sk->sk_reserved_mem)) + return 0; + + unused_mem = sk->sk_reserved_mem - sk->sk_wmem_queued - + atomic_read(&sk->sk_rmem_alloc); + + return unused_mem > 0 ? unused_mem : 0; +} + static inline void sk_mem_reclaim(struct sock *sk) { + int reclaimable; + if (!sk_has_account(sk)) return; - if (sk->sk_forward_alloc >= SK_MEM_QUANTUM) - __sk_mem_reclaim(sk, sk->sk_forward_alloc); + + reclaimable = sk->sk_forward_alloc - sk_unused_reserved_mem(sk); + + if (reclaimable >= SK_MEM_QUANTUM) + __sk_mem_reclaim(sk, reclaimable); +} + +static inline void sk_mem_reclaim_final(struct sock *sk) +{ + sk->sk_reserved_mem = 0; + sk_mem_reclaim(sk); } static inline void sk_mem_reclaim_partial(struct sock *sk) { + int reclaimable; + if (!sk_has_account(sk)) return; - if (sk->sk_forward_alloc > SK_MEM_QUANTUM) - __sk_mem_reclaim(sk, sk->sk_forward_alloc - 1); + + reclaimable = sk->sk_forward_alloc - sk_unused_reserved_mem(sk); + + if (reclaimable > SK_MEM_QUANTUM) + __sk_mem_reclaim(sk, reclaimable - 1); } static inline void sk_mem_charge(struct sock *sk, int size) @@ -1536,9 +1567,12 @@ static inline void sk_mem_charge(struct sock *sk, int size) static inline void sk_mem_uncharge(struct sock *sk, int size) { + int reclaimable; + if (!sk_has_account(sk)) return; sk->sk_forward_alloc += size; + reclaimable = sk->sk_forward_alloc - sk_unused_reserved_mem(sk); /* Avoid a possible overflow. * TCP send queues can make this happen, if sk_mem_reclaim() @@ -1547,7 +1581,7 @@ static inline void sk_mem_uncharge(struct sock *sk, int size) * If we reach 2 MBytes, reclaim 1 MBytes right now, there is * no need to hold that much forward allocation anyway. */ - if (unlikely(sk->sk_forward_alloc >= 1 << 21)) + if (unlikely(reclaimable >= 1 << 21)) __sk_mem_reclaim(sk, 1 << 20); } diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h index 1f0a2b4864e4..c77a1313b3b0 100644 --- a/include/uapi/asm-generic/socket.h +++ b/include/uapi/asm-generic/socket.h @@ -126,6 +126,8 @@ #define SO_BUF_LOCK 72 +#define SO_RESERVE_MEM 73 + #if !defined(__KERNEL__) #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__)) diff --git a/net/core/sock.c b/net/core/sock.c index 512e629f9780..0ecb8590e043 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -947,6 +947,53 @@ void sock_set_mark(struct sock *sk, u32 val) } EXPORT_SYMBOL(sock_set_mark); +static void sock_release_reserved_memory(struct sock *sk, int bytes) +{ + /* Round down bytes to multiple of pages */ + bytes &= ~(SK_MEM_QUANTUM - 1); + + WARN_ON(bytes > sk->sk_reserved_mem); + sk->sk_reserved_mem -= bytes; + sk_mem_reclaim(sk); +} + +static int sock_reserve_memory(struct sock *sk, int bytes) +{ + long allocated; + bool charged; + int pages; + + if (!mem_cgroup_sockets_enabled || !sk->sk_memcg) + return -EOPNOTSUPP; + + if (!bytes) + return 0; + + pages = sk_mem_pages(bytes); + + /* pre-charge to memcg */ + charged = mem_cgroup_charge_skmem(sk->sk_memcg, pages, + GFP_KERNEL | __GFP_RETRY_MAYFAIL); + if (!charged) + return -ENOMEM; + + /* pre-charge to forward_alloc */ + allocated = sk_memory_allocated_add(sk, pages); + /* If the system goes into memory pressure with this + * precharge, give up and return error. + */ + if (allocated > sk_prot_mem_limits(sk, 1)) { + sk_memory_allocated_sub(sk, pages); + mem_cgroup_uncharge_skmem(sk->sk_memcg, pages); + return -ENOMEM; + } + sk->sk_forward_alloc += pages << SK_MEM_QUANTUM_SHIFT; + + sk->sk_reserved_mem += pages << SK_MEM_QUANTUM_SHIFT; + + return 0; +} + /* * This is meant for all protocols to use and covers goings on * at the socket level. Everything here is generic. @@ -1367,6 +1414,23 @@ int sock_setsockopt(struct socket *sock, int level, int optname, ~SOCK_BUF_LOCK_MASK); break; + case SO_RESERVE_MEM: + { + int delta; + + if (val < 0) { + ret = -EINVAL; + break; + } + + delta = val - sk->sk_reserved_mem; + if (delta < 0) + sock_release_reserved_memory(sk, -delta); + else + ret = sock_reserve_memory(sk, delta); + break; + } + default: ret = -ENOPROTOOPT; break; @@ -1733,6 +1797,10 @@ int sock_getsockopt(struct socket *sock, int level, int optname, v.val = sk->sk_userlocks & SOCK_BUF_LOCK_MASK; break; + case SO_RESERVE_MEM: + v.val = sk->sk_reserved_mem; + break; + default: /* We implement the SO_SNDLOWAT etc to not be settable * (1003.1g 7). @@ -2045,6 +2113,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority) newsk->sk_dst_pending_confirm = 0; newsk->sk_wmem_queued = 0; newsk->sk_forward_alloc = 0; + newsk->sk_reserved_mem = 0; atomic_set(&newsk->sk_drops, 0); newsk->sk_send_head = NULL; newsk->sk_userlocks = sk->sk_userlocks & ~SOCK_BINDPORT_LOCK; diff --git a/net/core/stream.c b/net/core/stream.c index 4f1d4aa5fb38..e09ffd410685 100644 --- a/net/core/stream.c +++ b/net/core/stream.c @@ -202,7 +202,7 @@ void sk_stream_kill_queues(struct sock *sk) WARN_ON(!skb_queue_empty(&sk->sk_write_queue)); /* Account for returned memory. */ - sk_mem_reclaim(sk); + sk_mem_reclaim_final(sk); WARN_ON(sk->sk_wmem_queued); WARN_ON(sk->sk_forward_alloc); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 40558033f857..2fc6074583a4 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -135,7 +135,7 @@ void inet_sock_destruct(struct sock *sk) __skb_queue_purge(&sk->sk_receive_queue); __skb_queue_purge(&sk->sk_error_queue); - sk_mem_reclaim(sk); + sk_mem_reclaim_final(sk); if (sk->sk_type == SOCK_STREAM && sk->sk_state != TCP_CLOSE) { pr_err("Attempt to release TCP socket in state %d %p\n", From patchwork Wed Sep 29 17:25:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 12526405 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70B10C433EF for ; Wed, 29 Sep 2021 17:25:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 50FF26147F for ; Wed, 29 Sep 2021 17:25:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344626AbhI2R1B (ORCPT ); Wed, 29 Sep 2021 13:27:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344525AbhI2R1A (ORCPT ); Wed, 29 Sep 2021 13:27:00 -0400 Received: from mail-qk1-x749.google.com (mail-qk1-x749.google.com [IPv6:2607:f8b0:4864:20::749]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F035C06161C for ; Wed, 29 Sep 2021 10:25:19 -0700 (PDT) Received: by mail-qk1-x749.google.com with SMTP id h4-20020a05620a244400b004334ede5036so10256249qkn.13 for ; Wed, 29 Sep 2021 10:25:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=iAHVRqgozoYYpjblOI/kZSepQ+x+Hd8y/EKzWq06DbY=; b=m1CaQAzy2Jdcn3/3CNuFvK0NkM7WFAZwf1g+bHdx+8sSHAEEqFrkKfiFXKW9u1QNIl n61uy4KlA4LJX5IhqBT11fyJ+LdCoMAFxX12HqY3uWbPNpPwy3vuo7Av76IrEQW+Y2Nx jzepVFLAyZH8Sbjrgsh/onXA1TGyLS0HX1+5IYu/8o4/yipSDeOKkpcLQo/4AJiGeHZi L+/bovsIuhgXFcaSZgyldxfWNJ1oai1h3r3u4k8R0IaqJ0iLopAZIbcc1XOtQEC7k92u 0wNzxEJkKld1y9Vlqob3i3b1Lckkvqc94vVae4WMg1OMQ9pUEwX774puG8HmPFj+dQwS S4oA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=iAHVRqgozoYYpjblOI/kZSepQ+x+Hd8y/EKzWq06DbY=; b=hbvYlZd9ZCKsr/oAMyvq/NQorUpGbJCZNCOEaCyKcSNuFI7E9BKPsBQHeBHk+EUrl7 tHsLab/Nku3irZz6GVhF8ipErPOVWHTRUqPe+jjTx+d6We2qQPGuCrjDe7UF46b247WQ 9MABUAPI/fu+0t1/cmKNI3koUC+/DMD8Nd3k0OYX8SUq6r4ww+TA0tgX2cArbpWkOB4R S7iQxFno88vusfaZh0gOA2qdNOO1wpbRFKoxfeyFVx+sroWRIs3NTbhpREZOJKwIOYUE LtaxUSskIYwnqK64H1R8Aq4/CPl5I+PrvwuKDf4S5Jz54yJtBWL3X7yoqn9QQtbGbbI4 UlvQ== X-Gm-Message-State: AOAM530uIwifKgreyu4ghYFZEO5pKBAhFrIiuKqc4Xm9Tl2gX7cyB7+W idh+dCwhKTVB4fDtOyerdH0s3S0faJQ= X-Google-Smtp-Source: ABdhPJwsBycipuI9jSfQLa7i6b0U2tjqMnG7hNBlpe5TkR9rDT/iBPbFrKWA55ygDlyDtwNix6iRoeZvBGk= X-Received: from weiwan.svl.corp.google.com ([2620:15c:2c4:201:a9d9:bcda:fa5:99c6]) (user=weiwan job=sendgmr) by 2002:ad4:43c6:: with SMTP id o6mr855288qvs.12.1632936318335; Wed, 29 Sep 2021 10:25:18 -0700 (PDT) Date: Wed, 29 Sep 2021 10:25:12 -0700 In-Reply-To: <20210929172513.3930074-1-weiwan@google.com> Message-Id: <20210929172513.3930074-3-weiwan@google.com> Mime-Version: 1.0 References: <20210929172513.3930074-1-weiwan@google.com> X-Mailer: git-send-email 2.33.0.685.g46640cef36-goog Subject: [PATCH v3 net-next 2/3] tcp: adjust sndbuf according to sk_reserved_mem From: Wei Wang To: "'David S . Miller'" , Jakub Kicinski , netdev@vger.kernel.org Cc: Eric Dumazet , Shakeel Butt Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org If user sets SO_RESERVE_MEM socket option, in order to fully utilize the reserved memory in memory pressure state on the tx path, we modify the logic in sk_stream_moderate_sndbuf() to set sk_sndbuf according to available reserved memory, instead of MIN_SOCK_SNDBUF, and adjust it when new data is acked. Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet --- include/net/sock.h | 1 + net/ipv4/tcp_input.c | 14 ++++++++++++-- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 447fddb384a4..c3af696258fe 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2378,6 +2378,7 @@ static inline void sk_stream_moderate_sndbuf(struct sock *sk) return; val = min(sk->sk_sndbuf, sk->sk_wmem_queued >> 1); + val = max_t(u32, val, sk_unused_reserved_mem(sk)); WRITE_ONCE(sk->sk_sndbuf, max_t(u32, val, SOCK_MIN_SNDBUF)); } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 53675e284841..06020395cc8d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5380,7 +5380,7 @@ static int tcp_prune_queue(struct sock *sk) return -1; } -static bool tcp_should_expand_sndbuf(const struct sock *sk) +static bool tcp_should_expand_sndbuf(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); @@ -5391,8 +5391,18 @@ static bool tcp_should_expand_sndbuf(const struct sock *sk) return false; /* If we are under global TCP memory pressure, do not expand. */ - if (tcp_under_memory_pressure(sk)) + if (tcp_under_memory_pressure(sk)) { + int unused_mem = sk_unused_reserved_mem(sk); + + /* Adjust sndbuf according to reserved mem. But make sure + * it never goes below SOCK_MIN_SNDBUF. + * See sk_stream_moderate_sndbuf() for more details. + */ + if (unused_mem > SOCK_MIN_SNDBUF) + WRITE_ONCE(sk->sk_sndbuf, unused_mem); + return false; + } /* If we are under soft global TCP memory pressure, do not expand. */ if (sk_memory_allocated(sk) >= sk_prot_mem_limits(sk, 0)) From patchwork Wed Sep 29 17:25:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 12526407 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45100C433EF for ; Wed, 29 Sep 2021 17:25:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2BB3E6147F for ; Wed, 29 Sep 2021 17:25:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344620AbhI2R1I (ORCPT ); Wed, 29 Sep 2021 13:27:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56694 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344622AbhI2R1B (ORCPT ); Wed, 29 Sep 2021 13:27:01 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B91FAC061764 for ; Wed, 29 Sep 2021 10:25:20 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id x1-20020a056902102100b005b6233ad6b5so4418722ybt.6 for ; Wed, 29 Sep 2021 10:25:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=CvGJ4Z6Bljun4/eGYY7fseGY0GSY3h9OiPiupWWA5tM=; b=GmVWBnuZOe5ZD8VYrxEiLtAht1qo7Y2Bl7bNPQzenuVHdWV54pCQPSAKq7nzw5xYJo L4OmmU4Ik5NsjNGO2gCeWsM/7J4yYIZC1oJ8zAiTXwckp0n8KM4CUwHVjmATzSQjMAsA o8VU8geh9+p1mRuN6nfF1XomVTdMqC6jJP1Df1E51+81/OGWwPr88eS9m8KCcxJksv1a OyU8X+1Vp8+jOUZOoCXJshEmfT5rJEPgdoxCliRSKt0wo5SrrlLkbNTC5vP97yVFiZRh bXGXqC8GOqPDz3hHbWmqmp+uTrz/NPN4bei7JGfXjGVXsJdm+Ztb6uVL8MevYj95Oc0g wiaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=CvGJ4Z6Bljun4/eGYY7fseGY0GSY3h9OiPiupWWA5tM=; b=K/3bUF6txuDiwWgNHhmpS82JcoyP1CPq4SJ9rhv0BkB/Q7QDx0O1avbcl4WlJ0PTRx Z0LyOIJ6tEMQ9gqdrgRLJjBZ9rro1HWNCaBoKBzGxqlEBBGOqCT6TfSCDpQ7XGkLXl5i 23weUL5lDz+VdV8tFOcm/BkBSrWi6yRnbYRS0Kk/c7ZAtQqBQPvwKVg3l4o94pS1DvzN SL7ysRLvBoCandJcnS6WwaYpin0982kIGQ7+xUffg2rUhxpk+fTWy/b1JT5eL28f0vt9 cZ2UrvvPVtrMWRL3UTIKRJ9LXlfCV3MLiZmtmudi91gVvn4++7xo+54ts/JEwvdY3DFD +YbA== X-Gm-Message-State: AOAM5319HANHPM8MXW54uO7OQIUnCKkUHKJU40aXP+rCzHagDJBQLhw1 8w/PCtVQ0bIVfMdsx9DebvPpNmNlOe4= X-Google-Smtp-Source: ABdhPJyirv41GoyDFtvThYOZQFwQH03dbIYyS2zrfDcFmCaW02hz7uwzGedyyfegq3IedWEltsm2ng9JT4s= X-Received: from weiwan.svl.corp.google.com ([2620:15c:2c4:201:a9d9:bcda:fa5:99c6]) (user=weiwan job=sendgmr) by 2002:a25:59d5:: with SMTP id n204mr1115249ybb.189.1632936320022; Wed, 29 Sep 2021 10:25:20 -0700 (PDT) Date: Wed, 29 Sep 2021 10:25:13 -0700 In-Reply-To: <20210929172513.3930074-1-weiwan@google.com> Message-Id: <20210929172513.3930074-4-weiwan@google.com> Mime-Version: 1.0 References: <20210929172513.3930074-1-weiwan@google.com> X-Mailer: git-send-email 2.33.0.685.g46640cef36-goog Subject: [PATCH v3 net-next 3/3] tcp: adjust rcv_ssthresh according to sk_reserved_mem From: Wei Wang To: "'David S . Miller'" , Jakub Kicinski , netdev@vger.kernel.org Cc: Eric Dumazet , Shakeel Butt Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org When user sets SO_RESERVE_MEM socket option, in order to utilize the reserved memory when in memory pressure state, we adjust rcv_ssthresh according to the available reserved memory for the socket, instead of using 4 * advmss always. Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet --- include/net/tcp.h | 11 +++++++++++ net/ipv4/tcp_input.c | 12 ++++++++++-- net/ipv4/tcp_output.c | 3 +-- 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 32cf6c01f403..4c2898ac6569 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1421,6 +1421,17 @@ static inline int tcp_full_space(const struct sock *sk) return tcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf)); } +static inline void tcp_adjust_rcv_ssthresh(struct sock *sk) +{ + int unused_mem = sk_unused_reserved_mem(sk); + struct tcp_sock *tp = tcp_sk(sk); + + tp->rcv_ssthresh = min(tp->rcv_ssthresh, 4U * tp->advmss); + if (unused_mem) + tp->rcv_ssthresh = max_t(u32, tp->rcv_ssthresh, + tcp_win_from_space(sk, unused_mem)); +} + void tcp_cleanup_rbuf(struct sock *sk, int copied); /* We provision sk_rcvbuf around 200% of sk_rcvlowat. diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 06020395cc8d..246ab7b5e857 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -500,8 +500,11 @@ static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb, room = min_t(int, tp->window_clamp, tcp_space(sk)) - tp->rcv_ssthresh; + if (room <= 0) + return; + /* Check #1 */ - if (room > 0 && !tcp_under_memory_pressure(sk)) { + if (!tcp_under_memory_pressure(sk)) { unsigned int truesize = truesize_adjust(adjust, skb); int incr; @@ -518,6 +521,11 @@ static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb, tp->rcv_ssthresh += min(room, incr); inet_csk(sk)->icsk_ack.quick |= 1; } + } else { + /* Under pressure: + * Adjust rcv_ssthresh according to reserved mem + */ + tcp_adjust_rcv_ssthresh(sk); } } @@ -5345,7 +5353,7 @@ static int tcp_prune_queue(struct sock *sk) if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf) tcp_clamp_window(sk); else if (tcp_under_memory_pressure(sk)) - tp->rcv_ssthresh = min(tp->rcv_ssthresh, 4U * tp->advmss); + tcp_adjust_rcv_ssthresh(sk); if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf) return 0; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index fdc39b4fbbfa..3a01e5593a17 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2967,8 +2967,7 @@ u32 __tcp_select_window(struct sock *sk) icsk->icsk_ack.quick = 0; if (tcp_under_memory_pressure(sk)) - tp->rcv_ssthresh = min(tp->rcv_ssthresh, - 4U * tp->advmss); + tcp_adjust_rcv_ssthresh(sk); /* free_space might become our new window, make sure we don't * increase it due to wscale.