From patchwork Thu Apr 18 21:45:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 13635463 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39A88181BB4 for ; Thu, 18 Apr 2024 21:46:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713476765; cv=none; b=XQdh2o/SbdS+u+w4Fa5HxbdN53pnH9E5ZJSEYsP/a65tRmu9wy7afPcsC1N/SR49gDsRRazaoo3mU1gVWS9OgL9XbUZwXw+YNd3jS9rK42iG5KbRvJmYLBCZ40IQH4uvW0nrO9k3++usLLh3liVO+rDkyFThPDRpdQ3NyphHBw8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713476765; c=relaxed/simple; bh=0QvbYUCmS3Y5yWtcw8fAcmhNLv/CIBi03WYGD8JyacQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=adUCfmdTnTJ48xgekDOsEeYwOYO5ZqpgQuDviWOduBk0wWZEB9sdimA/ZyVUszPgCbUPhQ2LFRzFqtz0OmLVsF7OjuazKfKXmH9UKJTXbwqnLWHxKNeQ515khxPMYxZq0bVwPwPtmDn7UxpsPq4llcg3Hr79S9CbNeaGj901tvw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--edumazet.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=PnEFmimC; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--edumazet.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PnEFmimC" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-de45c577ca9so2118962276.0 for ; Thu, 18 Apr 2024 14:46:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1713476763; x=1714081563; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LpU3XVyHGTC+Qdd3+I9IE7FQBPQE6c+vlk4oBmqR0ww=; b=PnEFmimCPfsD8AOlqq82ggbSUxsUASB6GF7zZvKej5X9d/5nsYPBZ8WMJKoZg7BdLu FF80269mMZUYTljM2Vu2Rn9OOgPtvPOUwPOuiTRNxleUzP5HxJNEvZZIXc4z74fb8BDO cSbLoOVvLOZt4pLNo9axhVQp+6HBhWESQ0UT1eJjyEtd2BqiI9Q6uH5rqgWY/CIgu+op IRYWxxTo98kFxmr2gFTjLUFW7bGzm6/9meSZKz/81AVYqfOrb1EJN6VWGuxi7ibMXsSe i3qL0C9x0fxFbsoPrJqcdUbTw04d9cIgz6FSGFycApEvBkoaqK4o+khzFSLUi7DlJjqp K/Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713476763; x=1714081563; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LpU3XVyHGTC+Qdd3+I9IE7FQBPQE6c+vlk4oBmqR0ww=; b=IUuk2Gcvros9RGfRTk9afJ3JL9Z4aKkY72sYGabINi0yJ+eM975wbIfysYmSEjMZhk P6NPjUEMouY1/jefFNEk5hHosGhMRHYfDECQqe3W6qXikz6MP8gnDYiZTusYOAyufas9 mhDSuWpUUFFm9FZkAJGw2D6Z1oMqEBojr5wXs6WJ7xYGX5702iwKpbBz8lfFONDj1cDL wKMwvfnImU6blv73Q8iwFcUDIvizAJ4cmlRrobGIzGqLxe60BawsCo0Bu+CpoQBmOded fwXmNnYOAtwqoziYM9aECGpG7z1rrlVacjL8REAgqvQs8Y2hEYT/1ym5TbjVNM8GBna4 g6yg== X-Gm-Message-State: AOJu0Yxrna5+2uCGsr3SLH93hhZpCHTwwIlaEpdU+8Cn2VJrUENX8hCG BURQIktqKFiPhGGvykLmBsqVDt/YAoqWKjGNr23sK2fPxj9DxlRdUKwM9ifOA3BcsHNpiEanE88 q17aL3pjfug== X-Google-Smtp-Source: AGHT+IFSkgPN0SqquCnvmyRvDrQnzryu+nYG++w5ADfmWVpWDqCFXElFWt2JVWMGxllq8Yil3N3S9INF5edxDA== X-Received: from edumazet1.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:395a]) (user=edumazet job=sendgmr) by 2002:a25:2906:0:b0:dc6:b7c2:176e with SMTP id p6-20020a252906000000b00dc6b7c2176emr791820ybp.4.1713476763277; Thu, 18 Apr 2024 14:46:03 -0700 (PDT) Date: Thu, 18 Apr 2024 21:45:58 +0000 In-Reply-To: <20240418214600.1291486-1-edumazet@google.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240418214600.1291486-1-edumazet@google.com> X-Mailer: git-send-email 2.44.0.769.g3c40516874-goog Message-ID: <20240418214600.1291486-2-edumazet@google.com> Subject: [PATCH net-next 1/3] tcp: remove dubious FIN exception from tcp_cwnd_test() From: Eric Dumazet To: "David S . Miller" , Jakub Kicinski , Paolo Abeni Cc: netdev@vger.kernel.org, Neal Cardwell , Kevin Yang , eric.dumazet@gmail.com, Eric Dumazet X-Patchwork-Delegate: kuba@kernel.org tcp_cwnd_test() has a special handing for the last packet in the write queue if it is smaller than one MSS and has the FIN flag. This is in violation of TCP RFC, and seems quite dubious. This packet can be sent only if the current CWND is bigger than the number of packets in flight. Making tcp_cwnd_test() result independent of the first skb in the write queue is needed for the last patch of the series. Signed-off-by: Eric Dumazet --- net/ipv4/tcp_output.c | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 61119d42b0fd27a3736e136b1e81f6fc2d4cb44b..acbc76ca3e640354880c62c2423cfe4ba99f0be3 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2073,16 +2073,10 @@ static unsigned int tcp_mss_split_point(const struct sock *sk, /* Can at least one segment of SKB be sent right now, according to the * congestion window rules? If so, return how many segments are allowed. */ -static inline unsigned int tcp_cwnd_test(const struct tcp_sock *tp, - const struct sk_buff *skb) +static u32 tcp_cwnd_test(const struct tcp_sock *tp) { u32 in_flight, cwnd, halfcwnd; - /* Don't be strict about the congestion window for the final FIN. */ - if ((TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) && - tcp_skb_pcount(skb) == 1) - return 1; - in_flight = tcp_packets_in_flight(tp); cwnd = tcp_snd_cwnd(tp); if (in_flight >= cwnd) @@ -2706,10 +2700,9 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; unsigned int tso_segs, sent_pkts; - int cwnd_quota; + u32 cwnd_quota, max_segs; int result; bool is_cwnd_limited = false, is_rwnd_limited = false; - u32 max_segs; sent_pkts = 0; @@ -2743,7 +2736,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, tso_segs = tcp_init_tso_segs(skb, mss_now); BUG_ON(!tso_segs); - cwnd_quota = tcp_cwnd_test(tp, skb); + cwnd_quota = tcp_cwnd_test(tp); if (!cwnd_quota) { if (push_one == 2) /* Force out a loss probe pkt. */ @@ -2772,9 +2765,8 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, limit = mss_now; if (tso_segs > 1 && !tcp_urg_mode(tp)) limit = tcp_mss_split_point(sk, skb, mss_now, - min_t(unsigned int, - cwnd_quota, - max_segs), + min(cwnd_quota, + max_segs), nonagle); if (skb->len > limit && From patchwork Thu Apr 18 21:45:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 13635464 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-qt1-f201.google.com (mail-qt1-f201.google.com [209.85.160.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F016E199E96 for ; Thu, 18 Apr 2024 21:46:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713476767; cv=none; b=bWOpDAMdmI2BdClaiFMpyG3AXkXsGBiHE3XsUxTs5KB+DdmLB3JrynZNmZkCrCLpX/xqCf/BFM2+X+eBhhbu3N1S8B5vWn0CJxD3PPsEluVhGdFyNumenrVWdSiopLLxs9P0OqwgH9yPC9jHV7uru9pvMzQ4+SHOFA7P9vMh2WQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713476767; c=relaxed/simple; bh=UbcqAUt0buDx1m+bLQbQfLd0qDYQeA8peMo6KyKWdDo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Vxm1Uvy2mwBGtbSar/CleMLsledROswKr2VV5vyJNJ8OwGdKMMxo3TSzhx6xmCW7+wXX1wRmUT/UFVW+URli2T2V2ReAsj2tjHiJub39dkO7SW6iOcYU0dUEHMi6BchrA28tG9mKjWhDn7zMJQTxc5ZL3rFUJ2L80hK76b1MjHg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--edumazet.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=hIl27oMt; arc=none smtp.client-ip=209.85.160.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--edumazet.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="hIl27oMt" Received: by mail-qt1-f201.google.com with SMTP id d75a77b69052e-4348f151299so16596271cf.2 for ; Thu, 18 Apr 2024 14:46:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1713476765; x=1714081565; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=82+5W3OIOaAlA1csuN1KdYSS+L3cimNiDraSrsBf0sE=; b=hIl27oMtAEeVayRDFKtbIpiFYqGon2cJ3rbc703U7RjDyxdFK1+hydpX90mlzb2nXq EV75nG8iMR09pUT5mEr266C4EmAJsE92pKmIaLDhwqW/G9hK9cO0Q7Az2hGbRmvNhlhe bivgIgdlC3LxEWlN0EOFfF4OnpA6KbIggfHNcsjDKuE2vDEyAyRo2BQi6bc3bhmtAsRy 6p1cvRl6zjKOQ+WhRvXDWbxWxqeNXj6J2v5Fx0ft7vRSAo4q5xptTQ1C1rGLX/JAiVBh d1JgP1/Tr9auo7dvypxOMJtQTu393p252iZYuDvcUbY86tM4FVvG/511d78BK4xPyTSj iLzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713476765; x=1714081565; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=82+5W3OIOaAlA1csuN1KdYSS+L3cimNiDraSrsBf0sE=; b=mRn8r1hR8ZzvdYHbpjeebNUie0gkbH4Rmyjh+u6OFGdRyPmmq6rPlJA79zX1vHpTVX kYQ3P+aOFLeEp1jwizigA8Du7SfOuTEgw+z1so5h6a12tUanTSZe7k2I1kY6S3FHCiUg lppWasvGkGWqKpSaElA8+U0YYDQnBSmnKGGD5BVBPc82XNdpFaW09ZW52bYn2IsaPIkM /x3sSLQGpM7RXu/l0kCrpLZ5OWjnxcQXSyQIbSzIkeNwtX99+KQhaHADhkoiIFhljtMz UruO9RFO7NYeAkZga8S0CP1TcV/33WQWyf8fYhOeLg65ONcdkRSEJOqRhxV1zQc59SLw zKAw== X-Gm-Message-State: AOJu0YzqVDZSesywRCu7tHJL9/aImOFoMH8Ox7b3LThPJ/DGjvBsy8Tq 1mZto91VpBZcyHjJeLDicu1Ka2WKV+bw0VqIPqoJ4xy4EH+G8AFfymZMUdfBxFJiYK5Cpo/SWYA EKY4mWRO53w== X-Google-Smtp-Source: AGHT+IG39R1phMrLrUc60jKi4ZBSuKDf9gvanZsV4ZHpp/BLuHYNAL8e9IYmIeEaLLC/bZHjAZ2V26goSGgT1g== X-Received: from edumazet1.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:395a]) (user=edumazet job=sendgmr) by 2002:a05:622a:486:b0:437:acca:c692 with SMTP id p6-20020a05622a048600b00437accac692mr13168qtx.10.1713476764949; Thu, 18 Apr 2024 14:46:04 -0700 (PDT) Date: Thu, 18 Apr 2024 21:45:59 +0000 In-Reply-To: <20240418214600.1291486-1-edumazet@google.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240418214600.1291486-1-edumazet@google.com> X-Mailer: git-send-email 2.44.0.769.g3c40516874-goog Message-ID: <20240418214600.1291486-3-edumazet@google.com> Subject: [PATCH net-next 2/3] tcp: call tcp_set_skb_tso_segs() from tcp_write_xmit() From: Eric Dumazet To: "David S . Miller" , Jakub Kicinski , Paolo Abeni Cc: netdev@vger.kernel.org, Neal Cardwell , Kevin Yang , eric.dumazet@gmail.com, Eric Dumazet X-Patchwork-Delegate: kuba@kernel.org tcp_write_xmit() calls tcp_init_tso_segs() to set gso_size and gso_segs on the packet. tcp_init_tso_segs() requires the stack to maintain an up to date tcp_skb_pcount(), and this makes sense for packets in rtx queue. Not so much for packets still in the write queue. In the following patch, we don't want to deal with tcp_skb_pcount() when moving payload from 2nd skb to 1st skb in the write queue. Signed-off-by: Eric Dumazet --- net/ipv4/tcp_output.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index acbc76ca3e640354880c62c2423cfe4ba99f0be3..5e8665241f9345f38ce56afffe473948aef66786 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1502,18 +1502,22 @@ static void tcp_queue_skb(struct sock *sk, struct sk_buff *skb) } /* Initialize TSO segments for a packet. */ -static void tcp_set_skb_tso_segs(struct sk_buff *skb, unsigned int mss_now) +static int tcp_set_skb_tso_segs(struct sk_buff *skb, unsigned int mss_now) { + int tso_segs; + if (skb->len <= mss_now) { /* Avoid the costly divide in the normal * non-TSO case. */ - tcp_skb_pcount_set(skb, 1); TCP_SKB_CB(skb)->tcp_gso_size = 0; - } else { - tcp_skb_pcount_set(skb, DIV_ROUND_UP(skb->len, mss_now)); - TCP_SKB_CB(skb)->tcp_gso_size = mss_now; + tcp_skb_pcount_set(skb, 1); + return 1; } + TCP_SKB_CB(skb)->tcp_gso_size = mss_now; + tso_segs = DIV_ROUND_UP(skb->len, mss_now); + tcp_skb_pcount_set(skb, tso_segs); + return tso_segs; } /* Pcount in the middle of the write queue got changed, we need to do various @@ -2097,10 +2101,9 @@ static int tcp_init_tso_segs(struct sk_buff *skb, unsigned int mss_now) { int tso_segs = tcp_skb_pcount(skb); - if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now)) { - tcp_set_skb_tso_segs(skb, mss_now); - tso_segs = tcp_skb_pcount(skb); - } + if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now)) + return tcp_set_skb_tso_segs(skb, mss_now); + return tso_segs; } @@ -2733,9 +2736,6 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, if (tcp_pacing_check(sk)) break; - tso_segs = tcp_init_tso_segs(skb, mss_now); - BUG_ON(!tso_segs); - cwnd_quota = tcp_cwnd_test(tp); if (!cwnd_quota) { if (push_one == 2) @@ -2745,6 +2745,8 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, break; } + tso_segs = tcp_set_skb_tso_segs(skb, mss_now); + if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now))) { is_rwnd_limited = true; break; From patchwork Thu Apr 18 21:46:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 13635465 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68CEB181BB4 for ; Thu, 18 Apr 2024 21:46:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713476769; cv=none; b=WMSSIcSkbs7ZS3ZbxKEaOKLG4Yl0JG5FHNF+kMyyDtpgRW2Yagr0ttjZffZQukeVJNbmWj7uAZrsgJy8Hs8gBEp17en1JlaFKwxu2GTzGgd6dDgPQEwxs+Wby8L192lVettR60Jz7+A8PI1J/vaaMWrr04lo4O78ZrtnAq8Yb9k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713476769; c=relaxed/simple; bh=hDoVCfVGk0Dq+gFBeIcdTNjdbsNVfV9gcPK8H+EsmE0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=O7CbZ8UcQeH4VBfNTsWaniHXw5ANHHiHKuGd9VPea5RhhHz4CumXzpJFf1I/zpblM3+nxNfcfMeH5lHEmV6uyRpydAaS1n46YCCyVZr///RpndW+a/QpqYfftKzT3shc7OVtAEM8wqfGRCmiOCI2gISH2dVwBN9SwCuMVrk2LOw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--edumazet.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0ii+twmX; arc=none smtp.client-ip=209.85.219.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--edumazet.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0ii+twmX" Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-de45d510553so2488988276.2 for ; Thu, 18 Apr 2024 14:46:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1713476766; x=1714081566; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uuKXNQgTOGJ33cxvyO7WQb/YYkRaGLS5jfInmYFvfRY=; b=0ii+twmX8jHQXTRIoodybihRj3kMmlW4FZxTBow9uoOl9JKgzhIKgD3Hon8suptmSj 4FfIxd1Ax6LftoqM32RyNnDereKK4P/dudpHLD/niUbCjo7QVo6xfc8wo7Mnc4ed1K/Z hbMWoCMKuiSCHO7ozrO963XtkJiQng2qgCh5NxrWL5bAC2GW5lH6sscygUAGb7n03WWh Z8huwVf9JAxg7aBBcUhlnycbqkE7d3Elc85nCGgYMWDSwMq1rqSj71ID59GTEMM8UnGH NNPpnYfFWVSeG38dUR+QkhRrco6YT1GmzII6ndB94ewOANxXrA8RBzEUZwOkpeXdg3pf JtHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713476766; x=1714081566; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uuKXNQgTOGJ33cxvyO7WQb/YYkRaGLS5jfInmYFvfRY=; b=E9bh34fG7wmsurRQjGd3l8tc72Db4demltDF3Wmq/q/ZED5eJ/tFGTaKGQEYnBBKQK 9cjloAGZSsDcB/ebmbZgKvkcPSUsxjozSu9u6x4uQgBahYgPCxGk0uH2/0TJTLKrYuxW NPyVUCqAI3h7J1WKDTVz/zTIzBRm5BOdMKItD2PpiW2gbe9F5XwQeGHBCkH+/PBZy20F ffCfSMG0GgJ/7kJt1aJZoo/Wz6W/skkHqQtTNE/PkOlO9AdqEo3il+HdLVuuqSqZPP8I o+qTsAL8doyHKqMla6duceuuEqrDVOdroydUQO1ebEy4QdiRi34K3iCuZkRg353WMkXB j0LA== X-Gm-Message-State: AOJu0YzUzNB382VRQNEz+Km121op0h517xCSxc+j0oyyIKJWNTpFh3DI KOxihVwXMYL8eSMkYUi6pmy79U+e9zPgy1ngCiaFxx5SVn/PPeGBz4FDH9kUIYTwfMuMQi2TPBz IThKnnYOFWQ== X-Google-Smtp-Source: AGHT+IGN/6WWg9+UqrvgBrv/nQ1VuyQc3OYMk8TP1SNfo8RqastbNlk5YmGLR6mJTdqb5q6zoPGG/3Z/NEutuA== X-Received: from edumazet1.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:395a]) (user=edumazet job=sendgmr) by 2002:a25:9c83:0:b0:de1:d49:7ff6 with SMTP id y3-20020a259c83000000b00de10d497ff6mr12792ybo.7.1713476766403; Thu, 18 Apr 2024 14:46:06 -0700 (PDT) Date: Thu, 18 Apr 2024 21:46:00 +0000 In-Reply-To: <20240418214600.1291486-1-edumazet@google.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240418214600.1291486-1-edumazet@google.com> X-Mailer: git-send-email 2.44.0.769.g3c40516874-goog Message-ID: <20240418214600.1291486-4-edumazet@google.com> Subject: [PATCH net-next 3/3] tcp: try to send bigger TSO packets From: Eric Dumazet To: "David S . Miller" , Jakub Kicinski , Paolo Abeni Cc: netdev@vger.kernel.org, Neal Cardwell , Kevin Yang , eric.dumazet@gmail.com, Eric Dumazet X-Patchwork-Delegate: kuba@kernel.org While investigating TCP performance, I found that TCP would sometimes send big skbs followed by a single MSS skb, in a 'locked' pattern. For instance, BIG TCP is enabled, MSS is set to have 4096 bytes of payload per segment. gso_max_size is set to 181000. This means that an optimal TCP packet size should contain 44 * 4096 = 180224 bytes of payload, However, I was seeing packets sizes interleaved in this pattern: 172032, 8192, 172032, 8192, 172032, 8192, tcp_tso_should_defer() heuristic is defeated, because after a split of a packet in write queue for whatever reason (this might be a too small CWND or a small enough pacing_rate), the leftover packet in the queue is smaller than the optimal size. It is time to try to make 'leftover packets' bigger so that tcp_tso_should_defer() can give its full potential. After this patch, we can see the following output: 14:13:34.009273 IP6 sender > receiver: Flags [P.], seq 4048380:4098360, ack 1, win 256, options [nop,nop,TS val 3425678144 ecr 1561784500], length 49980 14:13:34.010272 IP6 sender > receiver: Flags [P.], seq 4098360:4148340, ack 1, win 256, options [nop,nop,TS val 3425678145 ecr 1561784501], length 49980 14:13:34.011271 IP6 sender > receiver: Flags [P.], seq 4148340:4198320, ack 1, win 256, options [nop,nop,TS val 3425678146 ecr 1561784502], length 49980 14:13:34.012271 IP6 sender > receiver: Flags [P.], seq 4198320:4248300, ack 1, win 256, options [nop,nop,TS val 3425678147 ecr 1561784503], length 49980 14:13:34.013272 IP6 sender > receiver: Flags [P.], seq 4248300:4298280, ack 1, win 256, options [nop,nop,TS val 3425678148 ecr 1561784504], length 49980 14:13:34.014271 IP6 sender > receiver: Flags [P.], seq 4298280:4348260, ack 1, win 256, options [nop,nop,TS val 3425678149 ecr 1561784505], length 49980 14:13:34.015272 IP6 sender > receiver: Flags [P.], seq 4348260:4398240, ack 1, win 256, options [nop,nop,TS val 3425678150 ecr 1561784506], length 49980 14:13:34.016270 IP6 sender > receiver: Flags [P.], seq 4398240:4448220, ack 1, win 256, options [nop,nop,TS val 3425678151 ecr 1561784507], length 49980 14:13:34.017269 IP6 sender > receiver: Flags [P.], seq 4448220:4498200, ack 1, win 256, options [nop,nop,TS val 3425678152 ecr 1561784508], length 49980 14:13:34.018276 IP6 sender > receiver: Flags [P.], seq 4498200:4548180, ack 1, win 256, options [nop,nop,TS val 3425678153 ecr 1561784509], length 49980 14:13:34.019259 IP6 sender > receiver: Flags [P.], seq 4548180:4598160, ack 1, win 256, options [nop,nop,TS val 3425678154 ecr 1561784510], length 49980 With 200 concurrent flows on a 100Gbit NIC, we can see a reduction of TSO packets (and ACK packets) of about 30 %. Signed-off-by: Eric Dumazet --- net/ipv4/tcp_output.c | 38 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 5e8665241f9345f38ce56afffe473948aef66786..99a1d88f7f47b9ef0334efe62f8fd34c0d693ced 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2683,6 +2683,36 @@ void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type) tcp_chrono_set(tp, TCP_CHRONO_BUSY); } +/* First skb in the write queue is smaller than ideal packet size. + * Check if we can move payload from the second skb in the queue. + */ +static void tcp_grow_skb(struct sock *sk, struct sk_buff *skb, int amount) +{ + struct sk_buff *next_skb = skb->next; + unsigned int nlen; + + if (tcp_skb_is_last(sk, skb)) + return; + + if (!tcp_skb_can_collapse(skb, next_skb)) + return; + + nlen = min_t(u32, amount, next_skb->len); + if (!nlen || !skb_shift(skb, next_skb, nlen)) + return; + + TCP_SKB_CB(skb)->end_seq += nlen; + TCP_SKB_CB(next_skb)->seq += nlen; + + if (!next_skb->len) { + TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(next_skb)->end_seq; + TCP_SKB_CB(skb)->eor = TCP_SKB_CB(next_skb)->eor; + TCP_SKB_CB(skb)->tcp_flags |= TCP_SKB_CB(next_skb)->tcp_flags; + tcp_unlink_write_queue(next_skb, sk); + tcp_wmem_free_skb(sk, next_skb); + } +} + /* This routine writes packets to the network. It advances the * send_head. This happens as incoming acks open up the remote * window for us. @@ -2723,6 +2753,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, max_segs = tcp_tso_segs(sk, mss_now); while ((skb = tcp_send_head(sk))) { unsigned int limit; + int missing_bytes; if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE) { /* "skb_mstamp_ns" is used as a start point for the retransmit timer */ @@ -2744,6 +2775,10 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, else break; } + cwnd_quota = min(cwnd_quota, max_segs); + missing_bytes = cwnd_quota * mss_now - skb->len; + if (missing_bytes > 0) + tcp_grow_skb(sk, skb, missing_bytes); tso_segs = tcp_set_skb_tso_segs(skb, mss_now); @@ -2767,8 +2802,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, limit = mss_now; if (tso_segs > 1 && !tcp_urg_mode(tp)) limit = tcp_mss_split_point(sk, skb, mss_now, - min(cwnd_quota, - max_segs), + cwnd_quota, nonagle); if (skb->len > limit &&