[net-next] tcp: adjust TSO packet sizes based on min_rtt

From: Eric Dumazet <edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>

Back when tcp_tso_autosize() and TCP pacing were introduced,
our focus was really to reduce burst sizes for long distance
flows.

The simple heuristic of using sk_pacing_rate/1024 has worked
well, but can lead to too small packets for hosts in the same
rack/cluster, when thousands of flows compete for the bottleneck.

Neal Cardwell had the idea of making the TSO burst size
a function of both sk_pacing_rate and tcp_min_rtt()

Indeed, for local flows, sending bigger bursts is better
to reduce cpu costs, as occasional losses can be repaired
quite fast.

This patch is based on Neal Cardwell implementation
done more than two years ago.
bbr is adjusting max_pacing_rate based on measured bandwidth,
while cubic would over estimate max_pacing_rate.

/proc/sys/net/ipv4/tcp_tso_rtt_log can be used to tune or disable
this new feature, in logarithmic steps.

Tested:

100Gbit NIC, two hosts in the same rack, 4K MTU.
600 flows rate-limited to 20000000 bytes per second.

Before patch: (TSO sizes would be limited to 20000000/1024/4096 -> 4 segments per TSO)

~# echo 0 >/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
  96005

 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':

         65,945.29 msec task-clock                #    2.845 CPUs utilized
         1,314,632      context-switches          # 19935.279 M/sec
             5,292      cpu-migrations            #   80.249 M/sec
           940,641      page-faults               # 14264.023 M/sec
   201,117,030,926      cycles                    # 3049769.216 GHz                   (83.45%)
    17,699,435,405      stalled-cycles-frontend   #    8.80% frontend cycles idle     (83.48%)
   136,584,015,071      stalled-cycles-backend    #   67.91% backend cycles idle      (83.44%)
    53,809,530,436      instructions              #    0.27  insn per cycle
                                                  #    2.54  stalled cycles per insn  (83.36%)
     9,062,315,523      branches                  # 137422329.563 M/sec               (83.22%)
       153,008,621      branch-misses             #    1.69% of all branches          (83.32%)

      23.182970846 seconds time elapsed

TcpInSegs                       15648792           0.0
TcpOutSegs                      58659110           0.0  # Average of 3.7 4K segments per TSO packet
TcpExtTCPDelivered              58654791           0.0
TcpExtTCPDeliveredCE            19                 0.0

After patch:

~# echo 9 >/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
  96046

 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':

         48,982.58 msec task-clock                #    2.104 CPUs utilized
           186,014      context-switches          # 3797.599 M/sec
             3,109      cpu-migrations            #   63.472 M/sec
           941,180      page-faults               # 19214.814 M/sec
   153,459,763,868      cycles                    # 3132982.807 GHz                   (83.56%)
    12,069,861,356      stalled-cycles-frontend   #    7.87% frontend cycles idle     (83.32%)
   120,485,917,953      stalled-cycles-backend    #   78.51% backend cycles idle      (83.24%)
    36,803,672,106      instructions              #    0.24  insn per cycle
                                                  #    3.27  stalled cycles per insn  (83.18%)
     5,947,266,275      branches                  # 121417383.427 M/sec               (83.64%)
        87,984,616      branch-misses             #    1.48% of all branches          (83.43%)

      23.281200256 seconds time elapsed

TcpInSegs                       1434706            0.0
TcpOutSegs                      58883378           0.0  # Average of 41 4K segments per TSO packet
TcpExtTCPDelivered              58878971           0.0
TcpExtTCPDeliveredCE            9664               0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 Documentation/networking/ip-sysctl.rst | 23 ++++++++++++++++++
 include/net/netns/ipv4.h               |  3 ++-
 net/ipv4/sysctl_net_ipv4.c             |  7 ++++++
 net/ipv4/tcp_ipv4.c                    |  1 +
 net/ipv4/tcp_output.c                  | 33 ++++++++++++++++----------
 5 files changed, 54 insertions(+), 13 deletions(-)

Message ID	20220309015757.2532973-1-eric.dumazet@gmail.com (mailing list archive)
State	Accepted
Commit	65466904b015f6eeb9225b51aeb29b01a1d4b59c
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FF91C433F5 for <netdev@archiver.kernel.org>; Wed, 9 Mar 2022 01:58:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231283AbiCIB7H (ORCPT <rfc822;netdev@archiver.kernel.org>); Tue, 8 Mar 2022 20:59:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231268AbiCIB7G (ORCPT <rfc822;netdev@vger.kernel.org>); Tue, 8 Mar 2022 20:59:06 -0500 Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8DCD4C792 for <netdev@vger.kernel.org>; Tue, 8 Mar 2022 17:58:07 -0800 (PST) Received: by mail-pf1-x430.google.com with SMTP id e15so936545pfv.11 for <netdev@vger.kernel.org>; Tue, 08 Mar 2022 17:58:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=tSmFUu+ge0MP/jenrZcT7/NZQRt0kD5JNu6kc/qJIP4=; b=gkqKLDPJmr5z+cvLHAkTbHOUsJUWdsSPhOqf2zUuDCIsRcLq65UzO0D3ds2NE0zTO0 i+slwQsf57CPAJIM+NbbYUKCJiLkJyh/4VF/cKSqB3Z8HIgnuAuHfDBWBx2ZxZJyw953 Yfl2DCRZkn1Tiok/j3uWwJTt+H4Vm6uPejd7CQ4CiC3pF14lJaetz7Ezni0pYfX/m09N 9tUo9DgZzR1rYdNpCOHpvMwnEPtIJ+Rqzcu22+zirCwlPyGhf+L3dKslU1DuKG+qmABj VlsVMu5xF+qN4C/C9Wmv6WBPNVbkEaaPLjLEzRaHDu7FiAEnVDe7gPcRMLNNmVViVmQE 9ziQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=tSmFUu+ge0MP/jenrZcT7/NZQRt0kD5JNu6kc/qJIP4=; b=WCl3DzVtSb1DphwEqHHAYnEanP3KhTpZP1oApLGGfoDPnJ5ipOSh4a4Kc9hCx5hset 8rC7t+4P4ktT0MvVdfiLvnmCeHjfEA1cIcpDJcmn+2EYdJ2euiC5BblJ37V232ldf+D5 bx2xbsD8UujFcqK6OlxzE75zsBSOGpHWPzrOp4iFMWDzPNTwAXhb3hgU10lqPjtq5S8G hepP+W8nc1BSKRlh022fxcKwiI2jxvDc/WeaW9fnfgJWWI78CE5A4xzJQcX6X33eRlSw TQeoLdusN1MDTNT8hfQQzYpcoP+QUhI6390opyefc6w2t0n5iSEwrJUk2bQhrhKfy59C L2Kw== X-Gm-Message-State: AOAM531q6qlsCfbO1XWyLTo3Dk3oDthM7hzLFd63GRG2dQgVMfA4Ck2Z L5H9GugH4BlTBajTA1McSXk= X-Google-Smtp-Source: ABdhPJyYbDyOhvKmYK0mHZDmVxr6k+Aoao3pwl0TEzyUxpYXsxp7myDani1NPdNerac8wvSZFz+FXw== X-Received: by 2002:a65:568b:0:b0:378:86b8:9426 with SMTP id v11-20020a65568b000000b0037886b89426mr16101788pgs.70.1646791087021; Tue, 08 Mar 2022 17:58:07 -0800 (PST) Received: from edumazet1.svl.corp.google.com ([2620:15c:2c4:201:ec26:3a58:d9f3:4e46]) by smtp.gmail.com with ESMTPSA id 17-20020a056a00071100b004f0f941d1e8sm384453pfl.24.2022.03.08.17.58.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Mar 2022 17:58:06 -0800 (PST) From: Eric Dumazet <eric.dumazet@gmail.com> To: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org> Cc: netdev <netdev@vger.kernel.org>, Eric Dumazet <edumazet@google.com>, Soheil Hassas Yeganeh <soheil@google.com>, Neal Cardwell <ncardwell@google.com>, Yuchung Cheng <ycheng@google.com>, Kevin Yang <yyd@google.com>, Eric Dumazet <eric.dumazet@gmail.com> Subject: [PATCH net-next] tcp: adjust TSO packet sizes based on min_rtt Date: Tue, 8 Mar 2022 17:57:57 -0800 Message-Id: <20220309015757.2532973-1-eric.dumazet@gmail.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org
Series	[net-next] tcp: adjust TSO packet sizes based on min_rtt \| expand [net-next] tcp: adjust TSO packet sizes based on min_rtt

Context	Check	Description
netdev/tree_selection	success	Clearly marked for net-next
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Single patches do not need cover letters
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 5143 this patch: 5143
netdev/cc_maintainers	warning	4 maintainers not CCed: dsahern@kernel.org yoshfuji@linux-ipv6.org linux-doc@vger.kernel.org corbet@lwn.net
netdev/build_clang	success	Errors and warnings before: 858 this patch: 858
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 5295 this patch: 5295
netdev/checkpatch	warning	WARNING: line length of 81 exceeds 80 columns
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

[net-next] tcp: adjust TSO packet sizes based on min_rtt

Checks

Commit Message

Comments

Patch