From patchwork Thu Jun 23 23:42:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 12893248 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5432FC43334 for ; Thu, 23 Jun 2022 23:44:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230448AbiFWXoy (ORCPT ); Thu, 23 Jun 2022 19:44:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230405AbiFWXoy (ORCPT ); Thu, 23 Jun 2022 19:44:54 -0400 Received: from 66-220-155-178.mail-mxout.facebook.com (66-220-155-178.mail-mxout.facebook.com [66.220.155.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2473C506E0 for ; Thu, 23 Jun 2022 16:44:52 -0700 (PDT) Received: by devbig010.atn6.facebook.com (Postfix, from userid 115148) id 3A19FE0E6FF6; Thu, 23 Jun 2022 16:44:39 -0700 (PDT) From: Joanne Koong To: netdev@vger.kernel.org Cc: edumazet@google.com, kafai@fb.com, kuba@kernel.org, davem@davemloft.net, pabeni@redhat.com, Joanne Koong Subject: [PATCH net-next v1 0/3] Add a second bind table hashed by port + address Date: Thu, 23 Jun 2022 16:42:39 -0700 Message-Id: <20220623234242.2083895-1-joannelkoong@gmail.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Currently, there is one bind hashtable (bhash) that hashes by port only. This patchset adds a second bind table (bhash2) that hashes by port and address. The motivation for adding bhash2 is to expedite bind requests in situations where the port has many sockets in its bhash table entry (eg a large number of sockets bound to different addresses on the same port), which makes checking bind conflicts costly especially given that we acquire the table entry spinlock while doing so, which can cause softirq cpu lockups and can prevent new tcp connections. We ran into this problem at Meta where the traffic team binds a large number of IPs to port 443 and the bind() call took a significant amount of time which led to cpu softirq lockups, which caused packet drops and other failures on the machine When experimentally testing this on a local server for ~24k sockets bound to the port, the results seen were: ipv4: before - 0.002317 seconds with bhash2 - 0.000020 seconds ipv6: before - 0.002431 seconds with bhash2 - 0.000021 seconds The additions to the initial bhash2 submission [1] are: * Updating bhash2 in the cases where a socket's rcv saddr changes after it has * been bound * Adding locks for bhash2 hashbuckets [1] https://lore.kernel.org/netdev/20220520001834.2247810-1-kuba@kernel.org/ Joanne Koong (3): net: Add a second bind table hashed by port + address selftests/net: Add test for timing a bind request to a port with a populated bhash entry selftests/net: Add sk_bind_sendto_listen test include/net/inet_connection_sock.h | 3 + include/net/inet_hashtables.h | 80 ++++- include/net/sock.h | 17 +- net/dccp/ipv4.c | 24 +- net/dccp/ipv6.c | 12 + net/dccp/proto.c | 34 ++- net/ipv4/af_inet.c | 31 +- net/ipv4/inet_connection_sock.c | 279 ++++++++++++++---- net/ipv4/inet_hashtables.c | 277 +++++++++++++++-- net/ipv4/tcp.c | 11 +- net/ipv4/tcp_ipv4.c | 21 +- net/ipv6/tcp_ipv6.c | 12 + tools/testing/selftests/net/.gitignore | 2 + tools/testing/selftests/net/Makefile | 4 + tools/testing/selftests/net/bind_bhash.c | 119 ++++++++ tools/testing/selftests/net/bind_bhash.sh | 23 ++ .../selftests/net/sk_bind_sendto_listen.c | 80 +++++ 17 files changed, 924 insertions(+), 105 deletions(-) create mode 100644 tools/testing/selftests/net/bind_bhash.c create mode 100755 tools/testing/selftests/net/bind_bhash.sh create mode 100644 tools/testing/selftests/net/sk_bind_sendto_listen.c