From patchwork Mon Jan 24 20:24:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 12722974 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D1A4C433EF for ; Mon, 24 Jan 2022 21:59:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1380461AbiAXV7N (ORCPT ); Mon, 24 Jan 2022 16:59:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1456556AbiAXVjb (ORCPT ); Mon, 24 Jan 2022 16:39:31 -0500 Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EFB7AC0417CE for ; Mon, 24 Jan 2022 12:25:06 -0800 (PST) Received: by mail-pf1-x433.google.com with SMTP id u10so12611237pfg.10 for ; Mon, 24 Jan 2022 12:25:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/6C7q8//RS9PyIZsoudaUf2fJ6pFVbwD2v5ucpzbo5E=; b=c8JCM0dUBK5GAOcTjWwgG9ebJI/i4rpXgG08xQ7jd6jH18Um8qeMQ0wqkwhFNxbTQ5 izYlgRb6XMRhdARTLNfHoXaC/YIeNEsLy8WIp7/xiba3VxiYUYozv8zRO1Lc8AjyMJxl 3aTqnxqwrXHQYjrNilNW4tEXUd/rgKjVhV3sLqshNWnQ5FwlIc6Ivm8vAXcs1EY+HDDs 37hPZAVgl1QqY6NeRXY84wLIEazmuE4n+HlVv44hBWSev5iAw6RhQA4GGhQYFBnx4gJU sZY5nM4/DP2TKAQGC6wsmMWy2I23qneaS1NycZ33uyo5egKv7y9uSsLQVcr87c7psKV+ T/sQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/6C7q8//RS9PyIZsoudaUf2fJ6pFVbwD2v5ucpzbo5E=; b=ZpetpKK9TLW9WgGFWRy+JauBBmPgEmF/KEnIcAH3ar0WZK8UIqo26PzaW4emZ5Mc8F CV/TSG2bsLTZ2aKMYI7/bwOMI3jlBVAtFH8TkQe2wADu6bFsmpHGHMq1WAEPwAe3SkvU cegNJWdpQBaZMhNOr7TJWr/jfuv6cSI8ntXzhM/3LavpT9e+qR86S0x+yM9TgMLkyKI8 +W7LcDjKIuHnYZKER5bWcwWdUZQs6CeNZswyaOt/s8+nEYwKt6or19s+3qtgphDPGBTz niT0MGRrz3UNtnO7l5GivLJKSsO+WNlFycxS1tuezgB5/IunTL9eM977SeMgh2rCRdjN mN4w== X-Gm-Message-State: AOAM531CgiGVJYO00tTWDO6Z5ts1kKv0wIWBRvP3ll5hw8/353yyPqWB J4IFL8K8+vW70pAkNUOZqMg= X-Google-Smtp-Source: ABdhPJxh5xgSSLu/Vdt/RZxOLZUsvaJr5Sjew+VbDbkNav4JVKrQ3yHdacJDNRHXAAMXqigspJk9zg== X-Received: by 2002:a63:115c:: with SMTP id 28mr12850850pgr.382.1643055906526; Mon, 24 Jan 2022 12:25:06 -0800 (PST) Received: from edumazet1.svl.corp.google.com ([2620:15c:2c4:201:e903:2adf:9289:9a45]) by smtp.gmail.com with ESMTPSA id c19sm17871115pfv.76.2022.01.24.12.25.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jan 2022 12:25:06 -0800 (PST) From: Eric Dumazet To: "David S . Miller" , Jakub Kicinski Cc: netdev , Eric Dumazet , Eric Dumazet Subject: [PATCH net-next 1/6] tcp/dccp: add tw->tw_bslot Date: Mon, 24 Jan 2022 12:24:52 -0800 Message-Id: <20220124202457.3450198-2-eric.dumazet@gmail.com> X-Mailer: git-send-email 2.35.0.rc0.227.g00780c9af4-goog In-Reply-To: <20220124202457.3450198-1-eric.dumazet@gmail.com> References: <20220124202457.3450198-1-eric.dumazet@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Eric Dumazet We want to allow inet_twsk_kill() working even if netns has been dismantled/freed, to get rid of inet_twsk_purge(). This patch adds tw->tw_bslot to cache the bind bucket slot so that inet_twsk_kill() no longer needs to dereference twsk_net(tw) Signed-off-by: Eric Dumazet --- include/net/inet_timewait_sock.h | 1 + net/ipv4/inet_timewait_sock.c | 11 +++++++---- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h index dfd919b3119e8efcbc436a67e3e6fbd02091db10..c221fe2b77dd24d8e0d13db9819cdf3ac13fe742 100644 --- a/include/net/inet_timewait_sock.h +++ b/include/net/inet_timewait_sock.h @@ -72,6 +72,7 @@ struct inet_timewait_sock { tw_tos : 8; u32 tw_txhash; u32 tw_priority; + u32 tw_bslot; /* bind bucket slot */ struct timer_list tw_timer; struct inet_bind_bucket *tw_tb; }; diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 437afe392e667c7c54509920d1e624f759f9215b..6e8f4a6cd222e89b1c7f9fdd73b10d336ff026a1 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -52,8 +52,7 @@ static void inet_twsk_kill(struct inet_timewait_sock *tw) spin_unlock(lock); /* Disassociate with bind bucket. */ - bhead = &hashinfo->bhash[inet_bhashfn(twsk_net(tw), tw->tw_num, - hashinfo->bhash_size)]; + bhead = &hashinfo->bhash[tw->tw_bslot]; spin_lock(&bhead->lock); inet_twsk_bind_unhash(tw, hashinfo); @@ -110,8 +109,12 @@ void inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk, Note, that any socket with inet->num != 0 MUST be bound in binding cache, even if it is closed. */ - bhead = &hashinfo->bhash[inet_bhashfn(twsk_net(tw), inet->inet_num, - hashinfo->bhash_size)]; + /* Cache inet_bhashfn(), because 'struct net' might be no longer + * available later in inet_twsk_kill(). + */ + tw->tw_bslot = inet_bhashfn(twsk_net(tw), inet->inet_num, + hashinfo->bhash_size); + bhead = &hashinfo->bhash[tw->tw_bslot]; spin_lock(&bhead->lock); tw->tw_tb = icsk->icsk_bind_hash; WARN_ON(!icsk->icsk_bind_hash); From patchwork Mon Jan 24 20:24:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 12722973 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F99EC433EF for ; Mon, 24 Jan 2022 21:59:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1576102AbiAXV7H (ORCPT ); Mon, 24 Jan 2022 16:59:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1456557AbiAXVjb (ORCPT ); Mon, 24 Jan 2022 16:39:31 -0500 Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEE0BC0417CF for ; Mon, 24 Jan 2022 12:25:08 -0800 (PST) Received: by mail-pg1-x532.google.com with SMTP id z131so5380676pgz.12 for ; Mon, 24 Jan 2022 12:25:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tFjgLXrRuL7F2KSwzfEoWVhMB06yvCvTNDbKGRLmzI8=; b=mURdRWQwUrU8+IX86vcFEfrEaZHWiZOi1c5WB1rUUZ9SgxLpWWvTJPqoY6csF5hdiy m5ryDUQlPioD5xmw/WybPBkgrZP6sK9gBZhEPvtfw6j/xOZIHBP1RNMydDzbUbBt2oz/ 5gSeLo5Ek5jJpjhVhSPAg+PLTRY0OhOoYNbSXmHfRwiDflWFOzLOFjsYXkZCnaMYSibv mT6VkFJYblbmjZ3OrMGrteaxqgjtYlWwRWzETFQi9nEPu+GXvjUioDoyFaeo3PqubPJu 701y8kp3+N/K059GzuIQIn7h+VXvhitozpwoiofDXckSAgmyL6TwFOh2iYeIxd0vJF3R 9fuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tFjgLXrRuL7F2KSwzfEoWVhMB06yvCvTNDbKGRLmzI8=; b=WWszy0ir760exwVPpFpY/MLCE7q9ugEp3AKRKj1sZKRxfRXXpGUkGe2Jjvi/IWgtko O/GQp8nD7//wJClwGgB+Y38XvI148HAwT1DGGew00n5hJCiM9Yc6V65u+udl7XGSblSn t7ZE87Hfy2tadGfWuXyhXupG5+9U4ZVLtUTlwOc/eD/B40+ty5OV4maWBav5Va2KJtfz /WcO6snH/y5S0gkPvREJRQbWI7RJOxHxNDUxwNFDiEOtoHmPzkoX2naeHQaOXdROygjw XuwJDEH7Jqm1tganwSJTKlBUf9zkV87Ml0Uze3x9YbQObCUcNdosK01tydmYWfxYnUwe HC4A== X-Gm-Message-State: AOAM533NLH7hjfdRyL0iJypQm58VMUjzX3iPvQZdnjUN/OpqCrzzucTa uSPRG1CpVX4ZgJJjge+orFI= X-Google-Smtp-Source: ABdhPJyf+b5q8P7A/2RslQpYwzn6Qeq0pf9qBOl8ZOMaLy+63IB5jqctT0iCbnGhJyhcNBJIOs6Z0w== X-Received: by 2002:a63:2502:: with SMTP id l2mr5027045pgl.162.1643055908448; Mon, 24 Jan 2022 12:25:08 -0800 (PST) Received: from edumazet1.svl.corp.google.com ([2620:15c:2c4:201:e903:2adf:9289:9a45]) by smtp.gmail.com with ESMTPSA id c19sm17871115pfv.76.2022.01.24.12.25.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jan 2022 12:25:08 -0800 (PST) From: Eric Dumazet To: "David S . Miller" , Jakub Kicinski Cc: netdev , Eric Dumazet , Eric Dumazet Subject: [PATCH net-next 2/6] tcp/dccp: no longer use twsk_net(tw) from tw_timer_handler() Date: Mon, 24 Jan 2022 12:24:53 -0800 Message-Id: <20220124202457.3450198-3-eric.dumazet@gmail.com> X-Mailer: git-send-email 2.35.0.rc0.227.g00780c9af4-goog In-Reply-To: <20220124202457.3450198-1-eric.dumazet@gmail.com> References: <20220124202457.3450198-1-eric.dumazet@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Eric Dumazet We will soon get rid of inet_twsk_purge(). This means that tw_timer_handler() might fire after a netns has been dismantled/freed. Instead of adding a function (and data structure) to find a netns from tw->tw_net_cookie, just update the SNMP counters a bit earlier, when the netns is known to be alive. Signed-off-by: Eric Dumazet --- include/net/inet_timewait_sock.h | 5 ++--- net/ipv4/inet_timewait_sock.c | 9 ++++----- 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h index c221fe2b77dd24d8e0d13db9819cdf3ac13fe742..b323db969b8b6df98ad84a9bb9aad646b4a8730c 100644 --- a/include/net/inet_timewait_sock.h +++ b/include/net/inet_timewait_sock.h @@ -65,10 +65,9 @@ struct inet_timewait_sock { /* these three are in inet_sock */ __be16 tw_sport; /* And these are ours. */ - unsigned int tw_kill : 1, - tw_transparent : 1, + unsigned int tw_transparent : 1, tw_flowlabel : 20, - tw_pad : 2, /* 2 bits hole */ + tw_pad : 3, /* 3 bits hole */ tw_tos : 8; u32 tw_txhash; u32 tw_priority; diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 6e8f4a6cd222e89b1c7f9fdd73b10d336ff026a1..e37e4852711c52bf8f6d01877297266cd19294ed 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -148,10 +148,6 @@ static void tw_timer_handler(struct timer_list *t) { struct inet_timewait_sock *tw = from_timer(tw, t, tw_timer); - if (tw->tw_kill) - __NET_INC_STATS(twsk_net(tw), LINUX_MIB_TIMEWAITKILLED); - else - __NET_INC_STATS(twsk_net(tw), LINUX_MIB_TIMEWAITED); inet_twsk_kill(tw); } @@ -247,8 +243,11 @@ void __inet_twsk_schedule(struct inet_timewait_sock *tw, int timeo, bool rearm) * of PAWS. */ - tw->tw_kill = timeo <= 4*HZ; if (!rearm) { + bool kill = timeo <= 4*HZ; + + __NET_INC_STATS(twsk_net(tw), kill ? LINUX_MIB_TIMEWAITKILLED : + LINUX_MIB_TIMEWAITED); BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo)); atomic_inc(&tw->tw_dr->tw_count); } else { From patchwork Mon Jan 24 20:24:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 12722976 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93F81C433FE for ; Mon, 24 Jan 2022 22:03:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1380536AbiAXV7d (ORCPT ); Mon, 24 Jan 2022 16:59:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1456633AbiAXVjk (ORCPT ); Mon, 24 Jan 2022 16:39:40 -0500 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A5463C0417D0 for ; Mon, 24 Jan 2022 12:25:10 -0800 (PST) Received: by mail-pl1-x632.google.com with SMTP id b15so2411196plg.3 for ; Mon, 24 Jan 2022 12:25:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cNxS8TBCqY8UJ60TjmZ6Gzdxol6+WtmgOAooWcnjlsY=; b=jt2JUF4Z5UoX7PyRhVuAZzq7XZJT/YTN8nn2qZUHRmprV7ivPHQ6elvlCFqdfOQhB3 4wTjIBXhWltSx8XgTobzTkMF9A4W922560GoRHwY84aVX6u6ZDLcKfanpG0J1SRsP7sY P80IGUHju7r3QCy+h3z25pNihm1c9yIOKf7anai3qLXypk4SGWKHj5mc7uGhVfNh6KZA JR0clmstqf9zZE7u4gbC0i6IU2RrjFDkR8TwfzHFhSmsByJEC53JgZmpoNDzkr1hCPRf JX3Flm4Fe67GmGN7kJa4XSzPsQSuUD8kJ2igvqhBqOXN4rr7hbdLTDmsrX9h9mIY36lR I7QQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cNxS8TBCqY8UJ60TjmZ6Gzdxol6+WtmgOAooWcnjlsY=; b=ICecVJGVbJIEdDrEM8N6hGIyHJkEYM3il+sy/vQqvzQDoJz/8AU33NQhefOeXIJs47 AVTcIx7mkCJCKFOsy0BMiCZAXd70gqF30O1k5x41sUYBqW6BXEn808uhrOd8GkAj0LD+ xXUT88jtV30xBommunSETrYNhHbYCmojgB5fW92NxIFvOST0rE4i7XIK9JEXehZ3AErp kMPvOIc1g+Ua9iTCTEtWXUCFK53vksBaiOZOnAAYfJwW/4V3L++4SJ/S3IjR7cZ5jRMa uBx2dR8TL7rAoVqbfkf8g6SyKLvipRnI2PN/1Lsk4MF8pId6cyfiO/UFvKHrM7LWNGql ofpA== X-Gm-Message-State: AOAM531+CELYvHyV84YhWP11tpy//2xyNw2uud+9L8Nt5xn0sKdJnTyd gWtqOOkoP4Nt4ofkzvRvfsc= X-Google-Smtp-Source: ABdhPJzUfXZ6KVPfia6ZOrHeIBXfmje5VtNlXKlTPZIfTimT4C9ylHEyDWGJI1DPMWr0LeBPqmyYAQ== X-Received: by 2002:a17:90b:4f83:: with SMTP id qe3mr24040pjb.203.1643055910273; Mon, 24 Jan 2022 12:25:10 -0800 (PST) Received: from edumazet1.svl.corp.google.com ([2620:15c:2c4:201:e903:2adf:9289:9a45]) by smtp.gmail.com with ESMTPSA id c19sm17871115pfv.76.2022.01.24.12.25.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jan 2022 12:25:09 -0800 (PST) From: Eric Dumazet To: "David S . Miller" , Jakub Kicinski Cc: netdev , Eric Dumazet , Eric Dumazet Subject: [PATCH net-next 3/6] tcp/dccp: get rid of inet_twsk_purge() Date: Mon, 24 Jan 2022 12:24:54 -0800 Message-Id: <20220124202457.3450198-4-eric.dumazet@gmail.com> X-Mailer: git-send-email 2.35.0.rc0.227.g00780c9af4-goog In-Reply-To: <20220124202457.3450198-1-eric.dumazet@gmail.com> References: <20220124202457.3450198-1-eric.dumazet@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Eric Dumazet Prior patches in the series made sure tw_timer_handler() can be fired after netns has been dismantled/freed. We no longer have to scan a potentially big TCP ehash table at netns dismantle. Signed-off-by: Eric Dumazet Reported-by: kernel test robot --- include/net/inet_timewait_sock.h | 2 -- net/dccp/ipv4.c | 6 ---- net/dccp/ipv6.c | 6 ---- net/ipv4/inet_timewait_sock.c | 47 -------------------------------- net/ipv4/tcp_ipv4.c | 2 -- net/ipv6/tcp_ipv6.c | 6 ---- 6 files changed, 69 deletions(-) diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h index b323db969b8b6df98ad84a9bb9aad646b4a8730c..463ae5d33eb09c40caeb4d039af268609b5e563b 100644 --- a/include/net/inet_timewait_sock.h +++ b/include/net/inet_timewait_sock.h @@ -110,8 +110,6 @@ static inline void inet_twsk_reschedule(struct inet_timewait_sock *tw, int timeo void inet_twsk_deschedule_put(struct inet_timewait_sock *tw); -void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family); - static inline struct net *twsk_net(const struct inet_timewait_sock *twsk) { diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index 0ea29270d7e53730d14ec43654be8f956f891552..ae662567a6cb6a440c79a9805a2cd6d146ac5a29 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -1030,15 +1030,9 @@ static void __net_exit dccp_v4_exit_net(struct net *net) inet_ctl_sock_destroy(pn->v4_ctl_sk); } -static void __net_exit dccp_v4_exit_batch(struct list_head *net_exit_list) -{ - inet_twsk_purge(&dccp_hashinfo, AF_INET); -} - static struct pernet_operations dccp_v4_ops = { .init = dccp_v4_init_net, .exit = dccp_v4_exit_net, - .exit_batch = dccp_v4_exit_batch, .id = &dccp_v4_pernet_id, .size = sizeof(struct dccp_v4_pernet), }; diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index fa663518fa0e465458b7486ad0cd0672425f08b0..eab3bd1ee9a0a0064c04ff97fd8363e60daa0079 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -1115,15 +1115,9 @@ static void __net_exit dccp_v6_exit_net(struct net *net) inet_ctl_sock_destroy(pn->v6_ctl_sk); } -static void __net_exit dccp_v6_exit_batch(struct list_head *net_exit_list) -{ - inet_twsk_purge(&dccp_hashinfo, AF_INET6); -} - static struct pernet_operations dccp_v6_ops = { .init = dccp_v6_init_net, .exit = dccp_v6_exit_net, - .exit_batch = dccp_v6_exit_batch, .id = &dccp_v6_pernet_id, .size = sizeof(struct dccp_v6_pernet), }; diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index e37e4852711c52bf8f6d01877297266cd19294ed..71808c7a7025c0a3a811b629a7796ad148152a4c 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -255,50 +255,3 @@ void __inet_twsk_schedule(struct inet_timewait_sock *tw, int timeo, bool rearm) } } EXPORT_SYMBOL_GPL(__inet_twsk_schedule); - -void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family) -{ - struct inet_timewait_sock *tw; - struct sock *sk; - struct hlist_nulls_node *node; - unsigned int slot; - - for (slot = 0; slot <= hashinfo->ehash_mask; slot++) { - struct inet_ehash_bucket *head = &hashinfo->ehash[slot]; -restart_rcu: - cond_resched(); - rcu_read_lock(); -restart: - sk_nulls_for_each_rcu(sk, node, &head->chain) { - if (sk->sk_state != TCP_TIME_WAIT) - continue; - tw = inet_twsk(sk); - if ((tw->tw_family != family) || - refcount_read(&twsk_net(tw)->ns.count)) - continue; - - if (unlikely(!refcount_inc_not_zero(&tw->tw_refcnt))) - continue; - - if (unlikely((tw->tw_family != family) || - refcount_read(&twsk_net(tw)->ns.count))) { - inet_twsk_put(tw); - goto restart; - } - - rcu_read_unlock(); - local_bh_disable(); - inet_twsk_deschedule_put(tw); - local_bh_enable(); - goto restart_rcu; - } - /* If the nulls value we got at the end of this lookup is - * not the expected one, we must restart lookup. - * We probably met an item that was moved to another chain. - */ - if (get_nulls_value(node) != slot) - goto restart; - rcu_read_unlock(); - } -} -EXPORT_SYMBOL_GPL(inet_twsk_purge); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index b3f34e366b27f7f1aece164aa485b1e9a7248d93..8e94b99882044d3d9927d83512d18f34dc2f5b43 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -3239,8 +3239,6 @@ static void __net_exit tcp_sk_exit_batch(struct list_head *net_exit_list) { struct net *net; - inet_twsk_purge(&tcp_hashinfo, AF_INET); - list_for_each_entry(net, net_exit_list, exit_list) tcp_fastopen_ctx_destroy(net); } diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 075ee8a2df3b7f3759f69f1b1256f2e8c9c700c1..1e55ee98dfedac67a591a8a04ce98f334a4b8b7c 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -2237,15 +2237,9 @@ static void __net_exit tcpv6_net_exit(struct net *net) inet_ctl_sock_destroy(net->ipv6.tcp_sk); } -static void __net_exit tcpv6_net_exit_batch(struct list_head *net_exit_list) -{ - inet_twsk_purge(&tcp_hashinfo, AF_INET6); -} - static struct pernet_operations tcpv6_net_ops = { .init = tcpv6_net_init, .exit = tcpv6_net_exit, - .exit_batch = tcpv6_net_exit_batch, }; int __init tcpv6_init(void) From patchwork Mon Jan 24 20:24:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 12722978 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 483C6C4332F for ; Mon, 24 Jan 2022 22:03:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1577238AbiAXV7j (ORCPT ); Mon, 24 Jan 2022 16:59:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54860 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1456636AbiAXVjl (ORCPT ); Mon, 24 Jan 2022 16:39:41 -0500 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82081C0BD13F for ; Mon, 24 Jan 2022 12:25:12 -0800 (PST) Received: by mail-pj1-x102a.google.com with SMTP id d15-20020a17090a110f00b001b4e7d27474so273351pja.2 for ; Mon, 24 Jan 2022 12:25:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1Sf7XHeE7RTjlHyr1j7U5Y5D0po4rkD6i9I7tZP+8yQ=; b=jougAziTOAZP44kUPaEOuMNHww6BROUpwo2FuuLzXFSDJGxeHK9PxiuefsBUozo1L9 7I3fqCZLJ0Kye+wXsA0zS5P7EEq7sF/sCB8lAG8bJcQ94+hNEHQGeoTub7o6zXMbuY9F FaamggLUg+IC9NPWfA7pIB55C1Qj+ulskhGM1273/4/gFxplK0KUD+gM19w2Q9+ihQA5 eIt0dHQayEGpge47cIzlT0rXNHXhrSzczoZ3gfY02VKALiC/G6R0w8iDjEp2CVIo7gzY hW8LpOAXhQeANHp8U+mjVqBEpwKShhNwPq6yrlOYqPFXMHSAlwB5EyNlvhdqtyjwvrLU LQ+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1Sf7XHeE7RTjlHyr1j7U5Y5D0po4rkD6i9I7tZP+8yQ=; b=BFW0IeOp35wr1DAjJ3PadX/h5ZkuRUBpM6y5fd2WcikZrEFm9Vc3Bd9zJ6AcH6+6Mi AAlPStWAtIKMycvQltJB+H/KcHPb2dyB/bgrhrNQItIJgrLeCuhChlDNR6PCp3n7SbhS yM/62zJ4lYi2M6Xw9veXZ3jcenA6qzbbhet+iqW343RB0JRbu3gn5ZI37JwXa7WbB/o7 zYaUALbY586W8i2RxoLc/sUtEupErG+5x93kkwGKxcZSFZC3A4m5ZzihwtBZw7a/xgJN GsxP1PyAioNxnOhyhqo+kw+wxQCaQN3M6mAdsqFgApxJJOO2aYwxjEDlGmyJ2jIaF9Vt ZHkQ== X-Gm-Message-State: AOAM533xyrk5lXmFnWFsg8D+8uLSVzphRLsEs1H7Pnz96uhjJ/BOtbie AeB+4T4uFQ0fWJcxd8/LSD0= X-Google-Smtp-Source: ABdhPJzAReoOvbtyn1MpZxqSFylkebY9id2D+0pi5/XnCGiac+s11FtKTkBfw45u/15eJl4wKOkROw== X-Received: by 2002:a17:902:a50f:b0:149:bc1a:2c98 with SMTP id s15-20020a170902a50f00b00149bc1a2c98mr15744317plq.35.1643055912012; Mon, 24 Jan 2022 12:25:12 -0800 (PST) Received: from edumazet1.svl.corp.google.com ([2620:15c:2c4:201:e903:2adf:9289:9a45]) by smtp.gmail.com with ESMTPSA id c19sm17871115pfv.76.2022.01.24.12.25.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jan 2022 12:25:11 -0800 (PST) From: Eric Dumazet To: "David S . Miller" , Jakub Kicinski Cc: netdev , Eric Dumazet , Eric Dumazet Subject: [PATCH net-next 4/6] ipv4: do not use per netns icmp sockets Date: Mon, 24 Jan 2022 12:24:55 -0800 Message-Id: <20220124202457.3450198-5-eric.dumazet@gmail.com> X-Mailer: git-send-email 2.35.0.rc0.227.g00780c9af4-goog In-Reply-To: <20220124202457.3450198-1-eric.dumazet@gmail.com> References: <20220124202457.3450198-1-eric.dumazet@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Eric Dumazet Back in linux-2.6.25 (commit 4a6ad7a141cb "[NETNS]: Make icmp_sk per namespace."), we added private per-cpu/per-netns ipv4 icmp sockets. This adds memory and cpu costs, which do not seem needed. Now typical servers have 256 or more cores, this adds considerable tax to netns users. icmp sockets are used from BH context, are not receiving packets, and do not store any persistent state but the 'struct net' pointer. icmp_xmit_lock() already makes sure to lock the chosen per-cpu socket. Signed-off-by: Eric Dumazet --- include/net/netns/ipv4.h | 1 - net/ipv4/icmp.c | 91 ++++++++++++++-------------------------- 2 files changed, 31 insertions(+), 61 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 78557643526e23985af5695b54076a399e5c9548..639a31638159b23e7ec1d16f621a7953b885729c 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -70,7 +70,6 @@ struct netns_ipv4 { struct hlist_head *fib_table_hash; struct sock *fibnl; - struct sock * __percpu *icmp_sk; struct sock *mc_autojoin_sk; struct inet_peer_base *peers; diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index b7e277d8a84d224cb9c034321e688d765d01c07f..72a375c7f4172d92af61eb9b5eb7da29b551b663 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -192,24 +192,14 @@ struct icmp_control { static const struct icmp_control icmp_pointers[NR_ICMP_TYPES+1]; -/* - * The ICMP socket(s). This is the most convenient way to flow control - * our ICMP output as well as maintain a clean interface throughout - * all layers. All Socketless IP sends will soon be gone. - * - * On SMP we have one ICMP socket per-cpu. - */ -static struct sock *icmp_sk(struct net *net) -{ - return this_cpu_read(*net->ipv4.icmp_sk); -} +static DEFINE_PER_CPU(struct sock *, ipv4_icmp_sk); /* Called with BH disabled */ static inline struct sock *icmp_xmit_lock(struct net *net) { struct sock *sk; - sk = icmp_sk(net); + sk = this_cpu_read(ipv4_icmp_sk); if (unlikely(!spin_trylock(&sk->sk_lock.slock))) { /* This can happen if the output path signals a @@ -217,11 +207,13 @@ static inline struct sock *icmp_xmit_lock(struct net *net) */ return NULL; } + sock_net_set(sk, net); return sk; } static inline void icmp_xmit_unlock(struct sock *sk) { + sock_net_set(sk, &init_net); spin_unlock(&sk->sk_lock.slock); } @@ -363,14 +355,13 @@ static int icmp_glue_bits(void *from, char *to, int offset, int len, int odd, return 0; } -static void icmp_push_reply(struct icmp_bxm *icmp_param, +static void icmp_push_reply(struct sock *sk, + struct icmp_bxm *icmp_param, struct flowi4 *fl4, struct ipcm_cookie *ipc, struct rtable **rt) { - struct sock *sk; struct sk_buff *skb; - sk = icmp_sk(dev_net((*rt)->dst.dev)); if (ip_append_data(sk, fl4, icmp_glue_bits, icmp_param, icmp_param->data_len+icmp_param->head_len, icmp_param->head_len, @@ -452,7 +443,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb) if (IS_ERR(rt)) goto out_unlock; if (icmpv4_xrlim_allow(net, rt, &fl4, type, code)) - icmp_push_reply(icmp_param, &fl4, &ipc, &rt); + icmp_push_reply(sk, icmp_param, &fl4, &ipc, &rt); ip_rt_put(rt); out_unlock: icmp_xmit_unlock(sk); @@ -766,7 +757,7 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info, if (!fl4.saddr) fl4.saddr = htonl(INADDR_DUMMY); - icmp_push_reply(&icmp_param, &fl4, &ipc, &rt); + icmp_push_reply(sk, &icmp_param, &fl4, &ipc, &rt); ende: ip_rt_put(rt); out_unlock: @@ -1434,46 +1425,8 @@ static const struct icmp_control icmp_pointers[NR_ICMP_TYPES + 1] = { }, }; -static void __net_exit icmp_sk_exit(struct net *net) -{ - int i; - - for_each_possible_cpu(i) - inet_ctl_sock_destroy(*per_cpu_ptr(net->ipv4.icmp_sk, i)); - free_percpu(net->ipv4.icmp_sk); - net->ipv4.icmp_sk = NULL; -} - static int __net_init icmp_sk_init(struct net *net) { - int i, err; - - net->ipv4.icmp_sk = alloc_percpu(struct sock *); - if (!net->ipv4.icmp_sk) - return -ENOMEM; - - for_each_possible_cpu(i) { - struct sock *sk; - - err = inet_ctl_sock_create(&sk, PF_INET, - SOCK_RAW, IPPROTO_ICMP, net); - if (err < 0) - goto fail; - - *per_cpu_ptr(net->ipv4.icmp_sk, i) = sk; - - /* Enough space for 2 64K ICMP packets, including - * sk_buff/skb_shared_info struct overhead. - */ - sk->sk_sndbuf = 2 * SKB_TRUESIZE(64 * 1024); - - /* - * Speedup sock_wfree() - */ - sock_set_flag(sk, SOCK_USE_WRITE_QUEUE); - inet_sk(sk)->pmtudisc = IP_PMTUDISC_DONT; - } - /* Control parameters for ECHO replies. */ net->ipv4.sysctl_icmp_echo_ignore_all = 0; net->ipv4.sysctl_icmp_echo_enable_probe = 0; @@ -1499,18 +1452,36 @@ static int __net_init icmp_sk_init(struct net *net) net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr = 0; return 0; - -fail: - icmp_sk_exit(net); - return err; } static struct pernet_operations __net_initdata icmp_sk_ops = { .init = icmp_sk_init, - .exit = icmp_sk_exit, }; int __init icmp_init(void) { + int err, i; + + for_each_possible_cpu(i) { + struct sock *sk; + + err = inet_ctl_sock_create(&sk, PF_INET, + SOCK_RAW, IPPROTO_ICMP, &init_net); + if (err < 0) + return err; + + per_cpu(ipv4_icmp_sk, i) = sk; + + /* Enough space for 2 64K ICMP packets, including + * sk_buff/skb_shared_info struct overhead. + */ + sk->sk_sndbuf = 2 * SKB_TRUESIZE(64 * 1024); + + /* + * Speedup sock_wfree() + */ + sock_set_flag(sk, SOCK_USE_WRITE_QUEUE); + inet_sk(sk)->pmtudisc = IP_PMTUDISC_DONT; + } return register_pernet_subsys(&icmp_sk_ops); } From patchwork Mon Jan 24 20:24:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 12722977 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3503FC433F5 for ; Mon, 24 Jan 2022 22:03:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1577223AbiAXV7h (ORCPT ); Mon, 24 Jan 2022 16:59:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1456643AbiAXVjl (ORCPT ); Mon, 24 Jan 2022 16:39:41 -0500 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C60C1C0419C0 for ; Mon, 24 Jan 2022 12:25:14 -0800 (PST) Received: by mail-pf1-x42b.google.com with SMTP id h5so8746987pfv.13 for ; Mon, 24 Jan 2022 12:25:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=6CydK47Ky2gdeZ9+T+LmQeSomVA0WgcQeU/YTKDdi8s=; b=jYzcYUqib3aKPT93H/ctey0ingTEMyOMa5/LbyTPBGOe7eP8fdf47cDW0+bMpSxhzn L0CU7czQAanHqE7PpQA4PuNqkKp36q9qnadsWZVMnfAnZxqujK3YhRjqYDmBJD8LWsaU HVQREReCnhLOjvZPKlEBQBXMY6vJ+BhVSwyhckiqrWeWkrm9ABCx/S8h1C/NeRJRkuJe UI+pzo5JNBOergWK4wrNYUyMBcA58A8iDxR6Bq06oKI2raLRSELmTA1Jyg0h0hA6zouA ou+UDhDlwL5Mv/o17sKf3Ef5QAVDNDp/BatOdzrgt1xj1LKb/YWjHzmJBLLlC3dFhD+G 367Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6CydK47Ky2gdeZ9+T+LmQeSomVA0WgcQeU/YTKDdi8s=; b=5aqahau96XvQcyskuUKLafXiGpk6Zm0ql2AaTiMLlOyoiiAvl+Cx0IseS0jrwe+/Kq jMH5COcCnrdSw7twnUWi0y+1EBw7BZAiao/Iq1a30t8eDmFwiUupLnEj55ke9cZ3EuJl 8aasS8ckTzEb4YRFXdssA6X4WSGWoDsDtJPPdq9aGWYMoAY77hKdXxYilD51uaT1Pszz nRAsHVM/qSKTwMELAPOxheC6YxjP7unTcJs1YwLD4FcpfDvT9vtqBnsUNDSXZRO33f62 yFTqf8kvp0hbbVw3HRiLtKpnFKaMzN1csUE1Pd5uanoXwQSvIgZ/Z6TW2fmDT4AXYgSy W/9Q== X-Gm-Message-State: AOAM530zRZONHUPpJVS/GhyhwzF6JJY0pzJMlBWkW/xAw56da8hy51e9 MRP0LPZhBR+m36urBCDXnIg= X-Google-Smtp-Source: ABdhPJxGTSj4Eihgh3PbLQO9jLOVDi3IAAS6NCn8lncnZ96ZIzulOQd1US8e9YehsZIJfSMy+40Vvw== X-Received: by 2002:a63:120b:: with SMTP id h11mr12715676pgl.611.1643055914282; Mon, 24 Jan 2022 12:25:14 -0800 (PST) Received: from edumazet1.svl.corp.google.com ([2620:15c:2c4:201:e903:2adf:9289:9a45]) by smtp.gmail.com with ESMTPSA id c19sm17871115pfv.76.2022.01.24.12.25.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jan 2022 12:25:14 -0800 (PST) From: Eric Dumazet To: "David S . Miller" , Jakub Kicinski Cc: netdev , Eric Dumazet , Eric Dumazet Subject: [PATCH net-next 5/6] ipv6: do not use per netns icmp sockets Date: Mon, 24 Jan 2022 12:24:56 -0800 Message-Id: <20220124202457.3450198-6-eric.dumazet@gmail.com> X-Mailer: git-send-email 2.35.0.rc0.227.g00780c9af4-goog In-Reply-To: <20220124202457.3450198-1-eric.dumazet@gmail.com> References: <20220124202457.3450198-1-eric.dumazet@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Eric Dumazet Back in linux-2.6.25 (commit 98c6d1b261e7 "[NETNS]: Make icmpv6_sk per namespace.", we added private per-cpu/per-netns ipv6 icmp sockets. This adds memory and cpu costs, which do not seem needed. Now typical servers have 256 or more cores, this adds considerable tax to netns users. icmp sockets are used from BH context, are not receiving packets, and do not store any persistent state but the 'struct net' pointer. icmpv6_xmit_lock() already makes sure to lock the chosen per-cpu socket. This patch has a considerable impact on the number of netns that the worker thread in cleanup_net() can dismantle per second, because ip6mr_sk_done() is no longer called, meaning we no longer acquire the rtnl mutex, competing with other threads adding new netns. Signed-off-by: Eric Dumazet --- include/net/netns/ipv6.h | 1 - net/ipv6/icmp.c | 62 +++++++--------------------------------- 2 files changed, 10 insertions(+), 53 deletions(-) diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h index a4b55038031652601444e46006c489a4e23b0ab7..30cdfc4e1615424b1c691b53499a1987d7fd0496 100644 --- a/include/net/netns/ipv6.h +++ b/include/net/netns/ipv6.h @@ -88,7 +88,6 @@ struct netns_ipv6 { struct fib6_table *fib6_local_tbl; struct fib_rules_ops *fib6_rules_ops; #endif - struct sock * __percpu *icmp_sk; struct sock *ndisc_sk; struct sock *tcp_sk; struct sock *igmp_sk; diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index 96c5cc0f30cebb02280a8384e34234701989f0d6..e6b978ea0e87fe595121a977d2030a308437eff3 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -69,17 +69,7 @@ #include -/* - * The ICMP socket(s). This is the most convenient way to flow control - * our ICMP output as well as maintain a clean interface throughout - * all layers. All Socketless IP sends will soon be gone. - * - * On SMP we have one ICMP socket per-cpu. - */ -static struct sock *icmpv6_sk(struct net *net) -{ - return this_cpu_read(*net->ipv6.icmp_sk); -} +static DEFINE_PER_CPU(struct sock *, ipv6_icmp_sk); static int icmpv6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, u8 type, u8 code, int offset, __be32 info) @@ -110,11 +100,11 @@ static const struct inet6_protocol icmpv6_protocol = { }; /* Called with BH disabled */ -static __inline__ struct sock *icmpv6_xmit_lock(struct net *net) +static struct sock *icmpv6_xmit_lock(struct net *net) { struct sock *sk; - sk = icmpv6_sk(net); + sk = this_cpu_read(ipv6_icmp_sk); if (unlikely(!spin_trylock(&sk->sk_lock.slock))) { /* This can happen if the output path (f.e. SIT or * ip6ip6 tunnel) signals dst_link_failure() for an @@ -122,11 +112,13 @@ static __inline__ struct sock *icmpv6_xmit_lock(struct net *net) */ return NULL; } + sock_net_set(sk, net); return sk; } -static __inline__ void icmpv6_xmit_unlock(struct sock *sk) +static void icmpv6_xmit_unlock(struct sock *sk) { + sock_net_set(sk, &init_net); spin_unlock(&sk->sk_lock.slock); } @@ -1034,59 +1026,27 @@ void icmpv6_flow_init(struct sock *sk, struct flowi6 *fl6, security_sk_classify_flow(sk, flowi6_to_flowi_common(fl6)); } -static void __net_exit icmpv6_sk_exit(struct net *net) -{ - int i; - - for_each_possible_cpu(i) - inet_ctl_sock_destroy(*per_cpu_ptr(net->ipv6.icmp_sk, i)); - free_percpu(net->ipv6.icmp_sk); -} - -static int __net_init icmpv6_sk_init(struct net *net) +int __init icmpv6_init(void) { struct sock *sk; int err, i; - net->ipv6.icmp_sk = alloc_percpu(struct sock *); - if (!net->ipv6.icmp_sk) - return -ENOMEM; - for_each_possible_cpu(i) { err = inet_ctl_sock_create(&sk, PF_INET6, - SOCK_RAW, IPPROTO_ICMPV6, net); + SOCK_RAW, IPPROTO_ICMPV6, &init_net); if (err < 0) { pr_err("Failed to initialize the ICMP6 control socket (err %d)\n", err); - goto fail; + return err; } - *per_cpu_ptr(net->ipv6.icmp_sk, i) = sk; + per_cpu(ipv6_icmp_sk, i) = sk; /* Enough space for 2 64K ICMP packets, including * sk_buff struct overhead. */ sk->sk_sndbuf = 2 * SKB_TRUESIZE(64 * 1024); } - return 0; - - fail: - icmpv6_sk_exit(net); - return err; -} - -static struct pernet_operations icmpv6_sk_ops = { - .init = icmpv6_sk_init, - .exit = icmpv6_sk_exit, -}; - -int __init icmpv6_init(void) -{ - int err; - - err = register_pernet_subsys(&icmpv6_sk_ops); - if (err < 0) - return err; err = -EAGAIN; if (inet6_add_protocol(&icmpv6_protocol, IPPROTO_ICMPV6) < 0) @@ -1101,14 +1061,12 @@ int __init icmpv6_init(void) inet6_del_protocol(&icmpv6_protocol, IPPROTO_ICMPV6); fail: pr_err("Failed to register ICMP6 protocol\n"); - unregister_pernet_subsys(&icmpv6_sk_ops); return err; } void icmpv6_cleanup(void) { inet6_unregister_icmp_sender(icmp6_send); - unregister_pernet_subsys(&icmpv6_sk_ops); inet6_del_protocol(&icmpv6_protocol, IPPROTO_ICMPV6); } From patchwork Mon Jan 24 20:24:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 12722979 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0D88C433F5 for ; Mon, 24 Jan 2022 22:03:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1577259AbiAXV7l (ORCPT ); Mon, 24 Jan 2022 16:59:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1456649AbiAXVjo (ORCPT ); Mon, 24 Jan 2022 16:39:44 -0500 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52995C0613A7 for ; Mon, 24 Jan 2022 12:25:17 -0800 (PST) Received: by mail-pl1-x629.google.com with SMTP id d18so4274258plg.2 for ; Mon, 24 Jan 2022 12:25:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ySlUS0rUDwK62yVJf2rN33yPK8KDaorlC/HRAX7+p4w=; b=hu7vnhUxLxJ3FKtKigjKIGC3UyA6PJKoJi9CIXE6yVh4OfiTMbuF/3OqyhxmXAfanF rOqxeiqTFmo75BTEW8LDEcG8tveAjQsHqh2rLBSdlmcrLay8OZDgdc3oNUhbCtusC5KN 8ybp18qNPBu8zlDYhOq1BAsdrftosLNKZQhmv24QD4xdU3y/Cd6jBoeBOtk8VCx5y37i t1hARtlbz6+H4T22NSjTYl6yWvxN5OhpOwxR7zrmA2uu9rmi4o9QHnEWlZNeTtdtyFY8 XhTn69KZSjrSnAual3iClGePdvlx+Rt9vtyQcGwQnQVS0kurYTfnSzDApqyzaxk1eOMJ 4DGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ySlUS0rUDwK62yVJf2rN33yPK8KDaorlC/HRAX7+p4w=; b=4hofR7ea03C9Su+BK4bdYJzI+8wR0LjFKjKBJM0ylo34knq8Vqu5LPpjT2htDm7SjK Rp8+0gVG0ych2oFMsNoLbchR7ABCsaJwevXJ4Vz595yHg3YfJM3vzOIaomqw7kL9P76M 0gQJbhz6xERfbBvninozeNsq85rLApX8iz+C/yLBWNxWx/vRo5bh1ITH1+Ju2atA8Vnd 7W19QVbB3cnS2pyFYVn75n3KLppoIzugDIaOk0CF4gHExHpXZpAvExCCU6A3kwGEsox3 ENbcQph4SayrXsnSjjAgG9KWwMcFruHyEYA7VNYatBUGgdXPWUUTR6EvbHIci7Tc7knj R+tQ== X-Gm-Message-State: AOAM531064rxs8YgRO82nArh3Fkh6a2+3YFCb1NpHbuRteD+XuY4UATA QnqpK6BojkoCeUJgmlR/Qwo= X-Google-Smtp-Source: ABdhPJzVPrwdQMweof3gpkLwzDICMm0P2NfovynBbl+TaTl57leUTJIMbkq691n9dAjbDesSaySr5Q== X-Received: by 2002:a17:90a:a90:: with SMTP id 16mr32207pjw.125.1643055916844; Mon, 24 Jan 2022 12:25:16 -0800 (PST) Received: from edumazet1.svl.corp.google.com ([2620:15c:2c4:201:e903:2adf:9289:9a45]) by smtp.gmail.com with ESMTPSA id c19sm17871115pfv.76.2022.01.24.12.25.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jan 2022 12:25:16 -0800 (PST) From: Eric Dumazet To: "David S . Miller" , Jakub Kicinski Cc: netdev , Eric Dumazet , Eric Dumazet Subject: [PATCH net-next 6/6] ipv4/tcp: do not use per netns ctl sockets Date: Mon, 24 Jan 2022 12:24:57 -0800 Message-Id: <20220124202457.3450198-7-eric.dumazet@gmail.com> X-Mailer: git-send-email 2.35.0.rc0.227.g00780c9af4-goog In-Reply-To: <20220124202457.3450198-1-eric.dumazet@gmail.com> References: <20220124202457.3450198-1-eric.dumazet@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Eric Dumazet TCP ipv4 uses per-cpu/per-netns ctl sockets in order to send RST and some ACK packets (on behalf of TIMEWAIT sockets). This adds memory and cpu costs, which do not seem needed. Now typical servers have 256 or more cores, this adds considerable tax to netns users. tcp sockets are used from BH context, are not receiving packets, and do not store any persistent state but the 'struct net' pointer in order to be able to use IPv4 output functions. Note that I attempted a related change in the past, that had to be hot-fixed in commit bdbbb8527b6f ("ipv4: tcp: get rid of ugly unicast_sock") This patch could very well surface old bugs, on layers not taking care of sk->sk_kern_sock properly. Signed-off-by: Eric Dumazet --- include/net/netns/ipv4.h | 1 - net/ipv4/tcp_ipv4.c | 61 ++++++++++++++++++---------------------- 2 files changed, 27 insertions(+), 35 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 639a31638159b23e7ec1d16f621a7953b885729c..22b4c6df1d2b383cd10dd3dc11cf8c39388c50bf 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -73,7 +73,6 @@ struct netns_ipv4 { struct sock *mc_autojoin_sk; struct inet_peer_base *peers; - struct sock * __percpu *tcp_sk; struct fqdir *fqdir; u8 sysctl_icmp_echo_ignore_all; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 8e94b99882044d3d9927d83512d18f34dc2f5b43..a7d83ceea42076e89862619f4b0cd7ae9277e7de 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -91,6 +91,8 @@ static int tcp_v4_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key *key, struct inet_hashinfo tcp_hashinfo; EXPORT_SYMBOL(tcp_hashinfo); +static DEFINE_PER_CPU(struct sock *, ipv4_tcp_sk); + static u32 tcp_v4_init_seq(const struct sk_buff *skb) { return secure_tcp_seq(ip_hdr(skb)->daddr, @@ -810,7 +812,8 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) arg.tos = ip_hdr(skb)->tos; arg.uid = sock_net_uid(net, sk && sk_fullsock(sk) ? sk : NULL); local_bh_disable(); - ctl_sk = this_cpu_read(*net->ipv4.tcp_sk); + ctl_sk = this_cpu_read(ipv4_tcp_sk); + sock_net_set(ctl_sk, net); if (sk) { ctl_sk->sk_mark = (sk->sk_state == TCP_TIME_WAIT) ? inet_twsk(sk)->tw_mark : sk->sk_mark; @@ -825,6 +828,7 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) transmit_time); ctl_sk->sk_mark = 0; + sock_net_set(ctl_sk, &init_net); __TCP_INC_STATS(net, TCP_MIB_OUTSEGS); __TCP_INC_STATS(net, TCP_MIB_OUTRSTS); local_bh_enable(); @@ -908,7 +912,8 @@ static void tcp_v4_send_ack(const struct sock *sk, arg.tos = tos; arg.uid = sock_net_uid(net, sk_fullsock(sk) ? sk : NULL); local_bh_disable(); - ctl_sk = this_cpu_read(*net->ipv4.tcp_sk); + ctl_sk = this_cpu_read(ipv4_tcp_sk); + sock_net_set(ctl_sk, net); ctl_sk->sk_mark = (sk->sk_state == TCP_TIME_WAIT) ? inet_twsk(sk)->tw_mark : sk->sk_mark; ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ? @@ -921,6 +926,7 @@ static void tcp_v4_send_ack(const struct sock *sk, transmit_time); ctl_sk->sk_mark = 0; + sock_net_set(ctl_sk, &init_net); __TCP_INC_STATS(net, TCP_MIB_OUTSEGS); local_bh_enable(); } @@ -3111,41 +3117,14 @@ EXPORT_SYMBOL(tcp_prot); static void __net_exit tcp_sk_exit(struct net *net) { - int cpu; - if (net->ipv4.tcp_congestion_control) bpf_module_put(net->ipv4.tcp_congestion_control, net->ipv4.tcp_congestion_control->owner); - - for_each_possible_cpu(cpu) - inet_ctl_sock_destroy(*per_cpu_ptr(net->ipv4.tcp_sk, cpu)); - free_percpu(net->ipv4.tcp_sk); } static int __net_init tcp_sk_init(struct net *net) { - int res, cpu, cnt; - - net->ipv4.tcp_sk = alloc_percpu(struct sock *); - if (!net->ipv4.tcp_sk) - return -ENOMEM; - - for_each_possible_cpu(cpu) { - struct sock *sk; - - res = inet_ctl_sock_create(&sk, PF_INET, SOCK_RAW, - IPPROTO_TCP, net); - if (res) - goto fail; - sock_set_flag(sk, SOCK_USE_WRITE_QUEUE); - - /* Please enforce IP_DF and IPID==0 for RST and - * ACK sent in SYN-RECV and TIME-WAIT state. - */ - inet_sk(sk)->pmtudisc = IP_PMTUDISC_DO; - - *per_cpu_ptr(net->ipv4.tcp_sk, cpu) = sk; - } + int cnt; net->ipv4.sysctl_tcp_ecn = 2; net->ipv4.sysctl_tcp_ecn_fallback = 1; @@ -3229,10 +3208,6 @@ static int __net_init tcp_sk_init(struct net *net) net->ipv4.tcp_congestion_control = &tcp_reno; return 0; -fail: - tcp_sk_exit(net); - - return res; } static void __net_exit tcp_sk_exit_batch(struct list_head *net_exit_list) @@ -3324,6 +3299,24 @@ static void __init bpf_iter_register(void) void __init tcp_v4_init(void) { + int cpu, res; + + for_each_possible_cpu(cpu) { + struct sock *sk; + + res = inet_ctl_sock_create(&sk, PF_INET, SOCK_RAW, + IPPROTO_TCP, &init_net); + if (res) + panic("Failed to create the TCP control socket.\n"); + sock_set_flag(sk, SOCK_USE_WRITE_QUEUE); + + /* Please enforce IP_DF and IPID==0 for RST and + * ACK sent in SYN-RECV and TIME-WAIT state. + */ + inet_sk(sk)->pmtudisc = IP_PMTUDISC_DO; + + per_cpu(ipv4_tcp_sk, cpu) = sk; + } if (register_pernet_subsys(&tcp_sk_ops)) panic("Failed to create the TCP control socket.\n");