From patchwork Tue Feb 27 01:10:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 13573043 Received: from smtp-fw-80007.amazon.com (smtp-fw-80007.amazon.com [99.78.197.218]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6EDCA4A21; Tue, 27 Feb 2024 01:11:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.218 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708996286; cv=none; b=J1hDSNefUpOXu66mK4Vq6we8SakPu5h6zUrvOqRNSGpmpI0qrnVepiBiLLOrdLqzQNzQke7RmfZder30l4HnLzWOp3C1ncSd322GTA7QfPtMZxucGwzak1JCb4b73syYbJv6PlHobKa4R0kDpwENPStez73rCzx2DPC4mP4HoZg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708996286; c=relaxed/simple; bh=9fky4NXpr+3ykSW2uH/qdFPPJQ8A2KdjRO58GhtpncM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Xa2vlMYYDUmD6mt+jsEFkEJ5aElqfSg9ll8R4iOQyIIODssAYFN3En1HFt+fe3ESw0mG5lDbVvkKe7ffrUwwiLBHsz7T4NyubvqWlypgQlkNg3p76Wwh++8bOnwyyQeIUtnaw2w/o11Dql+QMdP2SCx0q/6ptIs5kveOpcMe7KY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=e1RxINpv; arc=none smtp.client-ip=99.78.197.218 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="e1RxINpv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1708996286; x=1740532286; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=veGEC1zoLAkeldJklL1xVz9Rq+F2PDiRn3qM0xZssAs=; b=e1RxINpvXnaODEgtmBJkGBWAYSG1OB4Ujtl7GsAlds8r2UEA14UeG5cu uUa45BeiBsPkYEh5YMpeN5k1lCAcC/jWxdV2Sak287FNxMWe4W6NcLtsy nIRkBhrV9IHyWqGdgQ53z98s0rXC+iKvNs1DXpM8NdIZq6zycst7GFjca o=; X-IronPort-AV: E=Sophos;i="6.06,187,1705363200"; d="scan'208";a="276917896" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2024 01:11:23 +0000 Received: from EX19MTAUWA002.ant.amazon.com [10.0.38.20:19212] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.49.110:2525] with esmtp (Farcaster) id 0b049094-5800-4107-a40d-082085f99593; Tue, 27 Feb 2024 01:11:22 +0000 (UTC) X-Farcaster-Flow-ID: 0b049094-5800-4107-a40d-082085f99593 Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWA002.ant.amazon.com (10.250.64.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Tue, 27 Feb 2024 01:11:22 +0000 Received: from 88665a182662.ant.amazon.com.com (10.106.101.48) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 27 Feb 2024 01:11:19 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Allison Henderson CC: Kuniyuki Iwashima , Kuniyuki Iwashima , , , Subject: [PATCH v2 net 1/5] tcp: Restart iteration after removing reqsk in inet_twsk_purge(). Date: Mon, 26 Feb 2024 17:10:37 -0800 Message-ID: <20240227011041.97375-2-kuniyu@amazon.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240227011041.97375-1-kuniyu@amazon.com> References: <20240227011041.97375-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D043UWC002.ant.amazon.com (10.13.139.222) To EX19D004ANA001.ant.amazon.com (10.37.240.138) Commit 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in inet_twsk_purge()") added changes in inet_twsk_purge() to purge reqsk in per-netns ehash during netns dismantle. inet_csk_reqsk_queue_drop_and_put() will remove reqsk from per-netns ehash, but the iteration uses sk_nulls_for_each_rcu(), which is not safe. After removing reqsk, we need to restart iteration. Note that we need not check net->ns.count here because per-netns ehash does not have reqsk in other live netns. This change will be removed by the following patch. Fixes: 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in inet_twsk_purge()") Reported-by: Eric Dumazet Signed-off-by: Kuniyuki Iwashima --- net/ipv4/inet_timewait_sock.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 5befa4de5b24..00cbebaa2c68 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -287,6 +287,8 @@ void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family) struct request_sock *req = inet_reqsk(sk); inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, req); + + goto restart; } continue; From patchwork Tue Feb 27 01:10:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 13573044 Received: from smtp-fw-52003.amazon.com (smtp-fw-52003.amazon.com [52.119.213.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C8134431; Tue, 27 Feb 2024 01:11:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708996318; cv=none; b=geW07KZ6o8PElMKWU5jBhQ5jrEMBHbawpaDXQueae6njxbkfa1fSON/KOud74fok5KDub/6JevB8ejEywtKHue/6xPWQHSB7e5irtTUoxFXsWlCkLa2PdH85htFsOvwEpAIMC+QWOFTuNNymG8LotDVbrk6aqfPcfYXm8++Y30k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708996318; c=relaxed/simple; bh=vay2VEAyTqvCkGswPS4QOIoBT2b0IYWYIZGp/yNIJnw=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=B4A+jPhFsPTm2nk/xF9qd4QuEULUtCI//sgc5YWEMPpEcUQyfy+3CL315shw+bsBlSTEBF9LypZIFpsymsb5or0R3A/r4AHmlfO/8P+4IYB7Z+zmlGMzWO2WyohJ4CzD7WbSriMg3avvaHKtbsGb1pjEBoKG5i2HKJWqeuK2U9g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=dSqDarq5; arc=none smtp.client-ip=52.119.213.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="dSqDarq5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1708996314; x=1740532314; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vLWlDHaOzS3uEVenflKacTWEx1B/0qcpvGK1bRwXcjQ=; b=dSqDarq57tWc9Hf0a4kJohfn0Fj4OI7ABn/RUEzosWWPWTNjlV74uh/p xcme/4BqkZjrBTFd3oM1VXKkeHHgdavOGTika6GHAGt5yCFn0RV/wLVR3 +SMauOoZKC4dnGYYXjwBLSurL3+2wNERdbcjxGcHM1A++CkWvRC8Y39zT k=; X-IronPort-AV: E=Sophos;i="6.06,187,1705363200"; d="scan'208";a="640634277" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52003.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2024 01:11:51 +0000 Received: from EX19MTAUWC001.ant.amazon.com [10.0.21.151:55504] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.29.27:2525] with esmtp (Farcaster) id 8f1355e5-1eae-4fb8-a862-6aa5c5b499ba; Tue, 27 Feb 2024 01:11:50 +0000 (UTC) X-Farcaster-Flow-ID: 8f1355e5-1eae-4fb8-a862-6aa5c5b499ba Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWC001.ant.amazon.com (10.250.64.174) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Tue, 27 Feb 2024 01:11:47 +0000 Received: from 88665a182662.ant.amazon.com.com (10.106.101.48) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 27 Feb 2024 01:11:44 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Allison Henderson CC: Kuniyuki Iwashima , Kuniyuki Iwashima , , , Subject: [PATCH v2 net 2/5] Revert "tcp: Clean up kernel listener's reqsk in inet_twsk_purge()" Date: Mon, 26 Feb 2024 17:10:38 -0800 Message-ID: <20240227011041.97375-3-kuniyu@amazon.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240227011041.97375-1-kuniyu@amazon.com> References: <20240227011041.97375-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D039UWB004.ant.amazon.com (10.13.138.57) To EX19D004ANA001.ant.amazon.com (10.37.240.138) This reverts commit 740ea3c4a0b2e326b23d7cdf05472a0e92aa39bc. The change actually fixed a use-after-free of struct net by kernel listener's reqsk in per-netns ehash. However, the fix was incomplete, as the same issue exists for the global ehash. We should have fixed it on the RDS side without slowing down netns dismantle for the normal TCP use case. The following patches will fix the issue on the RDS side. Fixes: 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in inet_twsk_purge()") Signed-off-by: Kuniyuki Iwashima --- net/ipv4/inet_timewait_sock.c | 16 +--------------- net/ipv4/tcp_minisocks.c | 9 ++++----- 2 files changed, 5 insertions(+), 20 deletions(-) diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 00cbebaa2c68..6b65f5f97478 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -277,22 +277,8 @@ void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family) rcu_read_lock(); restart: sk_nulls_for_each_rcu(sk, node, &head->chain) { - if (sk->sk_state != TCP_TIME_WAIT) { - /* A kernel listener socket might not hold refcnt for net, - * so reqsk_timer_handler() could be fired after net is - * freed. Userspace listener and reqsk never exist here. - */ - if (unlikely(sk->sk_state == TCP_NEW_SYN_RECV && - hashinfo->pernet)) { - struct request_sock *req = inet_reqsk(sk); - - inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, req); - - goto restart; - } - + if (sk->sk_state != TCP_TIME_WAIT) continue; - } tw = inet_twsk(sk); if ((tw->tw_family != family) || diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 9e85f2a0bddd..baecfa4c70ef 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -394,14 +394,13 @@ void tcp_twsk_purge(struct list_head *net_exit_list, int family) struct net *net; list_for_each_entry(net, net_exit_list, exit_list) { + /* The last refcount is decremented in tcp_sk_exit_batch() */ + if (refcount_read(&net->ipv4.tcp_death_row.tw_refcount) == 1) + continue; + if (net->ipv4.tcp_death_row.hashinfo->pernet) { - /* Even if tw_refcount == 1, we must clean up kernel reqsk */ inet_twsk_purge(net->ipv4.tcp_death_row.hashinfo, family); } else if (!purged_once) { - /* The last refcount is decremented in tcp_sk_exit_batch() */ - if (refcount_read(&net->ipv4.tcp_death_row.tw_refcount) == 1) - continue; - inet_twsk_purge(&tcp_hashinfo, family); purged_once = true; } From patchwork Tue Feb 27 01:10:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 13573045 Received: from smtp-fw-52004.amazon.com (smtp-fw-52004.amazon.com [52.119.213.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 03D784A24; Tue, 27 Feb 2024 01:12:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.154 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708996339; cv=none; b=khsGHNApgKqM4IdA7/3Ak8OiYvbXgJeURkCqkeDJS2kZzs/J+OQdj+ZxiVWK5BsjGqnJIaBh9SlnwvQL+JQbmjODMBIVj+V0wK5u6xMWuBP92pVaGOL0d25W515Oapew9J/DQyyr46Qxar3C8rmHKtYAJAM6V7jF8sbBcgM+gd0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708996339; c=relaxed/simple; bh=faYedEqqzFxcdz9VkTACgPVlcakLjAUUrrr2nYoBtKE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=N30RG/HUAirR08NGuWlvulHTxY+xXoIrcX5lIf0vfY8GdFKn3CWlqX41ncGAznwyVPInzE2re13V4r0nHdLPHfUCpHrY05vTmvgnXo4Xl1QRdjNzx/4MBqKLBJYo8KSvre0IHIXioMp7TF08KsoFZiV7K0DwFQYNc9WZ0BRSkyQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=TgL8J3Wf; arc=none smtp.client-ip=52.119.213.154 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="TgL8J3Wf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1708996338; x=1740532338; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ll8HhLh8iJugOBzG9xD1stZ58xcqI4hDqWg8P+gNWp8=; b=TgL8J3Wf7CZd+1lhaii/up4SCe2Qu5mEJI7hJvkH540FeUW505Pr6JCJ Ojj0bY4L9JQEBW3kMZfy2q6fl0cF8mjJF0ygFK+OK/v2FsE0+ywqDws2L LK6H5llHLbevUMhzGauCGBGkXGVWjhcPGTdaEUKPh8qk6BzTxTR9pj8S8 w=; X-IronPort-AV: E=Sophos;i="6.06,187,1705363200"; d="scan'208";a="187574065" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-52004.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2024 01:12:15 +0000 Received: from EX19MTAUWA002.ant.amazon.com [10.0.38.20:21469] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.60.172:2525] with esmtp (Farcaster) id 8c984869-3474-471e-8ea3-57733d147b9c; Tue, 27 Feb 2024 01:12:13 +0000 (UTC) X-Farcaster-Flow-ID: 8c984869-3474-471e-8ea3-57733d147b9c Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWA002.ant.amazon.com (10.250.64.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Tue, 27 Feb 2024 01:12:13 +0000 Received: from 88665a182662.ant.amazon.com.com (10.106.101.48) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 27 Feb 2024 01:12:10 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Allison Henderson CC: Kuniyuki Iwashima , Kuniyuki Iwashima , , , Subject: [PATCH v2 net 3/5] net: Convert @kern of __sock_create() to enum. Date: Mon, 26 Feb 2024 17:10:39 -0800 Message-ID: <20240227011041.97375-4-kuniyu@amazon.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240227011041.97375-1-kuniyu@amazon.com> References: <20240227011041.97375-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D044UWA004.ant.amazon.com (10.13.139.7) To EX19D004ANA001.ant.amazon.com (10.37.240.138) Historically, syzbot has reported many use-after-free of struct net by kernel sockets. In most cases, the root cause was a timer kicked by a kernel socket which does not hold netns refcount nor clean it up during netns dismantle. This patch converts the @kern argument of __sock_create() to enum so that we can pass SOCKET_KERN_NET_REF and later sk_alloc() can hold refcount of net for kernel sockets. We pass !!kern to security_socket(_post)?_create() but kern as is to pf->create() because 3 functions (atalk_create(), inet_create(), inet6_create()) use it for the following check: if (sock->type == SOCK_RAW && !kern && !capable(CAP_NET_RAW)) The conversion for rest of the callers of __sock_create() and sk_alloc() will be completed in net-next.git as the change is too large to backport. Signed-off-by: Kuniyuki Iwashima --- include/linux/net.h | 6 ++++++ net/core/sock.c | 2 +- net/socket.c | 11 ++++++----- 3 files changed, 13 insertions(+), 6 deletions(-) diff --git a/include/linux/net.h b/include/linux/net.h index c9b4a63791a4..62ef0954be75 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -245,6 +245,12 @@ enum { SOCK_WAKE_URG, }; +enum socket_user { + SOCKET_USER, + SOCKET_KERN, + SOCKET_KERN_NET_REF, +}; + int sock_wake_async(struct socket_wq *sk_wq, int how, int band); int sock_register(const struct net_proto_family *fam); void sock_unregister(int family); diff --git a/net/core/sock.c b/net/core/sock.c index 5e78798456fd..6f417cdbcf50 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2138,7 +2138,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority, sk->sk_prot = sk->sk_prot_creator = prot; sk->sk_kern_sock = kern; sock_lock_init(sk); - sk->sk_net_refcnt = kern ? 0 : 1; + sk->sk_net_refcnt = kern != SOCKET_KERN; if (likely(sk->sk_net_refcnt)) { get_net_track(net, &sk->ns_tracker, priority); sock_inuse_add(net, 1); diff --git a/net/socket.c b/net/socket.c index ed3df2f749bf..f5ec613d9e3b 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1489,7 +1489,7 @@ EXPORT_SYMBOL(sock_wake_async); * @type: communication type (SOCK_STREAM, ...) * @protocol: protocol (0, ...) * @res: new socket - * @kern: boolean for kernel space sockets + * @kern: enum for kernel space sockets * * Creates a new socket and assigns it to @res, passing through LSM. * Returns 0 or an error. On failure @res is set to %NULL. @kern must @@ -1523,7 +1523,7 @@ int __sock_create(struct net *net, int family, int type, int protocol, family = PF_PACKET; } - err = security_socket_create(family, type, protocol, kern); + err = security_socket_create(family, type, protocol, !!kern); if (err) return err; @@ -1584,7 +1584,7 @@ int __sock_create(struct net *net, int family, int type, int protocol, * module can have its refcnt decremented */ module_put(pf->owner); - err = security_socket_post_create(sock, family, type, protocol, kern); + err = security_socket_post_create(sock, family, type, protocol, !!kern); if (err) goto out_sock_release; *res = sock; @@ -1619,7 +1619,8 @@ EXPORT_SYMBOL(__sock_create); int sock_create(int family, int type, int protocol, struct socket **res) { - return __sock_create(current->nsproxy->net_ns, family, type, protocol, res, 0); + return __sock_create(current->nsproxy->net_ns, family, type, protocol, + res, SOCKET_USER); } EXPORT_SYMBOL(sock_create); @@ -1637,7 +1638,7 @@ EXPORT_SYMBOL(sock_create); int sock_create_kern(struct net *net, int family, int type, int protocol, struct socket **res) { - return __sock_create(net, family, type, protocol, res, 1); + return __sock_create(net, family, type, protocol, res, SOCKET_KERN); } EXPORT_SYMBOL(sock_create_kern); From patchwork Tue Feb 27 01:10:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 13573046 Received: from smtp-fw-6002.amazon.com (smtp-fw-6002.amazon.com [52.95.49.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C65A22916; Tue, 27 Feb 2024 01:12:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.95.49.90 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708996365; cv=none; b=oPt45iJEckVfpE/BbFHtWzg3KQCQLC5FEcR4pnZ+cUtEcm+UyzDPHGFLo9uSqPEslaEleJYzMcH8EQyJrdkUKfmp5/q4gTXmdl3lX9eTv/Gfq1BFYPP/mENrCB4udFnyh093C74oftnL2rJPtcnWZ4ZW6gKmdMcMsu2E+gnm0HQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708996365; c=relaxed/simple; bh=hEhfX717neP/XbOXoFNKl1HdG+lCiHfzvSniaM1QvvA=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hcOnj2AZ1Q8FkCeejYme4y0dDdXJJWwK9N+kpFNRjdIC5ZzgdwQhPATMIc3engEhId7LfkJlWrzkwEFb/sgsDLlgJqX7bkR1GxJ/fFLuNTsFM33FDrmWQ7wz/n0pKsQUxI6UgFl6Qh6ceW9RZKAr3VcuLfzQ+W6ELphiJQd7e2U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=d9kEY3Eo; arc=none smtp.client-ip=52.95.49.90 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="d9kEY3Eo" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1708996365; x=1740532365; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TM92/DPMiES4drf9veDP2tNXPn/X3QYPrxvKLaPyVFs=; b=d9kEY3Eoavc1ZdzYQiZuN0F8p6UPTrg9BtwPOozG+vMJwiSOShGOn8c/ kFZECqbQhlbD2X39PHtKHj8R94FD/tSt6j6LOVqoQM3pWC1e4PuH7Ry6a 0qie2VSWpYff6u6ukCjn+Kmz0o2LDzcxj5AglvU9RafLRWTqaGNa8Px7a 0=; X-IronPort-AV: E=Sophos;i="6.06,187,1705363200"; d="scan'208";a="389307606" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-6002.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2024 01:12:41 +0000 Received: from EX19MTAUWC002.ant.amazon.com [10.0.21.151:63924] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.30.125:2525] with esmtp (Farcaster) id 83209d15-66dd-448b-b863-dd8208889c82; Tue, 27 Feb 2024 01:12:40 +0000 (UTC) X-Farcaster-Flow-ID: 83209d15-66dd-448b-b863-dd8208889c82 Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWC002.ant.amazon.com (10.250.64.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Tue, 27 Feb 2024 01:12:38 +0000 Received: from 88665a182662.ant.amazon.com.com (10.106.101.48) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.28; Tue, 27 Feb 2024 01:12:35 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Allison Henderson CC: Kuniyuki Iwashima , Kuniyuki Iwashima , , , , syzkaller Subject: [PATCH v2 net 4/5] rds: tcp: Fix use-after-free of net in reqsk_timer_handler(). Date: Mon, 26 Feb 2024 17:10:40 -0800 Message-ID: <20240227011041.97375-5-kuniyu@amazon.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240227011041.97375-1-kuniyu@amazon.com> References: <20240227011041.97375-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D037UWC004.ant.amazon.com (10.13.139.254) To EX19D004ANA001.ant.amazon.com (10.37.240.138) syzkaller reported a warning of netns tracker [0] followed by KASAN splat [1] and another ref tracker warning [1]. syzkaller could not find a repro, but in the log, the only suspicious sequence was as follows: 18:26:22 executing program 1: r0 = socket$inet6_mptcp(0xa, 0x1, 0x106) ... connect$inet6(r0, &(0x7f0000000080)={0xa, 0x4001, 0x0, @loopback}, 0x1c) (async) The notable thing here is 0x4001 in connect(), which is RDS_TCP_PORT. So, the scenario would be: 1. unshare(CLONE_NEWNET) creates a per netns tcp listener in rds_tcp_listen_init(). 2. syz-executor connect()s to it and creates a reqsk. 3. syz-executor exit()s immediately. 4. netns is dismantled. [0] 5. reqsk timer is fired, and UAF happens while freeing reqsk. [1] 6. listener is freed after RCU grace period. [2] Basically, reqsk assumes that the listener guarantees netns safety until all reqsk timers are expired by holding the listener's refcount. However, this was not the case for kernel sockets. Commit 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in inet_twsk_purge()") fixed this issue only for per-netns ehash, but the issue still exists for the global ehash. We can apply the same fix, but this issue is specific to RDS. Instead of iterating ehash and purging reqsk during netns dismantle, let's hold netns refcount for the kernel listener. [0]: ref_tracker: net notrefcnt@0000000065449cc3 has 1/1 users at sk_alloc (./include/net/net_namespace.h:337 net/core/sock.c:2146) inet6_create (net/ipv6/af_inet6.c:192 net/ipv6/af_inet6.c:119) __sock_create (net/socket.c:1572) rds_tcp_listen_init (net/rds/tcp_listen.c:279) rds_tcp_init_net (net/rds/tcp.c:577) ops_init (net/core/net_namespace.c:137) setup_net (net/core/net_namespace.c:340) copy_net_ns (net/core/net_namespace.c:497) create_new_namespaces (kernel/nsproxy.c:110) unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4)) ksys_unshare (kernel/fork.c:3429) __x64_sys_unshare (kernel/fork.c:3496) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) ... WARNING: CPU: 0 PID: 27 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179) [1]: BUG: KASAN: slab-use-after-free in inet_csk_reqsk_queue_drop (./include/net/inet_hashtables.h:180 net/ipv4/inet_connection_sock.c:952 net/ipv4/inet_connection_sock.c:966) Read of size 8 at addr ffff88801b370400 by task swapper/0/0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) print_report (mm/kasan/report.c:378 mm/kasan/report.c:488) kasan_report (mm/kasan/report.c:603) inet_csk_reqsk_queue_drop (./include/net/inet_hashtables.h:180 net/ipv4/inet_connection_sock.c:952 net/ipv4/inet_connection_sock.c:966) reqsk_timer_handler (net/ipv4/inet_connection_sock.c:979 net/ipv4/inet_connection_sock.c:1092) call_timer_fn (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/timer.h:127 kernel/time/timer.c:1701) __run_timers.part.0 (kernel/time/timer.c:1752 kernel/time/timer.c:2038) run_timer_softirq (kernel/time/timer.c:2053) __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554) irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 14)) Allocated by task 258 on cpu 0 at 83.612050s: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (mm/kasan/common.c:68) __kasan_slab_alloc (mm/kasan/common.c:343) kmem_cache_alloc (mm/slub.c:3813 mm/slub.c:3860 mm/slub.c:3867) copy_net_ns (./include/linux/slab.h:701 net/core/net_namespace.c:421 net/core/net_namespace.c:480) create_new_namespaces (kernel/nsproxy.c:110) unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4)) ksys_unshare (kernel/fork.c:3429) __x64_sys_unshare (kernel/fork.c:3496) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) Freed by task 27 on cpu 0 at 329.158864s: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (mm/kasan/common.c:68) kasan_save_free_info (mm/kasan/generic.c:643) __kasan_slab_free (mm/kasan/common.c:265) kmem_cache_free (mm/slub.c:4299 mm/slub.c:4363) cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:446 net/core/net_namespace.c:639) process_one_work (kernel/workqueue.c:2638) worker_thread (kernel/workqueue.c:2700 kernel/workqueue.c:2787) kthread (kernel/kthread.c:388) ret_from_fork (arch/x86/kernel/process.c:153) ret_from_fork_asm (arch/x86/entry/entry_64.S:250) The buggy address belongs to the object at ffff88801b370000 which belongs to the cache net_namespace of size 4352 The buggy address is located 1024 bytes inside of freed 4352-byte region [ffff88801b370000, ffff88801b371100) [2]: WARNING: CPU: 0 PID: 95 at lib/ref_tracker.c:228 ref_tracker_free (lib/ref_tracker.c:228 (discriminator 1)) Modules linked in: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:ref_tracker_free (lib/ref_tracker.c:228 (discriminator 1)) ... Call Trace: __sk_destruct (./include/net/net_namespace.h:353 net/core/sock.c:2204) rcu_core (./arch/x86/include/asm/preempt.h:26 kernel/rcu/tree.c:2165 kernel/rcu/tree.c:2433) __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554) irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 14)) Reported-by: syzkaller Suggested-by: Eric Dumazet Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") Signed-off-by: Kuniyuki Iwashima --- net/rds/tcp_listen.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c index 05008ce5c421..2d40e523322c 100644 --- a/net/rds/tcp_listen.c +++ b/net/rds/tcp_listen.c @@ -274,8 +274,8 @@ struct socket *rds_tcp_listen_init(struct net *net, bool isv6) int addr_len; int ret; - ret = sock_create_kern(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM, - IPPROTO_TCP, &sock); + ret = __sock_create(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM, + IPPROTO_TCP, &sock, SOCKET_KERN_NET_REF); if (ret < 0) { rdsdebug("could not create %s listener socket: %d\n", isv6 ? "IPv6" : "IPv4", ret); From patchwork Tue Feb 27 01:10:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 13573047 Received: from smtp-fw-52004.amazon.com (smtp-fw-52004.amazon.com [52.119.213.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2A13D4A21; Tue, 27 Feb 2024 01:13:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.154 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708996388; cv=none; b=LuzFxFVP/+4AYKf8ottazhgzbBe5LtRXCCG/sxqIWY2pqSHznsoyuBYoZzqqEE/MgIgfqEIexxtxW3FKoySOej0/V9E5u9Y57XyUNHZDKuk3sxsTQH4SnhSLtyYsYhuNWydhALdvkJ4h0gpszlcWiOoV9vZktdquPqmdgj8uRHk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708996388; c=relaxed/simple; bh=rjDwHqLTMJJ+cqWoqsmMZdFyD7O0WVCXFw75kbvv13g=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=m3ATAnIUAQNxjKMOHW3lm60TzJYpREtgzSVhJsRjvCvukBPMOP+NGfrKrodWqoVnf4NotOnKwq0v+Uy0C8Fq++CwT2srkLQHmEuF2mg5VZD9U3mugeRxE7ifWXuYQLrrAfQ16qVbQHRoDA+BDIgQspstd/74eDXS0QDPKpCne/U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=IVAFF0Hz; arc=none smtp.client-ip=52.119.213.154 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="IVAFF0Hz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1708996387; x=1740532387; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cYQeMcglbfCsqO+n2FbujJNC86HX4oA1mykhbxRwasM=; b=IVAFF0Hzp94AWhzx/Wwn/E8sxk1m6QkSFeeEqDktB5JxBylP9iyG+d0i JXFDu4VkH3O9Yq+sNMrvRtXweIYw64JXoqDr2YZb4MVlsUG5YVHIWoP6x zoNZvXso7x6sW3S4aWBNTSRL2GlXIZjPolIQAGeH7XBE6LsDByKLmVik4 8=; X-IronPort-AV: E=Sophos;i="6.06,187,1705363200"; d="scan'208";a="187574180" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-52004.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2024 01:13:06 +0000 Received: from EX19MTAUWC002.ant.amazon.com [10.0.21.151:20027] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.29.27:2525] with esmtp (Farcaster) id ec880b31-c8da-499e-ae0d-5b40799d14aa; Tue, 27 Feb 2024 01:13:05 +0000 (UTC) X-Farcaster-Flow-ID: ec880b31-c8da-499e-ae0d-5b40799d14aa Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWC002.ant.amazon.com (10.250.64.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Tue, 27 Feb 2024 01:13:03 +0000 Received: from 88665a182662.ant.amazon.com.com (10.106.101.48) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 27 Feb 2024 01:13:00 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Allison Henderson CC: Kuniyuki Iwashima , Kuniyuki Iwashima , , , Subject: [PATCH v2 net 5/5] tcp: Add assertion for reqsk->rsk_listener->sk_net_refcnt. Date: Mon, 26 Feb 2024 17:10:41 -0800 Message-ID: <20240227011041.97375-6-kuniyu@amazon.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240227011041.97375-1-kuniyu@amazon.com> References: <20240227011041.97375-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D033UWA004.ant.amazon.com (10.13.139.85) To EX19D004ANA001.ant.amazon.com (10.37.240.138) syzbot demonstrated that a reqsk timer could be fired after netns dismantle if the timer was kicked by kernel TCP listener. Regardless of the owner of the socket, TCP listener always has to hold netns refcount. Let's make sure that new user will not create kernel TCP listener without holding netns refcount. Suggested-by: Eric Dumazet Signed-off-by: Kuniyuki Iwashima --- net/ipv4/tcp_input.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index df7b13f0e5e0..341dd5bb3fd1 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -6972,6 +6972,8 @@ struct request_sock *inet_reqsk_alloc(const struct request_sock_ops *ops, if (req) { struct inet_request_sock *ireq = inet_rsk(req); + DEBUG_NET_WARN_ON_ONCE(!sk_listener->sk_net_refcnt); + ireq->ireq_opt = NULL; #if IS_ENABLED(CONFIG_IPV6) ireq->pktopts = NULL;