From patchwork Fri Jan 10 13:21:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Luczaj X-Patchwork-Id: 13934628 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mailtransmit05.runbox.com (mailtransmit05.runbox.com [185.226.149.38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74A6A20D4FB; Fri, 10 Jan 2025 13:24:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.226.149.38 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736515450; cv=none; b=jpceA2vAY6sZOcotjfyPcpuqZgR9gl9njBzJvrBz2cdaGgLVhu+x0Ls+zoej5tBn36H0dljopS2vWpRv6zoa7ojcZHjSBSwFHSZgEICmsFdRkACw9zT8Nu86rPQuNxMGscP953NzwZjGIYOBo/Wlrm5gblXFyw4OgNv1dRQB8dM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736515450; c=relaxed/simple; bh=clT/8RxDgjuG9ypKds+RfpuFO31oP95DIkCEURjcrmM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=j+Nxa499LV4UV1+BdVxPFyC3RSfYa85XuMgPYtfwaud1KKp+zxYhOsACD40jURq9ZmTRvrj/tb5iKqRsnTpgP/cUVYImTWBLjIUUIk3YPzwdPYlWK1tG2j0WnHnNE3BsoogMfzpXYgncCkXi1kQ2KJCN0czue982Pp8kYgrwE38= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=rbox.co; spf=pass smtp.mailfrom=rbox.co; dkim=pass (2048-bit key) header.d=rbox.co header.i=@rbox.co header.b=d4EPS6jg; arc=none smtp.client-ip=185.226.149.38 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=rbox.co Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=rbox.co Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=rbox.co header.i=@rbox.co header.b="d4EPS6jg" Received: from mailtransmit03.runbox ([10.9.9.163] helo=aibo.runbox.com) by mailtransmit05.runbox.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1tWEzO-005xC4-Oy; Fri, 10 Jan 2025 14:23:50 +0100 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=rbox.co; s=selector2; h=Cc:To:Message-Id:Content-Transfer-Encoding:Content-Type: MIME-Version:Subject:Date:From; bh=FcyrAET8neQquEGrAHiIKKnyo+Au3cIC0L4fmdBd8AA=; b=d4EPS6jgaiE0cjc69P/6q7NbF9 qdUPQjNwjYcwfS55oe3+sQer3uTb9Obq80zKg9vNB8MfhF0ANiSSba9dQO2Rwfxb5xRbKRgwoFswS 7aWxe+Et/w4sMGWkFObjP0jWsOFFx++zN/b7h3yje2y/4G10i3TyF8rKl6mwoS/8DSWhXpBS5DxAO 9ob3fvmMSCdzTXZjkgFmrnj5QRsLwaeWiSnqXbhIG9rVj5nmneoXKU3c5/7d3COswiiuA2HSBRtf5 Q7Bc67Rw5DNBJTV2TEkD+h28bL6pS8LdccKOKioTRCGu0mzM8gNQaJiotOrvPC+lRNfYG/BfbW0Ep OiHGcwOQ==; Received: from [10.9.9.74] (helo=submission03.runbox) by mailtransmit03.runbox with esmtp (Exim 4.86_2) (envelope-from ) id 1tWEzN-0002wu-MB; Fri, 10 Jan 2025 14:23:49 +0100 Received: by submission03.runbox with esmtpsa [Authenticated ID (604044)] (TLS1.2:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.93) id 1tWEzF-00DQf8-6f; Fri, 10 Jan 2025 14:23:41 +0100 From: Michal Luczaj Date: Fri, 10 Jan 2025 14:21:55 +0100 Subject: [PATCH bpf] bpf: Fix bpf_sk_select_reuseport() memory leak Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250110-reuseport-memleak-v1-1-fa1ddab0adfe@rbox.co> X-B4-Tracking: v=1; b=H4sIAPIegWcC/x3MSwqEMBBF0a1IjQ0kgp/0VhoHUZ9a+AsVFUHcu 8HhGdx7U4AwAv2SmwQnB97WCJMm1I5uHaC4i6ZMZ7k2RivBEeA32dWCZYablHVtkVdl01nrKHZ e0PP1Pf/U+J7q53kBfo1BTmgAAAA= X-Change-ID: 20250110-reuseport-memleak-9ac6587bd99a To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Mykola Lysenko , Shuah Khan , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Jakub Sitnicki Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, Michal Luczaj X-Mailer: b4 0.14.2 X-Patchwork-Delegate: bpf@iogearbox.net As pointed out in the original comment, lookup in sockmap can return a TCP ESTABLISHED socket. Such TCP socket may have had SO_ATTACH_REUSEPORT_EBPF set before it was ESTABLISHED. In other words, a non-NULL sk_reuseport_cb does not imply a non-refcounted socket. Drop sk's reference in both error paths. unreferenced object 0xffff888101911800 (size 2048): comm "test_progs", pid 44109, jiffies 4297131437 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 80 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 9336483b): __kmalloc_noprof+0x3bf/0x560 __reuseport_alloc+0x1d/0x40 reuseport_alloc+0xca/0x150 reuseport_attach_prog+0x87/0x140 sk_reuseport_attach_bpf+0xc8/0x100 sk_setsockopt+0x1181/0x1990 do_sock_setsockopt+0x12b/0x160 __sys_setsockopt+0x7b/0xc0 __x64_sys_setsockopt+0x1b/0x30 do_syscall_64+0x93/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: 64d85290d79c ("bpf: Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH") Signed-off-by: Michal Luczaj --- net/core/filter.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) --- base-commit: 1f6ff8756091d99b7e5fa556abc0465328302b8b change-id: 20250110-reuseport-memleak-9ac6587bd99a Best regards, diff --git a/net/core/filter.c b/net/core/filter.c index 834614071727ab92cee759dc788ec2ee6f92284b..2fb45a86f3ddbffa9fd55885d4c4c0d8c3a6c381 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -11251,6 +11251,7 @@ BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern, bool is_sockarray = map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY; struct sock_reuseport *reuse; struct sock *selected_sk; + int err; selected_sk = map->ops->map_lookup_elem(map, key); if (!selected_sk) @@ -11258,10 +11259,6 @@ BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern, reuse = rcu_dereference(selected_sk->sk_reuseport_cb); if (!reuse) { - /* Lookup in sock_map can return TCP ESTABLISHED sockets. */ - if (sk_is_refcounted(selected_sk)) - sock_put(selected_sk); - /* reuseport_array has only sk with non NULL sk_reuseport_cb. * The only (!reuse) case here is - the sk has already been * unhashed (e.g. by close()), so treat it as -ENOENT. @@ -11269,24 +11266,33 @@ BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern, * Other maps (e.g. sock_map) do not provide this guarantee and * the sk may never be in the reuseport group to begin with. */ - return is_sockarray ? -ENOENT : -EINVAL; + err = is_sockarray ? -ENOENT : -EINVAL; + goto error; } if (unlikely(reuse->reuseport_id != reuse_kern->reuseport_id)) { struct sock *sk = reuse_kern->sk; - if (sk->sk_protocol != selected_sk->sk_protocol) - return -EPROTOTYPE; - else if (sk->sk_family != selected_sk->sk_family) - return -EAFNOSUPPORT; - - /* Catch all. Likely bound to a different sockaddr. */ - return -EBADFD; + if (sk->sk_protocol != selected_sk->sk_protocol) { + err = -EPROTOTYPE; + } else if (sk->sk_family != selected_sk->sk_family) { + err = -EAFNOSUPPORT; + } else { + /* Catch all. Likely bound to a different sockaddr. */ + err = -EBADFD; + } + goto error; } reuse_kern->selected_sk = selected_sk; return 0; +error: + /* Lookup in sock_map can return TCP ESTABLISHED sockets. */ + if (sk_is_refcounted(selected_sk)) + sock_put(selected_sk); + + return err; } static const struct bpf_func_proto sk_select_reuseport_proto = {