diff mbox series

[net-next,v3,2/4] net: socket: add sockopts blacklist for BPF cgroup hook

Message ID 20230411104231.160837-3-aleksandr.mikhalitsyn@canonical.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Add SCM_PIDFD and SO_PEERPIDFD | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 22 this patch: 22
netdev/cc_maintainers warning 1 maintainers not CCed: bpf@vger.kernel.org
netdev/build_clang fail Errors and warnings before: 22 this patch: 22
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 22 this patch: 22
netdev/checkpatch warning WARNING: line length of 89 exceeds 80 columns WARNING: networking block comments don't use an empty /* line, use /* Comment...
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline fail Was 0 now: 1

Commit Message

Alexander Mikhalitsyn April 11, 2023, 10:42 a.m. UTC
During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
that bpf cgroup hook can cause FD leaks when used with sockopts which
install FDs into the process fdtable.

After some offlist discussion it was proposed to add a blacklist of
socket options those can cause troubles when BPF cgroup hook is enabled.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
 net/socket.c | 38 +++++++++++++++++++++++++++++++++++---
 1 file changed, 35 insertions(+), 3 deletions(-)

Comments

Christian Brauner April 11, 2023, 3:53 p.m. UTC | #1
On Tue, Apr 11, 2023 at 12:42:29PM +0200, Alexander Mikhalitsyn wrote:
> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> that bpf cgroup hook can cause FD leaks when used with sockopts which
> install FDs into the process fdtable.
> 
> After some offlist discussion it was proposed to add a blacklist of
> socket options those can cause troubles when BPF cgroup hook is enabled.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> Cc: Lennart Poettering <mzxreary@0pointer.de>
> Cc: linux-kernel@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Suggested-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---

Just some background for kicks

A crucial point for SO_PEERPIDFD is the allocation of a pidfd and to
place it into the optval buffer for userspace to retrieve.

The way we orginally envisioned this working is by splitting this into
an fd and file allocation phase, then get past the point of failure in
sockopt processing and then call fd_install(fd, pidfd_file).

While looking at this I realized that there's a generic problem in this
code:

	if (level == SOL_SOCKET)
		err = sock_getsockopt(sock, level, optname, optval, optlen);
	else if (unlikely(!sock->ops->getsockopt))
	        err = -EOPNOTSUPP;
	else
	        err = sock->ops->getsockopt(sock, level, optname, optval, optlen);
	
	if (!in_compat_syscall())
	        err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname, optval, optlen, max_optlen, err);

That BPF_CGROUP_RUN_PROG_GETSOCKOPT hook can fail after getsockopt
itself succeeded. So anything that places an fd into optval risks
leaking an fd into the caller's fdtable if the bpf hook fails.

If we do a pidfd_create() and place the pidfd into the optval buffer the
the bpf hook could reasonably interact with the fd but if it fails the
fd would be leaked. It could clean this up calling close_fd() but it
would be ugly since the fd has already been visible in the caller's
fdtable. Someone might've snatched it already even.

If we delay installing the fd and file past the bpf hook then the fd is
meaningless for the bpf hook but we wouldn't risk leaking the fd.

It should also be noted that the hook does a copy_from_user() on the
optval right after the prior getsockopt did a copy_to_user() into that
optval. This is not just racy it's also a bit wasteful. Userspace could
try to retrieve the optval and then copy another value over it so that
the bpf hook operates on another value than getsockopt originally placed
into optval. If the bpf hook wants to care about fd resources in the
future it should probably be passed the allocated struct file.

This should be addressed separately though. The solution here works for
me,

Acked-by: Christian Brauner <brauner@kernel.org>

>  net/socket.c | 38 +++++++++++++++++++++++++++++++++++---
>  1 file changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/net/socket.c b/net/socket.c
> index 73e493da4589..9c1ef11de23f 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -108,6 +108,8 @@
>  #include <linux/ptp_clock_kernel.h>
>  #include <trace/events/sock.h>
>  
> +#include <linux/sctp.h>
> +
>  #ifdef CONFIG_NET_RX_BUSY_POLL
>  unsigned int sysctl_net_busy_read __read_mostly;
>  unsigned int sysctl_net_busy_poll __read_mostly;
> @@ -2227,6 +2229,36 @@ static bool sock_use_custom_sol_socket(const struct socket *sock)
>  	return test_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags);
>  }
>  
> +#ifdef CONFIG_CGROUP_BPF
> +static bool sockopt_installs_fd(int level, int optname)
> +{
> +	/*
> +	 * These options do fd_install(), and if BPF_CGROUP_RUN_PROG_GETSOCKOPT
> +	 * hook returns an error after success of the original handler
> +	 * sctp_getsockopt(...), userspace will receive an error from getsockopt
> +	 * syscall and will be not aware that fd was successfully installed into fdtable.
> +	 *
> +	 * Let's prevent bpf cgroup hook from running on them.
> +	 */
> +	if (level == SOL_SCTP) {
> +		switch (optname) {
> +		case SCTP_SOCKOPT_PEELOFF:
> +		case SCTP_SOCKOPT_PEELOFF_FLAGS:
> +			return true;
> +		default:
> +			return false;
> +		}
> +	}
> +
> +	return false;
> +}
> +#else /* CONFIG_CGROUP_BPF */
> +static inline bool sockopt_installs_fd(int level, int optname)
> +{
> +	return false;
> +}
> +#endif /* CONFIG_CGROUP_BPF */
> +
>  /*
>   *	Set a socket option. Because we don't know the option lengths we have
>   *	to pass the user mode parameter for the protocols to sort out.
> @@ -2250,7 +2282,7 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
>  	if (err)
>  		goto out_put;
>  
> -	if (!in_compat_syscall())
> +	if (!in_compat_syscall() && !sockopt_installs_fd(level, optname))
>  		err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname,
>  						     user_optval, &optlen,
>  						     &kernel_optval);
> @@ -2304,7 +2336,7 @@ int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
>  	if (err)
>  		goto out_put;
>  
> -	if (!in_compat_syscall())
> +	if (!in_compat_syscall() && !sockopt_installs_fd(level, optname))
>  		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
>  
>  	if (level == SOL_SOCKET)
> @@ -2315,7 +2347,7 @@ int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
>  		err = sock->ops->getsockopt(sock, level, optname, optval,
>  					    optlen);
>  
> -	if (!in_compat_syscall())
> +	if (!in_compat_syscall() && !sockopt_installs_fd(level, optname))
>  		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
>  						     optval, optlen, max_optlen,
>  						     err);
> -- 
> 2.34.1
>
diff mbox series

Patch

diff --git a/net/socket.c b/net/socket.c
index 73e493da4589..9c1ef11de23f 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -108,6 +108,8 @@ 
 #include <linux/ptp_clock_kernel.h>
 #include <trace/events/sock.h>
 
+#include <linux/sctp.h>
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
 unsigned int sysctl_net_busy_read __read_mostly;
 unsigned int sysctl_net_busy_poll __read_mostly;
@@ -2227,6 +2229,36 @@  static bool sock_use_custom_sol_socket(const struct socket *sock)
 	return test_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags);
 }
 
+#ifdef CONFIG_CGROUP_BPF
+static bool sockopt_installs_fd(int level, int optname)
+{
+	/*
+	 * These options do fd_install(), and if BPF_CGROUP_RUN_PROG_GETSOCKOPT
+	 * hook returns an error after success of the original handler
+	 * sctp_getsockopt(...), userspace will receive an error from getsockopt
+	 * syscall and will be not aware that fd was successfully installed into fdtable.
+	 *
+	 * Let's prevent bpf cgroup hook from running on them.
+	 */
+	if (level == SOL_SCTP) {
+		switch (optname) {
+		case SCTP_SOCKOPT_PEELOFF:
+		case SCTP_SOCKOPT_PEELOFF_FLAGS:
+			return true;
+		default:
+			return false;
+		}
+	}
+
+	return false;
+}
+#else /* CONFIG_CGROUP_BPF */
+static inline bool sockopt_installs_fd(int level, int optname)
+{
+	return false;
+}
+#endif /* CONFIG_CGROUP_BPF */
+
 /*
  *	Set a socket option. Because we don't know the option lengths we have
  *	to pass the user mode parameter for the protocols to sort out.
@@ -2250,7 +2282,7 @@  int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
 	if (err)
 		goto out_put;
 
-	if (!in_compat_syscall())
+	if (!in_compat_syscall() && !sockopt_installs_fd(level, optname))
 		err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname,
 						     user_optval, &optlen,
 						     &kernel_optval);
@@ -2304,7 +2336,7 @@  int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
 	if (err)
 		goto out_put;
 
-	if (!in_compat_syscall())
+	if (!in_compat_syscall() && !sockopt_installs_fd(level, optname))
 		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
 
 	if (level == SOL_SOCKET)
@@ -2315,7 +2347,7 @@  int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
 		err = sock->ops->getsockopt(sock, level, optname, optval,
 					    optlen);
 
-	if (!in_compat_syscall())
+	if (!in_compat_syscall() && !sockopt_installs_fd(level, optname))
 		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
 						     optval, optlen, max_optlen,
 						     err);