diff mbox series

[v4,1/2] net/handshake: Create a NETLINK service for handling handshake requests

Message ID 167648899461.5586.1581702417186195077.stgit@91.116.238.104.host.secureserver.net (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Another crack at a handshake upcall mechanism | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4553 this patch: 4553
netdev/cc_maintainers warning 1 maintainers not CCed: davem@davemloft.net
netdev/build_clang success Errors and warnings before: 1074 this patch: 1074
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4763 this patch: 4763
netdev/checkpatch warning CHECK: Please use a blank line after function/struct/union/enum declarations CHECK: extern prototypes should be avoided in .h files WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? WARNING: line length of 81 exceeds 80 columns WARNING: networking block comments don't use an empty /* line, use /* Comment...
netdev/kdoc fail Errors and warnings before: 0 this patch: 1
netdev/source_inline success Was 0 now: 0

Commit Message

Chuck Lever Feb. 15, 2023, 7:23 p.m. UTC
When a kernel consumer needs a transport layer security session, it
first needs a handshake to negotiate and establish a session. This
negotiation can be done in user space via one of the several
existing library implementations, or it can be done in the kernel.

No in-kernel handshake implementations yet exist. In their absence,
we add a netlink service that can:

a. Notify a user space daemon that a handshake is needed.

b. Once notified, the daemon calls the kernel back via this
   netlink service to get the handshake parameters, including an
   open socket on which to establish the session.

c. Once the handshake is complete, the daemon reports the
   session status and other information via a second netlink
   operation. This operation marks that it is safe for the
   kernel to use the open socket and the security session
   established there.

The notification service uses a multicast group. Each handshake
protocol (eg, TLSv1.3, PSP, etc) adopts its own group number so that
the user space daemons for performing handshakes are completely
independent of one another. The kernel can then tell via
netlink_has_listeners() whether a user space daemon is active and
can handle a handshake request for the desired security layer
protocol.

A new netlink operation, ACCEPT, acts like accept(2) in that it
instantiates a file descriptor in the user space daemon's fd table.
If this operation is successful, the reply carries the fd number,
which can be treated as an open and ready file descriptor.

While user space is performing the handshake, the kernel keeps its
muddy paws off the open socket. A second new netlink operation,
DONE, indicates that the user space daemon is finished with the
socket and it is safe for the kernel to use again. The operation
also indicates whether a session was established successfully.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/net/handshake.h        |   46 +++++
 include/net/net_namespace.h    |    5 +
 include/net/sock.h             |    1 
 include/uapi/linux/handshake.h |   56 ++++++
 net/Makefile                   |    1 
 net/handshake/Makefile         |   11 +
 net/handshake/handshake.h      |   43 +++++
 net/handshake/netlink.c        |  370 ++++++++++++++++++++++++++++++++++++++++
 net/handshake/request.c        |  160 +++++++++++++++++
 9 files changed, 693 insertions(+)
 create mode 100644 include/net/handshake.h
 create mode 100644 include/uapi/linux/handshake.h
 create mode 100644 net/handshake/Makefile
 create mode 100644 net/handshake/handshake.h
 create mode 100644 net/handshake/netlink.c
 create mode 100644 net/handshake/request.c

Comments

Hannes Reinecke Feb. 16, 2023, 10:47 a.m. UTC | #1
On 2/15/23 20:23, Chuck Lever wrote:
> When a kernel consumer needs a transport layer security session, it
> first needs a handshake to negotiate and establish a session. This
> negotiation can be done in user space via one of the several
> existing library implementations, or it can be done in the kernel.
> 
> No in-kernel handshake implementations yet exist. In their absence,
> we add a netlink service that can:
> 
> a. Notify a user space daemon that a handshake is needed.
> 
> b. Once notified, the daemon calls the kernel back via this
>     netlink service to get the handshake parameters, including an
>     open socket on which to establish the session.
> 
> c. Once the handshake is complete, the daemon reports the
>     session status and other information via a second netlink
>     operation. This operation marks that it is safe for the
>     kernel to use the open socket and the security session
>     established there.
> 
> The notification service uses a multicast group. Each handshake
> protocol (eg, TLSv1.3, PSP, etc) adopts its own group number so that
> the user space daemons for performing handshakes are completely
> independent of one another. The kernel can then tell via
> netlink_has_listeners() whether a user space daemon is active and
> can handle a handshake request for the desired security layer
> protocol.
> 
> A new netlink operation, ACCEPT, acts like accept(2) in that it
> instantiates a file descriptor in the user space daemon's fd table.
> If this operation is successful, the reply carries the fd number,
> which can be treated as an open and ready file descriptor.
> 
> While user space is performing the handshake, the kernel keeps its
> muddy paws off the open socket. A second new netlink operation,
> DONE, indicates that the user space daemon is finished with the
> socket and it is safe for the kernel to use again. The operation
> also indicates whether a session was established successfully.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   include/net/handshake.h        |   46 +++++
>   include/net/net_namespace.h    |    5 +
>   include/net/sock.h             |    1
>   include/uapi/linux/handshake.h |   56 ++++++
>   net/Makefile                   |    1
>   net/handshake/Makefile         |   11 +
>   net/handshake/handshake.h      |   43 +++++
>   net/handshake/netlink.c        |  370 ++++++++++++++++++++++++++++++++++++++++
>   net/handshake/request.c        |  160 +++++++++++++++++
>   9 files changed, 693 insertions(+)
>   create mode 100644 include/net/handshake.h
>   create mode 100644 include/uapi/linux/handshake.h
>   create mode 100644 net/handshake/Makefile
>   create mode 100644 net/handshake/handshake.h
>   create mode 100644 net/handshake/netlink.c
>   create mode 100644 net/handshake/request.c
> 
> diff --git a/include/net/handshake.h b/include/net/handshake.h
> new file mode 100644
> index 000000000000..ca401c08c541
> --- /dev/null
> +++ b/include/net/handshake.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Generic HANDSHAKE service.
> + *
> + * Author: Chuck Lever <chuck.lever@oracle.com>
> + *
> + * Copyright (c) 2023, Oracle and/or its affiliates.
> + */
> +
> +/*
> + * Data structures and functions that are visible only within the
> + * kernel are declared here.
> + */
> +
> +#ifndef _NET_HANDSHAKE_H
> +#define _NET_HANDSHAKE_H
> +
> +struct handshake_req;
> +
> +/*
> + * Invariants for all handshake requests for one transport layer
> + * security protocol
> + */
> +struct handshake_proto {
> +	int			hp_protocol;
> +	int			hp_mcgrp;
> +	size_t			hp_privsize;
> +
> +	int			(*hp_accept)(struct handshake_req *req,
> +					     struct genl_info *gi, int fd);
> +	void			(*hp_done)(struct handshake_req *req,
> +					   int status, struct nlattr *args);
> +	void			(*hp_destroy)(struct handshake_req *req);
> +};
> +
> +extern struct handshake_req *
> +handshake_req_alloc(struct socket *sock, const struct handshake_proto *proto,
> +		    gfp_t flags);
> +extern void *handshake_req_private(struct handshake_req *req);
> +extern int handshake_req_submit(struct handshake_req *req, gfp_t flags);
> +extern void handshake_req_cancel(struct socket *sock);
> +
> +extern struct nlmsghdr *handshake_genl_put(struct sk_buff *msg,
> +					   struct genl_info *gi);
> +
> +#endif /* _NET_HANDSHAKE_H */
> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> index 8c3587d5c308..a66309789560 100644
> --- a/include/net/net_namespace.h
> +++ b/include/net/net_namespace.h
> @@ -186,6 +186,11 @@ struct net {
>   #if IS_ENABLED(CONFIG_SMC)
>   	struct netns_smc	smc;
>   #endif
> +
> +	/* transport layer security handshake requests */
> +	spinlock_t		hs_lock;
> +	struct list_head	hs_requests;
> +	int			hs_pending;
>   } __randomize_layout;
>   
>   #include <linux/seq_file_net.h>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index e0517ecc6531..e16e63ff61f2 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -515,6 +515,7 @@ struct sock {
>   
>   	struct socket		*sk_socket;
>   	void			*sk_user_data;
> +	void			*sk_handshake_req;
>   #ifdef CONFIG_SECURITY
>   	void			*sk_security;
>   #endif
> diff --git a/include/uapi/linux/handshake.h b/include/uapi/linux/handshake.h
> new file mode 100644
> index 000000000000..9544edeb181f
> --- /dev/null
> +++ b/include/uapi/linux/handshake.h
> @@ -0,0 +1,56 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * GENL HANDSHAKE service.
> + *
> + * Author: Chuck Lever <chuck.lever@oracle.com>
> + *
> + * Copyright (c) 2023, Oracle and/or its affiliates.
> + */
> +
> +/*
> + * Data structures and functions that are visible to user space are
> + * declared here. This file constitutes an API contract between the
> + * Linux kernel and user space.
> + */
> +
> +#ifndef _UAPI_LINUX_HANDSHAKE_H
> +#define _UAPI_LINUX_HANDSHAKE_H
> +
> +#define HANDSHAKE_GENL_NAME	"handshake"
> +#define HANDSHAKE_GENL_VERSION	0x01
> +
> +enum handshake_genl_mcgrps {
> +	HANDSHAKE_GENL_MCGRP_NONE = 0,
> +};
> +
> +#define HANDSHAKE_GENL_MCGRP_NONE_NAME	"none"
> +
> +enum handshake_genl_cmds {
> +	HANDSHAKE_GENL_CMD_UNSPEC = 0,
> +	HANDSHAKE_GENL_CMD_READY,
> +	HANDSHAKE_GENL_CMD_ACCEPT,
> +	HANDSHAKE_GENL_CMD_DONE,
> +
> +	__HANDSHAKE_GENL_CMD_MAX
> +};
> +#define HANDSHAKE_GENL_CMD_MAX	(__HANDSHAKE_GENL_CMD_MAX - 1)
> +
> +enum handshake_genl_attrs {
> +	HANDSHAKE_GENL_ATTR_UNSPEC = 0,
> +	HANDSHAKE_GENL_ATTR_MSG_STATUS,
> +	HANDSHAKE_GENL_ATTR_SESS_STATUS,
> +	HANDSHAKE_GENL_ATTR_SOCKFD,
> +	HANDSHAKE_GENL_ATTR_PROTOCOL,
> +
> +	HANDSHAKE_GENL_ATTR_ACCEPT,
> +	HANDSHAKE_GENL_ATTR_DONE,
> +
> +	__HANDSHAKE_GENL_ATTR_MAX
> +};
> +#define HANDSHAKE_GENL_ATTR_MAX	(__HANDSHAKE_GENL_ATTR_MAX - 1)
> +
> +enum handshake_genl_protocol {
> +	HANDSHAKE_GENL_PROTO_UNSPEC = 0,
> +};
> +
> +#endif /* _UAPI_LINUX_HANDSHAKE_H */
> diff --git a/net/Makefile b/net/Makefile
> index 6a62e5b27378..c1bb53f00486 100644
> --- a/net/Makefile
> +++ b/net/Makefile
> @@ -78,3 +78,4 @@ obj-$(CONFIG_NET_NCSI)		+= ncsi/
>   obj-$(CONFIG_XDP_SOCKETS)	+= xdp/
>   obj-$(CONFIG_MPTCP)		+= mptcp/
>   obj-$(CONFIG_MCTP)		+= mctp/
> +obj-y				+= handshake/
> diff --git a/net/handshake/Makefile b/net/handshake/Makefile
> new file mode 100644
> index 000000000000..824e08c626af
> --- /dev/null
> +++ b/net/handshake/Makefile
> @@ -0,0 +1,11 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Makefile for the Generic HANDSHAKE service
> +#
> +# Author: Chuck Lever <chuck.lever@oracle.com>
> +#
> +# Copyright (c) 2023, Oracle and/or its affiliates.
> +#
> +
> +obj-y += handshake.o
> +handshake-y := netlink.o request.o
> diff --git a/net/handshake/handshake.h b/net/handshake/handshake.h
> new file mode 100644
> index 000000000000..1cbcfc632a24
> --- /dev/null
> +++ b/net/handshake/handshake.h
> @@ -0,0 +1,43 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Generic netlink handshake service
> + *
> + * Author: Chuck Lever <chuck.lever@oracle.com>
> + *
> + * Copyright (c) 2023, Oracle and/or its affiliates.
> + */
> +
> +/*
> + * Data structures and functions that are visible only within the
> + * handshake module are declared here.
> + */
> +
> +#ifndef _INTERNAL_HANDSHAKE_H
> +#define _INTERNAL_HANDSHAKE_H
> +
> +/*
> + * One handshake request
> + */
> +struct handshake_req {
> +	refcount_t			hr_ref;
> +	struct list_head		hr_list;
> +	unsigned long			hr_flags;
> +	const struct handshake_proto	*hr_proto;
> +	struct socket			*hr_sock;
> +	int				hr_fd;
> +};
> +
> +#define HANDSHAKE_F_COMPLETED	BIT(0)
> +
> +int handshake_genl_notify(struct net *net, struct handshake_req *req,
> +			  gfp_t flags);
> +void handshake_complete(struct handshake_req *req, int status,
> +			struct nlattr *args);
> +
> +struct handshake_req *handshake_req_get(struct handshake_req *req);
> +void handshake_req_put(struct handshake_req *req);
> +
> +void add_pending_locked(struct net *net, struct handshake_req *req);
> +bool handshake_remove_pending(struct net *net, struct handshake_req *req);
> +
> +#endif /* _INTERNAL_HANDSHAKE_H */
> diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c
> new file mode 100644
> index 000000000000..8d0bf11396a7
> --- /dev/null
> +++ b/net/handshake/netlink.c
> @@ -0,0 +1,370 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Generic netlink handshake service
> + *
> + * Author: Chuck Lever <chuck.lever@oracle.com>
> + *
> + * Copyright (c) 2023, Oracle and/or its affiliates.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/socket.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/skbuff.h>
> +#include <linux/inet.h>
> +
> +#include <net/sock.h>
> +#include <net/genetlink.h>
> +#include <net/handshake.h>
> +
> +#include <uapi/linux/handshake.h>
> +#include "handshake.h"
> +
> +static struct genl_family __ro_after_init handshake_genl_family;
> +
> +void add_pending_locked(struct net *net, struct handshake_req *req)
> +{
> +	net->hs_pending++;
> +	list_add_tail(&req->hr_list, &net->hs_requests);
> +}
> +
> +static void remove_pending_locked(struct net *net, struct handshake_req *req)
> +{
> +	net->hs_pending--;
> +	list_del_init(&req->hr_list);
> +}
> +
> +/*
> + * Returns true if this req was on the pending list.
> + */
> +bool handshake_remove_pending(struct net *net, struct handshake_req *req)
> +{
> +	struct sock *sk = req->hr_sock->sk;
> +	bool ret;
> +
> +	ret = false;
> +
> +	spin_lock(&net->hs_lock);
> +	if (!list_empty(&req->hr_list)) {
> +		remove_pending_locked(net, req);
> +		ret = true;
> +	}
> +	sk->sk_handshake_req = NULL;
> +	spin_unlock(&net->hs_lock);
> +
> +	return ret;
> +}
> +
> +void handshake_complete(struct handshake_req *req, int status,
> +			struct nlattr *args)
> +{
> +	if (!test_and_set_bit(HANDSHAKE_F_COMPLETED, &req->hr_flags)) {
> +		req->hr_proto->hp_done(req, status, args);
> +		req->hr_sock->sk->sk_handshake_req = NULL;
> +	}
> +	handshake_req_put(req);
> +}
> +
> +int handshake_genl_notify(struct net *net, struct handshake_req *req,
> +			  gfp_t flags)
> +{
> +	struct sk_buff *skb;
> +	void *hdr;
> +
> +	if (!genl_has_listeners(&handshake_genl_family, net,
> +				req->hr_proto->hp_mcgrp))
> +		return -ESRCH;
> +
> +	skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +	if (!skb)
> +		return -ENOMEM;
> +
> +	hdr = genlmsg_put(skb, 0, 0, &handshake_genl_family, 0,
> +			  HANDSHAKE_GENL_CMD_READY);
> +	if (!hdr) {
> +		nlmsg_free(skb);
> +		return -EMSGSIZE;
> +	}
> +
> +	genlmsg_end(skb, hdr);
> +	return genlmsg_multicast(&handshake_genl_family, skb, 0,
> +				 req->hr_proto->hp_mcgrp, flags);
> +}
> +
> +static int handshake_accept(struct handshake_req *req)
> +{
> +	struct socket *sock = req->hr_sock;
> +	int flags = O_CLOEXEC;
> +	struct file *file;
> +	int fd;
> +
> +	fd = get_unused_fd_flags(flags);
> +	if (fd < 0)
> +		return fd;
> +	file = sock_alloc_file(sock, flags, sock->sk->sk_prot_creator->name);
> +	if (IS_ERR(file)) {
> +		put_unused_fd(fd);
> +		return PTR_ERR(file);
> +	}
> +
> +	req->hr_fd = fd;
> +	fd_install(fd, file);
> +	return 0;
> +}
> +
> +static const struct nla_policy
> +handshake_genl_policy[HANDSHAKE_GENL_ATTR_MAX + 1] = {
> +	[HANDSHAKE_GENL_ATTR_MSG_STATUS] = {
> +		.type = NLA_U32
> +	},
> +	[HANDSHAKE_GENL_ATTR_SESS_STATUS] = {
> +		.type = NLA_U32
> +	},
> +	[HANDSHAKE_GENL_ATTR_SOCKFD] = {
> +		.type = NLA_U32
> +	},
> +	[HANDSHAKE_GENL_ATTR_PROTOCOL] = {
> +		.type = NLA_U32
> +	},
> +
> +	[HANDSHAKE_GENL_ATTR_ACCEPT] = {
> +		.type = NLA_NESTED,
> +	},
> +	[HANDSHAKE_GENL_ATTR_DONE] = {
> +		.type = NLA_NESTED,
> +	},
> +};
> +
> +/**
> + * handshake_genl_put - Create a generic netlink message header
> + * @msg: buffer in which to create the header
> + * @gi: generic netlink message context
> + *
> + * Returns a ready-to-use header, or NULL.
> + */
> +struct nlmsghdr *handshake_genl_put(struct sk_buff *msg, struct genl_info *gi)
> +{
> +	return genlmsg_put(msg, gi->snd_portid, gi->snd_seq,
> +			   &handshake_genl_family, 0, gi->genlhdr->cmd);
> +}
> +EXPORT_SYMBOL(handshake_genl_put);
> +
> +static int handshake_genl_status_reply(struct sk_buff *skb,
> +				       struct genl_info *gi, int status)
> +{
> +	struct nlmsghdr *hdr;
> +	struct sk_buff *msg;
> +	int ret;
> +
> +	ret = -ENOMEM;
> +	msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +	if (!msg)
> +		goto out;
> +	hdr = handshake_genl_put(msg, gi);
> +	if (!hdr)
> +		goto out_free;
> +
> +	ret = -EMSGSIZE;
> +	ret = nla_put_u32(msg, HANDSHAKE_GENL_ATTR_MSG_STATUS, status);
> +	if (ret < 0)
> +		goto out_free;
> +
> +	genlmsg_end(msg, hdr);
> +	return genlmsg_reply(msg, gi);
> +
> +out_free:
> +	genlmsg_cancel(msg, hdr);
> +out:
> +	return ret;
> +}
> +
> +static int handshake_genl_cmd_accept(struct sk_buff *skb, struct genl_info *gi)
> +{
> +	struct nlattr *tb[HANDSHAKE_GENL_ATTR_MAX + 1];
> +	struct net *net = sock_net(skb->sk);
> +	struct handshake_req *pos, *req;
> +	int err;
> +
> +	err = genlmsg_parse(nlmsg_hdr(skb), &handshake_genl_family, tb,
> +			    HANDSHAKE_GENL_ATTR_MAX, handshake_genl_policy,
> +			    NULL);
> +	if (err) {
> +		pr_err_ratelimited("%s: genlmsg_parse() returned %d\n",
> +				   __func__, err);
> +		return err;
> +	}
> +
> +	if (!tb[HANDSHAKE_GENL_ATTR_PROTOCOL])
> +		return handshake_genl_status_reply(skb, gi, -EINVAL);
> +
> +	req = NULL;
> +	spin_lock(&net->hs_lock);
> +	list_for_each_entry(pos, &net->hs_requests, hr_list) {
> +		if (pos->hr_proto->hp_protocol !=
> +		    nla_get_u32(tb[HANDSHAKE_GENL_ATTR_PROTOCOL]))
> +			continue;
> +		remove_pending_locked(net, pos);
> +		req = handshake_req_get(pos);
> +		break;
> +	}
> +	spin_unlock(&net->hs_lock);
> +	if (!req)
> +		return handshake_genl_status_reply(skb, gi, -EAGAIN);
> +
> +	err = handshake_accept(req);
> +	if (err < 0) {
> +		handshake_complete(req, -EIO, NULL);
> +		handshake_req_put(req);
> +		return handshake_genl_status_reply(skb, gi, err);
> +	}
> +	err = req->hr_proto->hp_accept(req, gi, req->hr_fd);
> +	if (err) {
> +		put_unused_fd(req->hr_fd);
> +		handshake_complete(req, -EIO, NULL);
> +		handshake_req_put(req);
> +		return err;
> +	}
> +	return 0;
> +}
> +
> +/*
> + * This function is careful to not close the socket. It merely removes
> + * it from the file descriptor table so that it is no longer visible
> + * to the calling process.
> + */
> +static int handshake_genl_cmd_done(struct sk_buff *skb, struct genl_info *gi)
> +{
> +	struct nlattr *tb[HANDSHAKE_GENL_ATTR_MAX + 1];
> +	struct handshake_req *req;
> +	struct socket *sock;
> +	int fd, status, err;
> +
> +	err = genlmsg_parse(nlmsg_hdr(skb), &handshake_genl_family, tb,
> +			    HANDSHAKE_GENL_ATTR_MAX, handshake_genl_policy,
> +			    NULL);
> +	if (err) {
> +		pr_err_ratelimited("%s: genlmsg_parse() returned %d\n",
> +				   __func__, err);
> +		return err;
> +	}
> +
> +	if (!tb[HANDSHAKE_GENL_ATTR_SOCKFD])
> +		return handshake_genl_status_reply(skb, gi, -EINVAL);
> +	err = 0;
> +	fd = nla_get_u32(tb[HANDSHAKE_GENL_ATTR_SOCKFD]);
> +	sock = sockfd_lookup(fd, &err);
> +	if (err)
> +		return handshake_genl_status_reply(skb, gi, -EBADF);
> +
> +	req = sock->sk->sk_handshake_req;

And this will crash horribly if userspace released the socket in the 
meantime (as then sock->sk is NULL).
(Note: I probably show my complete ignorance of the network stack here, 
but ...)
Is there a good way of figuring out if 'sock->sk' is valid?
sock_hold() only makes sure that 'sock' is valid; it does nothing about
sock->sk.
Especially this bit in net/socket.c:__sock_release()

         if (!sock->file) {
                 iput(SOCK_INODE(sock));
                 return;
         }
         sock->file = NULL;

will always get you, as the _first_ caller to sock_release() does the 
right thing (by setting sock->file to NULL), but the second caller will
crash in iput().
There _must_ be a better way of checking...

Cheers,

Hannes
Paolo Abeni Feb. 16, 2023, 1:12 p.m. UTC | #2
[partial feedback /me is still a bit lost in the code ;]
On Wed, 2023-02-15 at 14:23 -0500, Chuck Lever wrote:
> +/*
> + * This function is careful to not close the socket. It merely removes
> + * it from the file descriptor table so that it is no longer visible
> + * to the calling process.
> + */
> +static int handshake_genl_cmd_done(struct sk_buff *skb, struct genl_info *gi)
> +{
> +	struct nlattr *tb[HANDSHAKE_GENL_ATTR_MAX + 1];
> +	struct handshake_req *req;
> +	struct socket *sock;
> +	int fd, status, err;
> +
> +	err = genlmsg_parse(nlmsg_hdr(skb), &handshake_genl_family, tb,
> +			    HANDSHAKE_GENL_ATTR_MAX, handshake_genl_policy,
> +			    NULL);
> +	if (err) {
> +		pr_err_ratelimited("%s: genlmsg_parse() returned %d\n",
> +				   __func__, err);
> +		return err;
> +	}
> +
> +	if (!tb[HANDSHAKE_GENL_ATTR_SOCKFD])
> +		return handshake_genl_status_reply(skb, gi, -EINVAL);
> +	err = 0;
> +	fd = nla_get_u32(tb[HANDSHAKE_GENL_ATTR_SOCKFD]);
> +	sock = sockfd_lookup(fd, &err);
> +	if (err)
> +		return handshake_genl_status_reply(skb, gi, -EBADF);
> +
> +	req = sock->sk->sk_handshake_req;
> +	if (req->hr_fd != fd)	/* sanity */
> +		return handshake_genl_status_reply(skb, gi, -EBADF);
> +
> +	status = -EIO;
> +	if (tb[HANDSHAKE_GENL_ATTR_SESS_STATUS])
> +		status = nla_get_u32(tb[HANDSHAKE_GENL_ATTR_SESS_STATUS]);
> +
> +	put_unused_fd(req->hr_fd);

If I read correctly, at this point the user-space is expected to have
already closed hr_fd , but that is not enforced, right? a buggy or
malicious user-space could cause bad things not closing such fd.

Can we use sockfd_put(sock) instead? will make the code more readable,
I think.

BTW I don't think there is any problem with the sock->sk dereference
above, the fd reference count will prevent __sock_release from being
called.

[...]

> +static void __net_exit handshake_net_exit(struct net *net)
> +{
> +	struct handshake_req *req;
> +	LIST_HEAD(requests);
> +
> +	/*
> +	 * XXX: This drains the net's pending list, but does
> +	 *	nothing about requests that have been accepted
> +	 *	and are in progress.
> +	 */
> +	spin_lock(&net->hs_lock);
> +	list_splice_init(&requests, &net->hs_requests);
> +	spin_unlock(&net->hs_lock);

If I read correctly accepted, uncompleted reqs are leaked. I think that
could be prevented installing a custom sk_destructor in sock->sk
tacking care of freeing the sk->sk_handshake_req. The existing/old
sk_destructor - if any - could be stored in an additional
sk_handshake_req field and tail-called by the req's one.

[...]

> +/*
> + * This limit is to prevent slow remotes from causing denial of service.
> + * A ulimit-style tunable might be used instead.
> + */
> +#define HANDSHAKE_PENDING_MAX (10)

I liked the idea of a core mem based limit ;) not a big deal anyway ;)

> +
> +struct handshake_req *handshake_req_get(struct handshake_req *req)
> +{
> +	return likely(refcount_inc_not_zero(&req->hr_ref)) ? req : NULL;
> +}

It's unclear to me under which circumstances the refcount should be >
1: AFAICS the req should have always a single owner: initially the
creator, then the accept queue and finally the user-space serving the
request.

Cheers,

Paolo
Chuck Lever Feb. 16, 2023, 3:15 p.m. UTC | #3
> On Feb 16, 2023, at 8:12 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> 
> [partial feedback /me is still a bit lost in the code ;]

Thanks to you, Hannes, and Jakub for your review.

Responses/questions below.


> On Wed, 2023-02-15 at 14:23 -0500, Chuck Lever wrote:
>> +/*
>> + * This function is careful to not close the socket. It merely removes
>> + * it from the file descriptor table so that it is no longer visible
>> + * to the calling process.
>> + */
>> +static int handshake_genl_cmd_done(struct sk_buff *skb, struct genl_info *gi)
>> +{
>> +	struct nlattr *tb[HANDSHAKE_GENL_ATTR_MAX + 1];
>> +	struct handshake_req *req;
>> +	struct socket *sock;
>> +	int fd, status, err;
>> +
>> +	err = genlmsg_parse(nlmsg_hdr(skb), &handshake_genl_family, tb,
>> +			    HANDSHAKE_GENL_ATTR_MAX, handshake_genl_policy,
>> +			    NULL);
>> +	if (err) {
>> +		pr_err_ratelimited("%s: genlmsg_parse() returned %d\n",
>> +				   __func__, err);
>> +		return err;
>> +	}
>> +
>> +	if (!tb[HANDSHAKE_GENL_ATTR_SOCKFD])
>> +		return handshake_genl_status_reply(skb, gi, -EINVAL);
>> +	err = 0;
>> +	fd = nla_get_u32(tb[HANDSHAKE_GENL_ATTR_SOCKFD]);
>> +	sock = sockfd_lookup(fd, &err);
>> +	if (err)
>> +		return handshake_genl_status_reply(skb, gi, -EBADF);
>> +
>> +	req = sock->sk->sk_handshake_req;
>> +	if (req->hr_fd != fd)	/* sanity */
>> +		return handshake_genl_status_reply(skb, gi, -EBADF);
>> +
>> +	status = -EIO;
>> +	if (tb[HANDSHAKE_GENL_ATTR_SESS_STATUS])
>> +		status = nla_get_u32(tb[HANDSHAKE_GENL_ATTR_SESS_STATUS]);
>> +
>> +	put_unused_fd(req->hr_fd);
> 
> If I read correctly, at this point the user-space is expected to have
> already closed hr_fd , but that is not enforced, right? a buggy or
> malicious user-space could cause bad things not closing such fd.

No, user space is no longer supposed to close the fd. The
CMD_DONE operation functions as "close" now. But maybe
that's a bad idea. More below.

The problem is what happens if user space /does/ close
without calling DONE; for example, if the daemon seg faults
and exits? (In other words, I'm not sure if the upcall
mechanism as it is now handles that kind of behavior).


> Can we use sockfd_put(sock) instead? will make the code more readable,
> I think.

Not sure yet, I need more detail; and if we use an
sk_destructor function, maybe that won't be needed.


> BTW I don't think there is any problem with the sock->sk dereference
> above, the fd reference count will prevent __sock_release from being
> called.

Yes, I tried to ensure that socket reference counting
keeps it alive where it might be used or dereferenced.


> [...]
> 
>> +static void __net_exit handshake_net_exit(struct net *net)
>> +{
>> +	struct handshake_req *req;
>> +	LIST_HEAD(requests);
>> +
>> +	/*
>> +	 * XXX: This drains the net's pending list, but does
>> +	 *	nothing about requests that have been accepted
>> +	 *	and are in progress.
>> +	 */
>> +	spin_lock(&net->hs_lock);
>> +	list_splice_init(&requests, &net->hs_requests);
>> +	spin_unlock(&net->hs_lock);
> 
> If I read correctly accepted, uncompleted reqs are leaked.

Yes, that's exactly right.


> I think that
> could be prevented installing a custom sk_destructor in sock->sk
> tacking care of freeing the sk->sk_handshake_req. The existing/old
> sk_destructor - if any - could be stored in an additional
> sk_handshake_req field and tail-called by the req's one.

I've been looking for a way to modify socket close behavior
for these sockets, and this sounds like it's in the
neighborhood. I'll have a look.

So one thing we might do is have CMD_DONE act just as a way
to report handshake results, and have the handshake daemon
close the fd to signal it's finished with it. sk_destructor
would then fire handshake_complete and free the
handshake_req.

Might make things a little more robust?


> [...]
> 
>> +/*
>> + * This limit is to prevent slow remotes from causing denial of service.
>> + * A ulimit-style tunable might be used instead.
>> + */
>> +#define HANDSHAKE_PENDING_MAX (10)
> 
> I liked the idea of a core mem based limit ;) not a big deal anyway ;)

Well, this is a placeholder, carried over from the last
version of this series. It's based on the same concept for
the maximum length of a listener queue.

I'm not dropping your idea, but instead trying to get the
high order bits taken care of first. If you have some
sample code, I'm happy to integrate it sooner rather than
later!


>> +
>> +struct handshake_req *handshake_req_get(struct handshake_req *req)
>> +{
>> +	return likely(refcount_inc_not_zero(&req->hr_ref)) ? req : NULL;
>> +}
> 
> It's unclear to me under which circumstances the refcount should be >
> 1: AFAICS the req should have always a single owner: initially the
> creator, then the accept queue and finally the user-space serving the
> request.

I think during request cancelation there are some moments
where a race between cancel and complete might result in
one of those two ending up with a reference to a freed
handshake_req. So I added reference counting.

Hannes is concerned about handshakes taking too long, and
would like a timeout mechanism. A handshake timeout would
be the same as a call to handshake_cancel, and thus be
likewise racy.

However if we use close/sk_destructor to fire the
completion and free the handshake_req, then maybe cancel/
timeout could be done simply by killing the user space
process that is handling the handshake request.


--
Chuck Lever
Chuck Lever Feb. 16, 2023, 4:58 p.m. UTC | #4
> On Feb 16, 2023, at 10:15 AM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> 
>> 
>> On Feb 16, 2023, at 8:12 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> 
>> On Wed, 2023-02-15 at 14:23 -0500, Chuck Lever wrote:
>> 
>>> +static void __net_exit handshake_net_exit(struct net *net)
>>> +{
>>> +	struct handshake_req *req;
>>> +	LIST_HEAD(requests);
>>> +
>>> +	/*
>>> +	 * XXX: This drains the net's pending list, but does
>>> +	 *	nothing about requests that have been accepted
>>> +	 *	and are in progress.
>>> +	 */
>>> +	spin_lock(&net->hs_lock);
>>> +	list_splice_init(&requests, &net->hs_requests);
>>> +	spin_unlock(&net->hs_lock);
>> 
>> If I read correctly accepted, uncompleted reqs are leaked.
> 
> Yes, that's exactly right.
> 
> 
>> I think that
>> could be prevented installing a custom sk_destructor in sock->sk
>> tacking care of freeing the sk->sk_handshake_req. The existing/old
>> sk_destructor - if any - could be stored in an additional
>> sk_handshake_req field and tail-called by the req's one.
> 
> I've been looking for a way to modify socket close behavior
> for these sockets, and this sounds like it's in the
> neighborhood. I'll have a look.
> 
> So one thing we might do is have CMD_DONE act just as a way
> to report handshake results, and have the handshake daemon
> close the fd to signal it's finished with it. sk_destructor
> would then fire handshake_complete and free the
> handshake_req.
> 
> Might make things a little more robust?

->sk_destruct is the wrong place to hook, since we need the
socket itself to stay around after the handshake completes.
Better would be hooking the close of the user space file
descriptor.

Yes, we could use ->sk_destruct to tear down the handshake_req,
if we don't mind it sticking around until the socket is
finally closed.


--
Chuck Lever
diff mbox series

Patch

diff --git a/include/net/handshake.h b/include/net/handshake.h
new file mode 100644
index 000000000000..ca401c08c541
--- /dev/null
+++ b/include/net/handshake.h
@@ -0,0 +1,46 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Generic HANDSHAKE service.
+ *
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ *
+ * Copyright (c) 2023, Oracle and/or its affiliates.
+ */
+
+/*
+ * Data structures and functions that are visible only within the
+ * kernel are declared here.
+ */
+
+#ifndef _NET_HANDSHAKE_H
+#define _NET_HANDSHAKE_H
+
+struct handshake_req;
+
+/*
+ * Invariants for all handshake requests for one transport layer
+ * security protocol
+ */
+struct handshake_proto {
+	int			hp_protocol;
+	int			hp_mcgrp;
+	size_t			hp_privsize;
+
+	int			(*hp_accept)(struct handshake_req *req,
+					     struct genl_info *gi, int fd);
+	void			(*hp_done)(struct handshake_req *req,
+					   int status, struct nlattr *args);
+	void			(*hp_destroy)(struct handshake_req *req);
+};
+
+extern struct handshake_req *
+handshake_req_alloc(struct socket *sock, const struct handshake_proto *proto,
+		    gfp_t flags);
+extern void *handshake_req_private(struct handshake_req *req);
+extern int handshake_req_submit(struct handshake_req *req, gfp_t flags);
+extern void handshake_req_cancel(struct socket *sock);
+
+extern struct nlmsghdr *handshake_genl_put(struct sk_buff *msg,
+					   struct genl_info *gi);
+
+#endif /* _NET_HANDSHAKE_H */
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 8c3587d5c308..a66309789560 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -186,6 +186,11 @@  struct net {
 #if IS_ENABLED(CONFIG_SMC)
 	struct netns_smc	smc;
 #endif
+
+	/* transport layer security handshake requests */
+	spinlock_t		hs_lock;
+	struct list_head	hs_requests;
+	int			hs_pending;
 } __randomize_layout;
 
 #include <linux/seq_file_net.h>
diff --git a/include/net/sock.h b/include/net/sock.h
index e0517ecc6531..e16e63ff61f2 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -515,6 +515,7 @@  struct sock {
 
 	struct socket		*sk_socket;
 	void			*sk_user_data;
+	void			*sk_handshake_req;
 #ifdef CONFIG_SECURITY
 	void			*sk_security;
 #endif
diff --git a/include/uapi/linux/handshake.h b/include/uapi/linux/handshake.h
new file mode 100644
index 000000000000..9544edeb181f
--- /dev/null
+++ b/include/uapi/linux/handshake.h
@@ -0,0 +1,56 @@ 
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * GENL HANDSHAKE service.
+ *
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ *
+ * Copyright (c) 2023, Oracle and/or its affiliates.
+ */
+
+/*
+ * Data structures and functions that are visible to user space are
+ * declared here. This file constitutes an API contract between the
+ * Linux kernel and user space.
+ */
+
+#ifndef _UAPI_LINUX_HANDSHAKE_H
+#define _UAPI_LINUX_HANDSHAKE_H
+
+#define HANDSHAKE_GENL_NAME	"handshake"
+#define HANDSHAKE_GENL_VERSION	0x01
+
+enum handshake_genl_mcgrps {
+	HANDSHAKE_GENL_MCGRP_NONE = 0,
+};
+
+#define HANDSHAKE_GENL_MCGRP_NONE_NAME	"none"
+
+enum handshake_genl_cmds {
+	HANDSHAKE_GENL_CMD_UNSPEC = 0,
+	HANDSHAKE_GENL_CMD_READY,
+	HANDSHAKE_GENL_CMD_ACCEPT,
+	HANDSHAKE_GENL_CMD_DONE,
+
+	__HANDSHAKE_GENL_CMD_MAX
+};
+#define HANDSHAKE_GENL_CMD_MAX	(__HANDSHAKE_GENL_CMD_MAX - 1)
+
+enum handshake_genl_attrs {
+	HANDSHAKE_GENL_ATTR_UNSPEC = 0,
+	HANDSHAKE_GENL_ATTR_MSG_STATUS,
+	HANDSHAKE_GENL_ATTR_SESS_STATUS,
+	HANDSHAKE_GENL_ATTR_SOCKFD,
+	HANDSHAKE_GENL_ATTR_PROTOCOL,
+
+	HANDSHAKE_GENL_ATTR_ACCEPT,
+	HANDSHAKE_GENL_ATTR_DONE,
+
+	__HANDSHAKE_GENL_ATTR_MAX
+};
+#define HANDSHAKE_GENL_ATTR_MAX	(__HANDSHAKE_GENL_ATTR_MAX - 1)
+
+enum handshake_genl_protocol {
+	HANDSHAKE_GENL_PROTO_UNSPEC = 0,
+};
+
+#endif /* _UAPI_LINUX_HANDSHAKE_H */
diff --git a/net/Makefile b/net/Makefile
index 6a62e5b27378..c1bb53f00486 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -78,3 +78,4 @@  obj-$(CONFIG_NET_NCSI)		+= ncsi/
 obj-$(CONFIG_XDP_SOCKETS)	+= xdp/
 obj-$(CONFIG_MPTCP)		+= mptcp/
 obj-$(CONFIG_MCTP)		+= mctp/
+obj-y				+= handshake/
diff --git a/net/handshake/Makefile b/net/handshake/Makefile
new file mode 100644
index 000000000000..824e08c626af
--- /dev/null
+++ b/net/handshake/Makefile
@@ -0,0 +1,11 @@ 
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the Generic HANDSHAKE service
+#
+# Author: Chuck Lever <chuck.lever@oracle.com>
+#
+# Copyright (c) 2023, Oracle and/or its affiliates.
+#
+
+obj-y += handshake.o
+handshake-y := netlink.o request.o
diff --git a/net/handshake/handshake.h b/net/handshake/handshake.h
new file mode 100644
index 000000000000..1cbcfc632a24
--- /dev/null
+++ b/net/handshake/handshake.h
@@ -0,0 +1,43 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Generic netlink handshake service
+ *
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ *
+ * Copyright (c) 2023, Oracle and/or its affiliates.
+ */
+
+/*
+ * Data structures and functions that are visible only within the
+ * handshake module are declared here.
+ */
+
+#ifndef _INTERNAL_HANDSHAKE_H
+#define _INTERNAL_HANDSHAKE_H
+
+/*
+ * One handshake request
+ */
+struct handshake_req {
+	refcount_t			hr_ref;
+	struct list_head		hr_list;
+	unsigned long			hr_flags;
+	const struct handshake_proto	*hr_proto;
+	struct socket			*hr_sock;
+	int				hr_fd;
+};
+
+#define HANDSHAKE_F_COMPLETED	BIT(0)
+
+int handshake_genl_notify(struct net *net, struct handshake_req *req,
+			  gfp_t flags);
+void handshake_complete(struct handshake_req *req, int status,
+			struct nlattr *args);
+
+struct handshake_req *handshake_req_get(struct handshake_req *req);
+void handshake_req_put(struct handshake_req *req);
+
+void add_pending_locked(struct net *net, struct handshake_req *req);
+bool handshake_remove_pending(struct net *net, struct handshake_req *req);
+
+#endif /* _INTERNAL_HANDSHAKE_H */
diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c
new file mode 100644
index 000000000000..8d0bf11396a7
--- /dev/null
+++ b/net/handshake/netlink.c
@@ -0,0 +1,370 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Generic netlink handshake service
+ *
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ *
+ * Copyright (c) 2023, Oracle and/or its affiliates.
+ */
+
+#include <linux/types.h>
+#include <linux/socket.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/inet.h>
+
+#include <net/sock.h>
+#include <net/genetlink.h>
+#include <net/handshake.h>
+
+#include <uapi/linux/handshake.h>
+#include "handshake.h"
+
+static struct genl_family __ro_after_init handshake_genl_family;
+
+void add_pending_locked(struct net *net, struct handshake_req *req)
+{
+	net->hs_pending++;
+	list_add_tail(&req->hr_list, &net->hs_requests);
+}
+
+static void remove_pending_locked(struct net *net, struct handshake_req *req)
+{
+	net->hs_pending--;
+	list_del_init(&req->hr_list);
+}
+
+/*
+ * Returns true if this req was on the pending list.
+ */
+bool handshake_remove_pending(struct net *net, struct handshake_req *req)
+{
+	struct sock *sk = req->hr_sock->sk;
+	bool ret;
+
+	ret = false;
+
+	spin_lock(&net->hs_lock);
+	if (!list_empty(&req->hr_list)) {
+		remove_pending_locked(net, req);
+		ret = true;
+	}
+	sk->sk_handshake_req = NULL;
+	spin_unlock(&net->hs_lock);
+
+	return ret;
+}
+
+void handshake_complete(struct handshake_req *req, int status,
+			struct nlattr *args)
+{
+	if (!test_and_set_bit(HANDSHAKE_F_COMPLETED, &req->hr_flags)) {
+		req->hr_proto->hp_done(req, status, args);
+		req->hr_sock->sk->sk_handshake_req = NULL;
+	}
+	handshake_req_put(req);
+}
+
+int handshake_genl_notify(struct net *net, struct handshake_req *req,
+			  gfp_t flags)
+{
+	struct sk_buff *skb;
+	void *hdr;
+
+	if (!genl_has_listeners(&handshake_genl_family, net,
+				req->hr_proto->hp_mcgrp))
+		return -ESRCH;
+
+	skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
+	hdr = genlmsg_put(skb, 0, 0, &handshake_genl_family, 0,
+			  HANDSHAKE_GENL_CMD_READY);
+	if (!hdr) {
+		nlmsg_free(skb);
+		return -EMSGSIZE;
+	}
+
+	genlmsg_end(skb, hdr);
+	return genlmsg_multicast(&handshake_genl_family, skb, 0,
+				 req->hr_proto->hp_mcgrp, flags);
+}
+
+static int handshake_accept(struct handshake_req *req)
+{
+	struct socket *sock = req->hr_sock;
+	int flags = O_CLOEXEC;
+	struct file *file;
+	int fd;
+
+	fd = get_unused_fd_flags(flags);
+	if (fd < 0)
+		return fd;
+	file = sock_alloc_file(sock, flags, sock->sk->sk_prot_creator->name);
+	if (IS_ERR(file)) {
+		put_unused_fd(fd);
+		return PTR_ERR(file);
+	}
+
+	req->hr_fd = fd;
+	fd_install(fd, file);
+	return 0;
+}
+
+static const struct nla_policy
+handshake_genl_policy[HANDSHAKE_GENL_ATTR_MAX + 1] = {
+	[HANDSHAKE_GENL_ATTR_MSG_STATUS] = {
+		.type = NLA_U32
+	},
+	[HANDSHAKE_GENL_ATTR_SESS_STATUS] = {
+		.type = NLA_U32
+	},
+	[HANDSHAKE_GENL_ATTR_SOCKFD] = {
+		.type = NLA_U32
+	},
+	[HANDSHAKE_GENL_ATTR_PROTOCOL] = {
+		.type = NLA_U32
+	},
+
+	[HANDSHAKE_GENL_ATTR_ACCEPT] = {
+		.type = NLA_NESTED,
+	},
+	[HANDSHAKE_GENL_ATTR_DONE] = {
+		.type = NLA_NESTED,
+	},
+};
+
+/**
+ * handshake_genl_put - Create a generic netlink message header
+ * @msg: buffer in which to create the header
+ * @gi: generic netlink message context
+ *
+ * Returns a ready-to-use header, or NULL.
+ */
+struct nlmsghdr *handshake_genl_put(struct sk_buff *msg, struct genl_info *gi)
+{
+	return genlmsg_put(msg, gi->snd_portid, gi->snd_seq,
+			   &handshake_genl_family, 0, gi->genlhdr->cmd);
+}
+EXPORT_SYMBOL(handshake_genl_put);
+
+static int handshake_genl_status_reply(struct sk_buff *skb,
+				       struct genl_info *gi, int status)
+{
+	struct nlmsghdr *hdr;
+	struct sk_buff *msg;
+	int ret;
+
+	ret = -ENOMEM;
+	msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		goto out;
+	hdr = handshake_genl_put(msg, gi);
+	if (!hdr)
+		goto out_free;
+
+	ret = -EMSGSIZE;
+	ret = nla_put_u32(msg, HANDSHAKE_GENL_ATTR_MSG_STATUS, status);
+	if (ret < 0)
+		goto out_free;
+
+	genlmsg_end(msg, hdr);
+	return genlmsg_reply(msg, gi);
+
+out_free:
+	genlmsg_cancel(msg, hdr);
+out:
+	return ret;
+}
+
+static int handshake_genl_cmd_accept(struct sk_buff *skb, struct genl_info *gi)
+{
+	struct nlattr *tb[HANDSHAKE_GENL_ATTR_MAX + 1];
+	struct net *net = sock_net(skb->sk);
+	struct handshake_req *pos, *req;
+	int err;
+
+	err = genlmsg_parse(nlmsg_hdr(skb), &handshake_genl_family, tb,
+			    HANDSHAKE_GENL_ATTR_MAX, handshake_genl_policy,
+			    NULL);
+	if (err) {
+		pr_err_ratelimited("%s: genlmsg_parse() returned %d\n",
+				   __func__, err);
+		return err;
+	}
+
+	if (!tb[HANDSHAKE_GENL_ATTR_PROTOCOL])
+		return handshake_genl_status_reply(skb, gi, -EINVAL);
+
+	req = NULL;
+	spin_lock(&net->hs_lock);
+	list_for_each_entry(pos, &net->hs_requests, hr_list) {
+		if (pos->hr_proto->hp_protocol !=
+		    nla_get_u32(tb[HANDSHAKE_GENL_ATTR_PROTOCOL]))
+			continue;
+		remove_pending_locked(net, pos);
+		req = handshake_req_get(pos);
+		break;
+	}
+	spin_unlock(&net->hs_lock);
+	if (!req)
+		return handshake_genl_status_reply(skb, gi, -EAGAIN);
+
+	err = handshake_accept(req);
+	if (err < 0) {
+		handshake_complete(req, -EIO, NULL);
+		handshake_req_put(req);
+		return handshake_genl_status_reply(skb, gi, err);
+	}
+	err = req->hr_proto->hp_accept(req, gi, req->hr_fd);
+	if (err) {
+		put_unused_fd(req->hr_fd);
+		handshake_complete(req, -EIO, NULL);
+		handshake_req_put(req);
+		return err;
+	}
+	return 0;
+}
+
+/*
+ * This function is careful to not close the socket. It merely removes
+ * it from the file descriptor table so that it is no longer visible
+ * to the calling process.
+ */
+static int handshake_genl_cmd_done(struct sk_buff *skb, struct genl_info *gi)
+{
+	struct nlattr *tb[HANDSHAKE_GENL_ATTR_MAX + 1];
+	struct handshake_req *req;
+	struct socket *sock;
+	int fd, status, err;
+
+	err = genlmsg_parse(nlmsg_hdr(skb), &handshake_genl_family, tb,
+			    HANDSHAKE_GENL_ATTR_MAX, handshake_genl_policy,
+			    NULL);
+	if (err) {
+		pr_err_ratelimited("%s: genlmsg_parse() returned %d\n",
+				   __func__, err);
+		return err;
+	}
+
+	if (!tb[HANDSHAKE_GENL_ATTR_SOCKFD])
+		return handshake_genl_status_reply(skb, gi, -EINVAL);
+	err = 0;
+	fd = nla_get_u32(tb[HANDSHAKE_GENL_ATTR_SOCKFD]);
+	sock = sockfd_lookup(fd, &err);
+	if (err)
+		return handshake_genl_status_reply(skb, gi, -EBADF);
+
+	req = sock->sk->sk_handshake_req;
+	if (req->hr_fd != fd)	/* sanity */
+		return handshake_genl_status_reply(skb, gi, -EBADF);
+
+	status = -EIO;
+	if (tb[HANDSHAKE_GENL_ATTR_SESS_STATUS])
+		status = nla_get_u32(tb[HANDSHAKE_GENL_ATTR_SESS_STATUS]);
+
+	put_unused_fd(req->hr_fd);
+	handshake_complete(req, status, tb[HANDSHAKE_GENL_ATTR_DONE]);
+	handshake_req_put(req);
+	return 0;
+}
+
+static const struct genl_ops handshake_genl_ops[] = {
+	{
+		.cmd		= HANDSHAKE_GENL_CMD_ACCEPT,
+		.doit		= handshake_genl_cmd_accept,
+		.flags		= GENL_ADMIN_PERM,
+	},
+	{
+		.cmd		= HANDSHAKE_GENL_CMD_DONE,
+		.doit		= handshake_genl_cmd_done,
+		.flags		= GENL_ADMIN_PERM,
+	},
+};
+
+static const struct genl_multicast_group handshake_genl_mcgrps[] = {
+	[HANDSHAKE_GENL_MCGRP_NONE] = {
+		.name		= HANDSHAKE_GENL_MCGRP_NONE_NAME,
+	},
+};
+
+static struct genl_family __ro_after_init handshake_genl_family = {
+	.hdrsize		= 0,
+	.name			= HANDSHAKE_GENL_NAME,
+	.version		= HANDSHAKE_GENL_VERSION,
+	.maxattr		= HANDSHAKE_GENL_ATTR_MAX,
+	.netnsok		= true,
+	.n_mcgrps		= ARRAY_SIZE(handshake_genl_mcgrps),
+	.n_ops			= ARRAY_SIZE(handshake_genl_ops),
+	.resv_start_op		= HANDSHAKE_GENL_CMD_MAX,
+	.policy			= handshake_genl_policy,
+	.ops			= handshake_genl_ops,
+	.mcgrps			= handshake_genl_mcgrps,
+	.module			= THIS_MODULE,
+};
+
+static int __net_init handshake_net_init(struct net *net)
+{
+	spin_lock_init(&net->hs_lock);
+	INIT_LIST_HEAD(&net->hs_requests);
+	net->hs_pending	= 0;
+	return 0;
+}
+
+static void __net_exit handshake_net_exit(struct net *net)
+{
+	struct handshake_req *req;
+	LIST_HEAD(requests);
+
+	/*
+	 * XXX: This drains the net's pending list, but does
+	 *	nothing about requests that have been accepted
+	 *	and are in progress.
+	 */
+	spin_lock(&net->hs_lock);
+	list_splice_init(&requests, &net->hs_requests);
+	spin_unlock(&net->hs_lock);
+
+	while (!list_empty(&requests)) {
+		req = list_first_entry(&requests, struct handshake_req, hr_list);
+		list_del(&req->hr_list);
+
+		/*
+		 * Requests on this list have not yet been
+		 * accepted, so they do not have an fd to put.
+		 */
+
+		handshake_complete(req, -ETIMEDOUT, NULL);
+	}
+}
+
+static struct pernet_operations handshake_genl_net_ops = {
+	.init		= handshake_net_init,
+	.exit		= handshake_net_exit,
+};
+
+static int __init handshake_init(void)
+{
+	int ret;
+
+	ret = genl_register_family(&handshake_genl_family);
+	if (ret)
+		return ret;
+
+	ret = register_pernet_subsys(&handshake_genl_net_ops);
+	if (ret)
+		genl_unregister_family(&handshake_genl_family);
+
+	return ret;
+}
+
+static void __exit handshake_exit(void)
+{
+	unregister_pernet_subsys(&handshake_genl_net_ops);
+	genl_unregister_family(&handshake_genl_family);
+}
+
+module_init(handshake_init);
+module_exit(handshake_exit);
diff --git a/net/handshake/request.c b/net/handshake/request.c
new file mode 100644
index 000000000000..bf56703ea1f5
--- /dev/null
+++ b/net/handshake/request.c
@@ -0,0 +1,160 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Handshake request lifetime events
+ *
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ *
+ * Copyright (c) 2023, Oracle and/or its affiliates.
+ */
+
+#include <linux/types.h>
+#include <linux/socket.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/inet.h>
+#include <linux/fdtable.h>
+
+#include <net/sock.h>
+#include <net/genetlink.h>
+#include <net/handshake.h>
+
+#include <uapi/linux/handshake.h>
+#include "handshake.h"
+
+/*
+ * This limit is to prevent slow remotes from causing denial of service.
+ * A ulimit-style tunable might be used instead.
+ */
+#define HANDSHAKE_PENDING_MAX (10)
+
+struct handshake_req *handshake_req_get(struct handshake_req *req)
+{
+	return likely(refcount_inc_not_zero(&req->hr_ref)) ? req : NULL;
+}
+
+static void handshake_req_destroy(struct handshake_req *req)
+{
+	__sock_put(req->hr_sock->sk);
+	req->hr_proto->hp_destroy(req);
+	kfree(req);
+}
+
+void handshake_req_put(struct handshake_req *req)
+{
+	if (refcount_dec_and_test(&req->hr_ref))
+		handshake_req_destroy(req);
+}
+
+/**
+ * handshake_req_alloc - consumer API to allocate a request
+ * @sock: open socket on which to perform a handshake
+ * @proto: security protocol
+ * @flags: memory allocation flags
+ *
+ * Returns an initialized handshake_req or NULL.
+ */
+struct handshake_req *handshake_req_alloc(struct socket *sock,
+					  const struct handshake_proto *proto,
+					  gfp_t flags)
+{
+	struct handshake_req *req;
+
+	req = kzalloc(sizeof(*req) + proto->hp_privsize, flags);
+	if (!req)
+		return NULL;
+
+	sock_hold(sock->sk);
+
+	refcount_set(&req->hr_ref, 1);
+	INIT_LIST_HEAD(&req->hr_list);
+	req->hr_sock = sock;
+	req->hr_proto = proto;
+	return req;
+}
+EXPORT_SYMBOL(handshake_req_alloc);
+
+/**
+ * handshake_req_private - consumer API to return per-handshake private data
+ * @req: handshake arguments
+ *
+ */
+void *handshake_req_private(struct handshake_req *req)
+{
+	return (void *)(req + 1);
+}
+EXPORT_SYMBOL(handshake_req_private);
+
+/**
+ * handshake_req_submit - consumer API to submit a handshake request
+ * @req: handshake arguments
+ * @flags: memory allocation flags
+ *
+ * Return values:
+ *   %0: Request queued
+ *   %-EBUSY: A handshake is already under way for this socket
+ *   %-ESRCH: No handshake agent is available
+ *   %-EAGAIN: Too many pending handshake requests
+ *   %-ENOMEM: Failed to allocate memory
+ *   %-EMSGSIZE: Failed to construct notification message
+ *
+ * A zero return value from handshake_request() means that
+ * exactly one subsequent completion callback is guaranteed.
+ *
+ * A negative return value from handshake_request() means that
+ * no completion callback will be done and that @req is
+ * destroyed.
+ */
+int handshake_req_submit(struct handshake_req *req, gfp_t flags)
+{
+	struct sock *sk = req->hr_sock->sk;
+	struct net *net = sock_net(sk);
+	int ret;
+
+	ret = -EAGAIN;
+	if (READ_ONCE(net->hs_pending) >= HANDSHAKE_PENDING_MAX)
+		goto out_err;
+
+	ret = -EBUSY;
+	spin_lock(&net->hs_lock);
+	if (sk->sk_handshake_req || !list_empty(&req->hr_list)) {
+		spin_unlock(&net->hs_lock);
+		goto out_err;
+	}
+
+	add_pending_locked(net, req);
+	sk->sk_handshake_req = req;
+	spin_unlock(&net->hs_lock);
+
+	ret = handshake_genl_notify(net, req, flags);
+	if (ret)
+		if (handshake_remove_pending(net, req))
+			goto out_err;
+
+	return 0;
+
+out_err:
+	handshake_req_put(req);
+	return ret;
+}
+EXPORT_SYMBOL(handshake_req_submit);
+
+/**
+ * handshake_req_cancel - consumer API to cancel an in-progress handshake
+ * @sock: socket on which there is an ongoing handshake
+ *
+ * The consumer must discard @sock immediately when this function
+ * returns.
+ */
+void handshake_req_cancel(struct socket *sock)
+{
+	struct sock *sk = sock->sk;
+	struct handshake_req *req = sk->sk_handshake_req;
+
+	if (!req)
+		return;
+
+	handshake_remove_pending(sock_net(sk), req);
+	handshake_complete(req, -ERESTARTSYS, NULL);
+}
+EXPORT_SYMBOL(handshake_req_cancel);