Message ID | 167580607317.5328.2575913180270613320.stgit@91.116.238.104.host.secureserver.net (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Another crack at a handshake upcall mechanism | expand |
On 2/7/23 22:41, Chuck Lever wrote: > When a kernel consumer needs a transport layer security session, it > first needs a handshake to negotiate and establish a session. This > negotiation can be done in user space via one of the several > existing library implementations, or it can be done in the kernel. > > No in-kernel handshake implementations yet exist. In their absence, > we add a netlink service akin to NETLINK_ROUTE that can: > > a. Notify a user space daemon that a handshake is needed. > > b. Once notified, the daemon calls the kernel back via this > netlink service to get the handshake parameters, including an > open socket on which to establish the session. > > The notification service uses a multicast group. Each handshake > protocol (eg, TLSv1.3, PSP, etc) adopts its own group number so that > the user space daemons for performing the handshakes are completely > independent of one another. The kernel can then tell via > netlink_has_listeners() whether a user space daemon is active and > can handle a handshake request for the desired security layer > protocol. > > A new netlink operation, ACCEPT, acts like accept(2) in that it > instantiates a file descriptor in the user space daemon's fd table. > If this operation is successful, the reply carries the fd number, > which can be treated as an open and ready file descriptor. > > While user space is performing the handshake, the kernel keeps its > muddy paws off the open socket. The act of closing the user space > file descriptor alerts the kernel that the open socket is safe to > use again. When the user daemon completes a handshake, the kernel is > responsible for checking that a valid transport layer security > session has been established. > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > include/net/handshake.h | 37 ++++ > include/net/net_namespace.h | 1 > include/net/sock.h | 1 > include/uapi/linux/handshake.h | 65 +++++++ > include/uapi/linux/netlink.h | 1 > net/Makefile | 1 > net/handshake/Makefile | 11 + > net/handshake/netlink.c | 320 ++++++++++++++++++++++++++++++++++++ > tools/include/uapi/linux/netlink.h | 1 > 9 files changed, 438 insertions(+) > create mode 100644 include/net/handshake.h > create mode 100644 include/uapi/linux/handshake.h > create mode 100644 net/handshake/Makefile > create mode 100644 net/handshake/netlink.c > Looks good on first glance; I'll give it a go on my testbed. Cheers, Hannes
On Tue, 07 Feb 2023 16:41:13 -0500 Chuck Lever wrote: > diff --git a/tools/include/uapi/linux/netlink.h b/tools/include/uapi/linux/netlink.h > index 0a4d73317759..a269d356f358 100644 > --- a/tools/include/uapi/linux/netlink.h > +++ b/tools/include/uapi/linux/netlink.h > @@ -29,6 +29,7 @@ > #define NETLINK_RDMA 20 > #define NETLINK_CRYPTO 21 /* Crypto layer */ > #define NETLINK_SMC 22 /* SMC monitoring */ > +#define NETLINK_HANDSHAKE 23 /* transport layer sec handshake requests */ The extra indirection of genetlink introduces some complications?
> On Feb 9, 2023, at 1:00 AM, Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 07 Feb 2023 16:41:13 -0500 Chuck Lever wrote: >> diff --git a/tools/include/uapi/linux/netlink.h b/tools/include/uapi/linux/netlink.h >> index 0a4d73317759..a269d356f358 100644 >> --- a/tools/include/uapi/linux/netlink.h >> +++ b/tools/include/uapi/linux/netlink.h >> @@ -29,6 +29,7 @@ >> #define NETLINK_RDMA 20 >> #define NETLINK_CRYPTO 21 /* Crypto layer */ >> #define NETLINK_SMC 22 /* SMC monitoring */ >> +#define NETLINK_HANDSHAKE 23 /* transport layer sec handshake requests */ > > The extra indirection of genetlink introduces some complications? I don't think it does, necessarily. But neither does it seem to add any value (for this use case). <shrug> -- Chuck Lever
On Thu, 2023-02-09 at 15:43 +0000, Chuck Lever III wrote: > > On Feb 9, 2023, at 1:00 AM, Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Tue, 07 Feb 2023 16:41:13 -0500 Chuck Lever wrote: > > > diff --git a/tools/include/uapi/linux/netlink.h > > > b/tools/include/uapi/linux/netlink.h > > > index 0a4d73317759..a269d356f358 100644 > > > --- a/tools/include/uapi/linux/netlink.h > > > +++ b/tools/include/uapi/linux/netlink.h > > > @@ -29,6 +29,7 @@ > > > #define NETLINK_RDMA 20 > > > #define NETLINK_CRYPTO 21 /* Crypto layer */ > > > #define NETLINK_SMC 22 /* SMC monitoring */ > > > +#define NETLINK_HANDSHAKE 23 /* transport layer sec > > > handshake requests */ > > > > The extra indirection of genetlink introduces some complications? > > I don't think it does, necessarily. But neither does it seem > to add any value (for this use case). <shrug> To me it introduces a good separation between the handshake mechanism itself and the current subject (sock). IIRC the previous version allowed the user-space to create a socket of the HANDSHAKE family which in turn accept()ed tcp sockets. That kind of construct - assuming I interpreted it correctly - did not sound right to me. Back to these patches, they looks sane to me, even if the whole architecture is a bit hard to follow, given the non trivial cross references between the patches - I can likely have missed some relevant point. I'm wondering if this approach scales well enough with the number of concurrent handshakes: the single list looks like a potential bottle- neck. Cheers, Paolo
> On Feb 9, 2023, at 11:02 AM, Paolo Abeni <pabeni@redhat.com> wrote: > > On Thu, 2023-02-09 at 15:43 +0000, Chuck Lever III wrote: >>> On Feb 9, 2023, at 1:00 AM, Jakub Kicinski <kuba@kernel.org> wrote: >>> >>> On Tue, 07 Feb 2023 16:41:13 -0500 Chuck Lever wrote: >>>> diff --git a/tools/include/uapi/linux/netlink.h >>>> b/tools/include/uapi/linux/netlink.h >>>> index 0a4d73317759..a269d356f358 100644 >>>> --- a/tools/include/uapi/linux/netlink.h >>>> +++ b/tools/include/uapi/linux/netlink.h >>>> @@ -29,6 +29,7 @@ >>>> #define NETLINK_RDMA 20 >>>> #define NETLINK_CRYPTO 21 /* Crypto layer */ >>>> #define NETLINK_SMC 22 /* SMC monitoring */ >>>> +#define NETLINK_HANDSHAKE 23 /* transport layer sec >>>> handshake requests */ >>> >>> The extra indirection of genetlink introduces some complications? >> >> I don't think it does, necessarily. But neither does it seem >> to add any value (for this use case). <shrug> > > To me it introduces a good separation between the handshake mechanism > itself and the current subject (sock). > > IIRC the previous version allowed the user-space to create a socket of > the HANDSHAKE family which in turn accept()ed tcp sockets. That kind of > construct - assuming I interpreted it correctly - did not sound right > to me. > > Back to these patches, they looks sane to me, even if the whole > architecture is a bit hard to follow, given the non trivial cross > references between the patches - I can likely have missed some relevant > point. One of the original goals was to support other security protocols besides TLS v1.3, which is why the code is split between two patches. I know that is cumbersome for some review workflows. Now is a good time to simplify, if we see a sensible opportunity to do so. > I'm wondering if this approach scales well enough with the number of > concurrent handshakes: the single list looks like a potential bottle- > neck. It's not clear how much scaling is needed. I don't have a strong sense of how frequently a busy storage server will need a handshake, for instance, but it seems like it would be relatively less frequent than, say, I/O. Network storage connections are typically long-lived, unlike http. In terms of scalability, I am a little more concerned about the handshake_mutex. Maybe that isn't needed since the pending list is spinlock protected? All that said, the single pending list can be replaced easily. It would be straightforward to move it into struct net, for example. -- Chuck Lever
On Thu, 9 Feb 2023 15:43:06 +0000 Chuck Lever III wrote: > >> @@ -29,6 +29,7 @@ > >> #define NETLINK_RDMA 20 > >> #define NETLINK_CRYPTO 21 /* Crypto layer */ > >> #define NETLINK_SMC 22 /* SMC monitoring */ > >> +#define NETLINK_HANDSHAKE 23 /* transport layer sec handshake requests */ > > > > The extra indirection of genetlink introduces some complications? > > I don't think it does, necessarily. But neither does it seem > to add any value (for this use case). <shrug> Our default is to go for generic netlink, it's where we invest most time in terms of infrastructure. It can take care of attribute parsing, and allows user space to dump information about the family (eg. the parsing policy). Most recently we added support for writing the policies in (simple) YAML: https://docs.kernel.org/next/userspace-api/netlink/intro-specs.html It takes care of a lot of the netlink tedium. If you're willing to make use of it I'm happy to help converting etc. We merged this stuff last month, so there are likely sharp edges.
On Thu, 2023-02-09 at 16:34 +0000, Chuck Lever III wrote: > > > On Feb 9, 2023, at 11:02 AM, Paolo Abeni <pabeni@redhat.com> wrote: > > > > On Thu, 2023-02-09 at 15:43 +0000, Chuck Lever III wrote: > > > > On Feb 9, 2023, at 1:00 AM, Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > > > On Tue, 07 Feb 2023 16:41:13 -0500 Chuck Lever wrote: > > > > > diff --git a/tools/include/uapi/linux/netlink.h > > > > > b/tools/include/uapi/linux/netlink.h > > > > > index 0a4d73317759..a269d356f358 100644 > > > > > --- a/tools/include/uapi/linux/netlink.h > > > > > +++ b/tools/include/uapi/linux/netlink.h > > > > > @@ -29,6 +29,7 @@ > > > > > #define NETLINK_RDMA 20 > > > > > #define NETLINK_CRYPTO 21 /* Crypto layer */ > > > > > #define NETLINK_SMC 22 /* SMC monitoring */ > > > > > +#define NETLINK_HANDSHAKE 23 /* transport layer sec > > > > > handshake requests */ > > > > > > > > The extra indirection of genetlink introduces some complications? > > > > > > I don't think it does, necessarily. But neither does it seem > > > to add any value (for this use case). <shrug> > > > > To me it introduces a good separation between the handshake mechanism > > itself and the current subject (sock). > > > > IIRC the previous version allowed the user-space to create a socket of > > the HANDSHAKE family which in turn accept()ed tcp sockets. That kind of > > construct - assuming I interpreted it correctly - did not sound right > > to me. > > > > Back to these patches, they looks sane to me, even if the whole > > architecture is a bit hard to follow, given the non trivial cross > > references between the patches - I can likely have missed some relevant > > point. > > One of the original goals was to support other security protocols > besides TLS v1.3, which is why the code is split between two > patches. I know that is cumbersome for some review workflows. > > Now is a good time to simplify, if we see a sensible opportunity > to do so. I think that adding a 'hi_free'/'hi_release' op inside the handshake_info struct - and moving the handshake info deallocation inside the 'core' could possibly simplify a bit the architecture. Since it looks like there is a reasonable agreement on this path (@Dave, @Eric, @Jakub: please educate me otherwise!), and no clear/immediate show stoppers, I suggested start hammering some documentation with an high level overview that will help also understanding/reviewing the code. > > I'm wondering if this approach scales well enough with the number of > > concurrent handshakes: the single list looks like a potential bottle- > > neck. > > It's not clear how much scaling is needed. I don't have a strong > sense of how frequently a busy storage server will need a handshake, > for instance, but it seems like it would be relatively less frequent > than, say, I/O. Network storage connections are typically long-lived, > unlike http. > > In terms of scalability, I am a little more concerned about the > handshake_mutex. Maybe that isn't needed since the pending list is > spinlock protected? Good point. Indeed it looks like that is not needed. > All that said, the single pending list can be replaced easily. It > would be straightforward to move it into struct net, for example. In the end I don't see a operations needing a full list traversal. handshake_nl_msg_accept walk that, but it stops at netns/proto matching which should be ~always /~very soon in the typical use-case. And as you said it should be easy to avoid even that. I think it could be useful limiting the number of pending handshake to some maximum, to avoid problems in pathological/malicious scenarios. Cheers, Paolo
> On Feb 9, 2023, at 9:07 PM, Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 9 Feb 2023 15:43:06 +0000 Chuck Lever III wrote: >>>> @@ -29,6 +29,7 @@ >>>> #define NETLINK_RDMA 20 >>>> #define NETLINK_CRYPTO 21 /* Crypto layer */ >>>> #define NETLINK_SMC 22 /* SMC monitoring */ >>>> +#define NETLINK_HANDSHAKE 23 /* transport layer sec handshake requests */ >>> >>> The extra indirection of genetlink introduces some complications? >> >> I don't think it does, necessarily. But neither does it seem >> to add any value (for this use case). <shrug> > > Our default is to go for generic netlink, it's where we invest most time > in terms of infrastructure. v2 of the series used generic netlink for the downcall piece. I can convert back to using generic netlink for v4 of the series. > It can take care of attribute parsing, and > allows user space to dump information about the family (eg. the parsing > policy). Most recently we added support for writing the policies in > (simple) YAML: > > https://docs.kernel.org/next/userspace-api/netlink/intro-specs.html > > It takes care of a lot of the netlink tedium. If you're willing to make > use of it I'm happy to help converting etc. We merged this stuff last > month, so there are likely sharp edges. -- Chuck Lever
> On Feb 10, 2023, at 6:41 AM, Paolo Abeni <pabeni@redhat.com> wrote: > > On Thu, 2023-02-09 at 16:34 +0000, Chuck Lever III wrote: >> >>> On Feb 9, 2023, at 11:02 AM, Paolo Abeni <pabeni@redhat.com> wrote: >>> >>> On Thu, 2023-02-09 at 15:43 +0000, Chuck Lever III wrote: >>>>> On Feb 9, 2023, at 1:00 AM, Jakub Kicinski <kuba@kernel.org> wrote: >>>>> >>>>> On Tue, 07 Feb 2023 16:41:13 -0500 Chuck Lever wrote: >>>>>> diff --git a/tools/include/uapi/linux/netlink.h >>>>>> b/tools/include/uapi/linux/netlink.h >>>>>> index 0a4d73317759..a269d356f358 100644 >>>>>> --- a/tools/include/uapi/linux/netlink.h >>>>>> +++ b/tools/include/uapi/linux/netlink.h >>>>>> @@ -29,6 +29,7 @@ >>>>>> #define NETLINK_RDMA 20 >>>>>> #define NETLINK_CRYPTO 21 /* Crypto layer */ >>>>>> #define NETLINK_SMC 22 /* SMC monitoring */ >>>>>> +#define NETLINK_HANDSHAKE 23 /* transport layer sec >>>>>> handshake requests */ >>>>> >>>>> The extra indirection of genetlink introduces some complications? >>>> >>>> I don't think it does, necessarily. But neither does it seem >>>> to add any value (for this use case). <shrug> >>> >>> To me it introduces a good separation between the handshake mechanism >>> itself and the current subject (sock). >>> >>> IIRC the previous version allowed the user-space to create a socket of >>> the HANDSHAKE family which in turn accept()ed tcp sockets. That kind of >>> construct - assuming I interpreted it correctly - did not sound right >>> to me. >>> >>> Back to these patches, they looks sane to me, even if the whole >>> architecture is a bit hard to follow, given the non trivial cross >>> references between the patches - I can likely have missed some relevant >>> point. >> >> One of the original goals was to support other security protocols >> besides TLS v1.3, which is why the code is split between two >> patches. I know that is cumbersome for some review workflows. >> >> Now is a good time to simplify, if we see a sensible opportunity >> to do so. > > I think that adding a 'hi_free'/'hi_release' op inside the > handshake_info struct - and moving the handshake info deallocation > inside the 'core' could possibly simplify a bit the architecture. I'm concerned about lifetime issues for handshake_info. I was thinking maybe these objects need to be reference-counted as well. I'll experiment with adding a destructor method too. > Since it looks like there is a reasonable agreement on this path > (@Dave, @Eric, @Jakub: please educate me otherwise!), and no > clear/immediate show stoppers, I suggested start hammering some > documentation with an high level overview that will help also > understanding/reviewing the code. In previous generations of this series, there was an addition to Documentation/ that explained how kernel TLS consumers use the tls_ handshake API. I can add that back now that things are settling down. But maybe you are thinking of some other topics. I'm happy to write down whatever is needed, but I'd like suggestions about what particular areas would be most helpful. >>> I'm wondering if this approach scales well enough with the number of >>> concurrent handshakes: the single list looks like a potential bottle- >>> neck. >> >> It's not clear how much scaling is needed. I don't have a strong >> sense of how frequently a busy storage server will need a handshake, >> for instance, but it seems like it would be relatively less frequent >> than, say, I/O. Network storage connections are typically long-lived, >> unlike http. >> >> In terms of scalability, I am a little more concerned about the >> handshake_mutex. Maybe that isn't needed since the pending list is >> spinlock protected? > > Good point. Indeed it looks like that is not needed. I will remove the handshake_mutex in v4. >> All that said, the single pending list can be replaced easily. It >> would be straightforward to move it into struct net, for example. > > In the end I don't see a operations needing a full list traversal. > handshake_nl_msg_accept walk that, but it stops at netns/proto matching > which should be ~always /~very soon in the typical use-case. And as you > said it should be easy to avoid even that. > > I think it could be useful limiting the number of pending handshake to > some maximum, to avoid problems in pathological/malicious scenarios. Defending against DoS is sensible. Maybe having a per-net maximum of 5 or 10 pending handshakes? handshake_request() can return an error code if a handshake is requested while we're over that maximum. -- Chuck Lever
On 2/10/23 15:31, Chuck Lever III wrote: > > >> On Feb 10, 2023, at 6:41 AM, Paolo Abeni <pabeni@redhat.com> wrote: >> >> On Thu, 2023-02-09 at 16:34 +0000, Chuck Lever III wrote: >>> [ .. ] >>> All that said, the single pending list can be replaced easily. It >>> would be straightforward to move it into struct net, for example. >> >> In the end I don't see a operations needing a full list traversal. >> handshake_nl_msg_accept walk that, but it stops at netns/proto matching >> which should be ~always /~very soon in the typical use-case. And as you >> said it should be easy to avoid even that. >> >> I think it could be useful limiting the number of pending handshake to >> some maximum, to avoid problems in pathological/malicious scenarios. > > Defending against DoS is sensible. Maybe having a per-net > maximum of 5 or 10 pending handshakes? handshake_request() can > return an error code if a handshake is requested while we're > over that maximum. > Can we check the source settings? Having more than one handshake in the queue coming from the same SRC IP/SRC Port seems a bit pointless, doesn't it? Cheers, Hannes
On Fri, 2023-02-10 at 14:31 +0000, Chuck Lever III wrote: > In previous generations of this series, there was an addition > to Documentation/ that explained how kernel TLS consumers use > the tls_ handshake API. I can add that back now that things > are settling down. That would be useful, thank! > But maybe you are thinking of some other topics. I'm happy to > write down whatever is needed, but I'd like suggestions about > what particular areas would be most helpful. A reference user-space implementation would be very interesting, too. Even a completely "dummy" one for self-tests purpose only could be useful. Speaking of that, at some point we will need some self-tests ;) > > > > I'm wondering if this approach scales well enough with the number of > > > > concurrent handshakes: the single list looks like a potential bottle- > > > > neck. > > > > > > It's not clear how much scaling is needed. I don't have a strong > > > sense of how frequently a busy storage server will need a handshake, > > > for instance, but it seems like it would be relatively less frequent > > > than, say, I/O. Network storage connections are typically long-lived, > > > unlike http. > > > > > > In terms of scalability, I am a little more concerned about the > > > handshake_mutex. Maybe that isn't needed since the pending list is > > > spinlock protected? > > > > Good point. Indeed it looks like that is not needed. > > I will remove the handshake_mutex in v4. > > > > > All that said, the single pending list can be replaced easily. It > > > would be straightforward to move it into struct net, for example. > > > > In the end I don't see a operations needing a full list traversal. > > handshake_nl_msg_accept walk that, but it stops at netns/proto matching > > which should be ~always /~very soon in the typical use-case. And as you > > said it should be easy to avoid even that. > > > > I think it could be useful limiting the number of pending handshake to > > some maximum, to avoid problems in pathological/malicious scenarios. > > Defending against DoS is sensible. Maybe having a per-net > maximum of 5 or 10 pending handshakes? handshake_request() can > return an error code if a handshake is requested while we're > over that maximum. I'm wondering if we could use an {r,w}mem based limits, so that the user-space could eventually tune it as/if needed without any additional knob. Cheers, Paolo
> On Feb 10, 2023, at 10:21 AM, Paolo Abeni <pabeni@redhat.com> wrote: > > On Fri, 2023-02-10 at 14:31 +0000, Chuck Lever III wrote: >> In previous generations of this series, there was an addition >> to Documentation/ that explained how kernel TLS consumers use >> the tls_ handshake API. I can add that back now that things >> are settling down. > > That would be useful, thank! > >> But maybe you are thinking of some other topics. I'm happy to >> write down whatever is needed, but I'd like suggestions about >> what particular areas would be most helpful. > > A reference user-space implementation would be very interesting, too. We've got one of those, specifically for TLSv1.3: https://github.com/oracle/ktls-utils netlink support is added on the "netlink" branch. The user space handshake agent for TLS is under src/tlshd. The netlink stuff is pretty fresh, so there's clean-up to be done. > Even a completely "dummy" one for self-tests purpose only could be > useful. > > Speaking of that, at some point we will need some self-tests ;) Jakub mentioned that during the first round of review last year. I've got some Kunit chops, so I can construct tests. But I'm coming up empty on exactly what would need to be tested. Right, maybe Kunit is the wrong tool for this job... >>>>> I'm wondering if this approach scales well enough with the number of >>>>> concurrent handshakes: the single list looks like a potential bottle- >>>>> neck. >>>> >>>> It's not clear how much scaling is needed. I don't have a strong >>>> sense of how frequently a busy storage server will need a handshake, >>>> for instance, but it seems like it would be relatively less frequent >>>> than, say, I/O. Network storage connections are typically long-lived, >>>> unlike http. >>>> >>>> In terms of scalability, I am a little more concerned about the >>>> handshake_mutex. Maybe that isn't needed since the pending list is >>>> spinlock protected? >>> >>> Good point. Indeed it looks like that is not needed. >> >> I will remove the handshake_mutex in v4. >> >> >>>> All that said, the single pending list can be replaced easily. It >>>> would be straightforward to move it into struct net, for example. >>> >>> In the end I don't see a operations needing a full list traversal. >>> handshake_nl_msg_accept walk that, but it stops at netns/proto matching >>> which should be ~always /~very soon in the typical use-case. And as you >>> said it should be easy to avoid even that. >>> >>> I think it could be useful limiting the number of pending handshake to >>> some maximum, to avoid problems in pathological/malicious scenarios. >> >> Defending against DoS is sensible. Maybe having a per-net >> maximum of 5 or 10 pending handshakes? handshake_request() can >> return an error code if a handshake is requested while we're >> over that maximum. > > I'm wondering if we could use an {r,w}mem based limits, so that the > user-space could eventually tune it as/if needed without any additional > knob. -- Chuck Lever
On Fri, 10 Feb 2023 14:17:28 +0000 Chuck Lever III wrote: > >> I don't think it does, necessarily. But neither does it seem > >> to add any value (for this use case). <shrug> > > > > Our default is to go for generic netlink, it's where we invest most time > > in terms of infrastructure. > > v2 of the series used generic netlink for the downcall piece. > I can convert back to using generic netlink for v4 of the > series. Would you be able to write the spec for it? I'm happy to help with that as I mentioned. Perhaps you have the user space already hand-written here but in case the mechanism/family gets reused it'd be sad if people had to hand write bindings for other programming languages.
> On Feb 10, 2023, at 1:09 PM, Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 10 Feb 2023 14:17:28 +0000 Chuck Lever III wrote: >>>> I don't think it does, necessarily. But neither does it seem >>>> to add any value (for this use case). <shrug> >>> >>> Our default is to go for generic netlink, it's where we invest most time >>> in terms of infrastructure. >> >> v2 of the series used generic netlink for the downcall piece. >> I can convert back to using generic netlink for v4 of the >> series. > > Would you be able to write the spec for it? I'm happy to help with that > as I mentioned. I'm coming from an RPC background, we usually do start from an XDR protocol specification. So, I'm used to that, and it might give us some new ideas about protocol correctness or simplification. Point me to a sample spec or maybe a language reference and we can discuss it further. > Perhaps you have the user space already hand-written > here but in case the mechanism/family gets reused it'd be sad if people > had to hand write bindings for other programming languages. Yes, the user space implementation is currently hand-written C, but it can easily be converted to machine-generated if you have a favorite tool to do that. -- Chuck Lever
On Fri, 10 Feb 2023 19:04:34 +0000 Chuck Lever III wrote: > >> v2 of the series used generic netlink for the downcall piece. > >> I can convert back to using generic netlink for v4 of the > >> series. > > > > Would you be able to write the spec for it? I'm happy to help with that > > as I mentioned. > > I'm coming from an RPC background, we usually do start from an > XDR protocol specification. So, I'm used to that, and it might > give us some new ideas about protocol correctness or > simplification. Nice, our thing is completely homegrown and unprofessional. Hopefully it won't make you run away. > Point me to a sample spec or maybe a language reference and we > can discuss it further. There are only two specs so far in net-next: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/Documentation/netlink/specs Neither of these is great (fou is a bit legacy, and ethtool is not fully expressed), a better example may be this one which is pending in the bpf-next tree: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/Documentation/netlink/specs/netdev.yaml There is a JSON schema spec (which may be useful for checking available fields quickly): https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/Documentation/netlink/genetlink.yaml And (uncharacteristically?), docs: https://docs.kernel.org/next/userspace-api/netlink/index.html > > Perhaps you have the user space already hand-written > > here but in case the mechanism/family gets reused it'd be sad if people > > had to hand write bindings for other programming languages. > > Yes, the user space implementation is currently hand-written C, > but it can easily be converted to machine-generated if you have > a favorite tool to do that. I started hacking on a code generator for C in net-next in tools/net/ynl/ynl-gen-c.py but it's likely bitrotted already. I don't actually have a strong user in C to justify the time investment. All the cool kids these days want to use Rust or Go (and the less cool C++). For development I use Python (tools/net/ynl/cli.py tools/net/ynl/lib/). It should work fairly well for generating the kernel bits (uAPI header, policy and op tables).
On 2/10/23 19:09, Jakub Kicinski wrote: > On Fri, 10 Feb 2023 14:17:28 +0000 Chuck Lever III wrote: >>>> I don't think it does, necessarily. But neither does it seem >>>> to add any value (for this use case). <shrug> >>> >>> Our default is to go for generic netlink, it's where we invest most time >>> in terms of infrastructure. >> >> v2 of the series used generic netlink for the downcall piece. >> I can convert back to using generic netlink for v4 of the >> series. > > Would you be able to write the spec for it? I'm happy to help with that > as I mentioned. Perhaps you have the user space already hand-written > here but in case the mechanism/family gets reused it'd be sad if people > had to hand write bindings for other programming languages. Can you send me a pointer to the YAML specification (and parser)? I couldn't find anything in the linux sources; but maybe I'm looking in the wrong tree or somesuch. Cheers, Hannes
> On Feb 10, 2023, at 4:44 PM, Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 10 Feb 2023 19:04:34 +0000 Chuck Lever III wrote: >>>> v2 of the series used generic netlink for the downcall piece. >>>> I can convert back to using generic netlink for v4 of the >>>> series. >>> >>> Would you be able to write the spec for it? I'm happy to help with that >>> as I mentioned. >> >> I'm coming from an RPC background, we usually do start from an >> XDR protocol specification. So, I'm used to that, and it might >> give us some new ideas about protocol correctness or >> simplification. > > Nice, our thing is completely homegrown and unprofessional. > Hopefully it won't make you run away. > >> Point me to a sample spec or maybe a language reference and we >> can discuss it further. > > There are only two specs so far in net-next: > > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/Documentation/netlink/specs > > Neither of these is great (fou is a bit legacy, and ethtool is not > fully expressed), a better example may be this one which is pending > in the bpf-next tree: > > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/Documentation/netlink/specs/netdev.yaml > > There is a JSON schema spec (which may be useful for checking available > fields quickly): > > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/Documentation/netlink/genetlink.yaml > > And (uncharacteristically?), docs: > > https://docs.kernel.org/next/userspace-api/netlink/index.html Based on this reply I was unsure whether you wanted an English spec (similar to an Internet Draft) or a machine-readable one. But now that I look at these, I think I get it: you'd like a YAML file that can be used with tools to either generate a parser or maybe do some correctness analysis. I think others will benefit as more security protocols come to this party, so it's a good thing to do for extensibility. I will look into this for v5 definitely and maybe v4. v4 already has significant churn... >>> Perhaps you have the user space already hand-written >>> here but in case the mechanism/family gets reused it'd be sad if people >>> had to hand write bindings for other programming languages. >> >> Yes, the user space implementation is currently hand-written C, >> but it can easily be converted to machine-generated if you have >> a favorite tool to do that. > > I started hacking on a code generator for C in net-next in > tools/net/ynl/ynl-gen-c.py but it's likely bitrotted already. > I don't actually have a strong user in C to justify the time > investment. All the cool kids these days want to use Rust or Go > (and the less cool C++). For development I use Python > (tools/net/ynl/cli.py tools/net/ynl/lib/). > > It should work fairly well for generating the kernel bits > (uAPI header, policy and op tables). Makes sense. -- Chuck Lever
On Thu, Feb 9, 2023 at 11:36 AM Chuck Lever III <chuck.lever@oracle.com> wrote: > > > > > On Feb 9, 2023, at 11:02 AM, Paolo Abeni <pabeni@redhat.com> wrote: [..] > > IIRC the previous version allowed the user-space to create a socket of > > the HANDSHAKE family which in turn accept()ed tcp sockets. That kind of > > construct - assuming I interpreted it correctly - did not sound right > > to me. > > > > Back to these patches, they looks sane to me, even if the whole > > architecture is a bit hard to follow, given the non trivial cross > > references between the patches - I can likely have missed some relevant > > point. > > One of the original goals was to support other security protocols > besides TLS v1.3, which is why the code is split between two > patches. I know that is cumbersome for some review workflows. > > Now is a good time to simplify, if we see a sensible opportunity > to do so. > > > > I'm wondering if this approach scales well enough with the number of > > concurrent handshakes: the single list looks like a potential bottle- > > neck. > > It's not clear how much scaling is needed. I don't have a strong > sense of how frequently a busy storage server will need a handshake, > for instance, but it seems like it would be relatively less frequent > than, say, I/O. Network storage connections are typically long-lived, > unlike http. > So this is for storage type traffic only? Assuming TCP/NVME probably. IOW, is it worth doing the handshake via the kernel for a short flow? We have done some analysis and TLS handshake certainly affects overall performance, so moving it into the kernel is a good idea. Question is how would this interact with a) KTLS b) KTLS offload c) user space type of crypto needs like quic, etc? cheers, jamal
Hello Jamal- > On Feb 12, 2023, at 10:40 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > On Thu, Feb 9, 2023 at 11:36 AM Chuck Lever III <chuck.lever@oracle.com> wrote: >> >> >> >>> On Feb 9, 2023, at 11:02 AM, Paolo Abeni <pabeni@redhat.com> wrote: > > [..] > >>> IIRC the previous version allowed the user-space to create a socket of >>> the HANDSHAKE family which in turn accept()ed tcp sockets. That kind of >>> construct - assuming I interpreted it correctly - did not sound right >>> to me. >>> >>> Back to these patches, they looks sane to me, even if the whole >>> architecture is a bit hard to follow, given the non trivial cross >>> references between the patches - I can likely have missed some relevant >>> point. >> >> One of the original goals was to support other security protocols >> besides TLS v1.3, which is why the code is split between two >> patches. I know that is cumbersome for some review workflows. >> >> Now is a good time to simplify, if we see a sensible opportunity >> to do so. >> >> >>> I'm wondering if this approach scales well enough with the number of >>> concurrent handshakes: the single list looks like a potential bottle- >>> neck. >> >> It's not clear how much scaling is needed. I don't have a strong >> sense of how frequently a busy storage server will need a handshake, >> for instance, but it seems like it would be relatively less frequent >> than, say, I/O. Network storage connections are typically long-lived, >> unlike http. >> > > So this is for storage type traffic only? Assuming TCP/NVME probably. > IOW, is it worth doing the handshake via the kernel for a short flow? > We have done some analysis and TLS handshake certainly affects overall > performance, so moving it into the kernel is a good idea. Question is > how would this interact with a) KTLS b) KTLS offload c) user space > type of crypto needs like quic, etc? I believe a summary of this thread is due. --- This is for any in-kernel transport security layer consumer. It is expressly not for the purpose of adding a handshake facility that can be used directly by user space consumers. Today we have several in-kernel consumers ready to use it, including SunRPC-with-TLS, NVMe/TCP with TLS, and SMB with QUIC (that one is less far along because the Linux kernel doesn't have a QUIC implementation yet). They all happen to be storage-related, but I believe that is coincidental. It is possible that once transport layer security is easily available to kernel consumers, other usage scenarios will appear. But networked storage connections are, on average, long- lived. We believe that session set-up overhead for such consumers will not be as significant as the increased latency and compute resources used by the security layer's record protocols. --- We would prefer an in-kernel handshake implementation, and for TLSv1.3 in particular, we believe everything is already available in-kernel (ciphers, certificate management, and so on) except for the TLS handshake protocol engine. So maybe not a heavy lift technically, especially because the TLSv1.3 handshake subprotocol is simpler than the ones in past versions. We know of at least one out-of-tree in-kernel handshake implementation, so it's definitely feasible. The purpose of that implementation is scalable handshake performance for user space consumers (like web front ends) so what you say about potential better performance is certainly plausible. But it's not the purpose of what I'm doing here. --- Most pertinently, however, the kernel security folks have stated that an in-kernel handshake is out of the question because it would be yet another handshake implementation to maintain and keep secure. They greatly prefer that these kinds of things be done in user space using a well-vetted library implementation. As an example, our upcall TLSv1.3 handshake prototype has chosen to use GnuTLS. Once the prototype handshake mechanism has established a TLS session, the user space library configures kTLS on the kernel's socket. Thus, our prototype enables in-kernel TLS consumers to take advantage of both software ciphers and NIC offload today via kTLS (thanks to Boris and his team). --- Lastly, there are potentially other transport security layer protocols aside from TLSv1.3 that will need similar treatment. Jakub is working on one now, I believe. I would expect an uphill battle for getting each one of those added to the kernel. --- So the solution we have derived is an upcall mechanism that will be at least a stop-gap, but may become a longer-term solution depending on the politics of getting handshake protocol implementations into kernel space. I believe that activity is also ongoing for TLSv1.3 at least, but it seems to me like a more distant solution. We have immediate cloud provider demand for both the RPC and NVMe usage scenarios with TLSv1.3, and the SMB service in Azure implements QUIC on Windows (which re-uses TLSv1.3's handshake protocol). Thus I'd like to see a handshake facility made available as soon as we can muster a sensible one. -- Chuck Lever
On Sat, 11 Feb 2023 20:55:58 +0000 Chuck Lever III wrote: > Based on this reply I was unsure whether you wanted an English > spec (similar to an Internet Draft) or a machine-readable one. I meant machine-readable. > But now that I look at these, I think I get it: you'd like a > YAML file that can be used with tools to either generate a > parser or maybe do some correctness analysis. > > I think others will benefit as more security protocols come > to this party, so it's a good thing to do for extensibility. Yup, it's great for parsers and we also plan to add syzbot metadata to it with semantics of the fields. > I will look into this for v5 definitely and maybe v4. v4 > already has significant churn...
On Sat, 11 Feb 2023 13:11:02 +0100 Hannes Reinecke wrote: > > Would you be able to write the spec for it? I'm happy to help with that > > as I mentioned. Perhaps you have the user space already hand-written > > here but in case the mechanism/family gets reused it'd be sad if people > > had to hand write bindings for other programming languages. > > Can you send me a pointer to the YAML specification (and parser)? > I couldn't find anything in the linux sources; but maybe I'm looking in > the wrong tree or somesuch. The ready-for-consumption specs are only in net-next, but the user space code gen did not end up there (yet?) I pushed some old branch where I had started typing up user space C code gen there: https://github.com/kuba-moo/ynl/tree/yaml-ynl-c-wip It's an old branch taken out of trash so there's a lot of unrelated garbage :( Only stuff of note would be the under tools/net/ynl/
diff --git a/include/net/handshake.h b/include/net/handshake.h new file mode 100644 index 000000000000..a439d823e828 --- /dev/null +++ b/include/net/handshake.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * NETLINK_HANDSHAKE service. + * + * Author: Chuck Lever <chuck.lever@oracle.com> + * + * Copyright (c) 2023, Oracle and/or its affiliates. + */ + +/* + * Data structures and functions that are visible only within the + * kernel are declared here. + */ + +#ifndef _NET_HANDSHAKE_H +#define _NET_HANDSHAKE_H + +struct handshake_info { + struct list_head hi_list; + + struct socket *hi_sock; + int hi_fd; + int hi_mcgrp; + int hi_protocol; + + struct sk_buff *(*hi_accept)(struct handshake_info *hsi, + struct sk_buff *skb, + struct nlmsghdr *nlh); + void (*hi_done)(struct handshake_info *hsi, + struct sk_buff *skb, + struct nlmsghdr *nlh, + struct nlattr *args); +}; + +extern int handshake_request(struct handshake_info *hsi, gfp_t flags); + +#endif /* _NET_HANDSHAKE_H */ diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 8c3587d5c308..88fd0442249c 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -105,6 +105,7 @@ struct net { struct sock *genl_sock; struct uevent_sock *uevent_sock; /* uevent socket */ + struct sock *hs_sock; /* handshake requests */ struct hlist_head *dev_name_head; struct hlist_head *dev_index_head; diff --git a/include/net/sock.h b/include/net/sock.h index e0517ecc6531..de0510306e28 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -515,6 +515,7 @@ struct sock { struct socket *sk_socket; void *sk_user_data; + void *sk_handshake_info; #ifdef CONFIG_SECURITY void *sk_security; #endif diff --git a/include/uapi/linux/handshake.h b/include/uapi/linux/handshake.h new file mode 100644 index 000000000000..39cab687eece --- /dev/null +++ b/include/uapi/linux/handshake.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * NETLINK_HANDSHAKE service. + * + * Author: Chuck Lever <chuck.lever@oracle.com> + * + * Copyright (c) 2023, Oracle and/or its affiliates. + */ + +/* + * Data structures and functions that are visible to user space are + * declared here. This file constitutes an API contract between the + * Linux kernel and user space. + */ + +#ifndef _UAPI_LINUX_HANDSHAKE_H +#define _UAPI_LINUX_HANDSHAKE_H + +/* Multicast Netlink socket groups */ +enum handshake_nlgrps { + HANDSHAKE_NLGRP_NONE = 0, + __HANDSHAKE_NLGRP_MAX +}; +#define HSNLGRP_MAX (__HANDSHAKE_NLGRP_MAX - 1) + +enum handshake_nl_msgs { + HANDSHAKE_NL_MSG_BASE = NLMSG_MIN_TYPE, + HANDSHAKE_NL_MSG_READY, + HANDSHAKE_NL_MSG_ACCEPT, + HANDSHAKE_NL_MSG_DONE, + __HANDSHAKE_NL_MSG_MAX +}; +#define HANDSHAKE_NL_MSG_MAX (__HANDSHAKE_NL_MSG_MAX - 1) + +enum handshake_nl_attrs { + HANDSHAKE_NL_ATTR_UNSPEC = 0, + HANDSHAKE_NL_ATTR_MSG_STATUS, + HANDSHAKE_NL_ATTR_SOCKFD, + HANDSHAKE_NL_ATTR_PROTOCOL, + HANDSHAKE_NL_ATTR_ACCEPT_RESP, + HANDSHAKE_NL_ATTR_DONE_ARGS, + + __HANDSHAKE_NL_ATTR_MAX +}; +#define HANDSHAKE_NL_ATTR_MAX (__HANDSHAKE_NL_ATTR_MAX - 1) + +enum handshake_nl_status { + HANDSHAKE_NL_STATUS_OK = 0, + HANDSHAKE_NL_STATUS_INVAL, + HANDSHAKE_NL_STATUS_BADF, + HANDSHAKE_NL_STATUS_NOTREADY, + HANDSHAKE_NL_STATUS_SYSTEMFAULT, +}; + +enum handshake_nl_protocol { + HANDSHAKE_NL_PROTO_UNSPEC = 0, +}; + +enum handshake_nl_tls_session_status { + HANDSHAKE_NL_TLS_SESS_STATUS_OK = 0, /* session established */ + HANDSHAKE_NL_TLS_SESS_STATUS_FAULT, /* failure to launch */ + HANDSHAKE_NL_TLS_SESS_STATUS_REJECTED, /* remote hates us */ +}; + +#endif /* _UAPI_LINUX_HANDSHAKE_H */ diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h index e2ae82e3f9f7..a29b2db5fa8a 100644 --- a/include/uapi/linux/netlink.h +++ b/include/uapi/linux/netlink.h @@ -29,6 +29,7 @@ #define NETLINK_RDMA 20 #define NETLINK_CRYPTO 21 /* Crypto layer */ #define NETLINK_SMC 22 /* SMC monitoring */ +#define NETLINK_HANDSHAKE 23 /* transport layer sec handshake requests */ #define NETLINK_INET_DIAG NETLINK_SOCK_DIAG diff --git a/net/Makefile b/net/Makefile index 6a62e5b27378..c1bb53f00486 100644 --- a/net/Makefile +++ b/net/Makefile @@ -78,3 +78,4 @@ obj-$(CONFIG_NET_NCSI) += ncsi/ obj-$(CONFIG_XDP_SOCKETS) += xdp/ obj-$(CONFIG_MPTCP) += mptcp/ obj-$(CONFIG_MCTP) += mctp/ +obj-y += handshake/ diff --git a/net/handshake/Makefile b/net/handshake/Makefile new file mode 100644 index 000000000000..b27400c01427 --- /dev/null +++ b/net/handshake/Makefile @@ -0,0 +1,11 @@ +# SPDX-License-Identifier: GPL-2.0-only +# +# Makefile for the NETLINK_HANDSHAKE service +# +# Author: Chuck Lever <chuck.lever@oracle.com> +# +# Copyright (c) 2023, Oracle and/or its affiliates. +# + +obj-y += handshake.o +handshake-y := netlink.o diff --git a/net/handshake/netlink.c b/net/handshake/netlink.c new file mode 100644 index 000000000000..49e05fa34df3 --- /dev/null +++ b/net/handshake/netlink.c @@ -0,0 +1,320 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * NETLINK_HANDSHAKE service + * + * Author: Chuck Lever <chuck.lever@oracle.com> + * + * Copyright (c) 2023, Oracle and/or its affiliates. + */ + +#include <linux/types.h> +#include <linux/socket.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/skbuff.h> +#include <linux/inet.h> + +#include <net/sock.h> +#include <net/netlink.h> +#include <net/handshake.h> + +#include <uapi/linux/handshake.h> + +static DEFINE_SPINLOCK(handshake_lock); +static LIST_HEAD(handshake_pending); + +/* + * Send a "ready" notification to the multicast group for this + * security handshake type, in the same net namespace as @hi_sock. + */ +static int handshake_notify(struct handshake_info *hsi, gfp_t flags) +{ + struct net *net = sock_net(hsi->hi_sock->sk); + struct sock *nls = net->hs_sock; + struct sk_buff *msg; + struct nlmsghdr *nlh; + int err; + + if (!netlink_has_listeners(nls, hsi->hi_mcgrp)) + return -ESRCH; + + err = -ENOMEM; + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, flags); + if (!msg) + goto out_err; + nlh = nlmsg_put(msg, 0, 0, HANDSHAKE_NL_MSG_READY, 0, 0); + if (!nlh) + goto out_err; + nlmsg_end(msg, nlh); + + return nlmsg_notify(nls, msg, 0, hsi->hi_mcgrp, nlmsg_report(nlh), + flags); + +out_err: + nlmsg_free(msg); + return err; +} + +/** + * handshake_request - consumer API to request a handshake + * @hsi: socket and callback information + * @flags: memory allocation flags + * + * Return values: + * %0: Request queued + * %-ESRCH: No user space HANDSHAKE listeners + * %-ENOMEM: Memory allocation failed + */ +int handshake_request(struct handshake_info *hsi, gfp_t flags) +{ + int ret; + + spin_lock(&handshake_lock); + list_add(&hsi->hi_list, &handshake_pending); + hsi->hi_sock->sk->sk_handshake_info = NULL; + spin_unlock(&handshake_lock); + + /* XXX: racy */ + ret = handshake_notify(hsi, flags); + if (ret) { + spin_lock(&handshake_lock); + if (!list_empty(&hsi->hi_list)) + list_del_init(&hsi->hi_list); + spin_unlock(&handshake_lock); + } + + return ret; +} +EXPORT_SYMBOL(handshake_request); + +static int handshake_accept(struct socket *sock) +{ + int flags = O_CLOEXEC; + struct file *file; + int fd; + + fd = get_unused_fd_flags(flags); + if (fd < 0) + return fd; + file = sock_alloc_file(sock, flags, sock->sk->sk_prot_creator->name); + if (IS_ERR(file)) { + put_unused_fd(fd); + return PTR_ERR(file); + } + + fd_install(fd, file); + return fd; +} + +static const struct nla_policy +handshake_nl_attr_policy[HANDSHAKE_NL_ATTR_MAX + 1] = { + [HANDSHAKE_NL_ATTR_MSG_STATUS] = { + .type = NLA_U32 + }, + [HANDSHAKE_NL_ATTR_SOCKFD] = { + .type = NLA_U32 + }, + [HANDSHAKE_NL_ATTR_PROTOCOL] = { + .type = NLA_U32 + }, + [HANDSHAKE_NL_ATTR_ACCEPT_RESP] = { + .type = NLA_NESTED, + }, + [HANDSHAKE_NL_ATTR_DONE_ARGS] = { + .type = NLA_NESTED, + }, +}; + +static int handshake_nl_msg_unsupp(struct sk_buff *skb, struct nlmsghdr *nlh, + struct nlattr **tb) +{ + pr_err("Handshake: Unknown command (%d) was ignored\n", nlh->nlmsg_type); + return -EINVAL; +} + +static int handshake_nl_status_reply(struct sk_buff *skb, struct nlmsghdr *nlh, + enum handshake_nl_status status) +{ + struct net *net = sock_net(skb->sk); + struct nlmsghdr *hdr; + struct sk_buff *msg; + int ret; + + ret = -ENOMEM; + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + goto out; + hdr = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, + nlh->nlmsg_type, 0, 0); + if (!hdr) + goto out_free; + + ret = -EMSGSIZE; + ret = nla_put_u32(msg, HANDSHAKE_NL_ATTR_MSG_STATUS, status); + if (ret < 0) + goto out_free; + + nlmsg_end(msg, hdr); + return nlmsg_unicast(net->hs_sock, msg, NETLINK_CB(skb).portid); + +out_free: + nlmsg_free(msg); +out: + return ret; +} + +static int handshake_nl_msg_accept(struct sk_buff *skb, struct nlmsghdr *nlh, + struct nlattr **tb) +{ + struct net *net = sock_net(skb->sk); + struct handshake_info *pos, *hsi; + struct sk_buff *msg; + int protocol; + + if (!tb[HANDSHAKE_NL_ATTR_PROTOCOL]) + return handshake_nl_status_reply(skb, nlh, + HANDSHAKE_NL_STATUS_INVAL); + protocol = nla_get_u32(tb[HANDSHAKE_NL_ATTR_PROTOCOL]); + + hsi = NULL; + spin_lock(&handshake_lock); + list_for_each_entry(pos, &handshake_pending, hi_list) { + if (sock_net(pos->hi_sock->sk) != net) + continue; + if (pos->hi_protocol != protocol) + continue; + + list_del_init(&pos->hi_list); + hsi = pos; + break; + } + spin_unlock(&handshake_lock); + if (!hsi) + return handshake_nl_status_reply(skb, nlh, + HANDSHAKE_NL_STATUS_NOTREADY); + + hsi->hi_fd = handshake_accept(hsi->hi_sock); + if (hsi->hi_fd < 0) + return handshake_nl_status_reply(skb, nlh, + HANDSHAKE_NL_STATUS_SYSTEMFAULT); + + msg = hsi->hi_accept(hsi, skb, nlh); + if (IS_ERR(msg)) + return PTR_ERR(msg); + + hsi->hi_sock->sk->sk_handshake_info = hsi; + return nlmsg_unicast(net->hs_sock, msg, NETLINK_CB(skb).portid); +} + +/* + * This function is careful to not close the socket. It merely removes + * it from the file descriptor table so that it is no longer visible + * to the calling process. + */ +static int handshake_nl_msg_done(struct sk_buff *skb, struct nlmsghdr *nlh, + struct nlattr **tb) +{ + struct handshake_info *hsi; + struct socket *sock; + int fd, err; + + if (!tb[HANDSHAKE_NL_ATTR_SOCKFD]) + return handshake_nl_status_reply(skb, nlh, + HANDSHAKE_NL_STATUS_INVAL); + + err = 0; + fd = nla_get_u32(tb[HANDSHAKE_NL_ATTR_SOCKFD]); + sock = sockfd_lookup(fd, &err); + if (err) + return handshake_nl_status_reply(skb, nlh, + HANDSHAKE_NL_STATUS_BADF); + + put_unused_fd(fd); + + hsi = sock->sk->sk_handshake_info; + if (hsi) { + hsi->hi_done(hsi, skb, nlh, tb[HANDSHAKE_NL_ATTR_DONE_ARGS]); + sock->sk->sk_handshake_info = NULL; + } + return 0; +} + +static const struct handshake_link { + int (*doit)(struct sk_buff *skb, struct nlmsghdr *nlh, + struct nlattr **tb); +} handshake_dispatch[] = { + [HANDSHAKE_NL_MSG_ACCEPT - HANDSHAKE_NL_MSG_BASE] = { + .doit = handshake_nl_msg_accept, + }, + [HANDSHAKE_NL_MSG_DONE - HANDSHAKE_NL_MSG_BASE] = { + .doit = handshake_nl_msg_done, + }, +}; + +static int handshake_nl_rcv_skb(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[HANDSHAKE_NL_ATTR_MAX + 1]; + const struct handshake_link *link; + int err; + + if (!netlink_net_capable(skb, CAP_NET_ADMIN)) + return -EPERM; + + if (nlh->nlmsg_type > HANDSHAKE_NL_MSG_MAX) + return handshake_nl_msg_unsupp(skb, nlh, tb); + link = &handshake_dispatch[nlh->nlmsg_type - HANDSHAKE_NL_MSG_BASE]; + if (!link->doit) + return handshake_nl_msg_unsupp(skb, nlh, tb); + + err = nlmsg_parse(nlh, 0, tb, HANDSHAKE_NL_ATTR_MAX, + handshake_nl_attr_policy, extack); + if (err < 0) + return err; + + return link->doit(skb, nlh, tb); +} + +static void handshake_nl_rcv(struct sk_buff *skb) +{ + static DEFINE_MUTEX(handshake_mutex); + + mutex_lock(&handshake_mutex); + netlink_rcv_skb(skb, &handshake_nl_rcv_skb); + mutex_unlock(&handshake_mutex); +} + +static int __net_init handshake_nl_net_init(struct net *net) +{ + struct netlink_kernel_cfg cfg = { + .groups = HSNLGRP_MAX, + .input = handshake_nl_rcv, + }; + + net->hs_sock = netlink_kernel_create(net, NETLINK_HANDSHAKE, &cfg); + return net->hs_sock == NULL ? -ENOMEM : 0; +} + +static void __net_exit handshake_nl_net_exit(struct net *net) +{ + netlink_kernel_release(net->hs_sock); + net->hs_sock = NULL; +} + +static struct pernet_operations handshake_nl_net_ops = { + .init = handshake_nl_net_init, + .exit = handshake_nl_net_exit, +}; + +static int __init handshake_nl_init(void) +{ + return register_pernet_subsys(&handshake_nl_net_ops); +} + +static void __exit handshake_nl_exit(void) +{ + unregister_pernet_subsys(&handshake_nl_net_ops); +} + +module_init(handshake_nl_init); +module_exit(handshake_nl_exit); diff --git a/tools/include/uapi/linux/netlink.h b/tools/include/uapi/linux/netlink.h index 0a4d73317759..a269d356f358 100644 --- a/tools/include/uapi/linux/netlink.h +++ b/tools/include/uapi/linux/netlink.h @@ -29,6 +29,7 @@ #define NETLINK_RDMA 20 #define NETLINK_CRYPTO 21 /* Crypto layer */ #define NETLINK_SMC 22 /* SMC monitoring */ +#define NETLINK_HANDSHAKE 23 /* transport layer sec handshake requests */ #define NETLINK_INET_DIAG NETLINK_SOCK_DIAG
When a kernel consumer needs a transport layer security session, it first needs a handshake to negotiate and establish a session. This negotiation can be done in user space via one of the several existing library implementations, or it can be done in the kernel. No in-kernel handshake implementations yet exist. In their absence, we add a netlink service akin to NETLINK_ROUTE that can: a. Notify a user space daemon that a handshake is needed. b. Once notified, the daemon calls the kernel back via this netlink service to get the handshake parameters, including an open socket on which to establish the session. The notification service uses a multicast group. Each handshake protocol (eg, TLSv1.3, PSP, etc) adopts its own group number so that the user space daemons for performing the handshakes are completely independent of one another. The kernel can then tell via netlink_has_listeners() whether a user space daemon is active and can handle a handshake request for the desired security layer protocol. A new netlink operation, ACCEPT, acts like accept(2) in that it instantiates a file descriptor in the user space daemon's fd table. If this operation is successful, the reply carries the fd number, which can be treated as an open and ready file descriptor. While user space is performing the handshake, the kernel keeps its muddy paws off the open socket. The act of closing the user space file descriptor alerts the kernel that the open socket is safe to use again. When the user daemon completes a handshake, the kernel is responsible for checking that a valid transport layer security session has been established. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- include/net/handshake.h | 37 ++++ include/net/net_namespace.h | 1 include/net/sock.h | 1 include/uapi/linux/handshake.h | 65 +++++++ include/uapi/linux/netlink.h | 1 net/Makefile | 1 net/handshake/Makefile | 11 + net/handshake/netlink.c | 320 ++++++++++++++++++++++++++++++++++++ tools/include/uapi/linux/netlink.h | 1 9 files changed, 438 insertions(+) create mode 100644 include/net/handshake.h create mode 100644 include/uapi/linux/handshake.h create mode 100644 net/handshake/Makefile create mode 100644 net/handshake/netlink.c