Message ID | 20240510030435.120935-2-kuba@kernel.org (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | add basic PSP encryption for TCP connections | expand |
On 09 May 20:04, Jakub Kicinski wrote: >Add documentation of things which belong in the docs rather >than commit messages. > >Signed-off-by: Jakub Kicinski <kuba@kernel.org> >--- > Documentation/networking/index.rst | 1 + > Documentation/networking/psp.rst | 138 +++++++++++++++++++++++++++++ > 2 files changed, 139 insertions(+) > create mode 100644 Documentation/networking/psp.rst > >diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst >index 7664c0bfe461..0376029ecbdf 100644 >--- a/Documentation/networking/index.rst >+++ b/Documentation/networking/index.rst >@@ -94,6 +94,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics. > ppp_generic > proc_net_tcp > pse-pd/index >+ psp > radiotap-headers > rds > regulatory >diff --git a/Documentation/networking/psp.rst b/Documentation/networking/psp.rst >new file mode 100644 >index 000000000000..a39b464813ab >--- /dev/null >+++ b/Documentation/networking/psp.rst >@@ -0,0 +1,138 @@ >+.. SPDX-License-Identifier: GPL-2.0-only >+ >+===================== >+PSP Security Protocol >+===================== >+ >+Protocol >+======== >+ >+PSP Security Protocol (PSP) was defined at Google and published in: >+ >+https://raw.githubusercontent.com/google/psp/main/doc/PSP_Arch_Spec.pdf >+ >+This section briefly covers protocol aspects crucial for understanding >+the kernel API. Refer to the protocol specification for further details. >+ >+Note that the kernel implementation and documentation uses the term >+"secret state" in place of "master key", it is both less confusing >+to an average developer and is less likely to run afoul any naming >+guidelines. >+ [ ... ] >+User facing API >+=============== >+ >+PSP is designed primarily for hardware offloads. There is currently >+no software fallback for systems which do not have PSP capable NICs. >+There is also no standard (or otherwise defined) way of establishing >+a PSP-secured connection or exchanging the symmetric keys. >+ >+The expectation is that higher layer protocols will take care of >+protocol and key negotiation. For example one may use TLS key exchange, >+announce the PSP capability, and switch to PSP if both endpoints >+are PSP-capable. >+ The documentation doesn't include anything about userspace, other than highlevel remarks on how this is expected to work. What are we planning for userspace? I know we have kperf basic support and some experimental python library, but nothing official or psp centric. I propose to start community driven project with a well established library, with some concrete sample implementation for key negotiation, as a plugin maybe, so anyone can implement their own key-exchange mechanisms on top of the official psp library.
On Fri, 10 May 2024 15:19:23 -0700 Saeed Mahameed wrote: > >+PSP is designed primarily for hardware offloads. There is currently > >+no software fallback for systems which do not have PSP capable NICs. > >+There is also no standard (or otherwise defined) way of establishing > >+a PSP-secured connection or exchanging the symmetric keys. > >+ > >+The expectation is that higher layer protocols will take care of > >+protocol and key negotiation. For example one may use TLS key exchange, > >+announce the PSP capability, and switch to PSP if both endpoints > >+are PSP-capable. > > The documentation doesn't include anything about userspace, other than > highlevel remarks on how this is expected to work. The cover letter does. > What are we planning for userspace? I know we have kperf basic support and > some experimental python library, but nothing official or psp centric. Remind me, how long did it take for kernel TLS support to be merged into OpenSSL? ;) > I propose to start community driven project with a well established > library, with some concrete sample implementation for key negotiation, > as a plugin maybe, so anyone can implement their own key-exchange > mechanisms on top of the official psp library. Yes, I should have CCed Meta's folks who work on TLS [1]. Adding them now. More than happy to facilitate the discussion, maybe Willem can CC the right Google folks, IDK who else... We should start moving with the kernel support, IMO, until we do the user space implementation is stalled. I don't expect that the way we install keys in the kernel would be impacted by the handshake. [1] https://github.com/facebookincubator/fizz
On 11.05.2024 01:11, Jakub Kicinski wrote: > On Fri, 10 May 2024 15:19:23 -0700 Saeed Mahameed wrote: >>> +PSP is designed primarily for hardware offloads. There is currently >>> +no software fallback for systems which do not have PSP capable NICs. >>> +There is also no standard (or otherwise defined) way of establishing >>> +a PSP-secured connection or exchanging the symmetric keys. >>> + >>> +The expectation is that higher layer protocols will take care of >>> +protocol and key negotiation. For example one may use TLS key exchange, >>> +announce the PSP capability, and switch to PSP if both endpoints >>> +are PSP-capable. >> >> The documentation doesn't include anything about userspace, other than >> highlevel remarks on how this is expected to work. > > The cover letter does. > >> What are we planning for userspace? I know we have kperf basic support and >> some experimental python library, but nothing official or psp centric. > > Remind me, how long did it take for kernel TLS support to be merged > into OpenSSL? ;) I believe it was bad timing for OpenSSL. The patches with kTLS support were sitting in the main branch for long time, the problem was postponed release with with jump to new versioning schema. But I agree, there is no easy way to start coding user-space lib without initial support from kernel. >> I propose to start community driven project with a well established >> library, with some concrete sample implementation for key negotiation, >> as a plugin maybe, so anyone can implement their own key-exchange >> mechanisms on top of the official psp library. > > Yes, I should have CCed Meta's folks who work on TLS [1]. Adding them > now. More than happy to facilitate the discussion, maybe Willem can > CC the right Google folks, IDK who else... > > We should start moving with the kernel support, IMO, until we do > the user space implementation is stalled. I don't expect that the > way we install keys in the kernel would be impacted by the handshake. > > [1] https://github.com/facebookincubator/fizz
On 5/11/24 3:41 AM, Vadim Fedorenko wrote: > But I agree, there is no easy way to start coding user-space lib without > initial > support from kernel. The 2 sides should be co-developed; it is the only way to merge a sane uapi.
Jakub Kicinski wrote: > Add documentation of things which belong in the docs rather > than commit messages. > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > --- > Documentation/networking/index.rst | 1 + > Documentation/networking/psp.rst | 138 +++++++++++++++++++++++++++++ > 2 files changed, 139 insertions(+) > create mode 100644 Documentation/networking/psp.rst > > diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst > index 7664c0bfe461..0376029ecbdf 100644 > --- a/Documentation/networking/index.rst > +++ b/Documentation/networking/index.rst > @@ -94,6 +94,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics. > ppp_generic > proc_net_tcp > pse-pd/index > + psp > radiotap-headers > rds > regulatory > diff --git a/Documentation/networking/psp.rst b/Documentation/networking/psp.rst > new file mode 100644 > index 000000000000..a39b464813ab > --- /dev/null > +++ b/Documentation/networking/psp.rst > @@ -0,0 +1,138 @@ > +.. SPDX-License-Identifier: GPL-2.0-only > + > +===================== > +PSP Security Protocol > +===================== > + > +Protocol > +======== > + > +PSP Security Protocol (PSP) was defined at Google and published in: > + > +https://raw.githubusercontent.com/google/psp/main/doc/PSP_Arch_Spec.pdf > + > +This section briefly covers protocol aspects crucial for understanding > +the kernel API. Refer to the protocol specification for further details. > + > +Note that the kernel implementation and documentation uses the term > +"secret state" in place of "master key", it is both less confusing > +to an average developer and is less likely to run afoul any naming > +guidelines. There is some value in using the same terminology in the code as in the spec. And the session keys are derived from a key. That is more precise than state. Specifically, counter-mode KDF from an AES key. Perhaps device key, instead of master key? > +Derived Rx keys > +--------------- > + > +PSP borrows some terms and mechanisms from IPsec. PSP was designed > +with HW offloads in mind. The key feature of PSP is that Rx keys for every > +connection do not have to be stored by the receiver but can be derived > +from secret state and information present in packet headers. A second less obvious, but neat, feature is that it supports an encryption offset, such that (say) the L4 ports are integrity protected, but not encrypted, to allow for in-network telemetry. > +This makes it possible to implement receivers which require a constant > +amount of memory regardless of the number of connections (``O(1)`` scaling). > + > +Tx keys have to be stored like with any other protocol, Keys can optionally be passed in descriptor. > +The expectation is that higher layer protocols will take care of > +protocol and key negotiation. For example one may use TLS key exchange, > +announce the PSP capability, and switch to PSP if both endpoints > +are PSP-capable. > +Securing a connection > +--------------------- > + > +PSP encryption is currently only supported for TCP connections. > +Rx and Tx keys are allocated separately. First the ``rx-assoc`` > +Netlink command needs to be issued, specifying a target TCP socket. > +Kernel will allocate a new PSP Rx key from the NIC and associate it > +with given socket. At this stage socket will accept both PSP-secured > +and plain text TCP packets. > + > +Tx keys are installed using the ``tx-assoc`` Netlink command. > +Once the Tx keys are installed all data read from the socket will > +be PSP-secured. In other words act of installing Tx keys has the secondary > +effect on the Rx direction, requring all received packets to be encrypted. Consider clarifying the entire state diagram from when one pair initiates upgrade. And some edge cases: - retransmits - TCP fin handshake, if only one peer succeeds - TCP control socket response to encrypted pkt What is the expectation for data already queued for transmission when the tx assocation is made? More generally, what happens for data in flight. One possible simplification is to only allow an upgrade sequence (possibly including in-band exchange of keys) when no other data is in flight. > +Since packet reception is asynchronous, to make it possible for the > +application to trust that any data read from the socket after the ``tx-assoc`` > +call returns success has been encrypted, the kernel will scan the receive > +queue of the socket at ``tx-assoc`` time. If any enqueued packet was received > +in clear text the Tx association will fail, and application should retry > +installing the Tx key after draining the socket (this should not be necessary > +if both endpoints are well behaved). > + > +Rotation notifications > +---------------------- > + > +The rotations of secret state happen asynchornously and are usually typo: asynchronously > +performed by management daemons, not under application control. > +The PSP netlink family will generate a notification whenever keys > +are rotated. The applications are expected to re-establish connections > +before keys are rotated again. Connection key rotation is not supported? I did notice that tx key insertion fails if a key is already present, so this does appear to be the behavior.
On Sun, 12 May 2024 21:24:23 -0400 Willem de Bruijn wrote: > Jakub Kicinski wrote: > > +PSP Security Protocol (PSP) was defined at Google and published in: > > + > > +https://raw.githubusercontent.com/google/psp/main/doc/PSP_Arch_Spec.pdf > > + > > +This section briefly covers protocol aspects crucial for understanding > > +the kernel API. Refer to the protocol specification for further details. > > + > > +Note that the kernel implementation and documentation uses the term > > +"secret state" in place of "master key", it is both less confusing > > +to an average developer and is less likely to run afoul any naming > > +guidelines. > > There is some value in using the same terminology in the code as in > the spec. > > And the session keys are derived from a key. That is more precise than > state. Specifically, counter-mode KDF from an AES key. > > Perhaps device key, instead of master key? Weak preference towards secret state, but device key works, too. > > +Derived Rx keys > > +--------------- > > + > > +PSP borrows some terms and mechanisms from IPsec. PSP was designed > > +with HW offloads in mind. The key feature of PSP is that Rx keys for every > > +connection do not have to be stored by the receiver but can be derived > > +from secret state and information present in packet headers. > > A second less obvious, but neat, feature is that it supports an > encryption offset, such that (say) the L4 ports are integrity > protected, but not encrypted, to allow for in-network telemetry. I know, but the opening paragraph has: This section briefly covers protocol aspects crucial for understanding the kernel API. Refer to the protocol specification for further details. :) .. and I didn't implement the offset, yet. (It's trivial to add and ETOOMANYPATCHES.) > > +This makes it possible to implement receivers which require a constant > > +amount of memory regardless of the number of connections (``O(1)`` scaling). > > + > > +Tx keys have to be stored like with any other protocol, > > Keys can optionally be passed in descriptor. Added: Preferably, the Tx keys should be provided with the packet (e.g. as part of the descriptors). > > +The expectation is that higher layer protocols will take care of > > +protocol and key negotiation. For example one may use TLS key exchange, > > +announce the PSP capability, and switch to PSP if both endpoints > > +are PSP-capable. > > > +Securing a connection > > +--------------------- > > + > > +PSP encryption is currently only supported for TCP connections. > > +Rx and Tx keys are allocated separately. First the ``rx-assoc`` > > +Netlink command needs to be issued, specifying a target TCP socket. > > +Kernel will allocate a new PSP Rx key from the NIC and associate it > > +with given socket. At this stage socket will accept both PSP-secured > > +and plain text TCP packets. > > + > > +Tx keys are installed using the ``tx-assoc`` Netlink command. > > +Once the Tx keys are installed all data read from the socket will > > +be PSP-secured. In other words act of installing Tx keys has the secondary > > +effect on the Rx direction, requring all received packets to be encrypted. > > Consider clarifying the entire state diagram from when one pair > initiates upgrade. Not sure about state diagram, there are only 3 states. Or do you mean extend TCP state diagrams? I think a table may be better: Event | Normal TCP | Rx PSP key present | Tx PSP key present | --------------------------------------------------------------------------- Rx plain text | accept | accept | drop | Rx PSP (good) | drop | accept | accept | Rx PSP (bad) | drop | drop | drop | Tx | plain text | plain text | encrypted * | * data enqueued before Tx key in installed will not be encrypted (either initial send nor retranmissions) What should I add? > And some edge cases: > > - retransmits > - TCP fin handshake, if only one peer succeeds So FIN when one end is "locked down" and the other isn't? > - TCP control socket response to encrypted pkt Control sock ignores PSP. > What is the expectation for data already queued for transmission when > the tx assocation is made? > > More generally, what happens for data in flight. One possible > simplification is to only allow an upgrade sequence (possibly > including in-band exchange of keys) when no other data is in > flight. Like TLS offload, the data is annotated "for encryption" when queued. So data queued earlier or retransmits of such data will never be encrypted. > > +performed by management daemons, not under application control. > > +The PSP netlink family will generate a notification whenever keys > > +are rotated. The applications are expected to re-establish connections > > +before keys are rotated again. > > Connection key rotation is not supported? I did notice that tx key > insertion fails if a key is already present, so this does appear to be > the behavior. Correct, for now connections need to be re-established once a day. Rx should be easy, Tx we can make easy by only supporting rotation when there's no data queued.
Jakub Kicinski wrote: > On Sun, 12 May 2024 21:24:23 -0400 Willem de Bruijn wrote: > > Jakub Kicinski wrote: > > > +PSP Security Protocol (PSP) was defined at Google and published in: > > > + > > > +https://raw.githubusercontent.com/google/psp/main/doc/PSP_Arch_Spec.pdf > > > + > > > +This section briefly covers protocol aspects crucial for understanding > > > +the kernel API. Refer to the protocol specification for further details. > > > + > > > +Note that the kernel implementation and documentation uses the term > > > +"secret state" in place of "master key", it is both less confusing > > > +to an average developer and is less likely to run afoul any naming > > > +guidelines. > > > > There is some value in using the same terminology in the code as in > > the spec. > > > > And the session keys are derived from a key. That is more precise than > > state. Specifically, counter-mode KDF from an AES key. > > > > Perhaps device key, instead of master key? > > Weak preference towards secret state, but device key works, too. Totally your choice. I just wanted to make sure this was considered. > > > +Derived Rx keys > > > +--------------- > > > + > > > +PSP borrows some terms and mechanisms from IPsec. PSP was designed > > > +with HW offloads in mind. The key feature of PSP is that Rx keys for every > > > +connection do not have to be stored by the receiver but can be derived > > > +from secret state and information present in packet headers. > > > > A second less obvious, but neat, feature is that it supports an > > encryption offset, such that (say) the L4 ports are integrity > > protected, but not encrypted, to allow for in-network telemetry. > > I know, but the opening paragraph has: > > This section briefly covers protocol aspects crucial for > understanding the kernel API. Refer to the protocol specification for further details. > > :) .. and I didn't implement the offset, yet. (It's trivial to add and > ETOOMANYPATCHES.) Ack, sounds good. > > > > +This makes it possible to implement receivers which require a constant > > > +amount of memory regardless of the number of connections (``O(1)`` scaling). > > > + > > > +Tx keys have to be stored like with any other protocol, > > > > Keys can optionally be passed in descriptor. > > Added: Preferably, the Tx keys should be provided with the packet (e.g. > as part of the descriptors). > > > > +The expectation is that higher layer protocols will take care of > > > +protocol and key negotiation. For example one may use TLS key exchange, > > > +announce the PSP capability, and switch to PSP if both endpoints > > > +are PSP-capable. > > > > > +Securing a connection > > > +--------------------- > > > + > > > +PSP encryption is currently only supported for TCP connections. > > > +Rx and Tx keys are allocated separately. First the ``rx-assoc`` > > > +Netlink command needs to be issued, specifying a target TCP socket. > > > +Kernel will allocate a new PSP Rx key from the NIC and associate it > > > +with given socket. At this stage socket will accept both PSP-secured > > > +and plain text TCP packets. > > > + > > > +Tx keys are installed using the ``tx-assoc`` Netlink command. > > > +Once the Tx keys are installed all data read from the socket will > > > +be PSP-secured. In other words act of installing Tx keys has the secondary > > > +effect on the Rx direction, requring all received packets to be encrypted. > > > > Consider clarifying the entire state diagram from when one pair > > initiates upgrade. > > Not sure about state diagram, there are only 3 states. Or do you mean > extend TCP state diagrams? I think a table may be better: > > Event | Normal TCP | Rx PSP key present | Tx PSP key present | > --------------------------------------------------------------------------- > Rx plain text | accept | accept | drop | > > Rx PSP (good) | drop | accept | accept | > > Rx PSP (bad) | drop | drop | drop | > > Tx | plain text | plain text | encrypted * | > > * data enqueued before Tx key in installed will not be encrypted > (either initial send nor retranmissions) > > > What should I add? I've mostly been concerned about the below edge cases. If both peers are in TCP_ESTABLISHED for the during of the upgrade, and data is aligned on message boundary, things are straightforward. The retransmit logic is clear, as this is controlled by skb->decrypted on the individual skbs on the retransmit queue. That also solves another edge case: skb geometry changes on retransmit (due to different MSS or segs, using tcp_fragment, tso_fragment, tcp_retrans_try_collapse, ..) maintain skb->decrypted. It's not possible that skb is accidentally created that combines plaintext and ciphertext content. Although.. does this require adding that skb->decrypted check to tcp_skb_can_collapse? > > And some edge cases: > > > > - retransmits > > - TCP fin handshake, if only one peer succeeds > > So FIN when one end is "locked down" and the other isn't? If one peer can enter the state where it drops all plaintext, while the other decides to close the connection before completing the upgrade, and thus sends a plaintext FIN. If (big if) that can happen, then the connection cannot be cleanly closed. > > - TCP control socket response to encrypted pkt > > Control sock ignores PSP. Another example where a peer stays open and stays retrying if it has upgraded and drops all plaintext.
On Wed, 29 May 2024 20:47:02 -0400 Willem de Bruijn wrote: > Jakub Kicinski wrote: > > On Sun, 12 May 2024 21:24:23 -0400 Willem de Bruijn wrote: > > > There is some value in using the same terminology in the code as in > > > the spec. > > > > > > And the session keys are derived from a key. That is more precise than > > > state. Specifically, counter-mode KDF from an AES key. > > > > > > Perhaps device key, instead of master key? > > > > Weak preference towards secret state, but device key works, too. > > Totally your choice. I just wanted to make sure this was considered. Already run the sed, device key it is :) > > > Consider clarifying the entire state diagram from when one pair > > > initiates upgrade. > > > > Not sure about state diagram, there are only 3 states. Or do you mean > > extend TCP state diagrams? I think a table may be better: > > > > Event | Normal TCP | Rx PSP key present | Tx PSP key present | > > --------------------------------------------------------------------------- > > Rx plain text | accept | accept | drop | > > > > Rx PSP (good) | drop | accept | accept | > > > > Rx PSP (bad) | drop | drop | drop | > > > > Tx | plain text | plain text | encrypted * | > > > > * data enqueued before Tx key in installed will not be encrypted > > (either initial send nor retranmissions) > > > > > > What should I add? > > I've mostly been concerned about the below edge cases. > > If both peers are in TCP_ESTABLISHED for the during of the upgrade, > and data is aligned on message boundary, things are straightforward. > > The retransmit logic is clear, as this is controlled by skb->decrypted > on the individual skbs on the retransmit queue. > > That also solves another edge case: skb geometry changes on retransmit > (due to different MSS or segs, using tcp_fragment, tso_fragment, > tcp_retrans_try_collapse, ..) maintain skb->decrypted. It's not > possible that skb is accidentally created that combines plaintext and > ciphertext content. > > Although.. does this require adding that skb->decrypted check to > tcp_skb_can_collapse? Good catch. The TLS checks predate tcp_skb_can_collapse() (and MPTCP). We've grown the check in tcp_shift_skb_data() and the logic in tcp_grow_skb(), both missing the decrypted check. I'll send some fixes, these are existing bugs :( > > > And some edge cases: > > > > > > - retransmits > > > - TCP fin handshake, if only one peer succeeds > > > > So FIN when one end is "locked down" and the other isn't? > > If one peer can enter the state where it drops all plaintext, while > the other decides to close the connection before completing the > upgrade, and thus sends a plaintext FIN. > > If (big if) that can happen, then the connection cannot be cleanly > closed. Hm. And we can avoid this by only enforcing encryption of data-less segments once we've seen some encrypted data? > > > - TCP control socket response to encrypted pkt > > > > Control sock ignores PSP. > > Another example where a peer stays open and stays retrying if it has > upgraded and drops all plaintext.
On Thu, 30 May 2024 12:51:20 -0700 Jakub Kicinski wrote: > > I've mostly been concerned about the below edge cases. > > > > If both peers are in TCP_ESTABLISHED for the during of the upgrade, > > and data is aligned on message boundary, things are straightforward. > > > > The retransmit logic is clear, as this is controlled by skb->decrypted > > on the individual skbs on the retransmit queue. > > > > That also solves another edge case: skb geometry changes on retransmit > > (due to different MSS or segs, using tcp_fragment, tso_fragment, > > tcp_retrans_try_collapse, ..) maintain skb->decrypted. It's not > > possible that skb is accidentally created that combines plaintext and > > ciphertext content. > > > > Although.. does this require adding that skb->decrypted check to > > tcp_skb_can_collapse? > > Good catch. The TLS checks predate tcp_skb_can_collapse() (and MPTCP). > We've grown the check in tcp_shift_skb_data() and the logic > in tcp_grow_skb(), both missing the decrypted check. > > I'll send some fixes, these are existing bugs :( I take that back, we can depend on EOR like TLS does.
Jakub Kicinski wrote: > On Thu, 30 May 2024 12:51:20 -0700 Jakub Kicinski wrote: > > > I've mostly been concerned about the below edge cases. > > > > > > If both peers are in TCP_ESTABLISHED for the during of the upgrade, > > > and data is aligned on message boundary, things are straightforward. > > > > > > The retransmit logic is clear, as this is controlled by skb->decrypted > > > on the individual skbs on the retransmit queue. > > > > > > That also solves another edge case: skb geometry changes on retransmit > > > (due to different MSS or segs, using tcp_fragment, tso_fragment, > > > tcp_retrans_try_collapse, ..) maintain skb->decrypted. It's not > > > possible that skb is accidentally created that combines plaintext and > > > ciphertext content. > > > > > > Although.. does this require adding that skb->decrypted check to > > > tcp_skb_can_collapse? > > > > Good catch. The TLS checks predate tcp_skb_can_collapse() (and MPTCP). > > We've grown the check in tcp_shift_skb_data() and the logic > > in tcp_grow_skb(), both missing the decrypted check. > > > > I'll send some fixes, these are existing bugs :( > > I take that back, we can depend on EOR like TLS does. Oh yes. Neat solution. This relies on userspace doing the right thing by passing MSG_EOR right? That is easy to get wrong. Should we still add a check or a WARN_ONCE. That would be net-next material.
Jakub Kicinski wrote: > On Wed, 29 May 2024 20:47:02 -0400 Willem de Bruijn wrote: > > Jakub Kicinski wrote: > > > On Sun, 12 May 2024 21:24:23 -0400 Willem de Bruijn wrote: > > > > There is some value in using the same terminology in the code as in > > > > the spec. > > > > > > > > And the session keys are derived from a key. That is more precise than > > > > state. Specifically, counter-mode KDF from an AES key. > > > > > > > > Perhaps device key, instead of master key? > > > > > > Weak preference towards secret state, but device key works, too. > > > > Totally your choice. I just wanted to make sure this was considered. > > Already run the sed, device key it is :) > > > > > Consider clarifying the entire state diagram from when one pair > > > > initiates upgrade. > > > > > > Not sure about state diagram, there are only 3 states. Or do you mean > > > extend TCP state diagrams? I think a table may be better: > > > > > > Event | Normal TCP | Rx PSP key present | Tx PSP key present | > > > --------------------------------------------------------------------------- > > > Rx plain text | accept | accept | drop | > > > > > > Rx PSP (good) | drop | accept | accept | > > > > > > Rx PSP (bad) | drop | drop | drop | > > > > > > Tx | plain text | plain text | encrypted * | > > > > > > * data enqueued before Tx key in installed will not be encrypted > > > (either initial send nor retranmissions) > > > > > > > > > What should I add? > > > > I've mostly been concerned about the below edge cases. > > > > If both peers are in TCP_ESTABLISHED for the during of the upgrade, > > and data is aligned on message boundary, things are straightforward. > > > > The retransmit logic is clear, as this is controlled by skb->decrypted > > on the individual skbs on the retransmit queue. > > > > That also solves another edge case: skb geometry changes on retransmit > > (due to different MSS or segs, using tcp_fragment, tso_fragment, > > tcp_retrans_try_collapse, ..) maintain skb->decrypted. It's not > > possible that skb is accidentally created that combines plaintext and > > ciphertext content. > > > > Although.. does this require adding that skb->decrypted check to > > tcp_skb_can_collapse? > > Good catch. The TLS checks predate tcp_skb_can_collapse() (and MPTCP). > We've grown the check in tcp_shift_skb_data() and the logic > in tcp_grow_skb(), both missing the decrypted check. > > I'll send some fixes, these are existing bugs :( > > > > > And some edge cases: > > > > > > > > - retransmits > > > > - TCP fin handshake, if only one peer succeeds > > > > > > So FIN when one end is "locked down" and the other isn't? > > > > If one peer can enter the state where it drops all plaintext, while > > the other decides to close the connection before completing the > > upgrade, and thus sends a plaintext FIN. > > > > If (big if) that can happen, then the connection cannot be cleanly > > closed. > > Hm. And we can avoid this by only enforcing encryption of data-less > segments once we've seen some encrypted data? That would help. It may also be needed to accept a pure ACK right at the upgrade seqno. Depends on the upgrade process. Which may be worth documenting explicitly: the system call and network packet exchange from when one peer initiates (by generating its local key) until the connection is fully encrypted. That also allows poking at the various edge cases that may happen if packets are lost, or when actions can race. One unexpected example of the latter that I came across was Tx SADB key insertion in tail edge cases taking longer than network RTT, for instance. The kernel API can be exercised in a variety of ways, not all of them will uphold the correctness. Documenting how it should be used should help. Even better when it reduces the option space. As it already does by failing a Tx key install until Rx is configured. > > > > - TCP control socket response to encrypted pkt > > > > > > Control sock ignores PSP. > > > > Another example where a peer stays open and stays retrying if it has > > upgraded and drops all plaintext. May want to always allow plaintext RSTs. This is a potential DoS vector. In all these cases, I suppose this has already been figured out for TLS.
On Fri, 31 May 2024 09:56:42 -0400 Willem de Bruijn wrote: > > > If one peer can enter the state where it drops all plaintext, while > > > the other decides to close the connection before completing the > > > upgrade, and thus sends a plaintext FIN. > > > > > > If (big if) that can happen, then the connection cannot be cleanly > > > closed. > > > > Hm. And we can avoid this by only enforcing encryption of data-less > > segments once we've seen some encrypted data? > > That would help. It may also be needed to accept a pure ACK right at > the upgrade seqno. Depends on the upgrade process. > > Which may be worth documenting explicitly: the system call and network > packet exchange from when one peer initiates (by generating its local > key) until the connection is fully encrypted. That also allows poking > at the various edge cases that may happen if packets are lost, or when > actions can race. Dunno if the format below is good, but you're very right. At least to me writing the diagram was an hour well spent :) > One unexpected example of the latter that I came across was Tx SADB > key insertion in tail edge cases taking longer than network RTT, for > instance. > > The kernel API can be exercised in a variety of ways, not all of them > will uphold the correctness. Documenting how it should be used should > help. > > Even better when it reduces the option space. As it already does by > failing a Tx key install until Rx is configured. Something along these lines? "Sequence" diagram of the worst case scenario: 01 p Host A Host B 02 l t ~~~~~~~~~~~[TCP 3 WHS]~~~~~~~~~~ 03 a e ~~~~~~[crypto negotiation]~~~~~~ 04 i x [Rx key alloc = K-B] 05 n t <--- [app] K-B key send 06 ------[Rx key alloc = K-A]- 07 [app] K-A key send -->| 08 [TCP] K-B input <----- 08 P [TCP] K-B ACK ---->| 09 S R [app] recv(K-B) | 10 P x [app] [Tx key set] | 11 -------------------------- 12 P T [app] send(RPC) #####>| 13 S x |<---- [TCP] Seq OoO! queue RPC, SACK 14 P [TCP] retr K-A --->| 15 | `-> [TCP] K-A input 16 | <--- [TCP] K-A ACK (or FIN) 17 | [app] recv(K-A) 18 | [app] [Tx key set] 19 ----------------------------------- 20 There is a causal dependency between Host B allocating the key (line 4), sending it (line 5) and Host A receiving it (line 8). Since Host B will accept PSP packets as soon as it allocated the key, Host A does not need to wait to start using the key (line 12). Host B will queue the RPC to the socket (line 13). [Problem #1] However, because Host B does not have a Tx key, the ACK / SACK packet (line 13) will not be encrypted. (Similarly if Host B decided to close the connection at this point, the resulting FIN packet would not be encrypted.) Host B needs to accept unencrypted non-data segments (pure acks, pure FIN) until it sees an encrypted packet from Host B. [Problem #2] The retansmissions of K-A are unencrypted, to avoid sending the same data in encrypted and unencrypted form. This poses a risk if an ACK gets lost but both hosts end up in the PSP Tx state. Assume that Host A did not send the RPC (line 12), and the retransmission (line 14) happens as an RTO or TLP. Host B may already reach PSP Tx state (line "20") and expect encrypted data. Plain text retransmissions (with sequence number before rcv_nxt) must be accepted until Host B sees encrypted data from Host A. With that I think the state machine needs to be amended: Event | Normal TCP | Rx PSP | Tx PSP | PSP full | ----------------------------------------------------------------------- Rx plain (new) | accept | accept | drop | drop | Rx plain | accept | accept | accept | drop | (ACK|FIN|rtx) | | | | | Rx PSP (good) | drop | accept | accept | accept | Rx PSP (bad | drop | drop | drop | drop | (crypt, !=SPI) | | | | | Tx | plain text | plain text | encrypted | encrypted | | | | (excl. rtx) | (excl. rtx) | > > > Another example where a peer stays open and stays retrying if it has > > > upgraded and drops all plaintext. > > May want to always allow plaintext RSTs. This is a potential DoS > vector. Because of key exhaustion? Or we can be tricked into spamming someone with retranmissions and ignoring their RST? > In all these cases, I suppose this has already been figured > out for TLS. Assuming the answer above is "key exhaustion" - I wouldn't be surprised if it wasn't :(
Jakub Kicinski wrote: > On Fri, 31 May 2024 09:56:42 -0400 Willem de Bruijn wrote: > > > > If one peer can enter the state where it drops all plaintext, while > > > > the other decides to close the connection before completing the > > > > upgrade, and thus sends a plaintext FIN. > > > > > > > > If (big if) that can happen, then the connection cannot be cleanly > > > > closed. > > > > > > Hm. And we can avoid this by only enforcing encryption of data-less > > > segments once we've seen some encrypted data? > > > > That would help. It may also be needed to accept a pure ACK right at > > the upgrade seqno. Depends on the upgrade process. > > > > Which may be worth documenting explicitly: the system call and network > > packet exchange from when one peer initiates (by generating its local > > key) until the connection is fully encrypted. That also allows poking > > at the various edge cases that may happen if packets are lost, or when > > actions can race. > > Dunno if the format below is good, but you're very right. > At least to me writing the diagram was an hour well spent :) Great :) > > One unexpected example of the latter that I came across was Tx SADB > > key insertion in tail edge cases taking longer than network RTT, for > > instance. > > > > The kernel API can be exercised in a variety of ways, not all of them > > will uphold the correctness. Documenting how it should be used should > > help. > > > > Even better when it reduces the option space. As it already does by > > failing a Tx key install until Rx is configured. > > Something along these lines? > > "Sequence" diagram of the worst case scenario: > > 01 p Host A Host B > 02 l t ~~~~~~~~~~~[TCP 3 WHS]~~~~~~~~~~ > 03 a e ~~~~~~[crypto negotiation]~~~~~~ > 04 i x [Rx key alloc = K-B] > 05 n t <--- [app] K-B key send > 06 ------[Rx key alloc = K-A]- > 07 [app] K-A key send -->| > 08 [TCP] K-B input <----- > 08 P [TCP] K-B ACK ---->| > 09 S R [app] recv(K-B) | > 10 P x [app] [Tx key set] | > 11 -------------------------- > 12 P T [app] send(RPC) #####>| > 13 S x |<---- [TCP] Seq OoO! queue RPC, SACK > 14 P [TCP] retr K-A --->| > 15 | `-> [TCP] K-A input > 16 | <--- [TCP] K-A ACK (or FIN) > 17 | [app] recv(K-A) > 18 | [app] [Tx key set] > 19 ----------------------------------- > 20 > > There is a causal dependency between Host B allocating the key (line 4), > sending it (line 5) and Host A receiving it (line 8). Since Host B will > accept PSP packets as soon as it allocated the key, Host A does not > need to wait to start using the key (line 12). Host B will queue the > RPC to the socket (line 13). > > [Problem #1] > > However, because Host B does not have a Tx key, the ACK / SACK packet > (line 13) will not be encrypted. (Similarly if Host B decided to close > the connection at this point, the resulting FIN packet would not be > encrypted.) Or if it plays SO_LINGER games the resulting RST. > Host B needs to accept unencrypted non-data segments > (pure acks, pure FIN) until it sees an encrypted packet from Host B. > > [Problem #2] > > The retansmissions of K-A are unencrypted, to avoid sending the same > data in encrypted and unencrypted form. This poses a risk if an ACK > gets lost but both hosts end up in the PSP Tx state. Assume that Host A > did not send the RPC (line 12), and the retransmission (line 14) > happens as an RTO or TLP. Host B may already reach PSP Tx state (line > "20") and expect encrypted data. Plain text retransmissions (with > sequence number before rcv_nxt) must be accepted until Host B sees > encrypted data from Host A. Is that sufficient if an initial encrypted packet could get reordered by the network to arrive before a plaintext retransmit of a lower seqno? Both scenarios make sense. It is unfortunately harder to be sure that we have captured all edge cases. An issue related to the rcv_nxt cut-point, not sure how important: the plaintext packet contents are protected by user crypto before upgrade. But the TCP headers are not. PSP relies on TCP PAWS against replay protection. It is possible for a MITM to offset all seqno from the start of connection establishment. I don't see an immediate issue. But at a minimum it could be possible to insert or delete before PSP is upgraded. > > With that I think the state machine needs to be amended: > > Event | Normal TCP | Rx PSP | Tx PSP | PSP full | > ----------------------------------------------------------------------- > Rx plain (new) | accept | accept | drop | drop | > > Rx plain | accept | accept | accept | drop | > (ACK|FIN|rtx) | | | | | > > Rx PSP (good) | drop | accept | accept | accept | > > Rx PSP (bad | drop | drop | drop | drop | > (crypt, !=SPI) | | | | | > > Tx | plain text | plain text | encrypted | encrypted | > | | | (excl. rtx) | (excl. rtx) | > > > > > Another example where a peer stays open and stays retrying if it has > > > > upgraded and drops all plaintext. > > > > May want to always allow plaintext RSTs. This is a potential DoS > > vector. > > Because of key exhaustion? Or we can be tricked into spamming someone > with retranmissions and ignoring their RST? Simpler: this falls back onto unencrypted TCP where someone capable of spoofing valid data is capable of terminating a connection. If denying all plaintext after upgrade, PSP protects against this. It is arguably low on the list of concerns, especially in a closed world hyperscaler setting. As it is hardly the only DoS vector. > > In all these cases, I suppose this has already been figured > > out for TLS. > > Assuming the answer above is "key exhaustion" - I wouldn't be surprised > if it wasn't :(
On Wed, 05 Jun 2024 16:11:31 -0400 Willem de Bruijn wrote: > > The retansmissions of K-A are unencrypted, to avoid sending the same > > data in encrypted and unencrypted form. This poses a risk if an ACK > > gets lost but both hosts end up in the PSP Tx state. Assume that Host A > > did not send the RPC (line 12), and the retransmission (line 14) > > happens as an RTO or TLP. Host B may already reach PSP Tx state (line > > "20") and expect encrypted data. Plain text retransmissions (with > > sequence number before rcv_nxt) must be accepted until Host B sees > > encrypted data from Host A. > > Is that sufficient if an initial encrypted packet could get reordered > by the network to arrive before a plaintext retransmit of a lower > seqno? Yes, I believe that's fine. I will document this clearer but both sides must be pretty precise in their understanding when the switchover happens. They must read what they expect to be clear text, and then install the Tx key thus locking down the socket. 1. If they under-read and clear text data is already queued - the kernel will error out. 2. If they under-read and clear text arrives later - the connection will hang. 3. If they over-read they will presumably get PSP-protected data which they have no way of validating, since it won't be secured by user crypto. We could protect from over-read (case 3) by refusing to give out PSP-protected data until keys are installed. But it adds to the fast path and I don't think it's all that beneficial, since there's no way to protect a sloppy application from under-read (case 2). Back to your question about reordering plain text with cipher text: the application should not lock down the socket until it gets all its clear text. So clear text retransmissions _after_ lock down must be spurious. The only worry is that we lose an ACK and never tell the other side that we got all the clear text. But we're guaranteed to successfully ACK any PSP-protected data, so if we receive some there is no way to get stuck. Let me copy / paste the diagram: 01 p Host A Host B 02 l t ~~~~~~~~~~~[TCP 3 WHS]~~~~~~~~~~ 03 a e ~~~~~~[crypto negotiation]~~~~~~ 04 i x [Rx key alloc = K-B] 05 n t <--- [app] K-B key send 06 ------[Rx key alloc = K-A]- 07 [app] K-A key send -->| 08 [TCP] K-B input <----- 08 P [TCP] K-B ACK ---->| 09 S R [app] recv(K-B) | 10 P x [app] [Tx key set] | 11 -------------------------- 12 P T [app] send(RPC) #####>| 13 S x |<---- [TCP] Seq OoO! queue RPC, SACK 14 P [TCP] retr K-A --->| 15 | `-> [TCP] K-A input 16 | <--- [TCP] K-A ACK (or FIN) 17 | [app] recv(K-A) 18 | [app] [Tx key set] 19 ----------------------------------- 20 Looking as Host A, if we receive encrypted data, we must have allocated and sent key (line 7) so we will start accepting encrypted data. But at this point we are also accepting plain text (until we reach line 9). We will send a plain text (S)ACK to encrypted data, but that's fine too since Host B hasn't seen any encrypted data from us and will accept such ACKs. > Both scenarios make sense. It is unfortunately harder to be sure that > we have captured all edge cases. Are you trying to say packetdrill without saying packetdrill? :) > An issue related to the rcv_nxt cut-point, not sure how important: the > plaintext packet contents are protected by user crypto before upgrade. > But the TCP headers are not. PSP relies on TCP PAWS against replay > protection. It is possible for a MITM to offset all seqno from the > start of connection establishment. I don't see an immediate issue. But > at a minimum it could be possible to insert or delete before PSP is > upgraded. Yes, the "cut off" point must be quite clearly defined, because both sides must precisely read out all the clear text. Then they install the Tx key and anything they read must have been PSP-protected. Hope I understood the point.
Jakub Kicinski wrote: > On Wed, 05 Jun 2024 16:11:31 -0400 Willem de Bruijn wrote: > > > The retansmissions of K-A are unencrypted, to avoid sending the same > > > data in encrypted and unencrypted form. This poses a risk if an ACK > > > gets lost but both hosts end up in the PSP Tx state. Assume that Host A > > > did not send the RPC (line 12), and the retransmission (line 14) > > > happens as an RTO or TLP. Host B may already reach PSP Tx state (line > > > "20") and expect encrypted data. Plain text retransmissions (with > > > sequence number before rcv_nxt) must be accepted until Host B sees > > > encrypted data from Host A. > > > > Is that sufficient if an initial encrypted packet could get reordered > > by the network to arrive before a plaintext retransmit of a lower > > seqno? > > Yes, I believe that's fine. > > I will document this clearer but both sides must be pretty precise in > their understanding when the switchover happens. They must read what > they expect to be clear text, and then install the Tx key thus locking > down the socket. > > 1. If they under-read and clear text data is already queued - the kernel > will error out. > 2. If they under-read and clear text arrives later - the connection will > hang. > 3. If they over-read they will presumably get PSP-protected data > which they have no way of validating, since it won't be secured by > user crypto. > > We could protect from over-read (case 3) by refusing to give out > PSP-protected data until keys are installed. But it adds to the fast > path and I don't think it's all that beneficial, since there's no way > to protect a sloppy application from under-read (case 2). > > Back to your question about reordering plain text with cipher text: > the application should not lock down the socket until it gets all > its clear text. So clear text retransmissions _after_ lock down must be > spurious. Ah yes, good point. > The only worry is that we lose an ACK and never tell > the other side that we got all the clear text. But we're guaranteed > to successfully ACK any PSP-protected data, so if we receive some > there is no way to get stuck. Let me copy / paste the diagram: > > 01 p Host A Host B > 02 l t ~~~~~~~~~~~[TCP 3 WHS]~~~~~~~~~~ > 03 a e ~~~~~~[crypto negotiation]~~~~~~ > 04 i x [Rx key alloc = K-B] > 05 n t <--- [app] K-B key send > 06 ------[Rx key alloc = K-A]- > 07 [app] K-A key send -->| > 08 [TCP] K-B input <----- > 08 P [TCP] K-B ACK ---->| > 09 S R [app] recv(K-B) | > 10 P x [app] [Tx key set] | > 11 -------------------------- > 12 P T [app] send(RPC) #####>| > 13 S x |<---- [TCP] Seq OoO! queue RPC, SACK > 14 P [TCP] retr K-A --->| > 15 | `-> [TCP] K-A input > 16 | <--- [TCP] K-A ACK (or FIN) > 17 | [app] recv(K-A) > 18 | [app] [Tx key set] > 19 ----------------------------------- > 20 > > Looking as Host A, if we receive encrypted data, we must have > allocated and sent key (line 7) so we will start accepting encrypted > data. But at this point we are also accepting plain text (until we > reach line 9). We will send a plain text (S)ACK to encrypted data, > but that's fine too since Host B hasn't seen any encrypted data from us > and will accept such ACKs. > > > Both scenarios make sense. It is unfortunately harder to be sure that > > we have captured all edge cases. > > Are you trying to say packetdrill without saying packetdrill? :) Ha, no, no such hint implied. I did expand packetdrill to PSP to exercise the cases that I could come up with, at a minimum to ensure coverage of all branches. But does that cover all edge cases possible? Including drops, reorders, geometry changes from MTU changes, SO_LINGER 0, races with slow OS operations (like that slow SADB insertion I mentioned)? The unknown unknowns. Stuck connections are a low risk, bugs that can be fixed later. As long as it is easy to reason that actual crypto issues like plaintext leaks are not reachable. Extending packetdrill to netlink would be quite some work, I suspect. A quick scan shows that it knows NLA, but only for OPT_STATS decoding. > > An issue related to the rcv_nxt cut-point, not sure how important: the > > plaintext packet contents are protected by user crypto before upgrade. > > But the TCP headers are not. PSP relies on TCP PAWS against replay > > protection. It is possible for a MITM to offset all seqno from the > > start of connection establishment. I don't see an immediate issue. But > > at a minimum it could be possible to insert or delete before PSP is > > upgraded. > > Yes, the "cut off" point must be quite clearly defined, because both > sides must precisely read out all the clear text. Then they install > the Tx key and anything they read must have been PSP-protected. > > Hope I understood the point. I think the issue, if any, is that there may be a gap between the two methods of integrity protection. What we call "cleartext" here is integrity protected such that no insertion or deletion attacks are possible. And PSP ensures the same. But is a a deletion of the last plaintext or first ciphertext possible? An insertion is not an issue as it will be protected by neither, while PSP is expected, so it is dropped. As long as the application (or is it presentation?) layer has a clear definition of at what point in the stream it must insert the Tx key, plaintext deletion is not possible, as the key is not inserted until all plaintext has been received. Which leaves: is it possible for a MITM to offset the seqno, such that the first PSP encrypted packet can be removed from the stream and this goes undetected?
On Fri, May 10, 2024 at 05:11:32PM -0700, Jakub Kicinski wrote: >Yes, I should have CCed Meta's folks who work on TLS [1]. Adding them >now. More than happy to facilitate the discussion, maybe Willem can >CC the right Google folks, IDK who else... > >We should start moving with the kernel support, IMO, until we do >the user space implementation is stalled. I don't expect that the >way we install keys in the kernel would be impacted by the handshake. > >[1] https://github.com/facebookincubator/fizz Having worked on this in the past, I was curious about looking at the kernel/userspace implementation, and was surprised there's nothing at all on the userspace side of things. What's the rush in merging this in prior to having a usable, open userspace? Can't they be developed side by side and merged in together?
On Wed, May 29, 2024 at 1:35 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sun, 12 May 2024 21:24:23 -0400 Willem de Bruijn wrote: ... snip ... > > Connection key rotation is not supported? I did notice that tx key > > insertion fails if a key is already present, so this does appear to be > > the behavior. > > Correct, for now connections need to be re-established once a day. > Rx should be easy, Tx we can make easy by only supporting rotation > when there's no data queued. > Could you elaborate on why updating the Tx key should only be allowed when no data is queued? At the point rekeying is being done, the receiver should accept both the new and previous key:spi. The lack of support for rekeying existing connections is a significant gap. At a minimum the API for notifying the application that a rotation has occurred should be defined, and the implementation should allow the configuration of a new Tx key:spi for rekeying. A tiny bit logic would also be needed on the Rx side to track the current and previous SPI, if the hardware supports keys indescriptors then nothing more should be needed on the Tx side. If the NIC maintains an SA database and doesn't allow existing entries to be updated, a small amount of additional logic would be needed, but perhaps that could be (waving hands a bit) the responsibility of the driver. - Lance
On Thu, 27 Jun 2024 11:14:39 -0400 Lance Richardson wrote: > > > Connection key rotation is not supported? I did notice that tx key > > > insertion fails if a key is already present, so this does appear to be > > > the behavior. > > > > Correct, for now connections need to be re-established once a day. > > Rx should be easy, Tx we can make easy by only supporting rotation > > when there's no data queued. > > Could you elaborate on why updating the Tx key should only be allowed when > no data is queued? At the point rekeying is being done, the receiver should > accept both the new and previous key:spi. I didn't say it shouldn't be allowed, just that disallowing it initially would make the implementation easier ;) > The lack of support for rekeying existing connections is a significant gap. At > a minimum the API for notifying the application that a rotation has occurred > should be defined, Notifications are in place, that's one of the reasons I chose netlink. > and the implementation should allow the configuration of a new Tx > key:spi for rekeying. I was under the possibly mistaken impression that Google have used PSP for years without rekeying... Did I misunderstand? > A tiny bit logic would also be needed on the Rx > side to track the current and previous SPI, if the hardware supports > keys indescriptors then nothing more should be needed on the Tx side. > If the NIC maintains an SA database and doesn't allow existing > entries to be updated, a small amount of additional logic would be > needed, but perhaps that could be (waving hands a bit) the > responsibility of the driver. Interesting. Hm. But SADB drivers would then have to implement some complex logic to make sure all rings have cycled, or take references. I'd rather have an opt-in for core to delay reaping old keys until all sockets which used them went empty at least once (in wmem sense).
On Thu, Jun 27, 2024 at 6:33 PM Jakub Kicinski <kuba@kernel.org> wrote: > > I was under the possibly mistaken impression that Google have used PSP > for years without rekeying... Did I misunderstand? > Actually Google does implement connection rekeying when master key rotation occurs. I believe this was the case even in the first production deployment (Willem would know the history better). > > A tiny bit logic would also be needed on the Rx > > side to track the current and previous SPI, if the hardware supports > > keys indescriptors then nothing more should be needed on the Tx side. > > If the NIC maintains an SA database and doesn't allow existing > > entries to be updated, a small amount of additional logic would be > > needed, but perhaps that could be (waving hands a bit) the > > responsibility of the driver. > > Interesting. Hm. But SADB drivers would then have to implement some > complex logic to make sure all rings have cycled, or take references. > I'd rather have an opt-in for core to delay reaping old keys until > all sockets which used them went empty at least once (in wmem sense). Right, ensuring that the old entry is no longer referenced by packets in the transmit pipeline before removing is definitely a concern. One simple approach is to simply keep the old entry around for long enough (e.g. a minute or two) to ensure that any packets referencing it have been transmitted.
On Fri, 28 Jun 2024 15:33:28 -0400 Lance Richardson wrote: > > Interesting. Hm. But SADB drivers would then have to implement some > > complex logic to make sure all rings have cycled, or take references. > > I'd rather have an opt-in for core to delay reaping old keys until > > all sockets which used them went empty at least once (in wmem sense). > > Right, ensuring that the old entry is no longer referenced by packets in the > transmit pipeline before removing is definitely a concern. One simple > approach is to simply keep the old entry around for long enough (e.g. a > minute or two) to ensure that any packets referencing it have been transmitted. Sounds good, we can opportunistically remove immediately if socket is idle, otherwise use a timeout. I'll try to have the patches on GH before I post v1, it won't all fit in one reviewable series.
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 7664c0bfe461..0376029ecbdf 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -94,6 +94,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics. ppp_generic proc_net_tcp pse-pd/index + psp radiotap-headers rds regulatory diff --git a/Documentation/networking/psp.rst b/Documentation/networking/psp.rst new file mode 100644 index 000000000000..a39b464813ab --- /dev/null +++ b/Documentation/networking/psp.rst @@ -0,0 +1,138 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +===================== +PSP Security Protocol +===================== + +Protocol +======== + +PSP Security Protocol (PSP) was defined at Google and published in: + +https://raw.githubusercontent.com/google/psp/main/doc/PSP_Arch_Spec.pdf + +This section briefly covers protocol aspects crucial for understanding +the kernel API. Refer to the protocol specification for further details. + +Note that the kernel implementation and documentation uses the term +"secret state" in place of "master key", it is both less confusing +to an average developer and is less likely to run afoul any naming +guidelines. + +Derived Rx keys +--------------- + +PSP borrows some terms and mechanisms from IPsec. PSP was designed +with HW offloads in mind. The key feature of PSP is that Rx keys for every +connection do not have to be stored by the receiver but can be derived +from secret state and information present in packet headers. +This makes it possible to implement receivers which require a constant +amount of memory regardless of the number of connections (``O(1)`` scaling). + +Tx keys have to be stored like with any other protocol, but Tx is much +less latency sensitive than Rx, and delays in fetching keys from slow +memory is less likely to cause packet drops. + +Key rotation +------------ + +The secret state known only to the receiver is fundamental to the design. +Per specification this state cannot be directly accessible (it must be +impossible to read it out of the hardware of the receiver NIC). +Moreover, it has to be "rotated" periodically (usually daily). Rotation +means that new secret state gets generated (by a random number generator +of the device), and used for all new connections. To avoid disrupting +old connections the old secret state remains in the NIC. A phase bit +carried in the packet headers indicates which generation of secret state +the packet has been encrypted with. + +User facing API +=============== + +PSP is designed primarily for hardware offloads. There is currently +no software fallback for systems which do not have PSP capable NICs. +There is also no standard (or otherwise defined) way of establishing +a PSP-secured connection or exchanging the symmetric keys. + +The expectation is that higher layer protocols will take care of +protocol and key negotiation. For example one may use TLS key exchange, +announce the PSP capability, and switch to PSP if both endpoints +are PSP-capable. + +All configuration of PSP is performed via the PSP netlink family. + +Device discovery +---------------- + +The PSP netlink family defines operations to retrieve information +about the PSP devices available on the system, configure them and +access PSP related statistics. + +Securing a connection +--------------------- + +PSP encryption is currently only supported for TCP connections. +Rx and Tx keys are allocated separately. First the ``rx-assoc`` +Netlink command needs to be issued, specifying a target TCP socket. +Kernel will allocate a new PSP Rx key from the NIC and associate it +with given socket. At this stage socket will accept both PSP-secured +and plain text TCP packets. + +Tx keys are installed using the ``tx-assoc`` Netlink command. +Once the Tx keys are installed all data read from the socket will +be PSP-secured. In other words act of installing Tx keys has the secondary +effect on the Rx direction, requring all received packets to be encrypted. +Since packet reception is asynchronous, to make it possible for the +application to trust that any data read from the socket after the ``tx-assoc`` +call returns success has been encrypted, the kernel will scan the receive +queue of the socket at ``tx-assoc`` time. If any enqueued packet was received +in clear text the Tx association will fail, and application should retry +installing the Tx key after draining the socket (this should not be necessary +if both endpoints are well behaved). + +Rotation notifications +---------------------- + +The rotations of secret state happen asynchornously and are usually +performed by management daemons, not under application control. +The PSP netlink family will generate a notification whenever keys +are rotated. The applications are expected to re-establish connections +before keys are rotated again. + +Kernel implementation +===================== + +Driver notes +------------ + +Drivers are expected to start with no PSP enabled (``psp-versions-ena`` +in ``dev-get`` set to ``0``) whenever possible. The user space should +not depend on this behavior, as future extension may necessitate creation +of devices with PSP already enabled, nonetheless drivers should not enable +PSP by default. Enabling PSP should be the responsibility of the system +component which also takes care of key rotation. + +Note that ``psp-versions-ena`` is expected to be used only for enabling +receive processing. The device is not expected to reject transmit requests +after ``psp-versions-ena`` has been disabled. User may also disable +``psp-versions-ena`` while there are active associations, which will +break all PSP Rx processing. + +Drivers are expected to ensure that secret state is usable upon init +(working keys can be allocated), and that no duplicate keys may be generated +(reuse of SPI without key rotation). Drivers may achieve this by rotating +keys twice before registering the PSP device. + +Drivers must use ``psp_skb_get_assoc_rcu()`` to check if PSP Tx offload +was requested for given skb. On Rx drivers should allocate and populate +the ``SKB_EXT_PSP`` skb extension, and set the skb->decrypted bit to 1. + +Kernel implementation notes +--------------------------- + +PSP implementation follows the TLS offload more closely than the IPsec +offload, with per-socket state, and the use of skb->decrypted to prevent +clear text leaks. + +PSP device is separate from netdev, to make it possible to "delegate" +PSP offload capabilities to software devices (e.g. ``veth``).
Add documentation of things which belong in the docs rather than commit messages. Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- Documentation/networking/index.rst | 1 + Documentation/networking/psp.rst | 138 +++++++++++++++++++++++++++++ 2 files changed, 139 insertions(+) create mode 100644 Documentation/networking/psp.rst