Message ID | 20220726201600.1715505-1-dima@arista.com (mailing list archive) |
---|---|
Headers | show |
Series | net/crypto: Introduce crypto_pool | expand |
On Tue, Jul 26, 2022 at 09:15:54PM +0100, Dmitry Safonov wrote: > Add crypto_pool - an API for allocating per-CPU array of crypto requests > on slow-path (in sleep'able context) and to use them on a fast-path, > which is RX/TX for net/ users (or in any other bh-disabled users). > The design is based on the current implementations of md5sig_pool. > > Previously, I've suggested to add such API on TCP-AO patch submission [1], > where Herbert kindly suggested to help with introducing new crypto API. What I was suggesting is modifying the actual ahash interface so that the tfm can be shared between different key users by moving the key into the request object. Thanks,
On 7/27/22 03:17, Herbert Xu wrote: > On Tue, Jul 26, 2022 at 09:15:54PM +0100, Dmitry Safonov wrote: >> Add crypto_pool - an API for allocating per-CPU array of crypto requests >> on slow-path (in sleep'able context) and to use them on a fast-path, >> which is RX/TX for net/ users (or in any other bh-disabled users). >> The design is based on the current implementations of md5sig_pool. >> >> Previously, I've suggested to add such API on TCP-AO patch submission [1], >> where Herbert kindly suggested to help with introducing new crypto API. > > What I was suggesting is modifying the actual ahash interface so > that the tfm can be shared between different key users by moving > the key into the request object. The fact that setkey is implemented at the crypto_ahash instead of the ahash_request level is baked into all algorithm implementations (including many hardware-specific ones). Changing this seems extremely difficult. Supporting setkey at the tfm level could be achieved by making it an optional capability on a per-algorithm basis, then something like crypto_pool could detect this scenario and avoid allocating a per-cpu tfm. This would also require a crypto_pool_setkey wrapper. As it stands right now multiple crypto-api users needs to duplicate logic for allocating a percpu array of transforms so adding this "pool" API is an useful step forward. As far as I remember the requirement for a per-cpu scratch buffer is based on weird architectures having limitations on what kind of memory can be passed to crypto api so this will have to remain. -- Regards, Leonard
On Wed, Jul 27, 2022 at 06:52:27PM +0300, Leonard Crestez wrote: > > The fact that setkey is implemented at the crypto_ahash instead of the > ahash_request level is baked into all algorithm implementations (including > many hardware-specific ones). Changing this seems extremely difficult. What I had in mind is simply making the tfm setkey optional. That way you could then have an additional setkey at the request level. If the key is provided in either place you're allowed to perform the hash. This should have minimal impact on existing code. Cheers,
Hi Herbert, On 7/27/22 01:17, Herbert Xu wrote: > On Tue, Jul 26, 2022 at 09:15:54PM +0100, Dmitry Safonov wrote: >> Add crypto_pool - an API for allocating per-CPU array of crypto requests >> on slow-path (in sleep'able context) and to use them on a fast-path, >> which is RX/TX for net/ users (or in any other bh-disabled users). >> The design is based on the current implementations of md5sig_pool. >> >> Previously, I've suggested to add such API on TCP-AO patch submission [1], >> where Herbert kindly suggested to help with introducing new crypto API. > > What I was suggesting is modifying the actual ahash interface so > that the tfm can be shared between different key users by moving > the key into the request object. My impression here is that we're looking at different issues. 1. The necessity of allocating per-CPU ahash_requests. 2. Managing the lifetime and sharing of ahash_request between different kernel users. Removing (1) will allow saving (num_possible_cpus() - 1)*(sizeof(struct ahash_request) + crypto_ahash_reqsize(tfm)) bytes. Which would be very nice for the new fancy CPUs with hundreds of threads. For (2) many kernel users try manage it themselves, resulting in different implementations, as well as some users trying to avoid using any complication like ref counting and allocating the request only once, without freeing it until the module is unloaded. Here for example, introducing TCP-AO would result in copy'n'paste of tcp_md5sig_pool code. As well as RFC5925 for TCP-AO let user to have any supported hashing algorithms, with the requirement from RFC5926 of hmac(sha1) & aes(cmac). If a user wants more algorithms that implementation would need to be patched. I see quite a few net/ users that could use some common API for this besides TCP-MD5 and TCP-AO. That have the same pattern of allocating crypto algorithm on a slow-path (adding a key or module initialization) and using it of a fast-path, which is RX/TX. Besides of sharing and lifetime managing, those users need a temporary buffer (usually the name is `scratch'), IIUC, it is needed for async algorithms that could use some hardware accelerator instead of CPU and need to write the result anywhere, but on vmapped stack. So, here I'm trying to address (2) in order to avoid copy'n'pasting of tcp_md5sig_pool code for introduction of TCP-AO support. I've also patched tcp-md5 code to dynamically disable the static branch, which is not crypto change. There's also a chance I've misunderstood what is your proposal :-) Thanks, Dmitry