Message ID | 20181129230217.158038-4-ebiggers@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Herbert Xu |
Headers | show |
Series | crypto: x86_64 optimized XChaCha and NHPoly1305 (for Adiantum) | expand |
> To improve responsiveness, disable preemption for each step of the > walk (which is at most PAGE_SIZE) rather than for the entire > encryption/decryption operation. It seems that it is not that uncommon for IPsec to get small inputs scattered over multiple blocks. Doing FPU context saving for each walk step then can slow down things. An alternative approach could be to re-enable preemption not based on the walk steps, but on the amount of bytes processed. This would satisfy both users, I guess. In the long run we probably need a better approach for FPU context saving, as this really hurts performance-wise. For IPsec we should find a way to avoid the (multiple) per-packet FPU save/restores in softirq context, but I guess this requires support from process context switching. Best regards Martin
On Sun, 2 Dec 2018 at 11:47, Martin Willi <martin@strongswan.org> wrote: > > > > To improve responsiveness, disable preemption for each step of the > > walk (which is at most PAGE_SIZE) rather than for the entire > > encryption/decryption operation. > > It seems that it is not that uncommon for IPsec to get small inputs > scattered over multiple blocks. Doing FPU context saving for each walk > step then can slow down things. > > An alternative approach could be to re-enable preemption not based on > the walk steps, but on the amount of bytes processed. This would > satisfy both users, I guess. > > In the long run we probably need a better approach for FPU context > saving, as this really hurts performance-wise. For IPsec we should find > a way to avoid the (multiple) per-packet FPU save/restores in softirq > context, but I guess this requires support from process context > switching. > At Jason's Zinc talk at plumbers, this came up, and apparently someone is working on this, i.e., to ensure that on x86, the FPU restore only occurs lazily, when returning to userland rather than every time you call kernel_fpu_end() [like we do on arm64 as well] Not sure what the ETA for that work is, though, nor did I get the name of the guy working on it.
On Mon, Dec 03, 2018 at 03:13:37PM +0100, Ard Biesheuvel wrote: > On Sun, 2 Dec 2018 at 11:47, Martin Willi <martin@strongswan.org> wrote: > > > > > > > To improve responsiveness, disable preemption for each step of the > > > walk (which is at most PAGE_SIZE) rather than for the entire > > > encryption/decryption operation. > > > > It seems that it is not that uncommon for IPsec to get small inputs > > scattered over multiple blocks. Doing FPU context saving for each walk > > step then can slow down things. > > > > An alternative approach could be to re-enable preemption not based on > > the walk steps, but on the amount of bytes processed. This would > > satisfy both users, I guess. > > > > In the long run we probably need a better approach for FPU context > > saving, as this really hurts performance-wise. For IPsec we should find > > a way to avoid the (multiple) per-packet FPU save/restores in softirq > > context, but I guess this requires support from process context > > switching. > > > > At Jason's Zinc talk at plumbers, this came up, and apparently someone > is working on this, i.e., to ensure that on x86, the FPU restore only > occurs lazily, when returning to userland rather than every time you > call kernel_fpu_end() [like we do on arm64 as well] > > Not sure what the ETA for that work is, though, nor did I get the name > of the guy working on it. Thanks for the suggestion; I'll replace this with a patch that re-enables preemption every 4 KiB encrypted. That also avoids having to do a kernel_fpu_begin(), kernel_fpu_end() pair just for hchacha_block_ssse3(). But yes, I'd definitely like repeated kernel_fpu_begin(), kernel_fpu_end() to not be incredibly slow. That would help in a lot of other places too. - Eric
diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c index 773d075a1483..036de144aab6 100644 --- a/arch/x86/crypto/chacha20_glue.c +++ b/arch/x86/crypto/chacha20_glue.c @@ -135,26 +135,24 @@ static int chacha20_simd(struct skcipher_request *req) if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd()) return crypto_chacha_crypt(req); - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); crypto_chacha_init(state, ctx, walk.iv); - kernel_fpu_begin(); - while (walk.nbytes > 0) { unsigned int nbytes = walk.nbytes; if (nbytes < walk.total) nbytes = round_down(nbytes, walk.stride); + kernel_fpu_begin(); chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr, nbytes); + kernel_fpu_end(); err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } - kernel_fpu_end(); - return err; }