Message ID | 20220326071159.56056-1-ebiggers@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Herbert Xu |
Headers | show |
Series | crypto: testmgr - test in-place en/decryption with two sglists | expand |
On Sat, Mar 26, 2022 at 10:13 AM Eric Biggers <ebiggers@kernel.org> wrote: > > From: Eric Biggers <ebiggers@google.com> > > As was established in the thread > https://lore.kernel.org/linux-crypto/20220223080400.139367-1-gilad@benyossef.com/T/#u, > many crypto API users doing in-place en/decryption don't use the same > scatterlist pointers for the source and destination, but rather use > separate scatterlists that point to the same memory. This case isn't > tested by the self-tests, resulting in bugs. > > This is the natural usage of the crypto API in some cases, so requiring > API users to avoid this usage is not reasonable. > > Therefore, update the self-tests to start testing this case. > > Signed-off-by: Eric Biggers <ebiggers@google.com> Thank you Eric. I have given this a lot of thought and here is what I predict will happen thanks to this added test: - We will not find a driver that this breaks, in the sense of producing wrong results and triggering failure in this test. - We probably will see drivers that when running this test when DMA debug is compiled and enabled trigger the debug warning about double DMA mapping of the same cache line. The reason is that these double mapping stemming from this test will be from mapping the same buffer as source and destination. As such, the situation that is the cause for the DMA debug warning, of a mapping causing cache flush invalidate, followed by DMA, followed by another mapping causing cache flush/invalidate while the DMA is in flight, will not happen. Instead we will have mapping -> flush/invalidate -> another mapping -> flush/invalidate -> DMA ... Note, this is certainly not a claim we should not add this test! on the contrary ... In fact, I would be tempted to claim that this means the real problem is with an over zealous DMA debug logic. Unfortunately, I can think of other scenarios where things are not so simple: For example, what happens if a crypto API user has a buffer, which it divides into two parts, and then submit a crypto op on one part and another crypto op on the 2nd part (say encrypt and hash, just as an example). For the best of my knowledge, there is nothing forcing the split between the two parts to fall on a cache line. This can cause a double mapping of the same cache line - and this time the warning is real, because we are not guaranteed a single DMA operation following the two mappings. There is nothing much a crypto driver can do even - the two operations don't have to be done by the same driver at all... I believe the scenario you are proposing to test is a benign example of a larger issue. I also believe this is an example of Worse in Better* and that the right solution is to dictate certain rules on the callers of the crypto API. Whether these rules should or should not include a limitation of not passing the same buffer via two different scatter gather list to the same crypto op is debatable, but I think we cannot run away from defining some rules. I would really love for others to voice an opinion on this. It seems a rather narrow discussion so far between the two of us on what I feel is a broader issue. Thanks! Gilad * https://dreamsongs.com/RiseOfWorseIsBetter.html
On Sat, Mar 26, 2022 at 12:11:59AM -0700, Eric Biggers wrote: > From: Eric Biggers <ebiggers@google.com> > > As was established in the thread > https://lore.kernel.org/linux-crypto/20220223080400.139367-1-gilad@benyossef.com/T/#u, > many crypto API users doing in-place en/decryption don't use the same > scatterlist pointers for the source and destination, but rather use > separate scatterlists that point to the same memory. This case isn't > tested by the self-tests, resulting in bugs. > > This is the natural usage of the crypto API in some cases, so requiring > API users to avoid this usage is not reasonable. > > Therefore, update the self-tests to start testing this case. > > Signed-off-by: Eric Biggers <ebiggers@google.com> FWIW, I tried this with QAT and I don't see any issue. Regards,
On Sun, Mar 27, 2022 at 09:04:43AM +0300, Gilad Ben-Yossef wrote: > On Sat, Mar 26, 2022 at 10:13 AM Eric Biggers <ebiggers@kernel.org> wrote: > > > > From: Eric Biggers <ebiggers@google.com> > > > > As was established in the thread > > https://lore.kernel.org/linux-crypto/20220223080400.139367-1-gilad@benyossef.com/T/#u, > > many crypto API users doing in-place en/decryption don't use the same > > scatterlist pointers for the source and destination, but rather use > > separate scatterlists that point to the same memory. This case isn't > > tested by the self-tests, resulting in bugs. > > > > This is the natural usage of the crypto API in some cases, so requiring > > API users to avoid this usage is not reasonable. > > > > Therefore, update the self-tests to start testing this case. > > > > Signed-off-by: Eric Biggers <ebiggers@google.com> > > > Thank you Eric. > > I have given this a lot of thought and here is what I predict will > happen thanks to this added test: > - We will not find a driver that this breaks, in the sense of > producing wrong results and triggering failure in this test. > - We probably will see drivers that when running this test when DMA > debug is compiled and enabled trigger the debug warning about double > DMA mapping of the same cache line. > > The reason is that these double mapping stemming from this test will > be from mapping the same buffer as source and destination. > As such, the situation that is the cause for the DMA debug warning, of > a mapping causing cache flush invalidate, followed by DMA, followed > by another mapping causing cache flush/invalidate while the DMA is in > flight, will not happen. Instead we will have mapping -> > flush/invalidate -> another mapping -> flush/invalidate -> DMA ... > > Note, this is certainly not a claim we should not add this test! on > the contrary ... > > In fact, I would be tempted to claim that this means the real problem > is with an over zealous DMA debug logic. Unfortunately, I can think of > other scenarios where things are not so simple: > > For example, what happens if a crypto API user has a buffer, which it > divides into two parts, and then submit a crypto op on one part and > another crypto op on the 2nd part (say encrypt and hash, just as an > example). For the best of my knowledge, there is nothing forcing the > split between the two parts to fall on a cache line. This can cause a > double mapping of the same cache line - and this time the warning is > real, because we are not guaranteed a single DMA operation following > the two mappings. There is nothing much a crypto driver can do even - > the two operations don't have to be done by the same driver at all... > > I believe the scenario you are proposing to test is a benign example > of a larger issue. I also believe this is an example of Worse in > Better* and that the right solution is to dictate certain rules on the > callers of the crypto API. Whether these rules should or should not > include a limitation of not passing the same buffer via two different > scatter gather list to the same crypto op is debatable, but I think we > cannot run away from defining some rules. > > I would really love for others to voice an opinion on this. It seems a > rather narrow discussion so far between the two of us on what I feel > is a broader issue. I don't have an answer, sorry. I personally don't actually have a lot of interest in the crypto accelerator support in the crypto API, since in the domain I work in (storage encryption) it's much more common for inline encryption hardware to be used instead, and that has its own support in the Linux block layer, separate from the crypto API. If there are fundamental issues with how crypto accelerators are supported in the crypto API, then I think that the people who actually care about such hardware need to get together to create a plan for correctly supporting it. Doing separate crypto operations on contiguous buffers is absolutely something that users can expect to work, so if that in fact cannot work, then I expect that this limitation will need to be very explicitly documented and checked in the crypto API, and users will need to explicitly opt-in to being able to use crypto accelerators rather than having them (sort of) be used by default. - Eric
On Tue, Apr 5, 2022 at 2:34 AM Eric Biggers <ebiggers@kernel.org> wrote: > > On Sun, Mar 27, 2022 at 09:04:43AM +0300, Gilad Ben-Yossef wrote: > > On Sat, Mar 26, 2022 at 10:13 AM Eric Biggers <ebiggers@kernel.org> wrote: > > > > > > From: Eric Biggers <ebiggers@google.com> > > > > > > As was established in the thread > > > https://lore.kernel.org/linux-crypto/20220223080400.139367-1-gilad@benyossef.com/T/#u, > > > many crypto API users doing in-place en/decryption don't use the same > > > scatterlist pointers for the source and destination, but rather use > > > separate scatterlists that point to the same memory. This case isn't > > > tested by the self-tests, resulting in bugs. > > > > > > This is the natural usage of the crypto API in some cases, so requiring > > > API users to avoid this usage is not reasonable. > > > > > > Therefore, update the self-tests to start testing this case. > > > > > > Signed-off-by: Eric Biggers <ebiggers@google.com> > > > > > > Thank you Eric. > > > > I have given this a lot of thought and here is what I predict will > > happen thanks to this added test: > > - We will not find a driver that this breaks, in the sense of > > producing wrong results and triggering failure in this test. > > - We probably will see drivers that when running this test when DMA > > debug is compiled and enabled trigger the debug warning about double > > DMA mapping of the same cache line. > > > > The reason is that these double mapping stemming from this test will > > be from mapping the same buffer as source and destination. > > As such, the situation that is the cause for the DMA debug warning, of > > a mapping causing cache flush invalidate, followed by DMA, followed > > by another mapping causing cache flush/invalidate while the DMA is in > > flight, will not happen. Instead we will have mapping -> > > flush/invalidate -> another mapping -> flush/invalidate -> DMA ... > > > > Note, this is certainly not a claim we should not add this test! on > > the contrary ... > > > > In fact, I would be tempted to claim that this means the real problem > > is with an over zealous DMA debug logic. Unfortunately, I can think of > > other scenarios where things are not so simple: > > > > For example, what happens if a crypto API user has a buffer, which it > > divides into two parts, and then submit a crypto op on one part and > > another crypto op on the 2nd part (say encrypt and hash, just as an > > example). For the best of my knowledge, there is nothing forcing the > > split between the two parts to fall on a cache line. This can cause a > > double mapping of the same cache line - and this time the warning is > > real, because we are not guaranteed a single DMA operation following > > the two mappings. There is nothing much a crypto driver can do even - > > the two operations don't have to be done by the same driver at all... > > > > I believe the scenario you are proposing to test is a benign example > > of a larger issue. I also believe this is an example of Worse in > > Better* and that the right solution is to dictate certain rules on the > > callers of the crypto API. Whether these rules should or should not > > include a limitation of not passing the same buffer via two different > > scatter gather list to the same crypto op is debatable, but I think we > > cannot run away from defining some rules. > > > > I would really love for others to voice an opinion on this. It seems a > > rather narrow discussion so far between the two of us on what I feel > > is a broader issue. > > I don't have an answer, sorry. > > I personally don't actually have a lot of interest in the crypto accelerator > support in the crypto API, since in the domain I work in (storage encryption) > it's much more common for inline encryption hardware to be used instead, and > that has its own support in the Linux block layer, separate from the crypto API. > > If there are fundamental issues with how crypto accelerators are supported in > the crypto API, then I think that the people who actually care about such > hardware need to get together to create a plan for correctly supporting it. > Doing separate crypto operations on contiguous buffers is absolutely something > that users can expect to work, so if that in fact cannot work, then I expect > that this limitation will need to be very explicitly documented and checked in > the crypto API, and users will need to explicitly opt-in to being able to use > crypto accelerators rather than having them (sort of) be used by default. Very well. Since the issue I've raised is right now an theoretical issue, I'm going to leave it be for now. FWIW, the ccree driver passes this extra test with no failures. Thanks, Gilad
Eric Biggers <ebiggers@kernel.org> wrote: > From: Eric Biggers <ebiggers@google.com> > > As was established in the thread > https://lore.kernel.org/linux-crypto/20220223080400.139367-1-gilad@benyossef.com/T/#u, > many crypto API users doing in-place en/decryption don't use the same > scatterlist pointers for the source and destination, but rather use > separate scatterlists that point to the same memory. This case isn't > tested by the self-tests, resulting in bugs. > > This is the natural usage of the crypto API in some cases, so requiring > API users to avoid this usage is not reasonable. > > Therefore, update the self-tests to start testing this case. > > Signed-off-by: Eric Biggers <ebiggers@google.com> > --- > crypto/testmgr.c | 75 ++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 63 insertions(+), 12 deletions(-) Patch applied. Thanks.
diff --git a/crypto/testmgr.c b/crypto/testmgr.c index 2d632a285869f..24a1f9365b9bb 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -232,6 +232,20 @@ enum finalization_type { FINALIZATION_TYPE_DIGEST, /* use digest() */ }; +/* + * Whether the crypto operation will occur in-place, and if so whether the + * source and destination scatterlist pointers will coincide (req->src == + * req->dst), or whether they'll merely point to two separate scatterlists + * (req->src != req->dst) that reference the same underlying memory. + * + * This is only relevant for algorithm types that support in-place operation. + */ +enum inplace_mode { + OUT_OF_PLACE, + INPLACE_ONE_SGLIST, + INPLACE_TWO_SGLISTS, +}; + #define TEST_SG_TOTAL 10000 /** @@ -265,7 +279,7 @@ struct test_sg_division { * crypto test vector can be tested. * * @name: name of this config, logged for debugging purposes if a test fails - * @inplace: operate on the data in-place, if applicable for the algorithm type? + * @inplace_mode: whether and how to operate on the data in-place, if applicable * @req_flags: extra request_flags, e.g. CRYPTO_TFM_REQ_MAY_SLEEP * @src_divs: description of how to arrange the source scatterlist * @dst_divs: description of how to arrange the dst scatterlist, if applicable @@ -282,7 +296,7 @@ struct test_sg_division { */ struct testvec_config { const char *name; - bool inplace; + enum inplace_mode inplace_mode; u32 req_flags; struct test_sg_division src_divs[XBUFSIZE]; struct test_sg_division dst_divs[XBUFSIZE]; @@ -307,11 +321,16 @@ struct testvec_config { /* Configs for skciphers and aeads */ static const struct testvec_config default_cipher_testvec_configs[] = { { - .name = "in-place", - .inplace = true, + .name = "in-place (one sglist)", + .inplace_mode = INPLACE_ONE_SGLIST, + .src_divs = { { .proportion_of_total = 10000 } }, + }, { + .name = "in-place (two sglists)", + .inplace_mode = INPLACE_TWO_SGLISTS, .src_divs = { { .proportion_of_total = 10000 } }, }, { .name = "out-of-place", + .inplace_mode = OUT_OF_PLACE, .src_divs = { { .proportion_of_total = 10000 } }, }, { .name = "unaligned buffer, offset=1", @@ -349,7 +368,7 @@ static const struct testvec_config default_cipher_testvec_configs[] = { .key_offset = 3, }, { .name = "misaligned splits crossing pages, inplace", - .inplace = true, + .inplace_mode = INPLACE_ONE_SGLIST, .src_divs = { { .proportion_of_total = 7500, @@ -749,18 +768,39 @@ static int build_cipher_test_sglists(struct cipher_test_sglists *tsgls, iov_iter_kvec(&input, WRITE, inputs, nr_inputs, src_total_len); err = build_test_sglist(&tsgls->src, cfg->src_divs, alignmask, - cfg->inplace ? + cfg->inplace_mode != OUT_OF_PLACE ? max(dst_total_len, src_total_len) : src_total_len, &input, NULL); if (err) return err; - if (cfg->inplace) { + /* + * In-place crypto operations can use the same scatterlist for both the + * source and destination (req->src == req->dst), or can use separate + * scatterlists (req->src != req->dst) which point to the same + * underlying memory. Make sure to test both cases. + */ + if (cfg->inplace_mode == INPLACE_ONE_SGLIST) { tsgls->dst.sgl_ptr = tsgls->src.sgl; tsgls->dst.nents = tsgls->src.nents; return 0; } + if (cfg->inplace_mode == INPLACE_TWO_SGLISTS) { + /* + * For now we keep it simple and only test the case where the + * two scatterlists have identical entries, rather than + * different entries that split up the same memory differently. + */ + memcpy(tsgls->dst.sgl, tsgls->src.sgl, + tsgls->src.nents * sizeof(tsgls->src.sgl[0])); + memcpy(tsgls->dst.sgl_saved, tsgls->src.sgl, + tsgls->src.nents * sizeof(tsgls->src.sgl[0])); + tsgls->dst.sgl_ptr = tsgls->dst.sgl; + tsgls->dst.nents = tsgls->src.nents; + return 0; + } + /* Out of place */ return build_test_sglist(&tsgls->dst, cfg->dst_divs[0].proportion_of_total ? cfg->dst_divs : cfg->src_divs, @@ -995,9 +1035,19 @@ static void generate_random_testvec_config(struct testvec_config *cfg, p += scnprintf(p, end - p, "random:"); - if (prandom_u32() % 2 == 0) { - cfg->inplace = true; - p += scnprintf(p, end - p, " inplace"); + switch (prandom_u32() % 4) { + case 0: + case 1: + cfg->inplace_mode = OUT_OF_PLACE; + break; + case 2: + cfg->inplace_mode = INPLACE_ONE_SGLIST; + p += scnprintf(p, end - p, " inplace_one_sglist"); + break; + default: + cfg->inplace_mode = INPLACE_TWO_SGLISTS; + p += scnprintf(p, end - p, " inplace_two_sglists"); + break; } if (prandom_u32() % 2 == 0) { @@ -1034,7 +1084,7 @@ static void generate_random_testvec_config(struct testvec_config *cfg, cfg->req_flags); p += scnprintf(p, end - p, "]"); - if (!cfg->inplace && prandom_u32() % 2 == 0) { + if (cfg->inplace_mode == OUT_OF_PLACE && prandom_u32() % 2 == 0) { p += scnprintf(p, end - p, " dst_divs=["); p = generate_random_sgl_divisions(cfg->dst_divs, ARRAY_SIZE(cfg->dst_divs), @@ -2085,7 +2135,8 @@ static int test_aead_vec_cfg(int enc, const struct aead_testvec *vec, /* Check for the correct output (ciphertext or plaintext) */ err = verify_correct_output(&tsgls->dst, enc ? vec->ctext : vec->ptext, enc ? vec->clen : vec->plen, - vec->alen, enc || !cfg->inplace); + vec->alen, + enc || cfg->inplace_mode == OUT_OF_PLACE); if (err == -EOVERFLOW) { pr_err("alg: aead: %s %s overran dst buffer on test vector %s, cfg=\"%s\"\n", driver, op, vec_name, cfg->name);