diff mbox series

crypto: testmgr - test in-place en/decryption with two sglists

Message ID 20220326071159.56056-1-ebiggers@kernel.org (mailing list archive)
State Accepted
Delegated to: Herbert Xu
Headers show
Series crypto: testmgr - test in-place en/decryption with two sglists | expand

Commit Message

Eric Biggers March 26, 2022, 7:11 a.m. UTC
From: Eric Biggers <ebiggers@google.com>

As was established in the thread
https://lore.kernel.org/linux-crypto/20220223080400.139367-1-gilad@benyossef.com/T/#u,
many crypto API users doing in-place en/decryption don't use the same
scatterlist pointers for the source and destination, but rather use
separate scatterlists that point to the same memory.  This case isn't
tested by the self-tests, resulting in bugs.

This is the natural usage of the crypto API in some cases, so requiring
API users to avoid this usage is not reasonable.

Therefore, update the self-tests to start testing this case.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 crypto/testmgr.c | 75 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 63 insertions(+), 12 deletions(-)


base-commit: 46f538bf2404ee9c32648deafb886f49144bfd5e

Comments

Gilad Ben-Yossef March 27, 2022, 6:04 a.m. UTC | #1
On Sat, Mar 26, 2022 at 10:13 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> From: Eric Biggers <ebiggers@google.com>
>
> As was established in the thread
> https://lore.kernel.org/linux-crypto/20220223080400.139367-1-gilad@benyossef.com/T/#u,
> many crypto API users doing in-place en/decryption don't use the same
> scatterlist pointers for the source and destination, but rather use
> separate scatterlists that point to the same memory.  This case isn't
> tested by the self-tests, resulting in bugs.
>
> This is the natural usage of the crypto API in some cases, so requiring
> API users to avoid this usage is not reasonable.
>
> Therefore, update the self-tests to start testing this case.
>
> Signed-off-by: Eric Biggers <ebiggers@google.com>


Thank you Eric.

I have given this a lot of thought and here is what I predict will
happen thanks to this added test:
- We will not find a driver that this breaks, in the sense of
producing wrong results and triggering failure in this test.
- We probably will see drivers that when running this test when DMA
debug is compiled and enabled trigger the debug warning about double
DMA mapping of the same cache line.

The reason is that these double mapping stemming from this test will
be from mapping the same buffer as source and destination.
As such, the situation that is the cause for the DMA debug warning, of
a mapping causing  cache flush invalidate, followed by DMA, followed
by another mapping causing cache flush/invalidate while the DMA is in
flight, will not happen. Instead we will have mapping ->
flush/invalidate -> another mapping -> flush/invalidate -> DMA ...

Note, this is certainly not a claim we should not add this test! on
the contrary ...

In fact, I would be tempted to claim that this means the real problem
is with an over zealous DMA debug logic. Unfortunately, I can think of
other scenarios where things are not so simple:

For example, what happens if a crypto API user has a buffer, which it
divides into two parts, and then submit a crypto op on one part and
another crypto op on the 2nd part (say encrypt and hash, just as an
example). For the best of my knowledge, there is nothing forcing the
split between the two parts to fall on a cache line. This can cause a
double mapping of the same cache line - and this time the warning is
real, because we are not guaranteed a single DMA operation following
the two mappings. There is nothing much a crypto driver can do even -
the two operations don't have to be done by the same driver at all...

I believe the scenario you are proposing to test is a benign example
of a larger issue. I also believe this is an example of Worse in
Better* and that the right solution is to dictate certain rules on the
callers of the crypto API. Whether these rules should or should not
include a limitation of not passing the same buffer via two different
scatter gather list to the same crypto op is debatable, but I think we
cannot run away from defining some rules.

I would really love for others to voice an opinion on this. It seems a
rather narrow discussion so far between the two of us on what I feel
is  a broader issue.

Thanks!
Gilad

* https://dreamsongs.com/RiseOfWorseIsBetter.html
Cabiddu, Giovanni April 1, 2022, 8:30 a.m. UTC | #2
On Sat, Mar 26, 2022 at 12:11:59AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> As was established in the thread
> https://lore.kernel.org/linux-crypto/20220223080400.139367-1-gilad@benyossef.com/T/#u,
> many crypto API users doing in-place en/decryption don't use the same
> scatterlist pointers for the source and destination, but rather use
> separate scatterlists that point to the same memory.  This case isn't
> tested by the self-tests, resulting in bugs.
> 
> This is the natural usage of the crypto API in some cases, so requiring
> API users to avoid this usage is not reasonable.
> 
> Therefore, update the self-tests to start testing this case.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>
FWIW, I tried this with QAT and I don't see any issue.

Regards,
Eric Biggers April 4, 2022, 11:34 p.m. UTC | #3
On Sun, Mar 27, 2022 at 09:04:43AM +0300, Gilad Ben-Yossef wrote:
> On Sat, Mar 26, 2022 at 10:13 AM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > From: Eric Biggers <ebiggers@google.com>
> >
> > As was established in the thread
> > https://lore.kernel.org/linux-crypto/20220223080400.139367-1-gilad@benyossef.com/T/#u,
> > many crypto API users doing in-place en/decryption don't use the same
> > scatterlist pointers for the source and destination, but rather use
> > separate scatterlists that point to the same memory.  This case isn't
> > tested by the self-tests, resulting in bugs.
> >
> > This is the natural usage of the crypto API in some cases, so requiring
> > API users to avoid this usage is not reasonable.
> >
> > Therefore, update the self-tests to start testing this case.
> >
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> 
> 
> Thank you Eric.
> 
> I have given this a lot of thought and here is what I predict will
> happen thanks to this added test:
> - We will not find a driver that this breaks, in the sense of
> producing wrong results and triggering failure in this test.
> - We probably will see drivers that when running this test when DMA
> debug is compiled and enabled trigger the debug warning about double
> DMA mapping of the same cache line.
> 
> The reason is that these double mapping stemming from this test will
> be from mapping the same buffer as source and destination.
> As such, the situation that is the cause for the DMA debug warning, of
> a mapping causing  cache flush invalidate, followed by DMA, followed
> by another mapping causing cache flush/invalidate while the DMA is in
> flight, will not happen. Instead we will have mapping ->
> flush/invalidate -> another mapping -> flush/invalidate -> DMA ...
> 
> Note, this is certainly not a claim we should not add this test! on
> the contrary ...
> 
> In fact, I would be tempted to claim that this means the real problem
> is with an over zealous DMA debug logic. Unfortunately, I can think of
> other scenarios where things are not so simple:
> 
> For example, what happens if a crypto API user has a buffer, which it
> divides into two parts, and then submit a crypto op on one part and
> another crypto op on the 2nd part (say encrypt and hash, just as an
> example). For the best of my knowledge, there is nothing forcing the
> split between the two parts to fall on a cache line. This can cause a
> double mapping of the same cache line - and this time the warning is
> real, because we are not guaranteed a single DMA operation following
> the two mappings. There is nothing much a crypto driver can do even -
> the two operations don't have to be done by the same driver at all...
> 
> I believe the scenario you are proposing to test is a benign example
> of a larger issue. I also believe this is an example of Worse in
> Better* and that the right solution is to dictate certain rules on the
> callers of the crypto API. Whether these rules should or should not
> include a limitation of not passing the same buffer via two different
> scatter gather list to the same crypto op is debatable, but I think we
> cannot run away from defining some rules.
> 
> I would really love for others to voice an opinion on this. It seems a
> rather narrow discussion so far between the two of us on what I feel
> is  a broader issue.

I don't have an answer, sorry.

I personally don't actually have a lot of interest in the crypto accelerator
support in the crypto API, since in the domain I work in (storage encryption)
it's much more common for inline encryption hardware to be used instead, and
that has its own support in the Linux block layer, separate from the crypto API.

If there are fundamental issues with how crypto accelerators are supported in
the crypto API, then I think that the people who actually care about such
hardware need to get together to create a plan for correctly supporting it.
Doing separate crypto operations on contiguous buffers is absolutely something
that users can expect to work, so if that in fact cannot work, then I expect
that this limitation will need to be very explicitly documented and checked in
the crypto API, and users will need to explicitly opt-in to being able to use
crypto accelerators rather than having them (sort of) be used by default.

- Eric
Gilad Ben-Yossef April 6, 2022, 8:16 a.m. UTC | #4
On Tue, Apr 5, 2022 at 2:34 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Sun, Mar 27, 2022 at 09:04:43AM +0300, Gilad Ben-Yossef wrote:
> > On Sat, Mar 26, 2022 at 10:13 AM Eric Biggers <ebiggers@kernel.org> wrote:
> > >
> > > From: Eric Biggers <ebiggers@google.com>
> > >
> > > As was established in the thread
> > > https://lore.kernel.org/linux-crypto/20220223080400.139367-1-gilad@benyossef.com/T/#u,
> > > many crypto API users doing in-place en/decryption don't use the same
> > > scatterlist pointers for the source and destination, but rather use
> > > separate scatterlists that point to the same memory.  This case isn't
> > > tested by the self-tests, resulting in bugs.
> > >
> > > This is the natural usage of the crypto API in some cases, so requiring
> > > API users to avoid this usage is not reasonable.
> > >
> > > Therefore, update the self-tests to start testing this case.
> > >
> > > Signed-off-by: Eric Biggers <ebiggers@google.com>
> >
> >
> > Thank you Eric.
> >
> > I have given this a lot of thought and here is what I predict will
> > happen thanks to this added test:
> > - We will not find a driver that this breaks, in the sense of
> > producing wrong results and triggering failure in this test.
> > - We probably will see drivers that when running this test when DMA
> > debug is compiled and enabled trigger the debug warning about double
> > DMA mapping of the same cache line.
> >
> > The reason is that these double mapping stemming from this test will
> > be from mapping the same buffer as source and destination.
> > As such, the situation that is the cause for the DMA debug warning, of
> > a mapping causing  cache flush invalidate, followed by DMA, followed
> > by another mapping causing cache flush/invalidate while the DMA is in
> > flight, will not happen. Instead we will have mapping ->
> > flush/invalidate -> another mapping -> flush/invalidate -> DMA ...
> >
> > Note, this is certainly not a claim we should not add this test! on
> > the contrary ...
> >
> > In fact, I would be tempted to claim that this means the real problem
> > is with an over zealous DMA debug logic. Unfortunately, I can think of
> > other scenarios where things are not so simple:
> >
> > For example, what happens if a crypto API user has a buffer, which it
> > divides into two parts, and then submit a crypto op on one part and
> > another crypto op on the 2nd part (say encrypt and hash, just as an
> > example). For the best of my knowledge, there is nothing forcing the
> > split between the two parts to fall on a cache line. This can cause a
> > double mapping of the same cache line - and this time the warning is
> > real, because we are not guaranteed a single DMA operation following
> > the two mappings. There is nothing much a crypto driver can do even -
> > the two operations don't have to be done by the same driver at all...
> >
> > I believe the scenario you are proposing to test is a benign example
> > of a larger issue. I also believe this is an example of Worse in
> > Better* and that the right solution is to dictate certain rules on the
> > callers of the crypto API. Whether these rules should or should not
> > include a limitation of not passing the same buffer via two different
> > scatter gather list to the same crypto op is debatable, but I think we
> > cannot run away from defining some rules.
> >
> > I would really love for others to voice an opinion on this. It seems a
> > rather narrow discussion so far between the two of us on what I feel
> > is  a broader issue.
>
> I don't have an answer, sorry.
>
> I personally don't actually have a lot of interest in the crypto accelerator
> support in the crypto API, since in the domain I work in (storage encryption)
> it's much more common for inline encryption hardware to be used instead, and
> that has its own support in the Linux block layer, separate from the crypto API.
>
> If there are fundamental issues with how crypto accelerators are supported in
> the crypto API, then I think that the people who actually care about such
> hardware need to get together to create a plan for correctly supporting it.
> Doing separate crypto operations on contiguous buffers is absolutely something
> that users can expect to work, so if that in fact cannot work, then I expect
> that this limitation will need to be very explicitly documented and checked in
> the crypto API, and users will need to explicitly opt-in to being able to use
> crypto accelerators rather than having them (sort of) be used by default.

Very well. Since the issue I've raised is right now an theoretical
issue, I'm going to leave it be for now.

FWIW, the ccree driver passes this extra test with no failures.

Thanks,
Gilad
Herbert Xu April 8, 2022, 8:31 a.m. UTC | #5
Eric Biggers <ebiggers@kernel.org> wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> As was established in the thread
> https://lore.kernel.org/linux-crypto/20220223080400.139367-1-gilad@benyossef.com/T/#u,
> many crypto API users doing in-place en/decryption don't use the same
> scatterlist pointers for the source and destination, but rather use
> separate scatterlists that point to the same memory.  This case isn't
> tested by the self-tests, resulting in bugs.
> 
> This is the natural usage of the crypto API in some cases, so requiring
> API users to avoid this usage is not reasonable.
> 
> Therefore, update the self-tests to start testing this case.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
> crypto/testmgr.c | 75 ++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 63 insertions(+), 12 deletions(-)

Patch applied.  Thanks.
diff mbox series

Patch

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 2d632a285869f..24a1f9365b9bb 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -232,6 +232,20 @@  enum finalization_type {
 	FINALIZATION_TYPE_DIGEST,	/* use digest() */
 };
 
+/*
+ * Whether the crypto operation will occur in-place, and if so whether the
+ * source and destination scatterlist pointers will coincide (req->src ==
+ * req->dst), or whether they'll merely point to two separate scatterlists
+ * (req->src != req->dst) that reference the same underlying memory.
+ *
+ * This is only relevant for algorithm types that support in-place operation.
+ */
+enum inplace_mode {
+	OUT_OF_PLACE,
+	INPLACE_ONE_SGLIST,
+	INPLACE_TWO_SGLISTS,
+};
+
 #define TEST_SG_TOTAL	10000
 
 /**
@@ -265,7 +279,7 @@  struct test_sg_division {
  * crypto test vector can be tested.
  *
  * @name: name of this config, logged for debugging purposes if a test fails
- * @inplace: operate on the data in-place, if applicable for the algorithm type?
+ * @inplace_mode: whether and how to operate on the data in-place, if applicable
  * @req_flags: extra request_flags, e.g. CRYPTO_TFM_REQ_MAY_SLEEP
  * @src_divs: description of how to arrange the source scatterlist
  * @dst_divs: description of how to arrange the dst scatterlist, if applicable
@@ -282,7 +296,7 @@  struct test_sg_division {
  */
 struct testvec_config {
 	const char *name;
-	bool inplace;
+	enum inplace_mode inplace_mode;
 	u32 req_flags;
 	struct test_sg_division src_divs[XBUFSIZE];
 	struct test_sg_division dst_divs[XBUFSIZE];
@@ -307,11 +321,16 @@  struct testvec_config {
 /* Configs for skciphers and aeads */
 static const struct testvec_config default_cipher_testvec_configs[] = {
 	{
-		.name = "in-place",
-		.inplace = true,
+		.name = "in-place (one sglist)",
+		.inplace_mode = INPLACE_ONE_SGLIST,
+		.src_divs = { { .proportion_of_total = 10000 } },
+	}, {
+		.name = "in-place (two sglists)",
+		.inplace_mode = INPLACE_TWO_SGLISTS,
 		.src_divs = { { .proportion_of_total = 10000 } },
 	}, {
 		.name = "out-of-place",
+		.inplace_mode = OUT_OF_PLACE,
 		.src_divs = { { .proportion_of_total = 10000 } },
 	}, {
 		.name = "unaligned buffer, offset=1",
@@ -349,7 +368,7 @@  static const struct testvec_config default_cipher_testvec_configs[] = {
 		.key_offset = 3,
 	}, {
 		.name = "misaligned splits crossing pages, inplace",
-		.inplace = true,
+		.inplace_mode = INPLACE_ONE_SGLIST,
 		.src_divs = {
 			{
 				.proportion_of_total = 7500,
@@ -749,18 +768,39 @@  static int build_cipher_test_sglists(struct cipher_test_sglists *tsgls,
 
 	iov_iter_kvec(&input, WRITE, inputs, nr_inputs, src_total_len);
 	err = build_test_sglist(&tsgls->src, cfg->src_divs, alignmask,
-				cfg->inplace ?
+				cfg->inplace_mode != OUT_OF_PLACE ?
 					max(dst_total_len, src_total_len) :
 					src_total_len,
 				&input, NULL);
 	if (err)
 		return err;
 
-	if (cfg->inplace) {
+	/*
+	 * In-place crypto operations can use the same scatterlist for both the
+	 * source and destination (req->src == req->dst), or can use separate
+	 * scatterlists (req->src != req->dst) which point to the same
+	 * underlying memory.  Make sure to test both cases.
+	 */
+	if (cfg->inplace_mode == INPLACE_ONE_SGLIST) {
 		tsgls->dst.sgl_ptr = tsgls->src.sgl;
 		tsgls->dst.nents = tsgls->src.nents;
 		return 0;
 	}
+	if (cfg->inplace_mode == INPLACE_TWO_SGLISTS) {
+		/*
+		 * For now we keep it simple and only test the case where the
+		 * two scatterlists have identical entries, rather than
+		 * different entries that split up the same memory differently.
+		 */
+		memcpy(tsgls->dst.sgl, tsgls->src.sgl,
+		       tsgls->src.nents * sizeof(tsgls->src.sgl[0]));
+		memcpy(tsgls->dst.sgl_saved, tsgls->src.sgl,
+		       tsgls->src.nents * sizeof(tsgls->src.sgl[0]));
+		tsgls->dst.sgl_ptr = tsgls->dst.sgl;
+		tsgls->dst.nents = tsgls->src.nents;
+		return 0;
+	}
+	/* Out of place */
 	return build_test_sglist(&tsgls->dst,
 				 cfg->dst_divs[0].proportion_of_total ?
 					cfg->dst_divs : cfg->src_divs,
@@ -995,9 +1035,19 @@  static void generate_random_testvec_config(struct testvec_config *cfg,
 
 	p += scnprintf(p, end - p, "random:");
 
-	if (prandom_u32() % 2 == 0) {
-		cfg->inplace = true;
-		p += scnprintf(p, end - p, " inplace");
+	switch (prandom_u32() % 4) {
+	case 0:
+	case 1:
+		cfg->inplace_mode = OUT_OF_PLACE;
+		break;
+	case 2:
+		cfg->inplace_mode = INPLACE_ONE_SGLIST;
+		p += scnprintf(p, end - p, " inplace_one_sglist");
+		break;
+	default:
+		cfg->inplace_mode = INPLACE_TWO_SGLISTS;
+		p += scnprintf(p, end - p, " inplace_two_sglists");
+		break;
 	}
 
 	if (prandom_u32() % 2 == 0) {
@@ -1034,7 +1084,7 @@  static void generate_random_testvec_config(struct testvec_config *cfg,
 					  cfg->req_flags);
 	p += scnprintf(p, end - p, "]");
 
-	if (!cfg->inplace && prandom_u32() % 2 == 0) {
+	if (cfg->inplace_mode == OUT_OF_PLACE && prandom_u32() % 2 == 0) {
 		p += scnprintf(p, end - p, " dst_divs=[");
 		p = generate_random_sgl_divisions(cfg->dst_divs,
 						  ARRAY_SIZE(cfg->dst_divs),
@@ -2085,7 +2135,8 @@  static int test_aead_vec_cfg(int enc, const struct aead_testvec *vec,
 	/* Check for the correct output (ciphertext or plaintext) */
 	err = verify_correct_output(&tsgls->dst, enc ? vec->ctext : vec->ptext,
 				    enc ? vec->clen : vec->plen,
-				    vec->alen, enc || !cfg->inplace);
+				    vec->alen,
+				    enc || cfg->inplace_mode == OUT_OF_PLACE);
 	if (err == -EOVERFLOW) {
 		pr_err("alg: aead: %s %s overran dst buffer on test vector %s, cfg=\"%s\"\n",
 		       driver, op, vec_name, cfg->name);