diff mbox series

[v2,09/11] crypto: blake2s - share the "shash" API boilerplate code

Message ID 20201217222138.170526-10-ebiggers@kernel.org (mailing list archive)
State New, archived
Headers show
Series crypto: arm32-optimized BLAKE2b and BLAKE2s | expand

Commit Message

Eric Biggers Dec. 17, 2020, 10:21 p.m. UTC
From: Eric Biggers <ebiggers@google.com>

Move the boilerplate code for setkey(), init(), update(), and final()
from blake2s_generic.ko into a new module blake2s_helpers.ko, and export
it so that it can be used by other shash implementations of BLAKE2s.

setkey() and init() are exported as-is, while update() and final() have
a blake2s_compress_t function pointer argument added.  This allows the
implementation of the compression function to be overridden, which is
the only part that optimized implementations really care about.

The helper functions are defined in a separate module blake2s_helpers.ko
(rather than just than in blake2s_generic.ko) because we can't simply
select CRYPTO_BLAKE2B from CRYPTO_BLAKE2S_X86.  Doing this selection
unconditionally would make the library API select the shash API, while
doing it conditionally on CRYPTO_HASH would create a recursive kconfig
dependency on CRYPTO_HASH.  As a bonus, using a separate module also
allows the generic implementation to be omitted when unneeded.

These helper functions very closely match the ones I defined for
BLAKE2b, except the BLAKE2b ones didn't go in a separate module yet
because BLAKE2b isn't exposed through the library API yet.

Finally, use these new helper functions in the x86 implementation of
BLAKE2s.  (This part should be a separate patch, but unfortunately the
x86 implementation used the exact same function names like
"crypto_blake2s_update()", so it had to be updated at the same time.)

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/blake2s-glue.c    | 74 +++-----------------------
 crypto/Kconfig                    |  5 ++
 crypto/Makefile                   |  1 +
 crypto/blake2s_generic.c          | 79 ++++------------------------
 crypto/blake2s_helpers.c          | 87 +++++++++++++++++++++++++++++++
 include/crypto/internal/blake2s.h | 17 ++++++
 6 files changed, 126 insertions(+), 137 deletions(-)
 create mode 100644 crypto/blake2s_helpers.c

Comments

Jason A. Donenfeld Dec. 18, 2020, 4:14 p.m. UTC | #1
On Thu, Dec 17, 2020 at 11:25 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> From: Eric Biggers <ebiggers@google.com>
>
> Move the boilerplate code for setkey(), init(), update(), and final()
> from blake2s_generic.ko into a new module blake2s_helpers.ko, and export
> it so that it can be used by other shash implementations of BLAKE2s.
>
> setkey() and init() are exported as-is, while update() and final() have
> a blake2s_compress_t function pointer argument added.  This allows the
> implementation of the compression function to be overridden, which is
> the only part that optimized implementations really care about.
>
> The helper functions are defined in a separate module blake2s_helpers.ko
> (rather than just than in blake2s_generic.ko) because we can't simply
> select CRYPTO_BLAKE2B from CRYPTO_BLAKE2S_X86.  Doing this selection
> unconditionally would make the library API select the shash API, while
> doing it conditionally on CRYPTO_HASH would create a recursive kconfig
> dependency on CRYPTO_HASH.  As a bonus, using a separate module also
> allows the generic implementation to be omitted when unneeded.
>
> These helper functions very closely match the ones I defined for
> BLAKE2b, except the BLAKE2b ones didn't go in a separate module yet
> because BLAKE2b isn't exposed through the library API yet.
>
> Finally, use these new helper functions in the x86 implementation of
> BLAKE2s.  (This part should be a separate patch, but unfortunately the
> x86 implementation used the exact same function names like
> "crypto_blake2s_update()", so it had to be updated at the same time.)

There's a similar situation happening with chacha20poly1305 and with
curve25519. Each of these uses a mildly different approach to how we
split things up between library and crypto api code. The _helpers.ko
is another one. There any opportunity here to take a more
unified/consistant approach? Or is shash slightly different than the
others and benefits from a third way?

Jason
Eric Biggers Dec. 18, 2020, 8:08 p.m. UTC | #2
On Fri, Dec 18, 2020 at 05:14:58PM +0100, Jason A. Donenfeld wrote:
> On Thu, Dec 17, 2020 at 11:25 PM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > From: Eric Biggers <ebiggers@google.com>
> >
> > Move the boilerplate code for setkey(), init(), update(), and final()
> > from blake2s_generic.ko into a new module blake2s_helpers.ko, and export
> > it so that it can be used by other shash implementations of BLAKE2s.
> >
> > setkey() and init() are exported as-is, while update() and final() have
> > a blake2s_compress_t function pointer argument added.  This allows the
> > implementation of the compression function to be overridden, which is
> > the only part that optimized implementations really care about.
> >
> > The helper functions are defined in a separate module blake2s_helpers.ko
> > (rather than just than in blake2s_generic.ko) because we can't simply
> > select CRYPTO_BLAKE2B from CRYPTO_BLAKE2S_X86.  Doing this selection
> > unconditionally would make the library API select the shash API, while
> > doing it conditionally on CRYPTO_HASH would create a recursive kconfig
> > dependency on CRYPTO_HASH.  As a bonus, using a separate module also
> > allows the generic implementation to be omitted when unneeded.
> >
> > These helper functions very closely match the ones I defined for
> > BLAKE2b, except the BLAKE2b ones didn't go in a separate module yet
> > because BLAKE2b isn't exposed through the library API yet.
> >
> > Finally, use these new helper functions in the x86 implementation of
> > BLAKE2s.  (This part should be a separate patch, but unfortunately the
> > x86 implementation used the exact same function names like
> > "crypto_blake2s_update()", so it had to be updated at the same time.)
> 
> There's a similar situation happening with chacha20poly1305 and with
> curve25519. Each of these uses a mildly different approach to how we
> split things up between library and crypto api code. The _helpers.ko
> is another one. There any opportunity here to take a more
> unified/consistant approach? Or is shash slightly different than the
> others and benefits from a third way?

What I'm doing isn't that new; it's basically the same thing that's already done
for sha1, sha256, sha512, sm3, and nhpoly1305.  The difference is just where the
shash helper functions are defined.  For sha1, sha256, sha512, and sm3 it's in
headers (include/crypto/{sha1,sha256,sha512,sm3}_base.h).  For nhpoly1305 it's
in the same module as the generic implementation (crypto/nhpoly1305.c).

As I mentioned in the commit message, putting the shash helpers in the same
module as the generic implementation doesn't work for blake2s due to the need
for the architecture-specific blake2s modules to support the library API (and
possibly *only* the library API) too.  So that mainly leaves the options of
adding include/crypto/blake2s_base.h, or adding blake2s_helpers.ko.  But these
approaches aren't fundamentally different -- it's just a matter of whether the
code is short enough for it to make sense to always inline it or not.

It's hard to compare to chacha20 and curve25519 (or chacha20poly1305), as those
are different types of algorithms.  poly1305 is the most comparable, but for
poly1305 the library API works differently in that there are
poly1305_{init,update,final}_arch(), not just blake2s_compress_arch().  This
seems to be due to some of the characteristics that optimized poly1305
implementations usually have -- they need to compute key powers differently, and
they may have an optimized implementation of emitting the final hash value, not
just processing blocks.

So that is going to result in a different implementation, as the
architecture-specific poly1305 modules have to implement the full
init/update/final sequence for the library API, while for blake2s they only need
that full sequence for the shash API.  Hence the usefulness of adding helper
functions for blake2s shash implementations but not poly1305.  I don't think
this difference is necessarily bad; it just means that optimized blake2s
implementations don't need to do as much themselves, in comparison to poly1305.

What we *could* do to reuse even more code than what I've proposed is to make
the functions in lib/crypto/blake2s.c usable as the helper functions for shash
implementations, like the following.  Then blake2s_helpers.ko wouldn't really be
necessary.  (A few bits would still be needed, but they could be inline
functions in a header.)  I thought this might be controversial though, as it
would add some indirection into the library implementation:

diff --git a/lib/crypto/blake2s.c b/lib/crypto/blake2s.c
index 41025a30c524c..5188e1ca43c60 100644
--- a/lib/crypto/blake2s.c
+++ b/lib/crypto/blake2s.c
@@ -20,6 +20,16 @@
 bool blake2s_selftest(void);
 
 void blake2s_update(struct blake2s_state *state, const u8 *in, size_t inlen)
+{
+	if (IS_ENABLED(CONFIG_CRYPTO_ARCH_HAVE_LIB_BLAKE2S))
+		__blake2s_update(state, in, inlen, blake2s_compress_arch);
+	else
+		__blake2s_update(state, in, inlen, blake2s_compress_generic);
+}
+EXPORT_SYMBOL(blake2s_update);
+
+void __blake2s_update(struct blake2s_state *state, const u8 *in, size_t inlen,
+		      blake2s_compress_t compress)
 {
 	const size_t fill = BLAKE2S_BLOCK_SIZE - state->buflen;
 
@@ -27,12 +37,7 @@ void blake2s_update(struct blake2s_state *state, const u8 *in, size_t inlen)
 		return;
 	if (inlen > fill) {
 		memcpy(state->buf + state->buflen, in, fill);
-		if (IS_ENABLED(CONFIG_CRYPTO_ARCH_HAVE_LIB_BLAKE2S))
-			blake2s_compress_arch(state, state->buf, 1,
-					      BLAKE2S_BLOCK_SIZE);
-		else
-			blake2s_compress_generic(state, state->buf, 1,
-						 BLAKE2S_BLOCK_SIZE);
+		(*compress)(state, state->buf, 1, BLAKE2S_BLOCK_SIZE);
 		state->buflen = 0;
 		in += fill;
 		inlen -= fill;
@@ -40,35 +45,37 @@ void blake2s_update(struct blake2s_state *state, const u8 *in, size_t inlen)
 	if (inlen > BLAKE2S_BLOCK_SIZE) {
 		const size_t nblocks = DIV_ROUND_UP(inlen, BLAKE2S_BLOCK_SIZE);
 		/* Hash one less (full) block than strictly possible */
-		if (IS_ENABLED(CONFIG_CRYPTO_ARCH_HAVE_LIB_BLAKE2S))
-			blake2s_compress_arch(state, in, nblocks - 1,
-					      BLAKE2S_BLOCK_SIZE);
-		else
-			blake2s_compress_generic(state, in, nblocks - 1,
-						 BLAKE2S_BLOCK_SIZE);
+		(*compress)(state, in, nblocks - 1, BLAKE2S_BLOCK_SIZE);
 		in += BLAKE2S_BLOCK_SIZE * (nblocks - 1);
 		inlen -= BLAKE2S_BLOCK_SIZE * (nblocks - 1);
 	}
 	memcpy(state->buf + state->buflen, in, inlen);
 	state->buflen += inlen;
 }
-EXPORT_SYMBOL(blake2s_update);
+EXPORT_SYMBOL(__blake2s_update);
 
 void blake2s_final(struct blake2s_state *state, u8 *out)
+{
+	if (IS_ENABLED(CONFIG_CRYPTO_ARCH_HAVE_LIB_BLAKE2S))
+		__blake2s_final(state, out, blake2s_compress_arch);
+	else
+		__blake2s_final(state, out, blake2s_compress_generic);
+}
+EXPORT_SYMBOL(blake2s_final);
+
+void __blake2s_final(struct blake2s_state *state, u8 *out,
+		     blake2s_compress_t compress)
 {
 	WARN_ON(IS_ENABLED(DEBUG) && !out);
 	blake2s_set_lastblock(state);
 	memset(state->buf + state->buflen, 0,
 	       BLAKE2S_BLOCK_SIZE - state->buflen); /* Padding */
-	if (IS_ENABLED(CONFIG_CRYPTO_ARCH_HAVE_LIB_BLAKE2S))
-		blake2s_compress_arch(state, state->buf, 1, state->buflen);
-	else
-		blake2s_compress_generic(state, state->buf, 1, state->buflen);
+	(*compress)(state, state->buf, 1, state->buflen);
 	cpu_to_le32_array(state->h, ARRAY_SIZE(state->h));
 	memcpy(out, state->h, state->outlen);
 	memzero_explicit(state, sizeof(*state));
 }
-EXPORT_SYMBOL(blake2s_final);
+EXPORT_SYMBOL(__blake2s_final);
 
 void blake2s256_hmac(u8 *out, const u8 *in, const u8 *key, const size_t inlen,
 		     const size_t keylen)
Jason A. Donenfeld Dec. 19, 2020, 12:01 a.m. UTC | #3
Hey Eric,

The solution you've proposed at the end of your email is actually kind
of similar to what we do with curve25519. Check out
include/crypto/curve25519.h. The critical difference between that and
the blake proposal is that it's in the header for curve25519, so the
indirection disappears.

Could we do that with headers for blake?

Jason
Eric Biggers Dec. 22, 2020, 8:55 a.m. UTC | #4
On Sat, Dec 19, 2020 at 01:01:53AM +0100, Jason A. Donenfeld wrote:
> Hey Eric,
> 
> The solution you've proposed at the end of your email is actually kind
> of similar to what we do with curve25519. Check out
> include/crypto/curve25519.h. The critical difference between that and
> the blake proposal is that it's in the header for curve25519, so the
> indirection disappears.
> 
> Could we do that with headers for blake?
> 

That doesn't look too similar, since most of include/crypto/curve25519.h is just
for the library API.  curve25519_generate_secret() is shared, but it's only a
few lines of code and there's no function pointer argument.

Either way, it would be possible to add __blake2s_update() and __blake2s_final()
(taking a blake2s_compress_t argument) to include/crypto/internal/blake2s.h, and
make these used by (and inlined into) both the library and shash functions.

Note, that's mostly separate from the question of whether blake2s_helpers.ko
should exist, since that depends on whether we want the functions in it to get
inlined into every shash implementation or not.  I don't really have a strong
preference.  They did seem long enough to make them out-of-line; however,
indirect calls are bad too.  If we go with inlining, then the shash helper
functions (crypto_blake2s_{setkey,init,update,final}()) would just be inline
functions in include/crypto/internal/blake2s.h too, similar to sha256_base.h,
and they would get compiled into both blake2s_generic.ko and blake2s-${arch}.ko.

- Eric
diff mbox series

Patch

diff --git a/arch/x86/crypto/blake2s-glue.c b/arch/x86/crypto/blake2s-glue.c
index 4dcb2ee89efc9..1306b3272c77f 100644
--- a/arch/x86/crypto/blake2s-glue.c
+++ b/arch/x86/crypto/blake2s-glue.c
@@ -58,75 +58,15 @@  void blake2s_compress_arch(struct blake2s_state *state,
 }
 EXPORT_SYMBOL(blake2s_compress_arch);
 
-static int crypto_blake2s_setkey(struct crypto_shash *tfm, const u8 *key,
-				 unsigned int keylen)
+static int crypto_blake2s_update_x86(struct shash_desc *desc, const u8 *in,
+				     unsigned int inlen)
 {
-	struct blake2s_tfm_ctx *tctx = crypto_shash_ctx(tfm);
-
-	if (keylen == 0 || keylen > BLAKE2S_KEY_SIZE)
-		return -EINVAL;
-
-	memcpy(tctx->key, key, keylen);
-	tctx->keylen = keylen;
-
-	return 0;
-}
-
-static int crypto_blake2s_init(struct shash_desc *desc)
-{
-	struct blake2s_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
-	struct blake2s_state *state = shash_desc_ctx(desc);
-	const int outlen = crypto_shash_digestsize(desc->tfm);
-
-	if (tctx->keylen)
-		blake2s_init_key(state, outlen, tctx->key, tctx->keylen);
-	else
-		blake2s_init(state, outlen);
-
-	return 0;
-}
-
-static int crypto_blake2s_update(struct shash_desc *desc, const u8 *in,
-				 unsigned int inlen)
-{
-	struct blake2s_state *state = shash_desc_ctx(desc);
-	const size_t fill = BLAKE2S_BLOCK_SIZE - state->buflen;
-
-	if (unlikely(!inlen))
-		return 0;
-	if (inlen > fill) {
-		memcpy(state->buf + state->buflen, in, fill);
-		blake2s_compress_arch(state, state->buf, 1, BLAKE2S_BLOCK_SIZE);
-		state->buflen = 0;
-		in += fill;
-		inlen -= fill;
-	}
-	if (inlen > BLAKE2S_BLOCK_SIZE) {
-		const size_t nblocks = DIV_ROUND_UP(inlen, BLAKE2S_BLOCK_SIZE);
-		/* Hash one less (full) block than strictly possible */
-		blake2s_compress_arch(state, in, nblocks - 1, BLAKE2S_BLOCK_SIZE);
-		in += BLAKE2S_BLOCK_SIZE * (nblocks - 1);
-		inlen -= BLAKE2S_BLOCK_SIZE * (nblocks - 1);
-	}
-	memcpy(state->buf + state->buflen, in, inlen);
-	state->buflen += inlen;
-
-	return 0;
+	return crypto_blake2s_update(desc, in, inlen, blake2s_compress_arch);
 }
 
-static int crypto_blake2s_final(struct shash_desc *desc, u8 *out)
+static int crypto_blake2s_final_x86(struct shash_desc *desc, u8 *out)
 {
-	struct blake2s_state *state = shash_desc_ctx(desc);
-
-	blake2s_set_lastblock(state);
-	memset(state->buf + state->buflen, 0,
-	       BLAKE2S_BLOCK_SIZE - state->buflen); /* Padding */
-	blake2s_compress_arch(state, state->buf, 1, state->buflen);
-	cpu_to_le32_array(state->h, ARRAY_SIZE(state->h));
-	memcpy(out, state->h, state->outlen);
-	memzero_explicit(state, sizeof(*state));
-
-	return 0;
+	return crypto_blake2s_final(desc, out, blake2s_compress_arch);
 }
 
 #define BLAKE2S_ALG(name, driver_name, digest_size)			\
@@ -141,8 +81,8 @@  static int crypto_blake2s_final(struct shash_desc *desc, u8 *out)
 		.digestsize		= digest_size,			\
 		.setkey			= crypto_blake2s_setkey,	\
 		.init			= crypto_blake2s_init,		\
-		.update			= crypto_blake2s_update,	\
-		.final			= crypto_blake2s_final,		\
+		.update			= crypto_blake2s_update_x86,	\
+		.final			= crypto_blake2s_final_x86,	\
 		.descsize		= sizeof(struct blake2s_state),	\
 	}
 
diff --git a/crypto/Kconfig b/crypto/Kconfig
index a367fcfeb5d45..e3a51154ac0cf 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -681,6 +681,7 @@  config CRYPTO_BLAKE2B
 config CRYPTO_BLAKE2S
 	tristate "BLAKE2s digest algorithm"
 	select CRYPTO_LIB_BLAKE2S_GENERIC
+	select CRYPTO_BLAKE2S_HELPERS
 	select CRYPTO_HASH
 	help
 	  Implementation of cryptographic hash function BLAKE2s
@@ -696,11 +697,15 @@  config CRYPTO_BLAKE2S
 
 	  See https://blake2.net for further information.
 
+config CRYPTO_BLAKE2S_HELPERS
+	tristate
+
 config CRYPTO_BLAKE2S_X86
 	tristate "BLAKE2s digest algorithm (x86 accelerated version)"
 	depends on X86 && 64BIT
 	select CRYPTO_LIB_BLAKE2S_GENERIC
 	select CRYPTO_ARCH_HAVE_LIB_BLAKE2S
+	select CRYPTO_BLAKE2S_HELPERS if CRYPTO_HASH
 
 config CRYPTO_CRCT10DIF
 	tristate "CRCT10DIF algorithm"
diff --git a/crypto/Makefile b/crypto/Makefile
index b279483fba50b..75f99ab82bfc2 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -82,6 +82,7 @@  CFLAGS_wp512.o := $(call cc-option,-fno-schedule-insns)  # https://gcc.gnu.org/b
 obj-$(CONFIG_CRYPTO_TGR192) += tgr192.o
 obj-$(CONFIG_CRYPTO_BLAKE2B) += blake2b_generic.o
 obj-$(CONFIG_CRYPTO_BLAKE2S) += blake2s_generic.o
+obj-$(CONFIG_CRYPTO_BLAKE2S_HELPERS) += blake2s_helpers.o
 obj-$(CONFIG_CRYPTO_GF128MUL) += gf128mul.o
 obj-$(CONFIG_CRYPTO_ECB) += ecb.o
 obj-$(CONFIG_CRYPTO_CBC) += cbc.o
diff --git a/crypto/blake2s_generic.c b/crypto/blake2s_generic.c
index b89536c3671cf..ea2f1c6eb3231 100644
--- a/crypto/blake2s_generic.c
+++ b/crypto/blake2s_generic.c
@@ -1,84 +1,23 @@ 
 // SPDX-License-Identifier: GPL-2.0 OR MIT
 /*
+ * shash interface to the generic implementation of BLAKE2s
+ *
  * Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
  */
 
 #include <crypto/internal/blake2s.h>
 #include <crypto/internal/hash.h>
-
-#include <linux/types.h>
-#include <linux/kernel.h>
 #include <linux/module.h>
 
-static int crypto_blake2s_setkey(struct crypto_shash *tfm, const u8 *key,
-				 unsigned int keylen)
-{
-	struct blake2s_tfm_ctx *tctx = crypto_shash_ctx(tfm);
-
-	if (keylen == 0 || keylen > BLAKE2S_KEY_SIZE)
-		return -EINVAL;
-
-	memcpy(tctx->key, key, keylen);
-	tctx->keylen = keylen;
-
-	return 0;
-}
-
-static int crypto_blake2s_init(struct shash_desc *desc)
-{
-	struct blake2s_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
-	struct blake2s_state *state = shash_desc_ctx(desc);
-	const int outlen = crypto_shash_digestsize(desc->tfm);
-
-	if (tctx->keylen)
-		blake2s_init_key(state, outlen, tctx->key, tctx->keylen);
-	else
-		blake2s_init(state, outlen);
-
-	return 0;
-}
-
-static int crypto_blake2s_update(struct shash_desc *desc, const u8 *in,
-				 unsigned int inlen)
+static int crypto_blake2s_update_generic(struct shash_desc *desc,
+					 const u8 *in, unsigned int inlen)
 {
-	struct blake2s_state *state = shash_desc_ctx(desc);
-	const size_t fill = BLAKE2S_BLOCK_SIZE - state->buflen;
-
-	if (unlikely(!inlen))
-		return 0;
-	if (inlen > fill) {
-		memcpy(state->buf + state->buflen, in, fill);
-		blake2s_compress_generic(state, state->buf, 1, BLAKE2S_BLOCK_SIZE);
-		state->buflen = 0;
-		in += fill;
-		inlen -= fill;
-	}
-	if (inlen > BLAKE2S_BLOCK_SIZE) {
-		const size_t nblocks = DIV_ROUND_UP(inlen, BLAKE2S_BLOCK_SIZE);
-		/* Hash one less (full) block than strictly possible */
-		blake2s_compress_generic(state, in, nblocks - 1, BLAKE2S_BLOCK_SIZE);
-		in += BLAKE2S_BLOCK_SIZE * (nblocks - 1);
-		inlen -= BLAKE2S_BLOCK_SIZE * (nblocks - 1);
-	}
-	memcpy(state->buf + state->buflen, in, inlen);
-	state->buflen += inlen;
-
-	return 0;
+	return crypto_blake2s_update(desc, in, inlen, blake2s_compress_generic);
 }
 
-static int crypto_blake2s_final(struct shash_desc *desc, u8 *out)
+static int crypto_blake2s_final_generic(struct shash_desc *desc, u8 *out)
 {
-	struct blake2s_state *state = shash_desc_ctx(desc);
-
-	blake2s_set_lastblock(state);
-	memset(state->buf + state->buflen, 0,
-	       BLAKE2S_BLOCK_SIZE - state->buflen); /* Padding */
-	blake2s_compress_generic(state, state->buf, 1, state->buflen);
-	cpu_to_le32_array(state->h, ARRAY_SIZE(state->h));
-	memcpy(out, state->h, state->outlen);
-	memzero_explicit(state, sizeof(*state));
-
-	return 0;
+	return crypto_blake2s_final(desc, out, blake2s_compress_generic);
 }
 
 #define BLAKE2S_ALG(name, driver_name, digest_size)			\
@@ -93,8 +32,8 @@  static int crypto_blake2s_final(struct shash_desc *desc, u8 *out)
 		.digestsize		= digest_size,			\
 		.setkey			= crypto_blake2s_setkey,	\
 		.init			= crypto_blake2s_init,		\
-		.update			= crypto_blake2s_update,	\
-		.final			= crypto_blake2s_final,		\
+		.update			= crypto_blake2s_update_generic, \
+		.final			= crypto_blake2s_final_generic,	\
 		.descsize		= sizeof(struct blake2s_state),	\
 	}
 
diff --git a/crypto/blake2s_helpers.c b/crypto/blake2s_helpers.c
new file mode 100644
index 0000000000000..0c3b9fcbd022c
--- /dev/null
+++ b/crypto/blake2s_helpers.c
@@ -0,0 +1,87 @@ 
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Helper functions for shash implementations of BLAKE2s.  The caller must
+ * provide a pointer to their blake2s_compress_t implementation.
+ *
+ * Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <crypto/internal/blake2s.h>
+#include <crypto/internal/hash.h>
+#include <linux/export.h>
+
+int crypto_blake2s_setkey(struct crypto_shash *tfm, const u8 *key,
+			  unsigned int keylen)
+{
+	struct blake2s_tfm_ctx *tctx = crypto_shash_ctx(tfm);
+
+	if (keylen == 0 || keylen > BLAKE2S_KEY_SIZE)
+		return -EINVAL;
+
+	memcpy(tctx->key, key, keylen);
+	tctx->keylen = keylen;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(crypto_blake2s_setkey);
+
+int crypto_blake2s_init(struct shash_desc *desc)
+{
+	struct blake2s_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
+	struct blake2s_state *state = shash_desc_ctx(desc);
+	const int outlen = crypto_shash_digestsize(desc->tfm);
+
+	if (tctx->keylen)
+		blake2s_init_key(state, outlen, tctx->key, tctx->keylen);
+	else
+		blake2s_init(state, outlen);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(crypto_blake2s_init);
+
+int crypto_blake2s_update(struct shash_desc *desc, const u8 *in,
+			  unsigned int inlen, blake2s_compress_t compress)
+{
+	struct blake2s_state *state = shash_desc_ctx(desc);
+	const size_t fill = BLAKE2S_BLOCK_SIZE - state->buflen;
+
+	if (unlikely(!inlen))
+		return 0;
+	if (inlen > fill) {
+		memcpy(state->buf + state->buflen, in, fill);
+		(*compress)(state, state->buf, 1, BLAKE2S_BLOCK_SIZE);
+		state->buflen = 0;
+		in += fill;
+		inlen -= fill;
+	}
+	if (inlen > BLAKE2S_BLOCK_SIZE) {
+		const size_t nblocks = DIV_ROUND_UP(inlen, BLAKE2S_BLOCK_SIZE);
+		/* Hash one less (full) block than strictly possible */
+		(*compress)(state, in, nblocks - 1, BLAKE2S_BLOCK_SIZE);
+		in += BLAKE2S_BLOCK_SIZE * (nblocks - 1);
+		inlen -= BLAKE2S_BLOCK_SIZE * (nblocks - 1);
+	}
+	memcpy(state->buf + state->buflen, in, inlen);
+	state->buflen += inlen;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(crypto_blake2s_update);
+
+int crypto_blake2s_final(struct shash_desc *desc, u8 *out,
+			 blake2s_compress_t compress)
+{
+	struct blake2s_state *state = shash_desc_ctx(desc);
+
+	blake2s_set_lastblock(state);
+	memset(state->buf + state->buflen, 0,
+	       BLAKE2S_BLOCK_SIZE - state->buflen); /* Padding */
+	(*compress)(state, state->buf, 1, state->buflen);
+	cpu_to_le32_array(state->h, ARRAY_SIZE(state->h));
+	memcpy(out, state->h, state->outlen);
+	memzero_explicit(state, sizeof(*state));
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(crypto_blake2s_final);
diff --git a/include/crypto/internal/blake2s.h b/include/crypto/internal/blake2s.h
index 6e376ae6b6b58..8c79af7662aa2 100644
--- a/include/crypto/internal/blake2s.h
+++ b/include/crypto/internal/blake2s.h
@@ -23,4 +23,21 @@  static inline void blake2s_set_lastblock(struct blake2s_state *state)
 	state->f[0] = -1;
 }
 
+typedef void (*blake2s_compress_t)(struct blake2s_state *state,
+				   const u8 *block, size_t nblocks, u32 inc);
+
+struct crypto_shash;
+struct shash_desc;
+
+int crypto_blake2s_setkey(struct crypto_shash *tfm, const u8 *key,
+			  unsigned int keylen);
+
+int crypto_blake2s_init(struct shash_desc *desc);
+
+int crypto_blake2s_update(struct shash_desc *desc, const u8 *in,
+			  unsigned int inlen, blake2s_compress_t compress);
+
+int crypto_blake2s_final(struct shash_desc *desc, u8 *out,
+			 blake2s_compress_t compress);
+
 #endif /* BLAKE2S_INTERNAL_H */