diff mbox series

[v2,04/13] RISC-V: crypto: add Zvkned accelerated AES implementation

Message ID 20231127070703.1697-5-jerry.shih@sifive.com (mailing list archive)
State Superseded
Headers show
Series RISC-V: provide some accelerated cryptography implementations using vector extensions | expand

Checks

Context Check Description
conchuod/vmtest-fixes-PR fail merge-conflict

Commit Message

Jerry Shih Nov. 27, 2023, 7:06 a.m. UTC
The AES implementation using the Zvkned vector crypto extension from
OpenSSL(openssl/openssl#21923).

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Co-developed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Co-developed-by: Phoebe Chen <phoebe.chen@sifive.com>
Signed-off-by: Phoebe Chen <phoebe.chen@sifive.com>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
---
Changelog v2:
 - Do not turn on kconfig `AES_RISCV64` option by default.
 - Turn to use `crypto_aes_ctx` structure for aes key.
 - Use `Zvkned` extension for AES-128/256 key expanding.
 - Export riscv64_aes_* symbols for other modules.
 - Add `asmlinkage` qualifier for crypto asm function.
 - Reorder structure riscv64_aes_alg_zvkned members initialization in
   the order declared.
---
 arch/riscv/crypto/Kconfig               |  11 +
 arch/riscv/crypto/Makefile              |  11 +
 arch/riscv/crypto/aes-riscv64-glue.c    | 151 ++++++
 arch/riscv/crypto/aes-riscv64-glue.h    |  18 +
 arch/riscv/crypto/aes-riscv64-zvkned.pl | 593 ++++++++++++++++++++++++
 5 files changed, 784 insertions(+)
 create mode 100644 arch/riscv/crypto/aes-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/aes-riscv64-glue.h
 create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl

Comments

Eric Biggers Nov. 28, 2023, 3:56 a.m. UTC | #1
On Mon, Nov 27, 2023 at 03:06:54PM +0800, Jerry Shih wrote:
> +int riscv64_aes_setkey(struct crypto_aes_ctx *ctx, const u8 *key,
> +		       unsigned int keylen)
> +{
> +	int ret;
> +
> +	ret = aes_check_keylen(keylen);
> +	if (ret < 0)
> +		return -EINVAL;
> +
> +	/*
> +	 * The RISC-V AES vector crypto key expanding doesn't support AES-192.
> +	 * Use the generic software key expanding for that case.
> +	 */
> +	if ((keylen == 16 || keylen == 32) && crypto_simd_usable()) {
> +		/*
> +		 * All zvkned-based functions use encryption expanding keys for both
> +		 * encryption and decryption.
> +		 */
> +		kernel_vector_begin();
> +		rv64i_zvkned_set_encrypt_key(key, keylen, ctx);
> +		kernel_vector_end();
> +	} else {
> +		ret = aes_expandkey(ctx, key, keylen);
> +	}

rv64i_zvkned_set_encrypt_key() does not initialize crypto_aes_ctx::key_dec.
So, decryption results will be incorrect if !crypto_simd_usable() later.

> +static int aes_setkey(struct crypto_tfm *tfm, const u8 *key,
> +		      unsigned int keylen)

It's best to avoid generic-sounding function names like this that could collide
with functions in crypto/ or lib/crypto/.  A better name for this function, for
example, would be aes_setkey_zvkned().

> diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
> new file mode 100644
> index 000000000000..303e82d9f6f0
> --- /dev/null
> +++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
[...]
> +L_enc_128:
[...]
> +L_enc_192:
[...]
> +L_enc_256:

There's some severe source code duplication going on in the AES assembly, with
the three AES variants having separate source code.  You can just leave this
as-is since this is what was merged into OpenSSL and we are borrowing that for
now, but I do expect that we'll want to clean this up later.

- Eric
Jerry Shih Nov. 28, 2023, 4:22 a.m. UTC | #2
On Nov 28, 2023, at 11:56, Eric Biggers <ebiggers@kernel.org> wrote:
> On Mon, Nov 27, 2023 at 03:06:54PM +0800, Jerry Shih wrote:
>> +int riscv64_aes_setkey(struct crypto_aes_ctx *ctx, const u8 *key,
>> +		       unsigned int keylen)
>> +{
>> +	int ret;
>> +
>> +	ret = aes_check_keylen(keylen);
>> +	if (ret < 0)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * The RISC-V AES vector crypto key expanding doesn't support AES-192.
>> +	 * Use the generic software key expanding for that case.
>> +	 */
>> +	if ((keylen == 16 || keylen == 32) && crypto_simd_usable()) {
>> +		/*
>> +		 * All zvkned-based functions use encryption expanding keys for both
>> +		 * encryption and decryption.
>> +		 */
>> +		kernel_vector_begin();
>> +		rv64i_zvkned_set_encrypt_key(key, keylen, ctx);
>> +		kernel_vector_end();
>> +	} else {
>> +		ret = aes_expandkey(ctx, key, keylen);
>> +	}
> 
> rv64i_zvkned_set_encrypt_key() does not initialize crypto_aes_ctx::key_dec.
> So, decryption results will be incorrect if !crypto_simd_usable() later.

Will we have the situation that `crypto_simd_usable()` condition is not consistent
during the aes_setkey(), aes_enc/dec()? If yes, all accelerated(or HW specific)
crypto algorithms should do the same implementations as the sw fallback path
since the `crypto_simd_usable()` will change back and forth.

>> +static int aes_setkey(struct crypto_tfm *tfm, const u8 *key,
>> +		      unsigned int keylen)
> 
> It's best to avoid generic-sounding function names like this that could collide
> with functions in crypto/ or lib/crypto/.  A better name for this function, for
> example, would be aes_setkey_zvkned().

Thx, I will fix that.

>> diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
>> new file mode 100644
>> index 000000000000..303e82d9f6f0
>> --- /dev/null
>> +++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
> [...]
>> +L_enc_128:
> [...]
>> +L_enc_192:
> [...]
>> +L_enc_256:
> 
> There's some severe source code duplication going on in the AES assembly, with
> the three AES variants having separate source code.  You can just leave this
> as-is since this is what was merged into OpenSSL and we are borrowing that for
> now, but I do expect that we'll want to clean this up later.

Do we prefer the code with the branches instead of the specified implementation?
We could make AES-128/192/256 together like:

    @{[vaesz_vs $V24, $V1]}
    @{[vaesem_vs $V24, $V2]}
    @{[vaesem_vs $V24, $V3]}
    @{[vaesem_vs $V24, $V4]}
    @{[vaesem_vs $V24, $V5]}
    @{[vaesem_vs $V24, $V6]}
    @{[vaesem_vs $V24, $V7]}
    @{[vaesem_vs $V24, $V8]}
    @{[vaesem_vs $V24, $V9]}
    @{[vaesem_vs $V24, $V10]}
    beq $ROUND, $ROUND_11, 1f
    @{[vaesem_vs $V24, $V11]}
    @{[vaesem_vs $V24, $V12]}
    beq $ROUND, $ROUND_13, 1f
    @{[vaesem_vs $V24, $V13]}
    @{[vaesem_vs $V24, $V14]}
1:
    @{[vaesef_vs $V24, $V15]}

But we will have the additional costs for the branches.

> - Eric
Eric Biggers Nov. 28, 2023, 4:38 a.m. UTC | #3
On Tue, Nov 28, 2023 at 12:22:26PM +0800, Jerry Shih wrote:
> On Nov 28, 2023, at 11:56, Eric Biggers <ebiggers@kernel.org> wrote:
> > On Mon, Nov 27, 2023 at 03:06:54PM +0800, Jerry Shih wrote:
> >> +int riscv64_aes_setkey(struct crypto_aes_ctx *ctx, const u8 *key,
> >> +		       unsigned int keylen)
> >> +{
> >> +	int ret;
> >> +
> >> +	ret = aes_check_keylen(keylen);
> >> +	if (ret < 0)
> >> +		return -EINVAL;
> >> +
> >> +	/*
> >> +	 * The RISC-V AES vector crypto key expanding doesn't support AES-192.
> >> +	 * Use the generic software key expanding for that case.
> >> +	 */
> >> +	if ((keylen == 16 || keylen == 32) && crypto_simd_usable()) {
> >> +		/*
> >> +		 * All zvkned-based functions use encryption expanding keys for both
> >> +		 * encryption and decryption.
> >> +		 */
> >> +		kernel_vector_begin();
> >> +		rv64i_zvkned_set_encrypt_key(key, keylen, ctx);
> >> +		kernel_vector_end();
> >> +	} else {
> >> +		ret = aes_expandkey(ctx, key, keylen);
> >> +	}
> > 
> > rv64i_zvkned_set_encrypt_key() does not initialize crypto_aes_ctx::key_dec.
> > So, decryption results will be incorrect if !crypto_simd_usable() later.
> 
> Will we have the situation that `crypto_simd_usable()` condition is not consistent
> during the aes_setkey(), aes_enc/dec()? If yes, all accelerated(or HW specific)
> crypto algorithms should do the same implementations as the sw fallback path
> since the `crypto_simd_usable()` will change back and forth.

Yes, the calls to one "crypto_cipher" can happen in different contexts.  For
example, crypto_simd_usable() can be true during setkey and false during
decrypt, or vice versa.

If the RISC-V decryption code wants to use the regular key schedule (key_enc)
instead of the "Equivalent Inverse Cipher key schedule" (key_dec), that's
perfectly fine, but setkey still needs to initialize key_dec in case the
fallback to aes_decrypt() gets taken.

> >> diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
> >> new file mode 100644
> >> index 000000000000..303e82d9f6f0
> >> --- /dev/null
> >> +++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
> > [...]
> >> +L_enc_128:
> > [...]
> >> +L_enc_192:
> > [...]
> >> +L_enc_256:
> > 
> > There's some severe source code duplication going on in the AES assembly, with
> > the three AES variants having separate source code.  You can just leave this
> > as-is since this is what was merged into OpenSSL and we are borrowing that for
> > now, but I do expect that we'll want to clean this up later.
> 
> Do we prefer the code with the branches instead of the specified implementation?
> We could make AES-128/192/256 together like:
> 
>     @{[vaesz_vs $V24, $V1]}
>     @{[vaesem_vs $V24, $V2]}
>     @{[vaesem_vs $V24, $V3]}
>     @{[vaesem_vs $V24, $V4]}
>     @{[vaesem_vs $V24, $V5]}
>     @{[vaesem_vs $V24, $V6]}
>     @{[vaesem_vs $V24, $V7]}
>     @{[vaesem_vs $V24, $V8]}
>     @{[vaesem_vs $V24, $V9]}
>     @{[vaesem_vs $V24, $V10]}
>     beq $ROUND, $ROUND_11, 1f
>     @{[vaesem_vs $V24, $V11]}
>     @{[vaesem_vs $V24, $V12]}
>     beq $ROUND, $ROUND_13, 1f
>     @{[vaesem_vs $V24, $V13]}
>     @{[vaesem_vs $V24, $V14]}
> 1:
>     @{[vaesef_vs $V24, $V15]}
> 
> But we will have the additional costs for the branches.
> 

That needs to be decided on a case by case basis depending on the performance
impact and how much binary code is saved.  On some architectures, separate
binary code for AES-{128,192,256} has been found to be worthwhile.  However,
that does *not* mean that they need to have separate source code.  Take a look
at how arch/x86/crypto/aes_ctrby8_avx-x86_64.S generates code for all the AES
variants using macros, for example.

Anyway, I don't think you should bother making too many changes to the "perlasm"
files.  If we decide to make major cleanups I think we should just replace them
with .S files (which already support macros).

- Eric
Conor Dooley Nov. 28, 2023, 5:54 p.m. UTC | #4
> +static inline bool check_aes_ext(void)
> +{
> +	return riscv_isa_extension_available(NULL, ZVKNED) &&
> +	       riscv_vector_vlen() >= 128;
> +}

I'm not keen on this construct, where you are checking vlen greater than
128 and the presence of Zvkned without checking for the presence of V
itself. Can you use "has_vector()" in any places where you depend on the
presence of vector please?

Also, there are potentially a lot of places in this drivers where you
can replace "riscv_isa_extension_available()" with
"riscv_has_extension_likely()". The latter is optimised with
alternatives, so in places that are going to be evaluated frequently it
may be beneficial for you.

Cheers,
Conor.
Eric Biggers Nov. 28, 2023, 8:12 p.m. UTC | #5
On Tue, Nov 28, 2023 at 05:54:49PM +0000, Conor Dooley wrote:
> > +static inline bool check_aes_ext(void)
> > +{
> > +	return riscv_isa_extension_available(NULL, ZVKNED) &&
> > +	       riscv_vector_vlen() >= 128;
> > +}
> 
> I'm not keen on this construct, where you are checking vlen greater than
> 128 and the presence of Zvkned without checking for the presence of V
> itself. Can you use "has_vector()" in any places where you depend on the
> presence of vector please?

Shouldn't both of those things imply vector support already?

> Also, there are potentially a lot of places in this drivers where you
> can replace "riscv_isa_extension_available()" with
> "riscv_has_extension_likely()". The latter is optimised with
> alternatives, so in places that are going to be evaluated frequently it
> may be beneficial for you.

These extension checks are only executed in module_init functions, so they're
not performance critical.

- Eric
Jerry Shih Nov. 29, 2023, 2:39 a.m. UTC | #6
On Nov 29, 2023, at 04:12, Eric Biggers <ebiggers@kernel.org> wrote:
> On Tue, Nov 28, 2023 at 05:54:49PM +0000, Conor Dooley wrote:
>>> +static inline bool check_aes_ext(void)
>>> +{
>>> +	return riscv_isa_extension_available(NULL, ZVKNED) &&
>>> +	       riscv_vector_vlen() >= 128;
>>> +}
>> 
>> I'm not keen on this construct, where you are checking vlen greater than
>> 128 and the presence of Zvkned without checking for the presence of V
>> itself. Can you use "has_vector()" in any places where you depend on the
>> presence of vector please?
> 
> Shouldn't both of those things imply vector support already?

The vector crypto extensions imply `V` extension. Should we still need to check
the `V` explicitly?
https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-spec-vector.adoc#1-extensions-overview

>> Also, there are potentially a lot of places in this drivers where you
>> can replace "riscv_isa_extension_available()" with
>> "riscv_has_extension_likely()". The latter is optimised with
>> alternatives, so in places that are going to be evaluated frequently it
>> may be beneficial for you.
> 
> These extension checks are only executed in module_init functions, so they're
> not performance critical.

All `riscv_isa_extension_available()` calls in crypto drivers are called once
in the module init calls. Should we still need that `riscv_has_extension_likely()`
with a little more code size?

> - Eric
Conor Dooley Nov. 29, 2023, 11:12 a.m. UTC | #7
On Wed, Nov 29, 2023 at 10:39:56AM +0800, Jerry Shih wrote:
> On Nov 29, 2023, at 04:12, Eric Biggers <ebiggers@kernel.org> wrote:
> > On Tue, Nov 28, 2023 at 05:54:49PM +0000, Conor Dooley wrote:
> >>> +static inline bool check_aes_ext(void)
> >>> +{
> >>> +	return riscv_isa_extension_available(NULL, ZVKNED) &&
> >>> +	       riscv_vector_vlen() >= 128;
> >>> +}
> >> 
> >> I'm not keen on this construct, where you are checking vlen greater than
> >> 128 and the presence of Zvkned without checking for the presence of V
> >> itself. Can you use "has_vector()" in any places where you depend on the
> >> presence of vector please?
> > 
> > Shouldn't both of those things imply vector support already?
> 
> The vector crypto extensions imply `V` extension. Should we still need to check
> the `V` explicitly?
> https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-spec-vector.adoc#1-extensions-overview

The check for Zkvned is only for whether or not Zvkned has been provided
in the DT or ACPI tables, it doesn't mean that the kernel supports the V
extension. I could see something like a hypervisor that does not support
vector parsing the "v" out of the DT or ACPI tables but not eliminating
every single extension that may depend on vector support.

The latter check is, IMO, an implementation detail and also should not
be used to imply that vector is supported.

Actually, Andy - questions for you. If the vsize is not homogeneous we do
not support vector for userspace and we disable vector in hwcap, but
riscv_v_size will have been set by riscv_fill_hwcap(). Is the disabling
of vector propagated to other locations in the kernel that inform
userspace, like hwprobe? I only skimmed the in-kernel vector patchset,
but I could not see anything there that ensures homogeneity either.
Should has_vector() calls start to fail if the vsize is not homogeneous?
I feel like they should, but I might very well be missing something here.

> >> Also, there are potentially a lot of places in this drivers where you
> >> can replace "riscv_isa_extension_available()" with
> >> "riscv_has_extension_likely()". The latter is optimised with
> >> alternatives, so in places that are going to be evaluated frequently it
> >> may be beneficial for you.
> > 
> > These extension checks are only executed in module_init functions, so they're
> > not performance critical.

That's fine, they can continue as they are so.

Cheers,
Conor.
Eric Biggers Nov. 29, 2023, 8:26 p.m. UTC | #8
On Wed, Nov 29, 2023 at 11:12:16AM +0000, Conor Dooley wrote:
> On Wed, Nov 29, 2023 at 10:39:56AM +0800, Jerry Shih wrote:
> > On Nov 29, 2023, at 04:12, Eric Biggers <ebiggers@kernel.org> wrote:
> > > On Tue, Nov 28, 2023 at 05:54:49PM +0000, Conor Dooley wrote:
> > >>> +static inline bool check_aes_ext(void)
> > >>> +{
> > >>> +	return riscv_isa_extension_available(NULL, ZVKNED) &&
> > >>> +	       riscv_vector_vlen() >= 128;
> > >>> +}
> > >> 
> > >> I'm not keen on this construct, where you are checking vlen greater than
> > >> 128 and the presence of Zvkned without checking for the presence of V
> > >> itself. Can you use "has_vector()" in any places where you depend on the
> > >> presence of vector please?
> > > 
> > > Shouldn't both of those things imply vector support already?
> > 
> > The vector crypto extensions imply `V` extension. Should we still need to check
> > the `V` explicitly?
> > https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-spec-vector.adoc#1-extensions-overview
> 
> The check for Zkvned is only for whether or not Zvkned has been provided
> in the DT or ACPI tables, it doesn't mean that the kernel supports the V
> extension. I could see something like a hypervisor that does not support
> vector parsing the "v" out of the DT or ACPI tables but not eliminating
> every single extension that may depend on vector support.
> 
> The latter check is, IMO, an implementation detail and also should not
> be used to imply that vector is supported.

First, the RISC-V crypto files are only compiled when CONFIG_RISCV_ISA_V=y.
So in those files, we know that the kernel supports V if the hardware does.

If the hardware can indeed declare extensions like Zvkned without declaring V,
that sounds problematic.  Would /proc/cpuinfo end up with the same misleading
information in that case, in which case userspace would have the same problem
too?  I think that such misconfigurations are best handled centrally by having
the low-level architecture code in the kernel clear all extensions that depend
on missing extensions.  IIRC there have been issues like this on x86, and that
was the fix that was implemented.  See arch/x86/kernel/cpu/cpuid-deps.c

- Eric
diff mbox series

Patch

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 10d60edc0110..65189d4d47b3 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -2,4 +2,15 @@ 
 
 menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
 
+config CRYPTO_AES_RISCV64
+	tristate "Ciphers: AES"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_ALGAPI
+	select CRYPTO_LIB_AES
+	help
+	  Block ciphers: AES cipher algorithms (FIPS-197)
+
+	  Architecture: riscv64 using:
+	  - Zvkned vector crypto extension
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index b3b6332c9f6d..90ca91d8df26 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -2,3 +2,14 @@ 
 #
 # linux/arch/riscv/crypto/Makefile
 #
+
+obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
+aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o
+
+quiet_cmd_perlasm = PERLASM $@
+      cmd_perlasm = $(PERL) $(<) void $(@)
+
+$(obj)/aes-riscv64-zvkned.S: $(src)/aes-riscv64-zvkned.pl
+	$(call cmd,perlasm)
+
+clean-files += aes-riscv64-zvkned.S
diff --git a/arch/riscv/crypto/aes-riscv64-glue.c b/arch/riscv/crypto/aes-riscv64-glue.c
new file mode 100644
index 000000000000..091e368edb30
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-glue.c
@@ -0,0 +1,151 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Port of the OpenSSL AES implementation for RISC-V
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <jerry.shih@sifive.com>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/aes.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+#include <linux/crypto.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+#include "aes-riscv64-glue.h"
+
+/* aes cipher using zvkned vector crypto extension */
+asmlinkage int rv64i_zvkned_set_encrypt_key(const u8 *user_key, const int bytes,
+					    const struct crypto_aes_ctx *key);
+asmlinkage void rv64i_zvkned_encrypt(const u8 *in, u8 *out,
+				     const struct crypto_aes_ctx *key);
+asmlinkage void rv64i_zvkned_decrypt(const u8 *in, u8 *out,
+				     const struct crypto_aes_ctx *key);
+
+int riscv64_aes_setkey(struct crypto_aes_ctx *ctx, const u8 *key,
+		       unsigned int keylen)
+{
+	int ret;
+
+	ret = aes_check_keylen(keylen);
+	if (ret < 0)
+		return -EINVAL;
+
+	/*
+	 * The RISC-V AES vector crypto key expanding doesn't support AES-192.
+	 * Use the generic software key expanding for that case.
+	 */
+	if ((keylen == 16 || keylen == 32) && crypto_simd_usable()) {
+		/*
+		 * All zvkned-based functions use encryption expanding keys for both
+		 * encryption and decryption.
+		 */
+		kernel_vector_begin();
+		rv64i_zvkned_set_encrypt_key(key, keylen, ctx);
+		kernel_vector_end();
+	} else {
+		ret = aes_expandkey(ctx, key, keylen);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(riscv64_aes_setkey);
+
+void riscv64_aes_encrypt_zvkned(const struct crypto_aes_ctx *ctx, u8 *dst,
+				const u8 *src)
+{
+	if (crypto_simd_usable()) {
+		kernel_vector_begin();
+		rv64i_zvkned_encrypt(src, dst, ctx);
+		kernel_vector_end();
+	} else {
+		aes_encrypt(ctx, dst, src);
+	}
+}
+EXPORT_SYMBOL(riscv64_aes_encrypt_zvkned);
+
+void riscv64_aes_decrypt_zvkned(const struct crypto_aes_ctx *ctx, u8 *dst,
+				const u8 *src)
+{
+	if (crypto_simd_usable()) {
+		kernel_vector_begin();
+		rv64i_zvkned_decrypt(src, dst, ctx);
+		kernel_vector_end();
+	} else {
+		aes_decrypt(ctx, dst, src);
+	}
+}
+EXPORT_SYMBOL(riscv64_aes_decrypt_zvkned);
+
+static int aes_setkey(struct crypto_tfm *tfm, const u8 *key,
+		      unsigned int keylen)
+{
+	struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	return riscv64_aes_setkey(ctx, key, keylen);
+}
+
+static void aes_encrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	const struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	riscv64_aes_encrypt_zvkned(ctx, dst, src);
+}
+
+static void aes_decrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	const struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	riscv64_aes_decrypt_zvkned(ctx, dst, src);
+}
+
+static struct crypto_alg riscv64_aes_alg_zvkned = {
+	.cra_flags = CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize = AES_BLOCK_SIZE,
+	.cra_ctxsize = sizeof(struct crypto_aes_ctx),
+	.cra_priority = 300,
+	.cra_name = "aes",
+	.cra_driver_name = "aes-riscv64-zvkned",
+	.cra_cipher = {
+		.cia_min_keysize = AES_MIN_KEY_SIZE,
+		.cia_max_keysize = AES_MAX_KEY_SIZE,
+		.cia_setkey = aes_setkey,
+		.cia_encrypt = aes_encrypt_zvkned,
+		.cia_decrypt = aes_decrypt_zvkned,
+	},
+	.cra_module = THIS_MODULE,
+};
+
+static inline bool check_aes_ext(void)
+{
+	return riscv_isa_extension_available(NULL, ZVKNED) &&
+	       riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_aes_mod_init(void)
+{
+	if (check_aes_ext())
+		return crypto_register_alg(&riscv64_aes_alg_zvkned);
+
+	return -ENODEV;
+}
+
+static void __exit riscv64_aes_mod_fini(void)
+{
+	crypto_unregister_alg(&riscv64_aes_alg_zvkned);
+}
+
+module_init(riscv64_aes_mod_init);
+module_exit(riscv64_aes_mod_fini);
+
+MODULE_DESCRIPTION("AES (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/riscv/crypto/aes-riscv64-glue.h b/arch/riscv/crypto/aes-riscv64-glue.h
new file mode 100644
index 000000000000..0416bbc4318e
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-glue.h
@@ -0,0 +1,18 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef AES_RISCV64_GLUE_H
+#define AES_RISCV64_GLUE_H
+
+#include <crypto/aes.h>
+#include <linux/types.h>
+
+int riscv64_aes_setkey(struct crypto_aes_ctx *ctx, const u8 *key,
+		       unsigned int keylen);
+
+void riscv64_aes_encrypt_zvkned(const struct crypto_aes_ctx *ctx, u8 *dst,
+				const u8 *src);
+
+void riscv64_aes_decrypt_zvkned(const struct crypto_aes_ctx *ctx, u8 *dst,
+				const u8 *src);
+
+#endif /* AES_RISCV64_GLUE_H */
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
new file mode 100644
index 000000000000..303e82d9f6f0
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
@@ -0,0 +1,593 @@ 
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
+# Copyright (c) 2023, Phoebe Chen <phoebe.chen@sifive.com>
+# Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector AES block cipher extension ('Zvkned')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+    $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+    $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+    $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+{
+################################################################################
+# int rv64i_zvkned_set_encrypt_key(const unsigned char *userKey, const int bytes,
+#                                  AES_KEY *key)
+my ($UKEY, $BYTES, $KEYP) = ("a0", "a1", "a2");
+my ($T0) = ("t0");
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_set_encrypt_key
+.type rv64i_zvkned_set_encrypt_key,\@function
+rv64i_zvkned_set_encrypt_key:
+    beqz $UKEY, L_fail_m1
+    beqz $KEYP, L_fail_m1
+
+    # Store the key length.
+    sw $BYTES, 480($KEYP)
+
+    li $T0, 32
+    beq $BYTES, $T0, L_set_key_256
+    li $T0, 16
+    beq $BYTES, $T0, L_set_key_128
+
+    j L_fail_m2
+
+L_set_key_128:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    # Load the key
+    @{[vle32_v $V10, $UKEY]}
+
+    # Generate keys for round 2-11 into registers v11-v20.
+    @{[vaeskf1_vi $V11, $V10, 1]}   # v11 <- rk2  (w[ 4, 7])
+    @{[vaeskf1_vi $V12, $V11, 2]}   # v12 <- rk3  (w[ 8,11])
+    @{[vaeskf1_vi $V13, $V12, 3]}   # v13 <- rk4  (w[12,15])
+    @{[vaeskf1_vi $V14, $V13, 4]}   # v14 <- rk5  (w[16,19])
+    @{[vaeskf1_vi $V15, $V14, 5]}   # v15 <- rk6  (w[20,23])
+    @{[vaeskf1_vi $V16, $V15, 6]}   # v16 <- rk7  (w[24,27])
+    @{[vaeskf1_vi $V17, $V16, 7]}   # v17 <- rk8  (w[28,31])
+    @{[vaeskf1_vi $V18, $V17, 8]}   # v18 <- rk9  (w[32,35])
+    @{[vaeskf1_vi $V19, $V18, 9]}   # v19 <- rk10 (w[36,39])
+    @{[vaeskf1_vi $V20, $V19, 10]}  # v20 <- rk11 (w[40,43])
+
+    # Store the round keys
+    @{[vse32_v $V10, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V11, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V12, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V13, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V14, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V15, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V16, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V17, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V18, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V19, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V20, $KEYP]}
+
+    li a0, 1
+    ret
+
+L_set_key_256:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    # Load the key
+    @{[vle32_v $V10, $UKEY]}
+    addi $UKEY, $UKEY, 16
+    @{[vle32_v $V11, $UKEY]}
+
+    @{[vmv_v_v $V12, $V10]}
+    @{[vaeskf2_vi $V12, $V11, 2]}
+    @{[vmv_v_v $V13, $V11]}
+    @{[vaeskf2_vi $V13, $V12, 3]}
+    @{[vmv_v_v $V14, $V12]}
+    @{[vaeskf2_vi $V14, $V13, 4]}
+    @{[vmv_v_v $V15, $V13]}
+    @{[vaeskf2_vi $V15, $V14, 5]}
+    @{[vmv_v_v $V16, $V14]}
+    @{[vaeskf2_vi $V16, $V15, 6]}
+    @{[vmv_v_v $V17, $V15]}
+    @{[vaeskf2_vi $V17, $V16, 7]}
+    @{[vmv_v_v $V18, $V16]}
+    @{[vaeskf2_vi $V18, $V17, 8]}
+    @{[vmv_v_v $V19, $V17]}
+    @{[vaeskf2_vi $V19, $V18, 9]}
+    @{[vmv_v_v $V20, $V18]}
+    @{[vaeskf2_vi $V20, $V19, 10]}
+    @{[vmv_v_v $V21, $V19]}
+    @{[vaeskf2_vi $V21, $V20, 11]}
+    @{[vmv_v_v $V22, $V20]}
+    @{[vaeskf2_vi $V22, $V21, 12]}
+    @{[vmv_v_v $V23, $V21]}
+    @{[vaeskf2_vi $V23, $V22, 13]}
+    @{[vmv_v_v $V24, $V22]}
+    @{[vaeskf2_vi $V24, $V23, 14]}
+
+    @{[vse32_v $V10, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V11, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V12, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V13, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V14, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V15, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V16, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V17, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V18, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V19, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V20, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V21, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V22, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V23, $KEYP]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $V24, $KEYP]}
+
+    li a0, 1
+    ret
+.size rv64i_zvkned_set_encrypt_key,.-rv64i_zvkned_set_encrypt_key
+___
+}
+
+{
+################################################################################
+# void rv64i_zvkned_encrypt(const unsigned char *in, unsigned char *out,
+#                           const AES_KEY *key);
+my ($INP, $OUTP, $KEYP) = ("a0", "a1", "a2");
+my ($T0) = ("t0");
+my ($KEY_LEN) = ("a3");
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_encrypt
+.type rv64i_zvkned_encrypt,\@function
+rv64i_zvkned_encrypt:
+    # Load key length.
+    lwu $KEY_LEN, 480($KEYP)
+
+    # Get proper routine for key length.
+    li $T0, 32
+    beq $KEY_LEN, $T0, L_enc_256
+    li $T0, 24
+    beq $KEY_LEN, $T0, L_enc_192
+    li $T0, 16
+    beq $KEY_LEN, $T0, L_enc_128
+
+    j L_fail_m2
+.size rv64i_zvkned_encrypt,.-rv64i_zvkned_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_128:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    @{[vle32_v $V1, $INP]}
+
+    @{[vle32_v $V10, $KEYP]}
+    @{[vaesz_vs $V1, $V10]}    # with round key w[ 0, 3]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V11, $KEYP]}
+    @{[vaesem_vs $V1, $V11]}   # with round key w[ 4, 7]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V12, $KEYP]}
+    @{[vaesem_vs $V1, $V12]}   # with round key w[ 8,11]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V13, $KEYP]}
+    @{[vaesem_vs $V1, $V13]}   # with round key w[12,15]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V14, $KEYP]}
+    @{[vaesem_vs $V1, $V14]}   # with round key w[16,19]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V15, $KEYP]}
+    @{[vaesem_vs $V1, $V15]}   # with round key w[20,23]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V16, $KEYP]}
+    @{[vaesem_vs $V1, $V16]}   # with round key w[24,27]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V17, $KEYP]}
+    @{[vaesem_vs $V1, $V17]}   # with round key w[28,31]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V18, $KEYP]}
+    @{[vaesem_vs $V1, $V18]}   # with round key w[32,35]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V19, $KEYP]}
+    @{[vaesem_vs $V1, $V19]}   # with round key w[36,39]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V20, $KEYP]}
+    @{[vaesef_vs $V1, $V20]}   # with round key w[40,43]
+
+    @{[vse32_v $V1, $OUTP]}
+
+    ret
+.size L_enc_128,.-L_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_192:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    @{[vle32_v $V1, $INP]}
+
+    @{[vle32_v $V10, $KEYP]}
+    @{[vaesz_vs $V1, $V10]}     # with round key w[ 0, 3]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V11, $KEYP]}
+    @{[vaesem_vs $V1, $V11]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V12, $KEYP]}
+    @{[vaesem_vs $V1, $V12]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V13, $KEYP]}
+    @{[vaesem_vs $V1, $V13]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V14, $KEYP]}
+    @{[vaesem_vs $V1, $V14]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V15, $KEYP]}
+    @{[vaesem_vs $V1, $V15]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V16, $KEYP]}
+    @{[vaesem_vs $V1, $V16]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V17, $KEYP]}
+    @{[vaesem_vs $V1, $V17]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V18, $KEYP]}
+    @{[vaesem_vs $V1, $V18]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V19, $KEYP]}
+    @{[vaesem_vs $V1, $V19]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V20, $KEYP]}
+    @{[vaesem_vs $V1, $V20]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V21, $KEYP]}
+    @{[vaesem_vs $V1, $V21]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V22, $KEYP]}
+    @{[vaesef_vs $V1, $V22]}
+
+    @{[vse32_v $V1, $OUTP]}
+    ret
+.size L_enc_192,.-L_enc_192
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_256:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    @{[vle32_v $V1, $INP]}
+
+    @{[vle32_v $V10, $KEYP]}
+    @{[vaesz_vs $V1, $V10]}     # with round key w[ 0, 3]
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V11, $KEYP]}
+    @{[vaesem_vs $V1, $V11]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V12, $KEYP]}
+    @{[vaesem_vs $V1, $V12]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V13, $KEYP]}
+    @{[vaesem_vs $V1, $V13]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V14, $KEYP]}
+    @{[vaesem_vs $V1, $V14]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V15, $KEYP]}
+    @{[vaesem_vs $V1, $V15]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V16, $KEYP]}
+    @{[vaesem_vs $V1, $V16]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V17, $KEYP]}
+    @{[vaesem_vs $V1, $V17]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V18, $KEYP]}
+    @{[vaesem_vs $V1, $V18]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V19, $KEYP]}
+    @{[vaesem_vs $V1, $V19]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V20, $KEYP]}
+    @{[vaesem_vs $V1, $V20]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V21, $KEYP]}
+    @{[vaesem_vs $V1, $V21]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V22, $KEYP]}
+    @{[vaesem_vs $V1, $V22]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V23, $KEYP]}
+    @{[vaesem_vs $V1, $V23]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $V24, $KEYP]}
+    @{[vaesef_vs $V1, $V24]}
+
+    @{[vse32_v $V1, $OUTP]}
+    ret
+.size L_enc_256,.-L_enc_256
+___
+
+################################################################################
+# void rv64i_zvkned_decrypt(const unsigned char *in, unsigned char *out,
+#                           const AES_KEY *key);
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_decrypt
+.type rv64i_zvkned_decrypt,\@function
+rv64i_zvkned_decrypt:
+    # Load key length.
+    lwu $KEY_LEN, 480($KEYP)
+
+    # Get proper routine for key length.
+    li $T0, 32
+    beq $KEY_LEN, $T0, L_dec_256
+    li $T0, 24
+    beq $KEY_LEN, $T0, L_dec_192
+    li $T0, 16
+    beq $KEY_LEN, $T0, L_dec_128
+
+    j L_fail_m2
+.size rv64i_zvkned_decrypt,.-rv64i_zvkned_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_128:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    @{[vle32_v $V1, $INP]}
+
+    addi $KEYP, $KEYP, 160
+    @{[vle32_v $V20, $KEYP]}
+    @{[vaesz_vs $V1, $V20]}    # with round key w[40,43]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V19, $KEYP]}
+    @{[vaesdm_vs $V1, $V19]}   # with round key w[36,39]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V18, $KEYP]}
+    @{[vaesdm_vs $V1, $V18]}   # with round key w[32,35]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V17, $KEYP]}
+    @{[vaesdm_vs $V1, $V17]}   # with round key w[28,31]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V16, $KEYP]}
+    @{[vaesdm_vs $V1, $V16]}   # with round key w[24,27]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V15, $KEYP]}
+    @{[vaesdm_vs $V1, $V15]}   # with round key w[20,23]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V14, $KEYP]}
+    @{[vaesdm_vs $V1, $V14]}   # with round key w[16,19]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V13, $KEYP]}
+    @{[vaesdm_vs $V1, $V13]}   # with round key w[12,15]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V12, $KEYP]}
+    @{[vaesdm_vs $V1, $V12]}   # with round key w[ 8,11]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V11, $KEYP]}
+    @{[vaesdm_vs $V1, $V11]}   # with round key w[ 4, 7]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V10, $KEYP]}
+    @{[vaesdf_vs $V1, $V10]}   # with round key w[ 0, 3]
+
+    @{[vse32_v $V1, $OUTP]}
+
+    ret
+.size L_dec_128,.-L_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_192:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    @{[vle32_v $V1, $INP]}
+
+    addi $KEYP, $KEYP, 192
+    @{[vle32_v $V22, $KEYP]}
+    @{[vaesz_vs $V1, $V22]}    # with round key w[48,51]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V21, $KEYP]}
+    @{[vaesdm_vs $V1, $V21]}   # with round key w[44,47]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V20, $KEYP]}
+    @{[vaesdm_vs $V1, $V20]}    # with round key w[40,43]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V19, $KEYP]}
+    @{[vaesdm_vs $V1, $V19]}   # with round key w[36,39]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V18, $KEYP]}
+    @{[vaesdm_vs $V1, $V18]}   # with round key w[32,35]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V17, $KEYP]}
+    @{[vaesdm_vs $V1, $V17]}   # with round key w[28,31]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V16, $KEYP]}
+    @{[vaesdm_vs $V1, $V16]}   # with round key w[24,27]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V15, $KEYP]}
+    @{[vaesdm_vs $V1, $V15]}   # with round key w[20,23]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V14, $KEYP]}
+    @{[vaesdm_vs $V1, $V14]}   # with round key w[16,19]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V13, $KEYP]}
+    @{[vaesdm_vs $V1, $V13]}   # with round key w[12,15]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V12, $KEYP]}
+    @{[vaesdm_vs $V1, $V12]}   # with round key w[ 8,11]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V11, $KEYP]}
+    @{[vaesdm_vs $V1, $V11]}   # with round key w[ 4, 7]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V10, $KEYP]}
+    @{[vaesdf_vs $V1, $V10]}   # with round key w[ 0, 3]
+
+    @{[vse32_v $V1, $OUTP]}
+
+    ret
+.size L_dec_192,.-L_dec_192
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_256:
+    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+    @{[vle32_v $V1, $INP]}
+
+    addi $KEYP, $KEYP, 224
+    @{[vle32_v $V24, $KEYP]}
+    @{[vaesz_vs $V1, $V24]}    # with round key w[56,59]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V23, $KEYP]}
+    @{[vaesdm_vs $V1, $V23]}   # with round key w[52,55]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V22, $KEYP]}
+    @{[vaesdm_vs $V1, $V22]}    # with round key w[48,51]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V21, $KEYP]}
+    @{[vaesdm_vs $V1, $V21]}   # with round key w[44,47]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V20, $KEYP]}
+    @{[vaesdm_vs $V1, $V20]}    # with round key w[40,43]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V19, $KEYP]}
+    @{[vaesdm_vs $V1, $V19]}   # with round key w[36,39]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V18, $KEYP]}
+    @{[vaesdm_vs $V1, $V18]}   # with round key w[32,35]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V17, $KEYP]}
+    @{[vaesdm_vs $V1, $V17]}   # with round key w[28,31]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V16, $KEYP]}
+    @{[vaesdm_vs $V1, $V16]}   # with round key w[24,27]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V15, $KEYP]}
+    @{[vaesdm_vs $V1, $V15]}   # with round key w[20,23]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V14, $KEYP]}
+    @{[vaesdm_vs $V1, $V14]}   # with round key w[16,19]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V13, $KEYP]}
+    @{[vaesdm_vs $V1, $V13]}   # with round key w[12,15]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V12, $KEYP]}
+    @{[vaesdm_vs $V1, $V12]}   # with round key w[ 8,11]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V11, $KEYP]}
+    @{[vaesdm_vs $V1, $V11]}   # with round key w[ 4, 7]
+    addi $KEYP, $KEYP, -16
+    @{[vle32_v $V10, $KEYP]}
+    @{[vaesdf_vs $V1, $V10]}   # with round key w[ 0, 3]
+
+    @{[vse32_v $V1, $OUTP]}
+
+    ret
+.size L_dec_256,.-L_dec_256
+___
+}
+
+$code .= <<___;
+L_fail_m1:
+    li a0, -1
+    ret
+.size L_fail_m1,.-L_fail_m1
+
+L_fail_m2:
+    li a0, -2
+    ret
+.size L_fail_m2,.-L_fail_m2
+
+L_end:
+  ret
+.size L_end,.-L_end
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";