Message ID | E1pZ2fs-000e27-4H@formenos.hmeau.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v5,1/7] crypto: stm32 - Save 54 CSR registers | expand |
On Mon, Mar 6, 2023 at 5:42 AM Herbert Xu <herbert@gondor.apana.org.au> wrote: > The Crypto API hashing paradigm requires the hardware state to > be exported between *each* request because multiple unrelated > hashes may be processed concurrently. > > The stm32 hardware is capable of producing the hardware hashing > state but it was only doing it in the export function. This is > not only broken for export as you can't export a kernel pointer > and reimport it, but it also means that concurrent hashing was > fundamentally broken. > > Fix this by moving the saving and restoring of hardware hash > state between each and every hashing request. > > Also change the emptymsg check in stm32_hash_copy_hash to rely > on whether we have any existing hash state, rather than whether > this particular update request is empty. > > Fixes: 8a1012d3f2ab ("crypto: stm32 - Support for STM32 HASH module") > Reported-by: Li kunyu <kunyu@nfschina.com> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> This partly works (after my folded in fix in patch 5)! Clean SHA1 and SHA256 works flawlessly. HMAC still fails, but not until we start testing random vectors: [ 7.541954] alg: ahash: stm32-hmac-sha256 digest() failed on test vector "random: psize=0 ksize=80"; expected_error=0, actual_error=-110, cfg="random: may_sleep" [ 7.567212] alg: self-tests for hmac(sha256) using stm32-hmac-sha256 failed (rc=-110) [ 7.567222] ------------[ cut here ]------------ [ 7.579669] WARNING: CPU: 0 PID: 89 at crypto/testmgr.c:5858 alg_test.part.0+0x4d0/0x4dc [ 7.587809] alg: self-tests for hmac(sha256) using stm32-hmac-sha256 failed (rc=-110) [ 7.587817] Modules linked in: [ 7.598702] CPU: 0 PID: 89 Comm: cryptomgr_test Not tainted 6.2.0-13572-gcdc48b2701b2 #87 [ 7.606877] Hardware name: ST-Ericsson Ux5x0 platform (Device Tree Support) [ 7.613842] unwind_backtrace from show_stack+0x10/0x14 [ 7.619080] show_stack from dump_stack_lvl+0x40/0x4c [ 7.624145] dump_stack_lvl from __warn+0x94/0xc0 [ 7.628861] __warn from warn_slowpath_fmt+0x118/0x164 [ 7.634007] warn_slowpath_fmt from alg_test.part.0+0x4d0/0x4dc [ 7.639936] alg_test.part.0 from cryptomgr_test+0x18/0x38 [ 7.645430] cryptomgr_test from kthread+0xc0/0xc4 [ 7.650229] kthread from ret_from_fork+0x14/0x2c [ 7.654936] Exception stack(0xf10b5fb0 to 0xf10b5ff8) [ 7.659984] 5fa0: 00000000 00000000 00000000 00000000 [ 7.668154] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 7.676325] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 7.682986] ---[ end trace 0000000000000000 ]--- [ 7.688219] stm32-hash a03c2000.hash: allocated sha256 fallback [ 10.675002] stm32-hash a03c2000.hash: allocated hmac(sha1) fallback [ 11.269604] alg: ahash: stm32-hmac-sha1 finup() failed with err -110 on test vector "random: psize=0 ksize=15", cfg="random: use_finup src_divs=[100.0%@+4081] i" [ 11.285037] alg: self-tests for hmac(sha1) using stm32-hmac-sha1 failed (rc=-110) [ 11.285048] ------------[ cut here ]------------ [ 11.297141] WARNING: CPU: 1 PID: 102 at crypto/testmgr.c:5858 alg_test.part.0+0x4d0/0x4dc [ 11.305352] alg: self-tests for hmac(sha1) using stm32-hmac-sha1 failed (rc=-110) [ 11.305361] Modules linked in: [ 11.315894] CPU: 1 PID: 102 Comm: cryptomgr_test Tainted: G W 6.2.0-13572-gcdc48b2701b2 #87 [ 11.325633] Hardware name: ST-Ericsson Ux5x0 platform (Device Tree Support) [ 11.332594] unwind_backtrace from show_stack+0x10/0x14 [ 11.337832] show_stack from dump_stack_lvl+0x40/0x4c [ 11.342897] dump_stack_lvl from __warn+0x94/0xc0 [ 11.347611] __warn from warn_slowpath_fmt+0x118/0x164 [ 11.352758] warn_slowpath_fmt from alg_test.part.0+0x4d0/0x4dc [ 11.358687] alg_test.part.0 from cryptomgr_test+0x18/0x38 [ 11.364181] cryptomgr_test from kthread+0xc0/0xc4 [ 11.368981] kthread from ret_from_fork+0x14/0x2c [ 11.373687] Exception stack(0xf10f1fb0 to 0xf10f1ff8) [ 11.378734] 1fa0: 00000000 00000000 00000000 00000000 [ 11.386906] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 11.395076] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 11.401724] ---[ end trace 0000000000000000 ]--- [ 11.407162] stm32-hash a03c2000.hash: allocated sha1 fallback I will try to investigate further as time permits. (Right now I have to go off for lunch...) Yours, Linus Walleij
On Mon, Mar 06, 2023 at 11:08:13AM +0100, Linus Walleij wrote: > > This partly works (after my folded in fix in patch 5)! > > Clean SHA1 and SHA256 works flawlessly. > HMAC still fails, but not until we start testing random vectors: > > [ 7.541954] alg: ahash: stm32-hmac-sha256 digest() failed on test > vector "random: psize=0 ksize=80"; expected_error=0, > actual_error=-110, cfg="random: may_sleep" > [ 7.567212] alg: self-tests for hmac(sha256) using > stm32-hmac-sha256 failed (rc=-110) So it's timing out. I wonder if the timeout in stm32_hash_wait_busy is long enough. Perhaps try adding a zero so that the timeout becomes 100,000us and see if it still breaks? Thanks,
On Tue, Mar 7, 2023 at 11:10 AM Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Mon, Mar 06, 2023 at 11:08:13AM +0100, Linus Walleij wrote: > > > > This partly works (after my folded in fix in patch 5)! > > > > Clean SHA1 and SHA256 works flawlessly. > > HMAC still fails, but not until we start testing random vectors: > > > > [ 7.541954] alg: ahash: stm32-hmac-sha256 digest() failed on test > > vector "random: psize=0 ksize=80"; expected_error=0, > > actual_error=-110, cfg="random: may_sleep" > > [ 7.567212] alg: self-tests for hmac(sha256) using > > stm32-hmac-sha256 failed (rc=-110) > > So it's timing out. I wonder if the timeout in stm32_hash_wait_busy > is long enough. Perhaps try adding a zero so that the timeout becomes > 100,000us and see if it still breaks? Sadly this doesn't work. I tried increasing with one and even two orders of magnitude, but the timeouts still happen, usually two of them, sometimes one sometimes three, depending on randomness, as can be expected. I think you mentioned something about that we need to store the key in the state as well though? Yours, Linus Walleij
On Tue, Mar 07, 2023 at 04:31:44PM +0100, Linus Walleij wrote: > > Sadly this doesn't work. Nevermind, I know why it doesn't work. It's specific to ux500 where you're polling the DCAL bit. But the DCAL bit only works for the final hash, it doesn't work for the intermediate state. Let me check how the old ux500 handled this case. Thanks,
On Wed, Mar 08, 2023 at 11:23:24AM +0800, Herbert Xu wrote: > > Nevermind, I know why it doesn't work. It's specific to ux500 > where you're polling the DCAL bit. But the DCAL bit only works > for the final hash, it doesn't work for the intermediate state. > > Let me check how the old ux500 handled this case. Hmm, it seems to use the same bit. I guess the meaning must be different with ux500. Could you check for me which wait_busy() call is actually failing? Is it the one I added right before we save the state, or is it something else? If it's something perhaps we aren't restoring the state in the right way, because the stm32 state restoring code is quite different compared to the ux500 code. Thanks,
On Wed, Mar 08, 2023 at 11:40:02AM +0800, Herbert Xu wrote: > > If it's something perhaps we aren't restoring the state in the > right way, because the stm32 state restoring code is quite different > compared to the ux500 code. Could you also confirm that the old ux500 driver actually passes all the extra tests on your hardware? It literally saves and restores the state every 256 bytes :) Thanks,
On Wed, Mar 8, 2023 at 4:40 AM Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Wed, Mar 08, 2023 at 11:23:24AM +0800, Herbert Xu wrote: > > > > Nevermind, I know why it doesn't work. It's specific to ux500 > > where you're polling the DCAL bit. But the DCAL bit only works > > for the final hash, it doesn't work for the intermediate state. > > > > Let me check how the old ux500 handled this case. > > Hmm, it seems to use the same bit. I guess the meaning must be > different with ux500. Not sure if it's that or if we were just lucky, or the tests were not as extensive back on the old driver :/ > Could you check for me which wait_busy() call is actually failing? > Is it the one I added right before we save the state, or is it > something else? It times out - always - before writing the HMAC key in stm32_hash_xmit_cpu(). So this: stm32_hash_set_nblw(hdev, length); reg = stm32_hash_read(hdev, HASH_STR); reg |= HASH_STR_DCAL; stm32_hash_write(hdev, HASH_STR, reg); if (hdev->flags & HASH_FLAGS_HMAC) { - if (stm32_hash_wait_busy(hdev)) + if (stm32_hash_wait_busy(hdev)) { + dev_err(hdev->dev, + "timeout before writing key in stm32_hash_xmit_cpu()\n"); return -ETIMEDOUT; + } stm32_hash_write_key(hdev); } return -EINPROGRESS; Gives: [ 4.812106] stm32-hash a03c2000.hash: allocated hmac(sha256) fallback [ 5.008829] stm32-hash a03c2000.hash: timeout before writing key in stm32_hash_xmit_cpu() [ 5.017167] alg: ahash: stm32-hmac-sha256 final() failed with err -110 on test vector "random: psize=0 ksize=70", cfg="random: may_sleep use_final src_divs=[<fl" [ 5.041034] alg: self-tests for hmac(sha256) using stm32-hmac-sha256 failed (rc=-110) I put error messages at all other timeouts too but they do not trigger. This is why non-HMAC works fine all of the time. > If it's something perhaps we aren't restoring the state in the > right way, because the stm32 state restoring code is quite different > compared to the ux500 code. It's just the HMAC that is failing so I'm puzzled, because until this point it is essentially a clean SHA sum. I also noticed that it only happens with long keys (more than 64 bytes) that need to set bit 16 (HASH_CR_LKEY) in the control register. > Could you also confirm that the old ux500 driver actually passes > all the extra tests on your hardware? It literally saves and > restores the state every 256 bytes :) I had this old patch set where I tried to clean up the old driver: https://lore.kernel.org/linux-crypto/20220816140049.102306-1-linus.walleij@linaro.org/ I put this old patch set on top of v6.0. First I enabled the old crypto driver without extended tests: all works fine. Then I enabled the extended tests. It does seem like it has the same problem! [ 25.486878] CPU: 1 PID: 91 Comm: cryptomgr_test Not tainted 6.0.0-00016-g23791565aac5 #3 [ 25.494967] Hardware name: ST-Ericsson Ux5x0 platform (Device Tree Support) [ 25.501932] unwind_backtrace from show_stack+0x10/0x14 [ 25.507175] show_stack from dump_stack_lvl+0x40/0x4c [ 25.512236] dump_stack_lvl from nmi_cpu_backtrace+0xd4/0x104 [ 25.517988] nmi_cpu_backtrace from nmi_trigger_cpumask_backtrace+0xdc/0x128 [ 25.525034] nmi_trigger_cpumask_backtrace from trigger_single_cpu_backtrace+0x24/0x2c [ 25.532952] trigger_single_cpu_backtrace from rcu_dump_cpu_stacks+0x10c/0x150 [ 25.540173] rcu_dump_cpu_stacks from print_cpu_stall+0x14c/0x1d0 [ 25.546268] print_cpu_stall from check_cpu_stall+0x1cc/0x26c [ 25.552013] check_cpu_stall from rcu_sched_clock_irq+0x74/0x16c [ 25.558017] rcu_sched_clock_irq from update_process_times+0x68/0x94 [ 25.564377] update_process_times from tick_sched_timer+0x60/0xec [ 25.570470] tick_sched_timer from __hrtimer_run_queues+0x15c/0x218 [ 25.576737] __hrtimer_run_queues from hrtimer_interrupt+0x124/0x2b0 [ 25.583090] hrtimer_interrupt from twd_handler+0x34/0x3c [ 25.588489] twd_handler from handle_percpu_devid_irq+0x78/0x134 [ 25.594498] handle_percpu_devid_irq from generic_handle_domain_irq+0x24/0x34 [ 25.601640] generic_handle_domain_irq from gic_handle_irq+0x74/0x88 [ 25.608000] gic_handle_irq from generic_handle_arch_irq+0x34/0x44 [ 25.614187] generic_handle_arch_irq from call_with_stack+0x18/0x20 [ 25.620462] call_with_stack from __irq_svc+0x98/0xb0 [ 25.625514] Exception stack(0xf10ad970 to 0xf10ad9b8) [ 25.630563] d960: c1d59a00 40000013 c056bbfc 00005f32 [ 25.638734] d980: 00000000 00000008 f10ad9e0 00000020 c1f69bc0 c1f69bc0 c2364140 f10adb34 [ 25.646904] d9a0: 00000000 f10ad9c0 c056da50 c099d1ec 20000013 ffffffff [ 25.653512] __irq_svc from _raw_spin_unlock_irqrestore+0x1c/0x20 [ 25.659605] _raw_spin_unlock_irqrestore from regmap_read+0x50/0x60 [ 25.665882] regmap_read from hash_hw_write_key+0xd8/0x100 [ 25.671377] hash_hw_write_key from init_hash_hw+0xd8/0xfc [ 25.676862] init_hash_hw from hash_hw_final+0x88/0x36c [ 25.682089] hash_hw_final from ahash_final+0x58/0x9c [ 25.687138] ahash_final from do_ahash_op+0x20/0x98 [ 25.692019] do_ahash_op from test_ahash_vec_cfg+0x68c/0x984 [ 25.697677] test_ahash_vec_cfg from test_hash_vec+0x64/0x168 [ 25.703421] test_hash_vec from __alg_test_hash+0x158/0x2d8 [ 25.708991] __alg_test_hash from alg_test_hash+0xc0/0x170 [ 25.714474] alg_test_hash from alg_test.part.0+0x378/0x4b8 [ 25.720044] alg_test.part.0 from cryptomgr_test+0x24/0x44 [ 25.725537] cryptomgr_test from kthread+0xc0/0xc4 [ 25.730334] kthread from ret_from_fork+0x14/0x2c [ 25.735036] Exception stack(0xf10adfb0 to 0xf10adff8) [ 25.740081] dfa0: 00000000 00000000 00000000 00000000 [ 25.748252] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 25.756422] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 I can't see 100% if it is the same issue but it does hang on something related to writing keys. So for Ux500 at least I suppose it would be best to inhibit .import and .export when using HMAC with long keys unless I can figure out exactly what the issue is here. I wonder if that is possible? Or do I have to remove it from the HMAC algos altogether? Yours, Linus Walleij
On Wed, Mar 08, 2023 at 10:05:14AM +0100, Linus Walleij wrote: > > So for Ux500 at least I suppose it would be best to inhibit .import and > .export when using HMAC with long keys unless I can figure out exactly > what the issue is here. I wonder if that is possible? > Or do I have to remove it from the HMAC algos altogether? If the hardware is buggered it's not a big deal if it's just HMAC. Because any HMAC hash can easily be broken down into three underlying hash operations. But let me digest your new information first, and see if we can figure out a way to get it to work. If not then we could just disable hmac (unless we can get confirmation from stm32 hardware we should just disable it for everything in stm32) and I'll fix the generic hmac to actually work with hardware drivers. Thanks,
On Wed, Mar 08, 2023 at 10:05:14AM +0100, Linus Walleij wrote: > > [ 4.812106] stm32-hash a03c2000.hash: allocated hmac(sha256) fallback > [ 5.008829] stm32-hash a03c2000.hash: timeout before writing key in > stm32_hash_xmit_cpu() > [ 5.017167] alg: ahash: stm32-hmac-sha256 final() failed with err > -110 on test vector "random: psize=0 ksize=70", cfg="random: may_sleep > use_final src_divs=[<fl" Wait a second, this is an empty message. Can you reproduce the hang if you exclude all psize=0 test vectors? If it's just empty messages, which we know are broken with ux500 to begin with, then we can simply not do the hash at all (doing it and then throwing it away seems pointless). Thanks,
On Wed, Mar 08, 2023 at 06:10:14PM +0800, Herbert Xu wrote: > > If it's just empty messages, which we know are broken with ux500 > to begin with, then we can simply not do the hash at all (doing > it and then throwing it away seems pointless). Here is a patch to not process empty messages at all. Cheers,
On Wed, Mar 8, 2023 at 11:19 AM Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Wed, Mar 08, 2023 at 06:10:14PM +0800, Herbert Xu wrote: > > > > If it's just empty messages, which we know are broken with ux500 > > to begin with, then we can simply not do the hash at all (doing > > it and then throwing it away seems pointless). > > Here is a patch to not process empty messages at all. Hey it works :D I had to tweak it slightly because we need to set the state right: @@ -374,9 +387,20 @@ static int stm32_hash_xmit_cpu(struct stm32_hash_dev *hdev, const u32 *buffer = (const u32 *)buf; u32 reg; - if (final) + if (final) { hdev->flags |= HASH_FLAGS_FINAL; + /* Do not process empty messages if hw is buggy. */ + if (!(hdev->flags & HASH_FLAGS_INIT) && !length && + hdev->pdata->broken_emptymsg) { + struct stm32_hash_request_ctx *rctx = ahash_request_ctx(hdev->req); + struct stm32_hash_state *state = &rctx->state; + + state->flags |= HASH_FLAGS_EMPTY; + return 0; + } + } + len32 = DIV_ROUND_UP(length, sizeof(u32)); dev_dbg(hdev->dev, "%s: length: %zd, final: %x len32 %i\n", After this it WORKS! For the "v5.5" patch and this (and my other patch) folded in: Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Linus Walleij <linus.walleij@linaro.org> So now the driver is fixed from a Ux500 point of view. I actually have the STM32MP board, I can try to find some time to test the final patch set on it as well if it has the same has as the other STM32 SoCs. Yours, Linus Walleij
On Wed, Mar 08, 2023 at 10:19:48PM +0100, Linus Walleij wrote: > > So now the driver is fixed from a Ux500 point of view. I think there is actually a nasty bug in it that may be hard to trigger. The stm32 driver as it stands will write up to 256 bytes into the FIFO which on the ux500 is limited to 64 bytes. We need to change the fixed 256-byte size to be dependent on the hardware type. Cheers,
On Thu, Mar 9, 2023 at 6:58 AM Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Wed, Mar 08, 2023 at 10:19:48PM +0100, Linus Walleij wrote: > > > > So now the driver is fixed from a Ux500 point of view. > > I think there is actually a nasty bug in it that may be hard to > trigger. > > The stm32 driver as it stands will write up to 256 bytes into > the FIFO which on the ux500 is limited to 64 bytes. We need to > change the fixed 256-byte size to be dependent on the hardware > type. Right so that is done implicitly by using a buffer of 256 bytes. But actually I think the bug will never trigger, because the datasheet for the DB8500 (Ux500) says this: "Then the message can be sent, by writing it word per word into the HASH_DIN register. When a block of 512 bits, i.e. 16 words have been written, a partial digest computation will start upon writing the first data of the next block. The AHB bus will be busy for 82 cycles for SHA-1 algorithm (66 cycles for SHA-256 algorithm)." The way I interpret it is that if you write 64 bytes (16 32bit words) the AHB bus will simply stall until the data is processed, so the writel() hangs there and then 66/82 bus cycles later it will continue. This isn't the prettiest from a system PoV, as it can stall interrupt handling and cause latency jitter, but it's not actually a bug. It's kind of similar to that user experience "bug" on x86 PCs where the sound starts breaking up if you have too intense graphics going on, because the bus is too busy so the sound FIFO goes empty. But I can certainly make a patch to shrink the buffer from 256 to 64 bytes on Ux500 if it's the right thing to do. Yours, Linus Walleij
On Thu, Mar 09, 2023 at 08:35:21AM +0100, Linus Walleij wrote: > > The way I interpret it is that if you write 64 bytes (16 32bit words) > the AHB bus will simply > stall until the data is processed, so the writel() hangs there and > then 66/82 bus cycles > later it will continue. You're right. I'll respin the patches with your updates in it. Thanks,
From: Linus Walleij > Sent: 09 March 2023 07:35 ... > But actually I think the bug will never trigger, because the datasheet > for the DB8500 (Ux500) says this: > > "Then the message can be sent, by writing it word per word into the > HASH_DIN register. > When a block of 512 bits, i.e. 16 words have been written, a partial > digest computation will > start upon writing the first data of the next block. The AHB bus will > be busy for 82 cycles for > SHA-1 algorithm (66 cycles for SHA-256 algorithm)." What speed clock is that? 4 or 5 extra clocks/word may (or may not) be significant. In terms of latency it may be noise compared to some PCIe reads done by hardware interrupt handlers. Some slow PCIe targets (like the fpga one we use) pretty much take 1us to handle a read cycle. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu, Mar 9, 2023 at 11:19 PM David Laight <David.Laight@aculab.com> wrote: > > But actually I think the bug will never trigger, because the datasheet > > for the DB8500 (Ux500) says this: > > > > "Then the message can be sent, by writing it word per word into the > > HASH_DIN register. > > When a block of 512 bits, i.e. 16 words have been written, a partial > > digest computation will > > start upon writing the first data of the next block. The AHB bus will > > be busy for 82 cycles for > > SHA-1 algorithm (66 cycles for SHA-256 algorithm)." > > What speed clock is that? 133 MHz. > 4 or 5 extra clocks/word may (or may not) be significant. > > In terms of latency it may be noise compared to some PCIe > reads done by hardware interrupt handlers. > Some slow PCIe targets (like the fpga one we use) pretty > much take 1us to handle a read cycle. So in this case it's 1/133M s = 8ns cycle time, 82 in worst case, so 82*8 = 656 ns < 1 us. Yours, Linus Walleij
diff --git a/drivers/crypto/stm32/stm32-hash.c b/drivers/crypto/stm32/stm32-hash.c index 95cd8876689b..e87a909dd97e 100644 --- a/drivers/crypto/stm32/stm32-hash.c +++ b/drivers/crypto/stm32/stm32-hash.c @@ -95,6 +95,7 @@ #define HASH_FLAGS_SHA1 BIT(19) #define HASH_FLAGS_SHA224 BIT(20) #define HASH_FLAGS_SHA256 BIT(21) +#define HASH_FLAGS_EMPTY BIT(22) #define HASH_FLAGS_HMAC BIT(23) #define HASH_OP_UPDATE 1 @@ -134,7 +135,7 @@ struct stm32_hash_state { u8 buffer[HASH_BUFLEN] __aligned(4); /* hash state */ - u32 *hw_context; + u32 hw_context[3 + HASH_CSR_REGISTER_NUMBER]; }; struct stm32_hash_request_ctx { @@ -314,8 +315,11 @@ static void stm32_hash_write_ctrl(struct stm32_hash_dev *hdev, int bufcnt) * On the Ux500 we need to set a special flag to indicate that * the message is zero length. */ - if (hdev->pdata->ux500 && bufcnt == 0) + if (hdev->pdata->ux500 && bufcnt == 0 && + (state->flags & HASH_FLAGS_FINAL)) { reg |= HASH_CR_UX500_EMPTYMSG; + state->flags |= HASH_FLAGS_EMPTY; + } if (!hdev->polled) stm32_hash_write(hdev, HASH_IMR, HASH_DCIE); @@ -419,7 +423,9 @@ static int stm32_hash_update_cpu(struct stm32_hash_dev *hdev) { struct stm32_hash_request_ctx *rctx = ahash_request_ctx(hdev->req); struct stm32_hash_state *state = &rctx->state; + u32 *preg = state->hw_context; int bufcnt, err = 0, final; + int i; dev_dbg(hdev->dev, "%s flags %x\n", __func__, state->flags); @@ -440,9 +446,24 @@ static int stm32_hash_update_cpu(struct stm32_hash_dev *hdev) if (final) { bufcnt = state->bufcnt; state->bufcnt = 0; - err = stm32_hash_xmit_cpu(hdev, state->buffer, bufcnt, 1); + return stm32_hash_xmit_cpu(hdev, state->buffer, bufcnt, 1); } + if (!(hdev->flags & HASH_FLAGS_INIT)) + return 0; + + if (stm32_hash_wait_busy(hdev)) + return -ETIMEDOUT; + + if (!hdev->pdata->ux500) + *preg++ = stm32_hash_read(hdev, HASH_IMR); + *preg++ = stm32_hash_read(hdev, HASH_STR); + *preg++ = stm32_hash_read(hdev, HASH_CR); + for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++) + *preg++ = stm32_hash_read(hdev, HASH_CSR(i)); + + state->flags |= HASH_FLAGS_INIT; + return err; } @@ -824,7 +845,7 @@ static void stm32_hash_copy_hash(struct ahash_request *req) __be32 *hash = (void *)rctx->digest; unsigned int i, hashsize; - if (hdev->pdata->broken_emptymsg && !req->nbytes) + if (hdev->pdata->broken_emptymsg && (state->flags & HASH_FLAGS_EMPTY)) return stm32_hash_emptymsg_fallback(req); switch (state->flags & HASH_FLAGS_ALGO_MASK) { @@ -874,11 +895,6 @@ static void stm32_hash_finish_req(struct ahash_request *req, int err) if (!err && (HASH_FLAGS_FINAL & hdev->flags)) { stm32_hash_copy_hash(req); err = stm32_hash_finish(req); - hdev->flags &= ~(HASH_FLAGS_FINAL | HASH_FLAGS_CPU | - HASH_FLAGS_INIT | HASH_FLAGS_DMA_READY | - HASH_FLAGS_OUTPUT_READY | HASH_FLAGS_HMAC | - HASH_FLAGS_HMAC_INIT | HASH_FLAGS_HMAC_FINAL | - HASH_FLAGS_HMAC_KEY); } pm_runtime_mark_last_busy(hdev->dev); @@ -887,66 +903,54 @@ static void stm32_hash_finish_req(struct ahash_request *req, int err) crypto_finalize_hash_request(hdev->engine, req, err); } -static int stm32_hash_hw_init(struct stm32_hash_dev *hdev, - struct stm32_hash_request_ctx *rctx) -{ - pm_runtime_get_sync(hdev->dev); - - if (!(HASH_FLAGS_INIT & hdev->flags)) { - stm32_hash_write(hdev, HASH_CR, HASH_CR_INIT); - stm32_hash_write(hdev, HASH_STR, 0); - stm32_hash_write(hdev, HASH_DIN, 0); - stm32_hash_write(hdev, HASH_IMR, 0); - } - - return 0; -} - -static int stm32_hash_one_request(struct crypto_engine *engine, void *areq); -static int stm32_hash_prepare_req(struct crypto_engine *engine, void *areq); - static int stm32_hash_handle_queue(struct stm32_hash_dev *hdev, struct ahash_request *req) { return crypto_transfer_hash_request_to_engine(hdev->engine, req); } -static int stm32_hash_prepare_req(struct crypto_engine *engine, void *areq) +static int stm32_hash_one_request(struct crypto_engine *engine, void *areq) { struct ahash_request *req = container_of(areq, struct ahash_request, base); struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req)); + struct stm32_hash_request_ctx *rctx = ahash_request_ctx(req); struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx); - struct stm32_hash_request_ctx *rctx; + struct stm32_hash_state *state = &rctx->state; + int err = 0; if (!hdev) return -ENODEV; - hdev->req = req; - - rctx = ahash_request_ctx(req); - dev_dbg(hdev->dev, "processing new req, op: %lu, nbytes %d\n", rctx->op, req->nbytes); - return stm32_hash_hw_init(hdev, rctx); -} + pm_runtime_get_sync(hdev->dev); -static int stm32_hash_one_request(struct crypto_engine *engine, void *areq) -{ - struct ahash_request *req = container_of(areq, struct ahash_request, - base); - struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req)); - struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx); - struct stm32_hash_request_ctx *rctx; - int err = 0; + hdev->req = req; + hdev->flags = 0; + + if (state->flags & HASH_FLAGS_INIT) { + u32 *preg = rctx->state.hw_context; + u32 reg; + int i; + + if (!hdev->pdata->ux500) + stm32_hash_write(hdev, HASH_IMR, *preg++); + stm32_hash_write(hdev, HASH_STR, *preg++); + stm32_hash_write(hdev, HASH_CR, *preg); + reg = *preg++ | HASH_CR_INIT; + stm32_hash_write(hdev, HASH_CR, reg); - if (!hdev) - return -ENODEV; + for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++) + stm32_hash_write(hdev, HASH_CSR(i), *preg++); - hdev->req = req; + hdev->flags |= HASH_FLAGS_INIT; - rctx = ahash_request_ctx(req); + if (state->flags & HASH_FLAGS_HMAC) + hdev->flags |= HASH_FLAGS_HMAC | + HASH_FLAGS_HMAC_KEY; + } if (rctx->op == HASH_OP_UPDATE) err = stm32_hash_update_req(hdev); @@ -1041,34 +1045,8 @@ static int stm32_hash_digest(struct ahash_request *req) static int stm32_hash_export(struct ahash_request *req, void *out) { struct stm32_hash_request_ctx *rctx = ahash_request_ctx(req); - struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req)); - struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx); - struct stm32_hash_state *state = &rctx->state; - u32 *preg; - unsigned int i; - int ret; - - pm_runtime_get_sync(hdev->dev); - - ret = stm32_hash_wait_busy(hdev); - if (ret) - return ret; - - state->hw_context = kmalloc_array(3 + HASH_CSR_REGISTER_NUMBER, - sizeof(u32), GFP_KERNEL); - preg = state->hw_context; - if (!hdev->pdata->ux500) - *preg++ = stm32_hash_read(hdev, HASH_IMR); - *preg++ = stm32_hash_read(hdev, HASH_STR); - *preg++ = stm32_hash_read(hdev, HASH_CR); - for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++) - *preg++ = stm32_hash_read(hdev, HASH_CSR(i)); - - pm_runtime_mark_last_busy(hdev->dev); - pm_runtime_put_autosuspend(hdev->dev); - - memcpy(out, rctx, sizeof(*rctx)); + memcpy(out, &rctx->state, sizeof(rctx->state)); return 0; } @@ -1076,33 +1054,9 @@ static int stm32_hash_export(struct ahash_request *req, void *out) static int stm32_hash_import(struct ahash_request *req, const void *in) { struct stm32_hash_request_ctx *rctx = ahash_request_ctx(req); - struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req)); - struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx); - struct stm32_hash_state *state = &rctx->state; - const u32 *preg = in; - u32 reg; - unsigned int i; - - memcpy(rctx, in, sizeof(*rctx)); - - preg = state->hw_context; - - pm_runtime_get_sync(hdev->dev); - - if (!hdev->pdata->ux500) - stm32_hash_write(hdev, HASH_IMR, *preg++); - stm32_hash_write(hdev, HASH_STR, *preg++); - stm32_hash_write(hdev, HASH_CR, *preg); - reg = *preg++ | HASH_CR_INIT; - stm32_hash_write(hdev, HASH_CR, reg); - - for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++) - stm32_hash_write(hdev, HASH_CSR(i), *preg++); - - pm_runtime_mark_last_busy(hdev->dev); - pm_runtime_put_autosuspend(hdev->dev); - kfree(state->hw_context); + stm32_hash_init(req); + memcpy(&rctx->state, in, sizeof(rctx->state)); return 0; } @@ -1159,8 +1113,6 @@ static int stm32_hash_cra_init_algs(struct crypto_tfm *tfm, ctx->flags |= HASH_FLAGS_HMAC; ctx->enginectx.op.do_one_request = stm32_hash_one_request; - ctx->enginectx.op.prepare_request = stm32_hash_prepare_req; - ctx->enginectx.op.unprepare_request = NULL; return stm32_hash_init_fallback(tfm); } @@ -1252,7 +1204,7 @@ static struct ahash_alg algs_md5[] = { .import = stm32_hash_import, .halg = { .digestsize = MD5_DIGEST_SIZE, - .statesize = sizeof(struct stm32_hash_request_ctx), + .statesize = sizeof(struct stm32_hash_state), .base = { .cra_name = "md5", .cra_driver_name = "stm32-md5", @@ -1279,7 +1231,7 @@ static struct ahash_alg algs_md5[] = { .setkey = stm32_hash_setkey, .halg = { .digestsize = MD5_DIGEST_SIZE, - .statesize = sizeof(struct stm32_hash_request_ctx), + .statesize = sizeof(struct stm32_hash_state), .base = { .cra_name = "hmac(md5)", .cra_driver_name = "stm32-hmac-md5", @@ -1308,7 +1260,7 @@ static struct ahash_alg algs_sha1[] = { .import = stm32_hash_import, .halg = { .digestsize = SHA1_DIGEST_SIZE, - .statesize = sizeof(struct stm32_hash_request_ctx), + .statesize = sizeof(struct stm32_hash_state), .base = { .cra_name = "sha1", .cra_driver_name = "stm32-sha1", @@ -1335,7 +1287,7 @@ static struct ahash_alg algs_sha1[] = { .setkey = stm32_hash_setkey, .halg = { .digestsize = SHA1_DIGEST_SIZE, - .statesize = sizeof(struct stm32_hash_request_ctx), + .statesize = sizeof(struct stm32_hash_state), .base = { .cra_name = "hmac(sha1)", .cra_driver_name = "stm32-hmac-sha1", @@ -1364,7 +1316,7 @@ static struct ahash_alg algs_sha224[] = { .import = stm32_hash_import, .halg = { .digestsize = SHA224_DIGEST_SIZE, - .statesize = sizeof(struct stm32_hash_request_ctx), + .statesize = sizeof(struct stm32_hash_state), .base = { .cra_name = "sha224", .cra_driver_name = "stm32-sha224", @@ -1391,7 +1343,7 @@ static struct ahash_alg algs_sha224[] = { .import = stm32_hash_import, .halg = { .digestsize = SHA224_DIGEST_SIZE, - .statesize = sizeof(struct stm32_hash_request_ctx), + .statesize = sizeof(struct stm32_hash_state), .base = { .cra_name = "hmac(sha224)", .cra_driver_name = "stm32-hmac-sha224", @@ -1420,7 +1372,7 @@ static struct ahash_alg algs_sha256[] = { .import = stm32_hash_import, .halg = { .digestsize = SHA256_DIGEST_SIZE, - .statesize = sizeof(struct stm32_hash_request_ctx), + .statesize = sizeof(struct stm32_hash_state), .base = { .cra_name = "sha256", .cra_driver_name = "stm32-sha256", @@ -1447,7 +1399,7 @@ static struct ahash_alg algs_sha256[] = { .setkey = stm32_hash_setkey, .halg = { .digestsize = SHA256_DIGEST_SIZE, - .statesize = sizeof(struct stm32_hash_request_ctx), + .statesize = sizeof(struct stm32_hash_state), .base = { .cra_name = "hmac(sha256)", .cra_driver_name = "stm32-hmac-sha256",
The Crypto API hashing paradigm requires the hardware state to be exported between *each* request because multiple unrelated hashes may be processed concurrently. The stm32 hardware is capable of producing the hardware hashing state but it was only doing it in the export function. This is not only broken for export as you can't export a kernel pointer and reimport it, but it also means that concurrent hashing was fundamentally broken. Fix this by moving the saving and restoring of hardware hash state between each and every hashing request. Also change the emptymsg check in stm32_hash_copy_hash to rely on whether we have any existing hash state, rather than whether this particular update request is empty. Fixes: 8a1012d3f2ab ("crypto: stm32 - Support for STM32 HASH module") Reported-by: Li kunyu <kunyu@nfschina.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> --- drivers/crypto/stm32/stm32-hash.c | 172 +++++++++++++------------------------- 1 file changed, 62 insertions(+), 110 deletions(-)