mbox series

[RFC,v1,0/3] remove remaining users of SHA-1

Message ID 20220112131204.800307-1-Jason@zx2c4.com (mailing list archive)
Headers show
Series remove remaining users of SHA-1 | expand

Message

Jason A. Donenfeld Jan. 12, 2022, 1:12 p.m. UTC
Hi,

There are currently two remaining users of SHA-1 left in the kernel: bpf
tag generation, and ipv6 address calculation. In an effort to reduce
code size and rid ourselves of insecure primitives, this RFC patchset
moves to using the more secure BLAKE2s function. I wanted to get your
feedback on how feasible this patchset is, and if there is some
remaining attachment to SHA-1, why exactly, and what could be done to
mitigate it. Rather than sending a mailing list post just asking, "what
do you think?" I figured it'd be easier to send this as an RFC patchset,
so you see specifically what I mean.

Thoughts? Comments?

Thanks,
Jason

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: linux-crypto@vger.kernel.org

Jason A. Donenfeld (3):
  bpf: move from sha1 to blake2s in tag calculation
  ipv6: move from sha1 to blake2s in address calculation
  crypto: sha1_generic - import lib/sha1.c locally

 crypto/sha1_generic.c | 114 +++++++++++++++++++++++++++++++++++
 include/crypto/sha1.h |  10 ---
 kernel/bpf/core.c     |  39 ++----------
 lib/Makefile          |   2 +-
 lib/sha1.c            | 137 ------------------------------------------
 net/ipv6/addrconf.c   |  31 +++-------
 6 files changed, 128 insertions(+), 205 deletions(-)
 delete mode 100644 lib/sha1.c

Comments

David Sterba Jan. 12, 2022, 6:50 p.m. UTC | #1
On Wed, Jan 12, 2022 at 02:12:01PM +0100, Jason A. Donenfeld wrote:
> Hi,
> 
> There are currently two remaining users of SHA-1 left in the kernel: bpf
> tag generation, and ipv6 address calculation. In an effort to reduce
> code size and rid ourselves of insecure primitives, this RFC patchset
> moves to using the more secure BLAKE2s function.

What's the rationale to use 2s and not 2b? Everywhere I can find the 2s
version is said to be for 8bit up to 32bit machines and it's worse than
2b in benchmarks (reading https://bench.cr.yp.to/results-hash.html).

I'd understand you go with 2s because you also chose it for wireguard
but I'd like know why 2s again even if it's not made for 64bit
architectures that are preferred nowadays.
Jason A. Donenfeld Jan. 12, 2022, 6:57 p.m. UTC | #2
On Wed, Jan 12, 2022 at 7:50 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Wed, Jan 12, 2022 at 02:12:01PM +0100, Jason A. Donenfeld wrote:
> > Hi,
> >
> > There are currently two remaining users of SHA-1 left in the kernel: bpf
> > tag generation, and ipv6 address calculation. In an effort to reduce
> > code size and rid ourselves of insecure primitives, this RFC patchset
> > moves to using the more secure BLAKE2s function.
>
> What's the rationale to use 2s and not 2b? Everywhere I can find the 2s
> version is said to be for 8bit up to 32bit machines and it's worse than
> 2b in benchmarks (reading https://bench.cr.yp.to/results-hash.html).
>
> I'd understand you go with 2s because you also chose it for wireguard
> but I'd like know why 2s again even if it's not made for 64bit
> architectures that are preferred nowadays.

Fast for small inputs on all architectures, small code size. And it
performs well on Intel - there are avx512 and ssse3 implementations.
Even blake3 went with the 32-bit choice and abandoned 2b's thing.
Plus, this makes it even more similar to the well trusted chacha
permutation. As far as a general purpose high security library (keyed)
hash function for internal kernel usages, it seems pretty ideal.

Your choice for btrfs though is fine; don't let this patchset change
your thinking on that.

Anyway, I hope that's interesting to you, but I'm not so much
interested in bikeshedding about blake variants as I am in learning
from the net people on the feasibility of getting rid of sha1 in those
two places. So I'd appreciate it if we can keep the discussion focused
on that and not let this veer off into a tangential thread on blakes.
Sandy Harris Jan. 13, 2022, 3:24 a.m. UTC | #3
Jason A. Donenfeld <Jason@zx2c4.com> wrote:

> There are currently two remaining users of SHA-1 left in the kernel: bpf
> tag generation, and ipv6 address calculation.

I think there are three, since drivers/char/random.c also uses it.
Moreover, there's some inefficiency there (or was last time I
looked) since it produces a 160-bit hash then folds it in half
to give an 80-bit output.

A possible fix would be to use a more modern 512-bit hash.
SHA3 would be the obvious one, but Blake2 would work,
Blake3 might be faster & there are several other possibilities.
Hash context size would then match ChaCha so you could
update the whole CC context at once, maybe even use the
same context for both.

That approach has difficulties, Extracting 512 bits every
time might drain the input pool too quickly & it is overkill
for ChaCha which should be secure with smaller rekeyings.

If you look at IPsec, SSL & other such protocols, many
have now mostly replaced the hash-based HMAC
constructions used in previous generations with things
like Galois field calculations (e.g. AES-GCM) or other
strange math (e,g. poly 1305). These have most of the
desirable properties of hashes & are much faster. As
far as I know, they all give 128-bit outputs.

I think we should replace SHA-1 with GCM. Give
ChaCha 128 bits somewhat more often than current
code gives it 256.
Ard Biesheuvel Jan. 13, 2022, 8:08 a.m. UTC | #4
On Thu, 13 Jan 2022 at 04:24, Sandy Harris <sandyinchina@gmail.com> wrote:
>
> Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> > There are currently two remaining users of SHA-1 left in the kernel: bpf
> > tag generation, and ipv6 address calculation.
>
> I think there are three, since drivers/char/random.c also uses it.
> Moreover, there's some inefficiency there (or was last time I
> looked) since it produces a 160-bit hash then folds it in half
> to give an 80-bit output.
>

That code was removed, hence the two /remaining/ users.

> A possible fix would be to use a more modern 512-bit hash.
> SHA3 would be the obvious one, but Blake2 would work,
> Blake3 might be faster & there are several other possibilities.
> Hash context size would then match ChaCha so you could
> update the whole CC context at once, maybe even use the
> same context for both.
>
> That approach has difficulties, Extracting 512 bits every
> time might drain the input pool too quickly & it is overkill
> for ChaCha which should be secure with smaller rekeyings.
>
> If you look at IPsec, SSL & other such protocols, many
> have now mostly replaced the hash-based HMAC
> constructions used in previous generations with things
> like Galois field calculations (e.g. AES-GCM) or other
> strange math (e,g. poly 1305). These have most of the
> desirable properties of hashes & are much faster. As
> far as I know, they all give 128-bit outputs.
>
> I think we should replace SHA-1 with GCM. Give
> ChaCha 128 bits somewhat more often than current
> code gives it 256.

You are conflating MACs with hashes. A MAC does is not suitable for
backtrack resistance, and GHASH in particular is really only suited to
be used in the context of GCM.
Theodore Ts'o Jan. 13, 2022, 5:28 p.m. UTC | #5
On Thu, Jan 13, 2022 at 11:24:10AM +0800, Sandy Harris wrote:
> Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> 
> > There are currently two remaining users of SHA-1 left in the kernel: bpf
> > tag generation, and ipv6 address calculation.
> 
> I think there are three, since drivers/char/random.c also uses it.

This was changed as of commit 9f9eff85a008 ("random: use BLAKE2s
instead of SHA1 in extraction"), which just landed in Linus's tree.

> Moreover, there's some inefficiency there (or was last time I
> looked) since it produces a 160-bit hash then folds it in half
> to give an 80-bit output.

This dates back to very early days of the /dev/random driver, back
when all that was known about SHA-1 was that it was designed by the
NSA using classified design principles, and it had not yet been as
well studied outside of the halls of the NSA.  So folding the SHA-1
hash in half was done deliberately, since at the time, performance was
*not* the primary goal; security was.

(This was also back in the days when encryption algorithms would run
you into export control difficulties, since this is around the times
when the source code of PGP was being published in an OCR font with a
barcode containing the checksum of the content of every single page
was being published by the MIT press, and we were publishing Kerberos
with all of the *calls* to the crypto stripped out and calling it
"Bones" since there were assertions that code that *called*
cryptographic algoriothms might be subject to export control, even if
it didn't have any crypto algorithms in the program themselves.  This
is also why HMAC-based constructions were so popular.  People seem to
forget how much things have changed since the late 1980's....)

       	   	       	    	    	  - Ted