Message ID | 160518586534.2277919.14475638653680231924.stgit@warthog.procyon.org.uk (mailing list archive) |
---|---|
Headers | show |
Series | crypto: Add generic Kerberos library | expand |
Would it be possible/practical to make the skcipher encrypt functions take an offset into the scatterlist rather than always starting at the beginning? David
> On Nov 12, 2020, at 7:57 AM, David Howells <dhowells@redhat.com> wrote: > > > Hi Herbert, Bruce, > > Here's my first cut at a generic Kerberos crypto library in the kernel so > that I can share code between rxrpc and sunrpc (and cifs?). > > I derived some of the parts from the sunrpc gss library and added more > advanced AES and Camellia crypto. I haven't ported across the DES-based > crypto yet - I figure that can wait a bit till the interface is sorted. > > Whilst I have put it into a directory under crypto/, I haven't made an > interface that goes and loads it (analogous to crypto_alloc_skcipher, > say). Instead, you call: > > const struct krb5_enctype *crypto_krb5_find_enctype(u32 enctype); > > to go and get a handler table and then use a bunch of accessor functions to > jump through the hoops. This is basically the way the sunrpc gsslib does > things. It might be worth making it so you do something like: > > struct crypto_mech *ctx = crypto_mech_alloc("krb5(18)"); > > to get enctype 18, but I'm not sure if it's worth the effort. Also, I'm > not sure if there are any alternatives to kerberos we will need to support. > > There are three main interfaces to it: > > (*) I/O crypto: encrypt, decrypt, get_mic and verify_mic. > > These all do in-place crypto, using an sglist to define the buffer > with the data in it. Is it necessary to make it able to take separate > input and output buffers? Hi David, Wondering if these "I/O" APIs use synchronous or async crypto under the covers. For small data items like MICs, synchronous might be a better choice, especially if asynchronous crypto could result in incoming requests getting re-ordered and falling out of the GSS sequence number window. What say ye? > (*) PRF+ calculation for key derivation. > (*) Kc, Ke, Ki derivation. > > These use krb5_buffer structs to pass objects around. This is akin to > the xdr_netobj, but has a void* instead of a u8* data pointer. > > In terms of rxrpc's rxgk, there's another step in key derivation that isn't > part of the kerberos standard, but uses the PRF+ function to generate a key > that is then used to generate Kc, Ke and Ki. Is it worth putting this into > the directory or maybe having a callout to insert an intermediate step in > key derivation? > > Note that, for purposes of illustration, I've included some rxrpc patches > that use this interface to implement the rxgk Rx security class. The > branch also is based on some rxrpc patches that are a prerequisite for > this, but the crypto patches don't need it. > > --- > The patches can be found here also: > > http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=crypto-krb5 > > David > --- > David Howells (18): > crypto/krb5: Implement Kerberos crypto core > crypto/krb5: Add some constants out of sunrpc headers > crypto/krb5: Provide infrastructure and key derivation > crypto/krb5: Implement the Kerberos5 rfc3961 key derivation > crypto/krb5: Implement the Kerberos5 rfc3961 encrypt and decrypt functions > crypto/krb5: Implement the Kerberos5 rfc3961 get_mic and verify_mic > crypto/krb5: Implement the AES enctypes from rfc3962 > crypto/krb5: Implement crypto self-testing > crypto/krb5: Implement the AES enctypes from rfc8009 > crypto/krb5: Implement the AES encrypt/decrypt from rfc8009 > crypto/krb5: Add the AES self-testing data from rfc8009 > crypto/krb5: Implement the Camellia enctypes from rfc6803 > rxrpc: Add the security index for yfs-rxgk > rxrpc: Add YFS RxGK (GSSAPI) security class > rxrpc: rxgk: Provide infrastructure and key derivation > rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI) > rxrpc: rxgk: Implement connection rekeying > rxgk: Support OpenAFS's rxgk implementation > > > crypto/krb5/Kconfig | 9 + > crypto/krb5/Makefile | 11 +- > crypto/krb5/internal.h | 101 +++ > crypto/krb5/kdf.c | 223 ++++++ > crypto/krb5/main.c | 190 +++++ > crypto/krb5/rfc3961_simplified.c | 732 ++++++++++++++++++ > crypto/krb5/rfc3962_aes.c | 140 ++++ > crypto/krb5/rfc6803_camellia.c | 249 ++++++ > crypto/krb5/rfc8009_aes2.c | 440 +++++++++++ > crypto/krb5/selftest.c | 543 +++++++++++++ > crypto/krb5/selftest_data.c | 289 +++++++ > fs/afs/misc.c | 13 + > include/crypto/krb5.h | 100 +++ > include/keys/rxrpc-type.h | 17 + > include/trace/events/rxrpc.h | 4 + > include/uapi/linux/rxrpc.h | 17 + > net/rxrpc/Kconfig | 10 + > net/rxrpc/Makefile | 5 + > net/rxrpc/ar-internal.h | 20 + > net/rxrpc/conn_object.c | 2 + > net/rxrpc/key.c | 319 ++++++++ > net/rxrpc/rxgk.c | 1232 ++++++++++++++++++++++++++++++ > net/rxrpc/rxgk_app.c | 424 ++++++++++ > net/rxrpc/rxgk_common.h | 164 ++++ > net/rxrpc/rxgk_kdf.c | 271 +++++++ > net/rxrpc/security.c | 6 + > 26 files changed, 5530 insertions(+), 1 deletion(-) > create mode 100644 crypto/krb5/kdf.c > create mode 100644 crypto/krb5/rfc3961_simplified.c > create mode 100644 crypto/krb5/rfc3962_aes.c > create mode 100644 crypto/krb5/rfc6803_camellia.c > create mode 100644 crypto/krb5/rfc8009_aes2.c > create mode 100644 crypto/krb5/selftest.c > create mode 100644 crypto/krb5/selftest_data.c > create mode 100644 net/rxrpc/rxgk.c > create mode 100644 net/rxrpc/rxgk_app.c > create mode 100644 net/rxrpc/rxgk_common.h > create mode 100644 net/rxrpc/rxgk_kdf.c > > -- Chuck Lever
Chuck Lever <chuck.lever@oracle.com> wrote: > > There are three main interfaces to it: > > > > (*) I/O crypto: encrypt, decrypt, get_mic and verify_mic. > > > > These all do in-place crypto, using an sglist to define the buffer > > with the data in it. Is it necessary to make it able to take separate > > input and output buffers? > > Hi David, Wondering if these "I/O" APIs use synchronous or async > crypto under the covers. For small data items like MICs, synchronous > might be a better choice, especially if asynchronous crypto could > result in incoming requests getting re-ordered and falling out of > the GSS sequence number window. > > What say ye? For the moment I'm using synchronous APIs as that's what sunrpc is using (I borrowed the basic code from there). It would be interesting to consider using async, but there's a potential issue. For the simplified profile, encryption and integrity checksum generation can be done simultaneously, but decryption and verification can't. For the AES-2 profile, the reverse is true. For my purposes in rxrpc, async mode isn't actually that useful since I'm only doing the contents of a UDP packet at a time. Either I'm encrypting with the intention of immediate transmission or decrypting with the intention of immediately using the data, so I'm in a context where I can wait anyway. What might get me more of a boost would be to encrypt the app data directly into a UDP packet and decrypt the UDP packet directly into the app buffers. This is easier said than done, though, as there's typically security metadata inserted into the packet inside the encrypted region. David
> On Nov 12, 2020, at 10:42 AM, David Howells <dhowells@redhat.com> wrote: > > Chuck Lever <chuck.lever@oracle.com> wrote: > >>> There are three main interfaces to it: >>> >>> (*) I/O crypto: encrypt, decrypt, get_mic and verify_mic. >>> >>> These all do in-place crypto, using an sglist to define the buffer >>> with the data in it. Is it necessary to make it able to take separate >>> input and output buffers? >> >> Hi David, Wondering if these "I/O" APIs use synchronous or async >> crypto under the covers. For small data items like MICs, synchronous >> might be a better choice, especially if asynchronous crypto could >> result in incoming requests getting re-ordered and falling out of >> the GSS sequence number window. >> >> What say ye? > > For the moment I'm using synchronous APIs as that's what sunrpc is using (I > borrowed the basic code from there). Really? My understanding of the Linux kernel SUNRPC implementation is that it uses asynchronous, even for small data items. Maybe I'm using the terminology incorrectly. The problem that arises is on the server. The asynchronous API can schedule, and if the server has other work to do, that can delay a verify_mic long enough that the request drops out of the GSS sequence number window (even a large window). Whatever the mechanism, we need to have deterministic ordering, at least on the server-side. > It would be interesting to consider using async, but there's a potential > issue. For the simplified profile, encryption and integrity checksum > generation can be done simultaneously, but decryption and verification can't. > For the AES-2 profile, the reverse is true. > > For my purposes in rxrpc, async mode isn't actually that useful since I'm only > doing the contents of a UDP packet at a time. Either I'm encrypting with the > intention of immediate transmission or decrypting with the intention of > immediately using the data, so I'm in a context where I can wait anyway. > > What might get me more of a boost would be to encrypt the app data directly > into a UDP packet and decrypt the UDP packet directly into the app buffers. > This is easier said than done, though, as there's typically security metadata > inserted into the packet inside the encrypted region. -- Chuck Lever
Chuck Lever <chuck.lever@oracle.com> wrote: > Really? My understanding of the Linux kernel SUNRPC implementation is > that it uses asynchronous, even for small data items. Maybe I'm using > the terminology incorrectly. Seems to be synchronous, at least in its use of skcipher: grep -e skcipher * gss_krb5_crypto.c:#include <crypto/skcipher.h> gss_krb5_crypto.c: struct crypto_sync_skcipher *tfm, gss_krb5_crypto.c: if (length % crypto_sync_skcipher_blocksize(tfm) != 0) gss_krb5_crypto.c: if (crypto_sync_skcipher_ivsize(tfm) > GSS_KRB5_MAX_BLOCKSIZE) { gss_krb5_crypto.c: crypto_sync_skcipher_ivsize(tfm)); gss_krb5_crypto.c: memcpy(local_iv, iv, crypto_sync_skcipher_ivsize(tfm)); gss_krb5_crypto.c: skcipher_request_set_sync_tfm(req, tfm); gss_krb5_crypto.c: skcipher_request_set_callback(req, 0, NULL, NULL); gss_krb5_crypto.c: skcipher_request_set_crypt(req, sg, sg, length, local_iv); gss_krb5_crypto.c: ret = crypto_skcipher_encrypt(req); gss_krb5_crypto.c: skcipher_request_zero(req); gss_krb5_crypto.c: struct crypto_sync_skcipher *tfm, gss_krb5_crypto.c: if (length % crypto_sync_skcipher_blocksize(tfm) != 0) gss_krb5_crypto.c: if (crypto_sync_skcipher_ivsize(tfm) > GSS_KRB5_MAX_BLOCKSIZE) { gss_krb5_crypto.c: crypto_sync_skcipher_ivsize(tfm)); gss_krb5_crypto.c: memcpy(local_iv, iv, crypto_sync_skcipher_ivsize(tfm)); gss_krb5_crypto.c: skcipher_request_set_sync_tfm(req, tfm); gss_krb5_crypto.c: skcipher_request_set_callback(req, 0, NULL, NULL); gss_krb5_crypto.c: skcipher_request_set_crypt(req, sg, sg, length, local_iv); gss_krb5_crypto.c: ret = crypto_skcipher_decrypt(req); gss_krb5_crypto.c: skcipher_request_zero(req); gss_krb5_crypto.c: struct skcipher_request *req; gss_krb5_crypto.c: struct crypto_sync_skcipher *tfm = gss_krb5_crypto.c: crypto_sync_skcipher_reqtfm(desc->req); gss_krb5_crypto.c: fraglen = thislen & (crypto_sync_skcipher_blocksize(tfm) - 1); gss_krb5_crypto.c: skcipher_request_set_crypt(desc->req, desc->infrags, desc->outfrags, gss_krb5_crypto.c: ret = crypto_skcipher_encrypt(desc->req); gss_krb5_crypto.c:gss_encrypt_xdr_buf(struct crypto_sync_skcipher *tfm, struct xdr_buf *buf, gss_krb5_crypto.c: BUG_ON((buf->len - offset) % crypto_sync_skcipher_blocksize(tfm) != 0); gss_krb5_crypto.c: skcipher_request_set_sync_tfm(req, tfm); gss_krb5_crypto.c: skcipher_request_set_callback(req, 0, NULL, NULL); gss_krb5_crypto.c: skcipher_request_zero(req); gss_krb5_crypto.c: struct skcipher_request *req; gss_krb5_crypto.c: struct crypto_sync_skcipher *tfm = gss_krb5_crypto.c: crypto_sync_skcipher_reqtfm(desc->req); gss_krb5_crypto.c: fraglen = thislen & (crypto_sync_skcipher_blocksize(tfm) - 1); gss_krb5_crypto.c: skcipher_request_set_crypt(desc->req, desc->frags, desc->frags, gss_krb5_crypto.c: ret = crypto_skcipher_decrypt(desc->req); gss_krb5_crypto.c:gss_decrypt_xdr_buf(struct crypto_sync_skcipher *tfm, struct xdr_buf *buf, gss_krb5_crypto.c: BUG_ON((buf->len - offset) % crypto_sync_skcipher_blocksize(tfm) != 0); gss_krb5_crypto.c: skcipher_request_set_sync_tfm(req, tfm); gss_krb5_crypto.c: skcipher_request_set_callback(req, 0, NULL, NULL); gss_krb5_crypto.c: skcipher_request_zero(req); gss_krb5_crypto.c:gss_krb5_cts_crypt(struct crypto_sync_skcipher *cipher, struct xdr_buf *buf, gss_krb5_crypto.c: skcipher_request_set_sync_tfm(req, cipher); gss_krb5_crypto.c: skcipher_request_set_callback(req, 0, NULL, NULL); gss_krb5_crypto.c: skcipher_request_set_crypt(req, sg, sg, len, iv); gss_krb5_crypto.c: ret = crypto_skcipher_encrypt(req); gss_krb5_crypto.c: ret = crypto_skcipher_decrypt(req); gss_krb5_crypto.c: skcipher_request_zero(req); gss_krb5_crypto.c: struct crypto_sync_skcipher *cipher, *aux_cipher; gss_krb5_crypto.c: blocksize = crypto_sync_skcipher_blocksize(cipher); gss_krb5_crypto.c: skcipher_request_set_sync_tfm(req, aux_cipher); gss_krb5_crypto.c: skcipher_request_set_callback(req, 0, NULL, NULL); gss_krb5_crypto.c: skcipher_request_zero(req); gss_krb5_crypto.c: struct crypto_sync_skcipher *cipher, *aux_cipher; gss_krb5_crypto.c: blocksize = crypto_sync_skcipher_blocksize(cipher); gss_krb5_crypto.c: skcipher_request_set_sync_tfm(req, aux_cipher); gss_krb5_crypto.c: skcipher_request_set_callback(req, 0, NULL, NULL); gss_krb5_crypto.c: skcipher_request_zero(req); gss_krb5_keys.c:#include <crypto/skcipher.h> gss_krb5_keys.c: struct crypto_sync_skcipher *cipher; gss_krb5_keys.c: cipher = crypto_alloc_sync_skcipher(gk5e->encrypt_name, 0, 0); gss_krb5_keys.c: if (crypto_sync_skcipher_setkey(cipher, inkey->data, inkey->len)) gss_krb5_keys.c: crypto_free_sync_skcipher(cipher); gss_krb5_mech.c:#include <crypto/skcipher.h> gss_krb5_mech.c: struct krb5_ctx *ctx, struct crypto_sync_skcipher **res) gss_krb5_mech.c: *res = crypto_alloc_sync_skcipher(ctx->gk5e->encrypt_name, 0, 0); gss_krb5_mech.c: if (crypto_sync_skcipher_setkey(*res, key.data, key.len)) { gss_krb5_mech.c: crypto_free_sync_skcipher(*res); gss_krb5_mech.c: crypto_free_sync_skcipher(ctx->seq); gss_krb5_mech.c: crypto_free_sync_skcipher(ctx->enc); gss_krb5_mech.c:static struct crypto_sync_skcipher * gss_krb5_mech.c: struct crypto_sync_skcipher *cp; gss_krb5_mech.c: cp = crypto_alloc_sync_skcipher(cname, 0, 0); gss_krb5_mech.c: if (crypto_sync_skcipher_setkey(cp, key, ctx->gk5e->keylength)) { gss_krb5_mech.c: crypto_free_sync_skcipher(cp); gss_krb5_mech.c: crypto_free_sync_skcipher(ctx->enc); gss_krb5_mech.c: crypto_free_sync_skcipher(ctx->seq); gss_krb5_mech.c: crypto_free_sync_skcipher(ctx->initiator_enc_aux); gss_krb5_mech.c: crypto_free_sync_skcipher(ctx->acceptor_enc); gss_krb5_mech.c: crypto_free_sync_skcipher(ctx->initiator_enc); gss_krb5_mech.c: crypto_free_sync_skcipher(kctx->seq); gss_krb5_mech.c: crypto_free_sync_skcipher(kctx->enc); gss_krb5_mech.c: crypto_free_sync_skcipher(kctx->acceptor_enc); gss_krb5_mech.c: crypto_free_sync_skcipher(kctx->initiator_enc); gss_krb5_mech.c: crypto_free_sync_skcipher(kctx->acceptor_enc_aux); gss_krb5_mech.c: crypto_free_sync_skcipher(kctx->initiator_enc_aux); gss_krb5_seqnum.c:#include <crypto/skcipher.h> gss_krb5_seqnum.c: struct crypto_sync_skcipher *key, gss_krb5_seqnum.c: struct crypto_sync_skcipher *key = kctx->seq; gss_krb5_wrap.c:#include <crypto/skcipher.h> gss_krb5_wrap.c: blocksize = crypto_sync_skcipher_blocksize(kctx->enc); gss_krb5_wrap.c: blocksize = crypto_sync_skcipher_blocksize(kctx->enc); David
On Thu, Nov 12, 2020 at 12:57:45PM +0000, David Howells wrote: > > Hi Herbert, Bruce, > > Here's my first cut at a generic Kerberos crypto library in the kernel so > that I can share code between rxrpc and sunrpc (and cifs?). > > I derived some of the parts from the sunrpc gss library and added more > advanced AES and Camellia crypto. I haven't ported across the DES-based > crypto yet - I figure that can wait a bit till the interface is sorted. > > Whilst I have put it into a directory under crypto/, I haven't made an > interface that goes and loads it (analogous to crypto_alloc_skcipher, > say). Instead, you call: > > const struct krb5_enctype *crypto_krb5_find_enctype(u32 enctype); > > to go and get a handler table and then use a bunch of accessor functions to > jump through the hoops. This is basically the way the sunrpc gsslib does > things. It might be worth making it so you do something like: > > struct crypto_mech *ctx = crypto_mech_alloc("krb5(18)"); > > to get enctype 18, but I'm not sure if it's worth the effort. Also, I'm > not sure if there are any alternatives to kerberos we will need to support. We did have code for a non-krb5 mechanism at some point, but it was torn out. So I don't think that's a priority. (Chuck, will RPC-over-SSL need a new non-krb5 mechanism?) > There are three main interfaces to it: > > (*) I/O crypto: encrypt, decrypt, get_mic and verify_mic. > > These all do in-place crypto, using an sglist to define the buffer > with the data in it. Is it necessary to make it able to take separate > input and output buffers? I don't know. My memory is that the buffer management in the existing rpcsec_gss code is complex and fragile. See e.g. the long comment in gss_krb5_remove_padding. --b. > (*) PRF+ calculation for key derivation. > (*) Kc, Ke, Ki derivation. > > These use krb5_buffer structs to pass objects around. This is akin to > the xdr_netobj, but has a void* instead of a u8* data pointer. > > In terms of rxrpc's rxgk, there's another step in key derivation that isn't > part of the kerberos standard, but uses the PRF+ function to generate a key > that is then used to generate Kc, Ke and Ki. Is it worth putting this into > the directory or maybe having a callout to insert an intermediate step in > key derivation? > > Note that, for purposes of illustration, I've included some rxrpc patches > that use this interface to implement the rxgk Rx security class. The > branch also is based on some rxrpc patches that are a prerequisite for > this, but the crypto patches don't need it. > > --- > The patches can be found here also: > > http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=crypto-krb5 > > David > --- > David Howells (18): > crypto/krb5: Implement Kerberos crypto core > crypto/krb5: Add some constants out of sunrpc headers > crypto/krb5: Provide infrastructure and key derivation > crypto/krb5: Implement the Kerberos5 rfc3961 key derivation > crypto/krb5: Implement the Kerberos5 rfc3961 encrypt and decrypt functions > crypto/krb5: Implement the Kerberos5 rfc3961 get_mic and verify_mic > crypto/krb5: Implement the AES enctypes from rfc3962 > crypto/krb5: Implement crypto self-testing > crypto/krb5: Implement the AES enctypes from rfc8009 > crypto/krb5: Implement the AES encrypt/decrypt from rfc8009 > crypto/krb5: Add the AES self-testing data from rfc8009 > crypto/krb5: Implement the Camellia enctypes from rfc6803 > rxrpc: Add the security index for yfs-rxgk > rxrpc: Add YFS RxGK (GSSAPI) security class > rxrpc: rxgk: Provide infrastructure and key derivation > rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI) > rxrpc: rxgk: Implement connection rekeying > rxgk: Support OpenAFS's rxgk implementation > > > crypto/krb5/Kconfig | 9 + > crypto/krb5/Makefile | 11 +- > crypto/krb5/internal.h | 101 +++ > crypto/krb5/kdf.c | 223 ++++++ > crypto/krb5/main.c | 190 +++++ > crypto/krb5/rfc3961_simplified.c | 732 ++++++++++++++++++ > crypto/krb5/rfc3962_aes.c | 140 ++++ > crypto/krb5/rfc6803_camellia.c | 249 ++++++ > crypto/krb5/rfc8009_aes2.c | 440 +++++++++++ > crypto/krb5/selftest.c | 543 +++++++++++++ > crypto/krb5/selftest_data.c | 289 +++++++ > fs/afs/misc.c | 13 + > include/crypto/krb5.h | 100 +++ > include/keys/rxrpc-type.h | 17 + > include/trace/events/rxrpc.h | 4 + > include/uapi/linux/rxrpc.h | 17 + > net/rxrpc/Kconfig | 10 + > net/rxrpc/Makefile | 5 + > net/rxrpc/ar-internal.h | 20 + > net/rxrpc/conn_object.c | 2 + > net/rxrpc/key.c | 319 ++++++++ > net/rxrpc/rxgk.c | 1232 ++++++++++++++++++++++++++++++ > net/rxrpc/rxgk_app.c | 424 ++++++++++ > net/rxrpc/rxgk_common.h | 164 ++++ > net/rxrpc/rxgk_kdf.c | 271 +++++++ > net/rxrpc/security.c | 6 + > 26 files changed, 5530 insertions(+), 1 deletion(-) > create mode 100644 crypto/krb5/kdf.c > create mode 100644 crypto/krb5/rfc3961_simplified.c > create mode 100644 crypto/krb5/rfc3962_aes.c > create mode 100644 crypto/krb5/rfc6803_camellia.c > create mode 100644 crypto/krb5/rfc8009_aes2.c > create mode 100644 crypto/krb5/selftest.c > create mode 100644 crypto/krb5/selftest_data.c > create mode 100644 net/rxrpc/rxgk.c > create mode 100644 net/rxrpc/rxgk_app.c > create mode 100644 net/rxrpc/rxgk_common.h > create mode 100644 net/rxrpc/rxgk_kdf.c >
> On Nov 12, 2020, at 1:37 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > On Thu, Nov 12, 2020 at 12:57:45PM +0000, David Howells wrote: >> >> Hi Herbert, Bruce, >> >> Here's my first cut at a generic Kerberos crypto library in the kernel so >> that I can share code between rxrpc and sunrpc (and cifs?). >> >> I derived some of the parts from the sunrpc gss library and added more >> advanced AES and Camellia crypto. I haven't ported across the DES-based >> crypto yet - I figure that can wait a bit till the interface is sorted. >> >> Whilst I have put it into a directory under crypto/, I haven't made an >> interface that goes and loads it (analogous to crypto_alloc_skcipher, >> say). Instead, you call: >> >> const struct krb5_enctype *crypto_krb5_find_enctype(u32 enctype); >> >> to go and get a handler table and then use a bunch of accessor functions to >> jump through the hoops. This is basically the way the sunrpc gsslib does >> things. It might be worth making it so you do something like: >> >> struct crypto_mech *ctx = crypto_mech_alloc("krb5(18)"); >> >> to get enctype 18, but I'm not sure if it's worth the effort. Also, I'm >> not sure if there are any alternatives to kerberos we will need to support. > > We did have code for a non-krb5 mechanism at some point, but it was torn > out. So I don't think that's a priority. > > (Chuck, will RPC-over-SSL need a new non-krb5 mechanism?) No, RPC-over-TLS does not involve the GSS infrastructure in any way. >> There are three main interfaces to it: >> >> (*) I/O crypto: encrypt, decrypt, get_mic and verify_mic. >> >> These all do in-place crypto, using an sglist to define the buffer >> with the data in it. Is it necessary to make it able to take separate >> input and output buffers? > > I don't know. My memory is that the buffer management in the existing > rpcsec_gss code is complex and fragile. See e.g. the long comment in > gss_krb5_remove_padding. And even worse, the buffer handling is slightly different in the NFS client and server code paths. > --b. > >> (*) PRF+ calculation for key derivation. >> (*) Kc, Ke, Ki derivation. >> >> These use krb5_buffer structs to pass objects around. This is akin to >> the xdr_netobj, but has a void* instead of a u8* data pointer. >> >> In terms of rxrpc's rxgk, there's another step in key derivation that isn't >> part of the kerberos standard, but uses the PRF+ function to generate a key >> that is then used to generate Kc, Ke and Ki. Is it worth putting this into >> the directory or maybe having a callout to insert an intermediate step in >> key derivation? >> >> Note that, for purposes of illustration, I've included some rxrpc patches >> that use this interface to implement the rxgk Rx security class. The >> branch also is based on some rxrpc patches that are a prerequisite for >> this, but the crypto patches don't need it. >> >> --- >> The patches can be found here also: >> >> http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=crypto-krb5 >> >> David >> --- >> David Howells (18): >> crypto/krb5: Implement Kerberos crypto core >> crypto/krb5: Add some constants out of sunrpc headers >> crypto/krb5: Provide infrastructure and key derivation >> crypto/krb5: Implement the Kerberos5 rfc3961 key derivation >> crypto/krb5: Implement the Kerberos5 rfc3961 encrypt and decrypt functions >> crypto/krb5: Implement the Kerberos5 rfc3961 get_mic and verify_mic >> crypto/krb5: Implement the AES enctypes from rfc3962 >> crypto/krb5: Implement crypto self-testing >> crypto/krb5: Implement the AES enctypes from rfc8009 >> crypto/krb5: Implement the AES encrypt/decrypt from rfc8009 >> crypto/krb5: Add the AES self-testing data from rfc8009 >> crypto/krb5: Implement the Camellia enctypes from rfc6803 >> rxrpc: Add the security index for yfs-rxgk >> rxrpc: Add YFS RxGK (GSSAPI) security class >> rxrpc: rxgk: Provide infrastructure and key derivation >> rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI) >> rxrpc: rxgk: Implement connection rekeying >> rxgk: Support OpenAFS's rxgk implementation >> >> >> crypto/krb5/Kconfig | 9 + >> crypto/krb5/Makefile | 11 +- >> crypto/krb5/internal.h | 101 +++ >> crypto/krb5/kdf.c | 223 ++++++ >> crypto/krb5/main.c | 190 +++++ >> crypto/krb5/rfc3961_simplified.c | 732 ++++++++++++++++++ >> crypto/krb5/rfc3962_aes.c | 140 ++++ >> crypto/krb5/rfc6803_camellia.c | 249 ++++++ >> crypto/krb5/rfc8009_aes2.c | 440 +++++++++++ >> crypto/krb5/selftest.c | 543 +++++++++++++ >> crypto/krb5/selftest_data.c | 289 +++++++ >> fs/afs/misc.c | 13 + >> include/crypto/krb5.h | 100 +++ >> include/keys/rxrpc-type.h | 17 + >> include/trace/events/rxrpc.h | 4 + >> include/uapi/linux/rxrpc.h | 17 + >> net/rxrpc/Kconfig | 10 + >> net/rxrpc/Makefile | 5 + >> net/rxrpc/ar-internal.h | 20 + >> net/rxrpc/conn_object.c | 2 + >> net/rxrpc/key.c | 319 ++++++++ >> net/rxrpc/rxgk.c | 1232 ++++++++++++++++++++++++++++++ >> net/rxrpc/rxgk_app.c | 424 ++++++++++ >> net/rxrpc/rxgk_common.h | 164 ++++ >> net/rxrpc/rxgk_kdf.c | 271 +++++++ >> net/rxrpc/security.c | 6 + >> 26 files changed, 5530 insertions(+), 1 deletion(-) >> create mode 100644 crypto/krb5/kdf.c >> create mode 100644 crypto/krb5/rfc3961_simplified.c >> create mode 100644 crypto/krb5/rfc3962_aes.c >> create mode 100644 crypto/krb5/rfc6803_camellia.c >> create mode 100644 crypto/krb5/rfc8009_aes2.c >> create mode 100644 crypto/krb5/selftest.c >> create mode 100644 crypto/krb5/selftest_data.c >> create mode 100644 net/rxrpc/rxgk.c >> create mode 100644 net/rxrpc/rxgk_app.c >> create mode 100644 net/rxrpc/rxgk_common.h >> create mode 100644 net/rxrpc/rxgk_kdf.c -- Chuck Lever
On Thu, Nov 12, 2020 at 04:54:06PM +0000, David Howells wrote: > Chuck Lever <chuck.lever@oracle.com> wrote: > > > Really? My understanding of the Linux kernel SUNRPC implementation is > > that it uses asynchronous, even for small data items. Maybe I'm using > > the terminology incorrectly. > > Seems to be synchronous, at least in its use of skcipher: Yes, it's all synchronous. The only cases where we defer and revisit a request is when we need to do upcalls to userspace. (And those upcalls mostly come after we're done with unwrapping and verifying a request, so now I'm sort of curious exactly what Chuck was seeing.) --b.
> On Nov 12, 2020, at 4:07 PM, Bruce Fields <bfields@fieldses.org> wrote: > > On Thu, Nov 12, 2020 at 04:54:06PM +0000, David Howells wrote: >> Chuck Lever <chuck.lever@oracle.com> wrote: >> >>> Really? My understanding of the Linux kernel SUNRPC implementation is >>> that it uses asynchronous, even for small data items. Maybe I'm using >>> the terminology incorrectly. >> >> Seems to be synchronous, at least in its use of skcipher: > > Yes, it's all synchronous. The only cases where we defer and revisit a > request is when we need to do upcalls to userspace. > > (And those upcalls mostly come after we're done with unwrapping and > verifying a request, so now I'm sort of curious exactly what Chuck was > seeing.) I vaguely recall that setting CRYPTO_TFM_REQ_MAY_SLEEP allows the crypto API to sleep and defer completion. -- Chuck Lever
On Thu, Nov 12, 2020 at 12:57:45PM +0000, David Howells wrote: > > Hi Herbert, Bruce, > > Here's my first cut at a generic Kerberos crypto library in the kernel so > that I can share code between rxrpc and sunrpc (and cifs?). Hi David: I can't find the bit where you are actually sharing this code with sunrpc, am I missing something? Cheers,
Herbert Xu <herbert@gondor.apana.org.au> wrote: > > Here's my first cut at a generic Kerberos crypto library in the kernel so > > that I can share code between rxrpc and sunrpc (and cifs?). > > I can't find the bit where you are actually sharing this code with > sunrpc, am I missing something? I haven't done that yet. Sorry, I should've been more explicit with what I was after. I was wanting to find out if the nfs/nfsd people are okay with this (and if there are any gotchas I should know about - it turns out, if I understand it correctly, the relevant code may being being rewritten a bit anyway). And from you, I was wanting to find out if you're okay with an interface of this kind in crypto/ where the code is just used directly - or whether I'll be required to wrap it up in the autoloading, module-handling mechanisms. David
On Thu, Nov 26, 2020 at 08:19:41AM +0000, David Howells wrote: > > I haven't done that yet. Sorry, I should've been more explicit with what I > was after. I was wanting to find out if the nfs/nfsd people are okay with > this (and if there are any gotchas I should know about - it turns out, if I > understand it correctly, the relevant code may being being rewritten a bit > anyway). > > And from you, I was wanting to find out if you're okay with an interface of > this kind in crypto/ where the code is just used directly - or whether I'll > be required to wrap it up in the autoloading, module-handling mechanisms. I don't have any problems with it living under crypto. However, I'd like to see what the sunrpc code looks like before going one way or another. Cheers,
Btw, would it be feasible to make it so that an extra parameter can be added to the cipher buffer-supplying functions, e.g.: skcipher_request_set_crypt(req, input, ciphertext_sg, esize, iv); such that we can pass in an offset into the output sg as well? David
On Tue, Dec 01, 2020 at 08:44:33AM +0000, David Howells wrote: > Btw, would it be feasible to make it so that an extra parameter can be added > to the cipher buffer-supplying functions, e.g.: > > skcipher_request_set_crypt(req, input, ciphertext_sg, esize, iv); > > such that we can pass in an offset into the output sg as well? Couldn't you just change the output sg to include the offset? Cheers,
Herbert Xu <herbert@gondor.apana.org.au> wrote:
> Couldn't you just change the output sg to include the offset?
That depends on whether the caller has passed it elsewhere for some other
parallel purpose, but I think I'm going to have to go down that road and
restore it afterwards.
David
On Tue, Dec 01, 2020 at 09:12:38AM +0000, David Howells wrote: > > That depends on whether the caller has passed it elsewhere for some other > parallel purpose, but I think I'm going to have to go down that road and > restore it afterwards. Sure but even if you added it to the API the underlying implementataions would just have to do the same thing. Since this is particular to your use-case it's better to leave the complexity where it's needed rather than propagting it to all the crypto drivers. Cheers,
Hi Chuck, Bruce, Why is gss_krb5_crypto.c using an auxiliary cipher? For reference, the gss_krb5_aes_encrypt() code looks like the attached. From what I can tell, in AES mode, the difference between the main cipher and the auxiliary cipher is that the latter is "cbc(aes)" whereas the former is "cts(cbc(aes))" - but they have the same key. Reading up on CTS, I'm guessing the reason it's like this is that CTS is the same as the non-CTS, except for the last two blocks, but the non-CTS one is more efficient. David --- nbytes = buf->len - offset - GSS_KRB5_TOK_HDR_LEN; nblocks = (nbytes + blocksize - 1) / blocksize; cbcbytes = 0; if (nblocks > 2) cbcbytes = (nblocks - 2) * blocksize; memset(desc.iv, 0, sizeof(desc.iv)); if (cbcbytes) { SYNC_SKCIPHER_REQUEST_ON_STACK(req, aux_cipher); desc.pos = offset + GSS_KRB5_TOK_HDR_LEN; desc.fragno = 0; desc.fraglen = 0; desc.pages = pages; desc.outbuf = buf; desc.req = req; skcipher_request_set_sync_tfm(req, aux_cipher); skcipher_request_set_callback(req, 0, NULL, NULL); sg_init_table(desc.infrags, 4); sg_init_table(desc.outfrags, 4); err = xdr_process_buf(buf, offset + GSS_KRB5_TOK_HDR_LEN, cbcbytes, encryptor, &desc); skcipher_request_zero(req); if (err) goto out_err; } /* Make sure IV carries forward from any CBC results. */ err = gss_krb5_cts_crypt(cipher, buf, offset + GSS_KRB5_TOK_HDR_LEN + cbcbytes, desc.iv, pages, 1); if (err) { err = GSS_S_FAILURE; goto out_err; }
On Fri, Dec 04, 2020 at 02:59:35PM +0000, David Howells wrote: > Hi Chuck, Bruce, > > Why is gss_krb5_crypto.c using an auxiliary cipher? For reference, the > gss_krb5_aes_encrypt() code looks like the attached. > > >From what I can tell, in AES mode, the difference between the main cipher and > the auxiliary cipher is that the latter is "cbc(aes)" whereas the former is > "cts(cbc(aes))" - but they have the same key. > > Reading up on CTS, I'm guessing the reason it's like this is that CTS is the > same as the non-CTS, except for the last two blocks, but the non-CTS one is > more efficient. CTS is cipher-text stealing, isn't it? I think it was Kevin Coffman that did that, and I don't remember the history. I thought it was required by some spec or peer implementation (maybe Windows?) but I really don't remember. It may predate git. I'll dig around and see what I can find. --b. > > David > --- > nbytes = buf->len - offset - GSS_KRB5_TOK_HDR_LEN; > nblocks = (nbytes + blocksize - 1) / blocksize; > cbcbytes = 0; > if (nblocks > 2) > cbcbytes = (nblocks - 2) * blocksize; > > memset(desc.iv, 0, sizeof(desc.iv)); > > if (cbcbytes) { > SYNC_SKCIPHER_REQUEST_ON_STACK(req, aux_cipher); > > desc.pos = offset + GSS_KRB5_TOK_HDR_LEN; > desc.fragno = 0; > desc.fraglen = 0; > desc.pages = pages; > desc.outbuf = buf; > desc.req = req; > > skcipher_request_set_sync_tfm(req, aux_cipher); > skcipher_request_set_callback(req, 0, NULL, NULL); > > sg_init_table(desc.infrags, 4); > sg_init_table(desc.outfrags, 4); > > err = xdr_process_buf(buf, offset + GSS_KRB5_TOK_HDR_LEN, > cbcbytes, encryptor, &desc); > skcipher_request_zero(req); > if (err) > goto out_err; > } > > /* Make sure IV carries forward from any CBC results. */ > err = gss_krb5_cts_crypt(cipher, buf, > offset + GSS_KRB5_TOK_HDR_LEN + cbcbytes, > desc.iv, pages, 1); > if (err) { > err = GSS_S_FAILURE; > goto out_err; > }
Bruce Fields <bfields@fieldses.org> wrote: > > Reading up on CTS, I'm guessing the reason it's like this is that CTS is the > > same as the non-CTS, except for the last two blocks, but the non-CTS one is > > more efficient. > > CTS is cipher-text stealing, isn't it? I think it was Kevin Coffman > that did that, and I don't remember the history. I thought it was > required by some spec or peer implementation (maybe Windows?) but I > really don't remember. It may predate git. I'll dig around and see > what I can find. rfc3961 and rfc3962 specify CTS-CBC with AES. David
On Fri, Dec 04, 2020 at 04:01:53PM +0000, David Howells wrote: > Bruce Fields <bfields@fieldses.org> wrote: > > > > Reading up on CTS, I'm guessing the reason it's like this is that CTS is the > > > same as the non-CTS, except for the last two blocks, but the non-CTS one is > > > more efficient. > > > > CTS is cipher-text stealing, isn't it? I think it was Kevin Coffman > > that did that, and I don't remember the history. I thought it was > > required by some spec or peer implementation (maybe Windows?) but I > > really don't remember. It may predate git. I'll dig around and see > > what I can find. > > rfc3961 and rfc3962 specify CTS-CBC with AES. OK, I guess I don't understand the question. I haven't thought about this code in at least a decade. What's an auxilary cipher? Is this a question about why we're implementing something, or how we're implementing it? --b.
> On Dec 4, 2020, at 10:46 AM, Bruce Fields <bfields@fieldses.org> wrote: > > On Fri, Dec 04, 2020 at 02:59:35PM +0000, David Howells wrote: >> Hi Chuck, Bruce, >> >> Why is gss_krb5_crypto.c using an auxiliary cipher? For reference, the >> gss_krb5_aes_encrypt() code looks like the attached. >> >>> From what I can tell, in AES mode, the difference between the main cipher and >> the auxiliary cipher is that the latter is "cbc(aes)" whereas the former is >> "cts(cbc(aes))" - but they have the same key. >> >> Reading up on CTS, I'm guessing the reason it's like this is that CTS is the >> same as the non-CTS, except for the last two blocks, but the non-CTS one is >> more efficient. > > CTS is cipher-text stealing, isn't it? I think it was Kevin Coffman > that did that, and I don't remember the history. I thought it was > required by some spec or peer implementation (maybe Windows?) but I > really don't remember. It may predate git. I'll dig around and see > what I can find. I can't add more here, this design comes from well before I started working on this body of code (though, I worked near Kevin when he implemented it). -- Chuck Lever
On Fri, Dec 04, 2020 at 10:46:26AM -0500, Bruce Fields wrote: > On Fri, Dec 04, 2020 at 02:59:35PM +0000, David Howells wrote: > > Hi Chuck, Bruce, > > > > Why is gss_krb5_crypto.c using an auxiliary cipher? For reference, the > > gss_krb5_aes_encrypt() code looks like the attached. > > > > >From what I can tell, in AES mode, the difference between the main cipher and > > the auxiliary cipher is that the latter is "cbc(aes)" whereas the former is > > "cts(cbc(aes))" - but they have the same key. > > > > Reading up on CTS, I'm guessing the reason it's like this is that CTS is the > > same as the non-CTS, except for the last two blocks, but the non-CTS one is > > more efficient. > > CTS is cipher-text stealing, isn't it? I think it was Kevin Coffman > that did that, and I don't remember the history. I thought it was > required by some spec or peer implementation (maybe Windows?) but I > really don't remember. It may predate git. I'll dig around and see > what I can find. Like I say, I've got no insight here, I'm just grepping through mailboxes and stuff, but maybe some of this history's useful; Addition of CTS mode: https://lore.kernel.org/linux-crypto/20080220202543.3209.47410.stgit@jazz.citi.umich.edu/ This rpc/krb5 code went in with 934a95aa1c9c "gss_krb5: add remaining pieces to enable AES encryption support"; may be worth looking at that and the series leading up to it, I see the changelogs have some RFC references that might explain why it's using the crypto it is. --b.
Bruce Fields <bfields@fieldses.org> wrote: > OK, I guess I don't understand the question. I haven't thought about > this code in at least a decade. What's an auxilary cipher? Is this a > question about why we're implementing something, or how we're > implementing it? That's what the Linux sunrpc implementation calls them: struct crypto_sync_skcipher *acceptor_enc; struct crypto_sync_skcipher *initiator_enc; struct crypto_sync_skcipher *acceptor_enc_aux; struct crypto_sync_skcipher *initiator_enc_aux; Auxiliary ciphers aren't mentioned in rfc396{1,2} so it appears to be something peculiar to that implementation. So acceptor_enc and acceptor_enc_aux, for instance, are both based on the same key, and the implementation seems to pass the IV from one to the other. The only difference is that the 'aux' cipher lacks the CTS wrapping - which only makes a difference for the final two blocks[*] of the encryption (or decryption) - and only if the data doesn't fully fill out the last block (ie. it needs padding in some way so that the encryption algorithm can handle it). [*] Encryption cipher blocks, that is. So I think it's purpose is twofold: (1) It's a way to be a bit more efficient, cutting out the CTS layer's indirection and additional buffering. (2) crypto_skcipher_encrypt() assumes that it's doing the entire crypto operation in one go and will always impose the final CTS bit, so you can't call it repeatedly to progress through a buffer (as xdr_process_buf() would like to do) as that would corrupt the data being encrypted - unless you made sure that the data was always block-size aligned (in which case, there's no point using CTS). I wonder how much going through three layers of crypto modules costs. Looking at how AES can be implemented using, say, Intel AES intructions, it looks like AES+CBC should be easy to do in a single module. I wonder if we could have optimised kerberos crypto that do the AES and the SHA together in a single loop. David
On Fri, 4 Dec 2020 at 17:52, David Howells <dhowells@redhat.com> wrote: > > Bruce Fields <bfields@fieldses.org> wrote: > > > OK, I guess I don't understand the question. I haven't thought about > > this code in at least a decade. What's an auxilary cipher? Is this a > > question about why we're implementing something, or how we're > > implementing it? > > That's what the Linux sunrpc implementation calls them: > > struct crypto_sync_skcipher *acceptor_enc; > struct crypto_sync_skcipher *initiator_enc; > struct crypto_sync_skcipher *acceptor_enc_aux; > struct crypto_sync_skcipher *initiator_enc_aux; > > Auxiliary ciphers aren't mentioned in rfc396{1,2} so it appears to be > something peculiar to that implementation. > > So acceptor_enc and acceptor_enc_aux, for instance, are both based on the same > key, and the implementation seems to pass the IV from one to the other. The > only difference is that the 'aux' cipher lacks the CTS wrapping - which only > makes a difference for the final two blocks[*] of the encryption (or > decryption) - and only if the data doesn't fully fill out the last block > (ie. it needs padding in some way so that the encryption algorithm can handle > it). > > [*] Encryption cipher blocks, that is. > > So I think it's purpose is twofold: > > (1) It's a way to be a bit more efficient, cutting out the CTS layer's > indirection and additional buffering. > > (2) crypto_skcipher_encrypt() assumes that it's doing the entire crypto > operation in one go and will always impose the final CTS bit, so you > can't call it repeatedly to progress through a buffer (as > xdr_process_buf() would like to do) as that would corrupt the data being > encrypted - unless you made sure that the data was always block-size > aligned (in which case, there's no point using CTS). > > I wonder how much going through three layers of crypto modules costs. Looking > at how AES can be implemented using, say, Intel AES intructions, it looks like > AES+CBC should be easy to do in a single module. I wonder if we could have > optimised kerberos crypto that do the AES and the SHA together in a single > loop. > The tricky thing with CTS is that you have to ensure that the final full and partial blocks are presented to the crypto driver as one chunk, or it won't be able to perform the ciphertext stealing. This might be the reason for the current approach. If the sunrpc code has multiple disjoint chunks of data to encrypto, it is always better to wrap it in a single scatterlist and call into the skcipher only once. However, I would recommend against it: at least for ARM and arm64, I have already contributed SIMD based implementations that use SIMD permutation instructions and overlapping loads and stores to perform the ciphertext stealing, which means that there is only a single layer which implements CTS+CBC+AES, and this layer can consume the entire scatterlist in one go. We could easily do something similar in the AES-NI driver as well.
Ard Biesheuvel <ardb@kernel.org> wrote: > The tricky thing with CTS is that you have to ensure that the final > full and partial blocks are presented to the crypto driver as one > chunk, or it won't be able to perform the ciphertext stealing. This > might be the reason for the current approach. If the sunrpc code has > multiple disjoint chunks of data to encrypto, it is always better to > wrap it in a single scatterlist and call into the skcipher only once. Yeah - the problem with that is that for sunrpc, we might be dealing with 1MB plus bits of non-contiguous pages, requiring >8K of scatterlist elements (admittedly, we can chain them, but we may have to do one or more large allocations). > However, I would recommend against it: Sorry, recommend against what? > at least for ARM and arm64, I > have already contributed SIMD based implementations that use SIMD > permutation instructions and overlapping loads and stores to perform > the ciphertext stealing, which means that there is only a single layer > which implements CTS+CBC+AES, and this layer can consume the entire > scatterlist in one go. We could easily do something similar in the > AES-NI driver as well. Can you point me at that in the sources? Can you also do SHA at the same time in the same loop? Note that the rfc3962 AES does the checksum over the plaintext, but rfc8009 does it over the ciphertext. David
On Fri, 4 Dec 2020 at 18:19, David Howells <dhowells@redhat.com> wrote: > > Ard Biesheuvel <ardb@kernel.org> wrote: > > > The tricky thing with CTS is that you have to ensure that the final > > full and partial blocks are presented to the crypto driver as one > > chunk, or it won't be able to perform the ciphertext stealing. This > > might be the reason for the current approach. If the sunrpc code has > > multiple disjoint chunks of data to encrypto, it is always better to > > wrap it in a single scatterlist and call into the skcipher only once. > > Yeah - the problem with that is that for sunrpc, we might be dealing with 1MB > plus bits of non-contiguous pages, requiring >8K of scatterlist elements > (admittedly, we can chain them, but we may have to do one or more large > allocations). > > > However, I would recommend against it: > > Sorry, recommend against what? > Recommend against the current approach of manipulating the input like this and feeding it into the skcipher piecemeal. Herbert recently made some changes for MSG_MORE support in the AF_ALG code, which permits a skcipher encryption to be split into several invocations of the skcipher layer without the need for this complexity on the side of the caller. Maybe there is a way to reuse that here. Herbert? > > at least for ARM and arm64, I > > have already contributed SIMD based implementations that use SIMD > > permutation instructions and overlapping loads and stores to perform > > the ciphertext stealing, which means that there is only a single layer > > which implements CTS+CBC+AES, and this layer can consume the entire > > scatterlist in one go. We could easily do something similar in the > > AES-NI driver as well. > > Can you point me at that in the sources? > arm64 has arch/arm64/crypto/aes-glue.c arch/arm64/crypto/aes-modes.S where the former implements the skcipher wrapper for an implementation of "cts(cbc(aes))" static int cts_cbc_encrypt(struct skcipher_request *req) walks over the src/dst scatterlist and feeds the data into the asm helpers, one for the bulk of the input, and one for the final full and partial blocks (or two final full blocks) The SIMD asm helpers are aes_cbc_encrypt aes_cbc_decrypt aes_cbc_cts_encrypt aes_cbc_cts_decrypt > Can you also do SHA at the same time in the same loop? > SHA-1 or HMAC-SHA1? The latter could probably be modeled as an AEAD. The former doesn't really fit the current API so we'd have to invent something for it. > Note that the rfc3962 AES does the checksum over the plaintext, but rfc8009 > does it over the ciphertext. > > David >
On Fri, Dec 04, 2020 at 02:59:35PM +0000, David Howells wrote: > Hi Chuck, Bruce, > > Why is gss_krb5_crypto.c using an auxiliary cipher? For reference, the > gss_krb5_aes_encrypt() code looks like the attached. > > From what I can tell, in AES mode, the difference between the main cipher and > the auxiliary cipher is that the latter is "cbc(aes)" whereas the former is > "cts(cbc(aes))" - but they have the same key. > > Reading up on CTS, I'm guessing the reason it's like this is that CTS is the > same as the non-CTS, except for the last two blocks, but the non-CTS one is > more efficient. The reason to use CTS is if you don't want to expand the size of the cipher text to the cipher block size. e.g., if you have a 53 byte plaintext, and you can't afford to let the ciphertext be 56 bytes, the cryptographic engineer will reach for CTS instead of CBC. So that probably explains the explanation to use CTS (and it's required by the spec in any case). As far as why CBC is being used instead of CTS, the only reason I can think of is the one you posted. Perhaps there was some hardware or software configureation where cbc(aes) was hardware accelerated, and cts(cbc(aes)) would not be? In any case, using cbc(aes) for all but the last two blocks, and using cts(cbc(aes)) for the last two blocks, is identical to using cts(cbc(aes)) for the whole encryption. So the only reason to do this in the more complex way would be because for performance reasons. - Ted
On Fri, Dec 04, 2020 at 06:35:48PM +0100, Ard Biesheuvel wrote: > > Herbert recently made some changes for MSG_MORE support in the AF_ALG > code, which permits a skcipher encryption to be split into several > invocations of the skcipher layer without the need for this complexity > on the side of the caller. Maybe there is a way to reuse that here. > Herbert? Yes this was one of the reasons I was persuing the continuation work. It should allow us to kill the special case for CTS in the krb5 code. Hopefully I can get some time to restart work on this soon. Cheers,
Herbert Xu <herbert@gondor.apana.org.au> wrote: > > Herbert recently made some changes for MSG_MORE support in the AF_ALG > > code, which permits a skcipher encryption to be split into several > > invocations of the skcipher layer without the need for this complexity > > on the side of the caller. Maybe there is a way to reuse that here. > > Herbert? > > Yes this was one of the reasons I was persuing the continuation > work. It should allow us to kill the special case for CTS in the > krb5 code. > > Hopefully I can get some time to restart work on this soon. In the krb5 case, we know in advance how much data we're going to be dealing with, if that helps. David
Ard Biesheuvel <ardb@kernel.org> wrote: > > Yeah - the problem with that is that for sunrpc, we might be dealing with 1MB > > plus bits of non-contiguous pages, requiring >8K of scatterlist elements > > (admittedly, we can chain them, but we may have to do one or more large > > allocations). > > > > > However, I would recommend against it: > > > > Sorry, recommend against what? > > > > Recommend against the current approach of manipulating the input like > this and feeding it into the skcipher piecemeal. Right. I understand the problem, but as I mentioned above, the scatterlist itself becomes a performance issue as it may exceed two pages in size. Double that as there may need to be separate input and output scatterlists. > Herbert recently made some changes for MSG_MORE support in the AF_ALG > code, which permits a skcipher encryption to be split into several > invocations of the skcipher layer without the need for this complexity > on the side of the caller. Maybe there is a way to reuse that here. > Herbert? I wonder if it would help if the input buffer and output buffer didn't have to correspond exactly in usage - ie. the output buffer could be used at a slower rate than the input to allow for buffering inside the crypto algorithm. > > Can you also do SHA at the same time in the same loop? > > SHA-1 or HMAC-SHA1? The latter could probably be modeled as an AEAD. > The former doesn't really fit the current API so we'd have to invent > something for it. The hashes corresponding to the kerberos enctypes I'm supporting are: HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96. HMAC-SHA256 for aes128-cts-hmac-sha256-128 HMAC-SHA384 for aes256-cts-hmac-sha384-192 CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac I'm not sure you can support all of those with the instructions available. David
On Mon, 7 Dec 2020 at 13:02, David Howells <dhowells@redhat.com> wrote: > > Ard Biesheuvel <ardb@kernel.org> wrote: > > > > Yeah - the problem with that is that for sunrpc, we might be dealing with 1MB > > > plus bits of non-contiguous pages, requiring >8K of scatterlist elements > > > (admittedly, we can chain them, but we may have to do one or more large > > > allocations). > > > > > > > However, I would recommend against it: > > > > > > Sorry, recommend against what? > > > > > > > Recommend against the current approach of manipulating the input like > > this and feeding it into the skcipher piecemeal. > > Right. I understand the problem, but as I mentioned above, the scatterlist > itself becomes a performance issue as it may exceed two pages in size. Double > that as there may need to be separate input and output scatterlists. > I wasn't aware that Herbert's work hadn't been merged yet. So that means it is entirely reasonable to split the input like this and feed the first part into a cbc(aes) skcipher and the last part into a cts(cbc(aes)) skcipher, provided that you ensure that the last part covers the final two blocks (one full block and one block that is either full or partial) With Herbert's changes, you will be able to use the same skcipher, and pass a flag to all but the final part that more data is coming. But for lack of that, the current approach is optimal for cases where having to cover the entire input with a single scatterlist is undesirable. > > Herbert recently made some changes for MSG_MORE support in the AF_ALG > > code, which permits a skcipher encryption to be split into several > > invocations of the skcipher layer without the need for this complexity > > on the side of the caller. Maybe there is a way to reuse that here. > > Herbert? > > I wonder if it would help if the input buffer and output buffer didn't have to > correspond exactly in usage - ie. the output buffer could be used at a slower > rate than the input to allow for buffering inside the crypto algorithm. > I don't follow - how could one be used at a slower rate? > > > Can you also do SHA at the same time in the same loop? > > > > SHA-1 or HMAC-SHA1? The latter could probably be modeled as an AEAD. > > The former doesn't really fit the current API so we'd have to invent > > something for it. > > The hashes corresponding to the kerberos enctypes I'm supporting are: > > HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96. > > HMAC-SHA256 for aes128-cts-hmac-sha256-128 > > HMAC-SHA384 for aes256-cts-hmac-sha384-192 > > CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac > > I'm not sure you can support all of those with the instructions available. > It depends on whether the caller can make use of the authenc() pattern, which is a type of AEAD we support. There are numerous implementations of authenc(hmac(shaXXX),cbc(aes)), including h/w accelerated ones, but none that implement ciphertext stealing. So that means that, even if you manage to use the AEAD layer to perform both at the same time, the generic authenc() template will perform the cts(cbc(aes)) and hmac(shaXXX) by calling into skciphers and ahashes, respectively, which won't give you any benefit until accelerated implementations turn up that perform the whole operation in one pass over the input. And even then, I don't think the performance benefit will be worth it.
Ard Biesheuvel <ardb@kernel.org> wrote: > > I wonder if it would help if the input buffer and output buffer didn't > > have to correspond exactly in usage - ie. the output buffer could be used > > at a slower rate than the input to allow for buffering inside the crypto > > algorithm. > > > > I don't follow - how could one be used at a slower rate? I mean that the crypto algorithm might need to buffer the last part of the input until it has a block's worth before it can write to the output. > > The hashes corresponding to the kerberos enctypes I'm supporting are: > > > > HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96. > > > > HMAC-SHA256 for aes128-cts-hmac-sha256-128 > > > > HMAC-SHA384 for aes256-cts-hmac-sha384-192 > > > > CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac > > > > I'm not sure you can support all of those with the instructions available. > > It depends on whether the caller can make use of the authenc() > pattern, which is a type of AEAD we support. Interesting. I didn't realise AEAD was an API. > There are numerous implementations of authenc(hmac(shaXXX),cbc(aes)), > including h/w accelerated ones, but none that implement ciphertext > stealing. So that means that, even if you manage to use the AEAD layer to > perform both at the same time, the generic authenc() template will perform > the cts(cbc(aes)) and hmac(shaXXX) by calling into skciphers and ahashes, > respectively, which won't give you any benefit until accelerated > implementations turn up that perform the whole operation in one pass over > the input. And even then, I don't think the performance benefit will be > worth it. Also, the rfc8009 variants that use AES with SHA256/384 hash the ciphertext, not the plaintext. For the moment, it's probably not worth worrying about, then. If I can manage to abstract the sunrpc bits out into a krb5 library, we can improve the library later. David
On Mon, 7 Dec 2020 at 15:15, David Howells <dhowells@redhat.com> wrote: > > Ard Biesheuvel <ardb@kernel.org> wrote: > > > > I wonder if it would help if the input buffer and output buffer didn't > > > have to correspond exactly in usage - ie. the output buffer could be used > > > at a slower rate than the input to allow for buffering inside the crypto > > > algorithm. > > > > > > > I don't follow - how could one be used at a slower rate? > > I mean that the crypto algorithm might need to buffer the last part of the > input until it has a block's worth before it can write to the output. > This is what is typically handled transparently by the driver. When you populate a scatterlist, it doesn't matter how misaligned the individual elements are, the scatterlist walker will always present the data in chunks that the crypto algorithm can manage. This is why using a single scatterlist for the entire input is preferable in general. > > > The hashes corresponding to the kerberos enctypes I'm supporting are: > > > > > > HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96. > > > > > > HMAC-SHA256 for aes128-cts-hmac-sha256-128 > > > > > > HMAC-SHA384 for aes256-cts-hmac-sha384-192 > > > > > > CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac > > > > > > I'm not sure you can support all of those with the instructions available. > > > > It depends on whether the caller can make use of the authenc() > > pattern, which is a type of AEAD we support. > > Interesting. I didn't realise AEAD was an API. > > > There are numerous implementations of authenc(hmac(shaXXX),cbc(aes)), > > including h/w accelerated ones, but none that implement ciphertext > > stealing. So that means that, even if you manage to use the AEAD layer to > > perform both at the same time, the generic authenc() template will perform > > the cts(cbc(aes)) and hmac(shaXXX) by calling into skciphers and ahashes, > > respectively, which won't give you any benefit until accelerated > > implementations turn up that perform the whole operation in one pass over > > the input. And even then, I don't think the performance benefit will be > > worth it. > > Also, the rfc8009 variants that use AES with SHA256/384 hash the ciphertext, > not the plaintext. > > For the moment, it's probably not worth worrying about, then. If I can manage > to abstract the sunrpc bits out into a krb5 library, we can improve the > library later. >
Ard Biesheuvel <ardb@kernel.org> wrote: Ard Biesheuvel <ardb@kernel.org> wrote: > > > > I wonder if it would help if the input buffer and output buffer didn't > > > > have to correspond exactly in usage - ie. the output buffer could be > > > > used at a slower rate than the input to allow for buffering inside the > > > > crypto algorithm. > > > > > > I don't follow - how could one be used at a slower rate? > > > > I mean that the crypto algorithm might need to buffer the last part of the > > input until it has a block's worth before it can write to the output. > > This is what is typically handled transparently by the driver. When > you populate a scatterlist, it doesn't matter how misaligned the > individual elements are, the scatterlist walker will always present > the data in chunks that the crypto algorithm can manage. This is why > using a single scatterlist for the entire input is preferable in > general. Yep - but the assumption currently on the part of the callers is that they provide the input buffer and corresponding output buffer - and that the algorithm will transfer data from one to the other, such that the same amount of input and output bufferage will be used. However, if we start pushing data in progressively, this would no longer hold true unless we also require the caller to only present in block-size chunks. For example, if I gave the encryption function 120 bytes of data and a 120 byte output buffer, but the algorithm has a 16-byte blocksize, it will, presumably, consume 120 bytes of input, but it can only write 112 bytes of output at this time. So the current interface would need to evolve to indicate separately how much input has been consumed and how much output has been produced - in which case it can't be handled transparently. For krb5, it's actually worse than that, since we want to be able to insert/remove a header and a trailer (and might need to go back and update the header after) - but I think in the krb5 case, we need to treat the header and trailer specially and update them after the fact in the wrapping case (unwrapping is not a problem, since we can just cache the header). David
I wonder - would it make sense to reserve two arrays of scatterlist structs and a mutex per CPU sufficient to map up to 1MiB of pages with each array while the krb5 service is in use? That way sunrpc could, say, grab the mutex, map the input and output buffers, do the entire crypto op in one go and then release the mutex - at least for big ops, small ops needn't use this service. For rxrpc/afs's use case this would probably be overkill - it's doing crypto on each packet, not on whole operations - but I could still make use of it there. However, that then limits the maximum size of an op to 1MiB, plus dangly bits on either side (which can be managed with chained scatterlist structs) and also limits the number of large simultaneous krb5 crypto ops we can do. David
David Howells <dhowells@redhat.com> wrote: > I wonder - would it make sense to reserve two arrays of scatterlist structs > and a mutex per CPU sufficient to map up to 1MiB of pages with each array > while the krb5 service is in use? Actually, simply reserving a set per CPU is probably unnecessary. We could, say, set a minimum and a maximum on the reservations (say 2 -> 2*nr_cpus) and then allocate new ones when we run out. Then let the memory shrinker clean them up off an lru list. David
On Tue, 8 Dec 2020 at 14:25, David Howells <dhowells@redhat.com> wrote: > > I wonder - would it make sense to reserve two arrays of scatterlist structs > and a mutex per CPU sufficient to map up to 1MiB of pages with each array > while the krb5 service is in use? > > That way sunrpc could, say, grab the mutex, map the input and output buffers, > do the entire crypto op in one go and then release the mutex - at least for > big ops, small ops needn't use this service. > > For rxrpc/afs's use case this would probably be overkill - it's doing crypto > on each packet, not on whole operations - but I could still make use of it > there. > > However, that then limits the maximum size of an op to 1MiB, plus dangly bits > on either side (which can be managed with chained scatterlist structs) and > also limits the number of large simultaneous krb5 crypto ops we can do. > Apparently, it is permitted for gss_krb5_cts_crypt() to do a kmalloc(GFP_NOFS) in the context from where gss_krb5_aes_encrypt() is being invoked, and so I don't see why it wouldn't be possible to simply kmalloc() a scatterlist[] of the appropriate size, populate it with all the pages, bufs and whatever else gets passed into the skcipher, and pass it into the skcipher in one go.
Ard Biesheuvel <ardb@kernel.org> wrote: > Apparently, it is permitted for gss_krb5_cts_crypt() to do a > kmalloc(GFP_NOFS) in the context from where gss_krb5_aes_encrypt() is > being invoked, and so I don't see why it wouldn't be possible to > simply kmalloc() a scatterlist[] of the appropriate size, populate it > with all the pages, bufs and whatever else gets passed into the > skcipher, and pass it into the skcipher in one go. I never said it wasn't possible. But doing a pair of order-1 allocations from there might have a significant detrimental effect on performance - in which case Trond and co. will say "no". Remember: to crypt 1MiB of data on a 64-bit machine requires 2 x minimum 8KiB scatterlist arrays. That's assuming the pages in the middle are contiguous, which might not be the case for a direct I/O read/write. So for the DIO case, it could be involve an order-2 allocation (or chaining of single pages). David