From patchwork Mon Dec 10 19:57:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Watson X-Patchwork-Id: 10722361 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7730E91E for ; Mon, 10 Dec 2018 19:58:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B1AC12A62D for ; Mon, 10 Dec 2018 19:58:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A12792A684; Mon, 10 Dec 2018 19:58:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6662B2A62D for ; Mon, 10 Dec 2018 19:58:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727515AbeLJT6m (ORCPT ); Mon, 10 Dec 2018 14:58:42 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:52098 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727330AbeLJT6m (ORCPT ); Mon, 10 Dec 2018 14:58:42 -0500 Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id wBAJn4e9022812; Mon, 10 Dec 2018 11:58:28 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-id : content-transfer-encoding : mime-version; s=facebook; bh=Gcri6K+36Vf1mB12ai79m9IS59FUFriccc+mPr6oOHg=; b=aiC8yXQDu0C3bSoUanY7dnWrH3WGzsP66LoTSmaJhnOl3T4+yinv3SnYUYBvgrzYoWM7 bTVh50iU155ZLZ5rs4Hnk0SvQMUGXonC42V0iqW7njm3YLhY3CfNA+0x1tiTY1q+mfFk gRz4Ygidg0VwuY3Fj0gMAtVdUwkNrxYkopQ= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2p9wvx86vk-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Mon, 10 Dec 2018 11:58:28 -0800 Received: from prn-hub05.TheFacebook.com (2620:10d:c081:35::129) by prn-hub05.TheFacebook.com (2620:10d:c081:35::129) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.1.1531.3; Mon, 10 Dec 2018 11:57:43 -0800 Received: from NAM05-DM3-obe.outbound.protection.outlook.com (192.168.54.28) by o365-in.thefacebook.com (192.168.16.29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.1.1531.3 via Frontend Transport; Mon, 10 Dec 2018 11:57:43 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.onmicrosoft.com; s=selector1-fb-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Gcri6K+36Vf1mB12ai79m9IS59FUFriccc+mPr6oOHg=; b=IFUqczeaQElkcjHhdZ39RsYlfRHwKggTiEOT94BQ6OsCk0ARkys7+QdQ1AAfOBCELTiXNk9srR5o1mK0Oddu2yj+iPEKT/i2OklHse2Dz6ig/xaAdkfPcxoBq7T1WIY9rM0zCGad0qKfwWgi26k8yGgPNDQXkT8GJfHFmZHuqMY= Received: from MWHPR15MB1134.namprd15.prod.outlook.com (10.175.2.12) by MWHPR15MB1710.namprd15.prod.outlook.com (10.174.254.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1404.21; Mon, 10 Dec 2018 19:57:37 +0000 Received: from MWHPR15MB1134.namprd15.prod.outlook.com ([fe80::911d:ed1a:7e45:6434]) by MWHPR15MB1134.namprd15.prod.outlook.com ([fe80::911d:ed1a:7e45:6434%4]) with mapi id 15.20.1404.026; Mon, 10 Dec 2018 19:57:37 +0000 From: Dave Watson To: Herbert Xu , Junaid Shahid , Steffen Klassert , "linux-crypto@vger.kernel.org" CC: Doron Roberts-Kedes , Sabrina Dubroca , "linux-kernel@vger.kernel.org" , Stephan Mueller Subject: [PATCH 04/12] x86/crypto: aesni: support 256 byte keys in avx asm Thread-Topic: [PATCH 04/12] x86/crypto: aesni: support 256 byte keys in avx asm Thread-Index: AQHUkMKZ2822VImdnUWfQoXG/klO8g== Date: Mon, 10 Dec 2018 19:57:36 +0000 Message-ID: <01b7ee2e738b5014d58b60489d831235bcc5f424.1544471415.git.davejwatson@fb.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: NeoMutt/20180716 x-clientproxiedby: MWHPR06CA0004.namprd06.prod.outlook.com (2603:10b6:301:39::17) To MWHPR15MB1134.namprd15.prod.outlook.com (2603:10b6:320:22::12) x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [2620:10d:c090:180::1:2261] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;MWHPR15MB1710;20:p1/WUDQQ7g3LN2XkbAZ+GMPNGgeCJQFTAKVgPxYUp9lPGUjgzBqfMXjpsxlYyXGre6hVQigD95OjhJXKaeloDzOeczlxv7H/1vbinBIOEDfdjMgjspQGEmgTQpJBZzK1ktf/HhyUwzBeTfRbj3S6/kxU4zr5jIesmOKxc4YROs0= x-ms-office365-filtering-correlation-id: d7e0ef07-436e-4e17-5800-08d65ed9bb6c x-microsoft-antispam: BCL:0;PCL:0;RULEID:(2390098)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(2017052603328)(7153060)(7193020);SRVR:MWHPR15MB1710; x-ms-traffictypediagnostic: MWHPR15MB1710: x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3231455)(999002)(11241501185)(944501520)(52105112)(10201501046)(3002001)(93006095)(93001095)(148016)(149066)(150057)(6041310)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123564045)(20161123560045)(201708071742011)(7699051)(76991095);SRVR:MWHPR15MB1710;BCL:0;PCL:0;RULEID:;SRVR:MWHPR15MB1710; x-forefront-prvs: 08828D20BC x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(39860400002)(366004)(396003)(346002)(136003)(376002)(189003)(199004)(386003)(5660300001)(25786009)(11346002)(46003)(446003)(68736007)(2616005)(478600001)(7736002)(105586002)(305945005)(6436002)(575784001)(476003)(186003)(71190400001)(71200400001)(6506007)(8936002)(81156014)(4744004)(14454004)(81166006)(86362001)(6486002)(52116002)(8676002)(102836004)(316002)(14444005)(256004)(54906003)(2501003)(6512007)(58126008)(76176011)(118296001)(99286004)(110136005)(2906002)(36756003)(53946003)(97736004)(6116002)(486006)(53936002)(106356001)(4326008)(579004);DIR:OUT;SFP:1102;SCL:1;SRVR:MWHPR15MB1710;H:MWHPR15MB1134.namprd15.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; received-spf: None (protection.outlook.com: fb.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: QQXevmKLoUvJo7xWija4zlxi2ldxmOtW/6MA5esQDh/hA3VhaPaCgupKqnihiLxa+EsHJnu1LN6mH1X0NxmmpjJAlQUgeFf/qKLVqDpDIen1wNqYHwvo3WHd6atOsuJqrs32eagqsecAojuprnIu8C5ByqFbHHiDQVsc8viscmxwMayfLMcBbBJxPVk7kombs4RKbcKSDANc3QYgm/lK6Yv8fB6h8BHBtVOZhcu4zqMWSm7dMTWIBeJkNekN/V4uZ5TNJYyCTluN2xte3VTMUchp9Gy7gogYkkHvtVAYvrr61aUCDc0LILv9ghLx4umo spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-ID: <4BA9702F574DE948943F4B1F13FDA14B@namprd15.prod.outlook.com> MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: d7e0ef07-436e-4e17-5800-08d65ed9bb6c X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Dec 2018 19:57:36.8991 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR15MB1710 X-OriginatorOrg: fb.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-12-10_07:,, signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add support for 192/256-bit keys using the avx gcm/aes routines. The sse routines were previously updated in e31ac32d3b (Add support for 192 & 256 bit keys to AESNI RFC4106). Instead of adding an additional loop in the hotpath as in e31ac32d3b, this diff instead generates separate versions of the code using macros, and the entry routines choose which version once. This results in a 5% performance improvement vs. adding a loop to the hot path. This is the same strategy chosen by the intel isa-l_crypto library. The key size checks are removed from the c code where appropriate. Note that this diff depends on using gcm_context_data - 256 bit keys require 16 HashKeys + 15 expanded keys, which is larger than struct crypto_aes_ctx, so they are stored in struct gcm_context_data. Signed-off-by: Dave Watson --- arch/x86/crypto/aesni-intel_avx-x86_64.S | 188 +++++++++++++++++------ arch/x86/crypto/aesni-intel_glue.c | 18 +-- 2 files changed, 145 insertions(+), 61 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S b/arch/x86/crypto/aesni-intel_avx-x86_64.S index dd895f69399b..2aa11c503bb9 100644 --- a/arch/x86/crypto/aesni-intel_avx-x86_64.S +++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S @@ -209,6 +209,7 @@ HashKey_8_k = 16*21 # store XOR of HashKey^8 <<1 mod poly here (for Karatsu #define arg8 STACK_OFFSET+8*2(%r14) #define arg9 STACK_OFFSET+8*3(%r14) #define arg10 STACK_OFFSET+8*4(%r14) +#define keysize 2*15*16(arg1) i = 0 j = 0 @@ -272,22 +273,22 @@ VARIABLE_OFFSET = 16*8 .endm # Encryption of a single block -.macro ENCRYPT_SINGLE_BLOCK XMM0 +.macro ENCRYPT_SINGLE_BLOCK REP XMM0 vpxor (arg1), \XMM0, \XMM0 - i = 1 - setreg -.rep 9 + i = 1 + setreg +.rep \REP vaesenc 16*i(arg1), \XMM0, \XMM0 - i = (i+1) - setreg + i = (i+1) + setreg .endr - vaesenclast 16*10(arg1), \XMM0, \XMM0 + vaesenclast 16*i(arg1), \XMM0, \XMM0 .endm # combined for GCM encrypt and decrypt functions # clobbering all xmm registers # clobbering r10, r11, r12, r13, r14, r15 -.macro GCM_ENC_DEC INITIAL_BLOCKS GHASH_8_ENCRYPT_8_PARALLEL GHASH_LAST_8 GHASH_MUL ENC_DEC +.macro GCM_ENC_DEC INITIAL_BLOCKS GHASH_8_ENCRYPT_8_PARALLEL GHASH_LAST_8 GHASH_MUL ENC_DEC REP vmovdqu HashKey(arg2), %xmm13 # xmm13 = HashKey mov arg5, %r13 # save the number of bytes of plaintext/ciphertext @@ -314,42 +315,42 @@ VARIABLE_OFFSET = 16*8 jmp _initial_num_blocks_is_1\@ _initial_num_blocks_is_7\@: - \INITIAL_BLOCKS 7, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC + \INITIAL_BLOCKS \REP, 7, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC sub $16*7, %r13 jmp _initial_blocks_encrypted\@ _initial_num_blocks_is_6\@: - \INITIAL_BLOCKS 6, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC + \INITIAL_BLOCKS \REP, 6, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC sub $16*6, %r13 jmp _initial_blocks_encrypted\@ _initial_num_blocks_is_5\@: - \INITIAL_BLOCKS 5, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC + \INITIAL_BLOCKS \REP, 5, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC sub $16*5, %r13 jmp _initial_blocks_encrypted\@ _initial_num_blocks_is_4\@: - \INITIAL_BLOCKS 4, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC + \INITIAL_BLOCKS \REP, 4, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC sub $16*4, %r13 jmp _initial_blocks_encrypted\@ _initial_num_blocks_is_3\@: - \INITIAL_BLOCKS 3, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC + \INITIAL_BLOCKS \REP, 3, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC sub $16*3, %r13 jmp _initial_blocks_encrypted\@ _initial_num_blocks_is_2\@: - \INITIAL_BLOCKS 2, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC + \INITIAL_BLOCKS \REP, 2, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC sub $16*2, %r13 jmp _initial_blocks_encrypted\@ _initial_num_blocks_is_1\@: - \INITIAL_BLOCKS 1, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC + \INITIAL_BLOCKS \REP, 1, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC sub $16*1, %r13 jmp _initial_blocks_encrypted\@ _initial_num_blocks_is_0\@: - \INITIAL_BLOCKS 0, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC + \INITIAL_BLOCKS \REP, 0, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC _initial_blocks_encrypted\@: @@ -374,7 +375,7 @@ _encrypt_by_8_new\@: add $8, %r15b - \GHASH_8_ENCRYPT_8_PARALLEL %xmm0, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm15, out_order, \ENC_DEC + \GHASH_8_ENCRYPT_8_PARALLEL \REP, %xmm0, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm15, out_order, \ENC_DEC add $128, %r11 sub $128, %r13 jne _encrypt_by_8_new\@ @@ -385,7 +386,7 @@ _encrypt_by_8_new\@: _encrypt_by_8\@: vpshufb SHUF_MASK(%rip), %xmm9, %xmm9 add $8, %r15b - \GHASH_8_ENCRYPT_8_PARALLEL %xmm0, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm15, in_order, \ENC_DEC + \GHASH_8_ENCRYPT_8_PARALLEL \REP, %xmm0, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm15, in_order, \ENC_DEC vpshufb SHUF_MASK(%rip), %xmm9, %xmm9 add $128, %r11 sub $128, %r13 @@ -414,7 +415,7 @@ _zero_cipher_left\@: vpaddd ONE(%rip), %xmm9, %xmm9 # INCR CNT to get Yn vpshufb SHUF_MASK(%rip), %xmm9, %xmm9 - ENCRYPT_SINGLE_BLOCK %xmm9 # E(K, Yn) + ENCRYPT_SINGLE_BLOCK \REP, %xmm9 # E(K, Yn) sub $16, %r11 add %r13, %r11 @@ -440,7 +441,7 @@ _only_less_than_16\@: vpaddd ONE(%rip), %xmm9, %xmm9 # INCR CNT to get Yn vpshufb SHUF_MASK(%rip), %xmm9, %xmm9 - ENCRYPT_SINGLE_BLOCK %xmm9 # E(K, Yn) + ENCRYPT_SINGLE_BLOCK \REP, %xmm9 # E(K, Yn) lea SHIFT_MASK+16(%rip), %r12 @@ -525,7 +526,7 @@ _multiple_of_16_bytes\@: mov arg6, %rax # rax = *Y0 vmovdqu (%rax), %xmm9 # xmm9 = Y0 - ENCRYPT_SINGLE_BLOCK %xmm9 # E(K, Y0) + ENCRYPT_SINGLE_BLOCK \REP, %xmm9 # E(K, Y0) vpxor %xmm14, %xmm9, %xmm9 @@ -690,7 +691,7 @@ _return_T_done\@: ## r10, r11, r12, rax are clobbered ## arg1, arg3, arg4, r14 are used as a pointer only, not modified -.macro INITIAL_BLOCKS_AVX num_initial_blocks T1 T2 T3 T4 T5 CTR XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 T6 T_key ENC_DEC +.macro INITIAL_BLOCKS_AVX REP num_initial_blocks T1 T2 T3 T4 T5 CTR XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 T6 T_key ENC_DEC i = (8-\num_initial_blocks) j = 0 setreg @@ -786,10 +787,10 @@ _get_AAD_done\@: setreg .endr - j = 1 - setreg -.rep 9 - vmovdqa 16*j(arg1), \T_key + j = 1 + setreg +.rep \REP + vmovdqa 16*j(arg1), \T_key i = (9-\num_initial_blocks) setreg .rep \num_initial_blocks @@ -798,12 +799,11 @@ _get_AAD_done\@: setreg .endr - j = (j+1) - setreg + j = (j+1) + setreg .endr - - vmovdqa 16*10(arg1), \T_key + vmovdqa 16*j(arg1), \T_key i = (9-\num_initial_blocks) setreg .rep \num_initial_blocks @@ -891,9 +891,9 @@ _get_AAD_done\@: vpxor \T_key, \XMM7, \XMM7 vpxor \T_key, \XMM8, \XMM8 - i = 1 - setreg -.rep 9 # do 9 rounds + i = 1 + setreg +.rep \REP # do REP rounds vmovdqa 16*i(arg1), \T_key vaesenc \T_key, \XMM1, \XMM1 vaesenc \T_key, \XMM2, \XMM2 @@ -903,11 +903,10 @@ _get_AAD_done\@: vaesenc \T_key, \XMM6, \XMM6 vaesenc \T_key, \XMM7, \XMM7 vaesenc \T_key, \XMM8, \XMM8 - i = (i+1) - setreg + i = (i+1) + setreg .endr - vmovdqa 16*i(arg1), \T_key vaesenclast \T_key, \XMM1, \XMM1 vaesenclast \T_key, \XMM2, \XMM2 @@ -996,7 +995,7 @@ _initial_blocks_done\@: # ghash the 8 previously encrypted ciphertext blocks # arg1, arg3, arg4 are used as pointers only, not modified # r11 is the data offset value -.macro GHASH_8_ENCRYPT_8_PARALLEL_AVX T1 T2 T3 T4 T5 T6 CTR XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 T7 loop_idx ENC_DEC +.macro GHASH_8_ENCRYPT_8_PARALLEL_AVX REP T1 T2 T3 T4 T5 T6 CTR XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 T7 loop_idx ENC_DEC vmovdqa \XMM1, \T2 vmovdqa \XMM2, TMP2(%rsp) @@ -1262,6 +1261,24 @@ _initial_blocks_done\@: vmovdqu 16*10(arg1), \T5 + i = 11 + setreg +.rep (\REP-9) + + vaesenc \T5, \XMM1, \XMM1 + vaesenc \T5, \XMM2, \XMM2 + vaesenc \T5, \XMM3, \XMM3 + vaesenc \T5, \XMM4, \XMM4 + vaesenc \T5, \XMM5, \XMM5 + vaesenc \T5, \XMM6, \XMM6 + vaesenc \T5, \XMM7, \XMM7 + vaesenc \T5, \XMM8, \XMM8 + + vmovdqu 16*i(arg1), \T5 + i = i + 1 + setreg +.endr + i = 0 j = 1 setreg @@ -1560,9 +1577,23 @@ ENDPROC(aesni_gcm_precomp_avx_gen2) ############################################################################### ENTRY(aesni_gcm_enc_avx_gen2) FUNC_SAVE - GCM_ENC_DEC INITIAL_BLOCKS_AVX GHASH_8_ENCRYPT_8_PARALLEL_AVX GHASH_LAST_8_AVX GHASH_MUL_AVX ENC + mov keysize, %eax + cmp $32, %eax + je key_256_enc + cmp $16, %eax + je key_128_enc + # must be 192 + GCM_ENC_DEC INITIAL_BLOCKS_AVX, GHASH_8_ENCRYPT_8_PARALLEL_AVX, GHASH_LAST_8_AVX, GHASH_MUL_AVX, ENC, 11 FUNC_RESTORE - ret + ret +key_128_enc: + GCM_ENC_DEC INITIAL_BLOCKS_AVX, GHASH_8_ENCRYPT_8_PARALLEL_AVX, GHASH_LAST_8_AVX, GHASH_MUL_AVX, ENC, 9 + FUNC_RESTORE + ret +key_256_enc: + GCM_ENC_DEC INITIAL_BLOCKS_AVX, GHASH_8_ENCRYPT_8_PARALLEL_AVX, GHASH_LAST_8_AVX, GHASH_MUL_AVX, ENC, 13 + FUNC_RESTORE + ret ENDPROC(aesni_gcm_enc_avx_gen2) ############################################################################### @@ -1584,9 +1615,23 @@ ENDPROC(aesni_gcm_enc_avx_gen2) ############################################################################### ENTRY(aesni_gcm_dec_avx_gen2) FUNC_SAVE - GCM_ENC_DEC INITIAL_BLOCKS_AVX GHASH_8_ENCRYPT_8_PARALLEL_AVX GHASH_LAST_8_AVX GHASH_MUL_AVX DEC + mov keysize,%eax + cmp $32, %eax + je key_256_dec + cmp $16, %eax + je key_128_dec + # must be 192 + GCM_ENC_DEC INITIAL_BLOCKS_AVX, GHASH_8_ENCRYPT_8_PARALLEL_AVX, GHASH_LAST_8_AVX, GHASH_MUL_AVX, DEC, 11 FUNC_RESTORE - ret + ret +key_128_dec: + GCM_ENC_DEC INITIAL_BLOCKS_AVX, GHASH_8_ENCRYPT_8_PARALLEL_AVX, GHASH_LAST_8_AVX, GHASH_MUL_AVX, DEC, 9 + FUNC_RESTORE + ret +key_256_dec: + GCM_ENC_DEC INITIAL_BLOCKS_AVX, GHASH_8_ENCRYPT_8_PARALLEL_AVX, GHASH_LAST_8_AVX, GHASH_MUL_AVX, DEC, 13 + FUNC_RESTORE + ret ENDPROC(aesni_gcm_dec_avx_gen2) #endif /* CONFIG_AS_AVX */ @@ -1671,7 +1716,7 @@ ENDPROC(aesni_gcm_dec_avx_gen2) ## r10, r11, r12, rax are clobbered ## arg1, arg3, arg4, r14 are used as a pointer only, not modified -.macro INITIAL_BLOCKS_AVX2 num_initial_blocks T1 T2 T3 T4 T5 CTR XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 T6 T_key ENC_DEC VER +.macro INITIAL_BLOCKS_AVX2 REP num_initial_blocks T1 T2 T3 T4 T5 CTR XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 T6 T_key ENC_DEC VER i = (8-\num_initial_blocks) j = 0 setreg @@ -1770,7 +1815,7 @@ _get_AAD_done\@: j = 1 setreg -.rep 9 +.rep \REP vmovdqa 16*j(arg1), \T_key i = (9-\num_initial_blocks) setreg @@ -1785,7 +1830,7 @@ _get_AAD_done\@: .endr - vmovdqa 16*10(arg1), \T_key + vmovdqa 16*j(arg1), \T_key i = (9-\num_initial_blocks) setreg .rep \num_initial_blocks @@ -1876,7 +1921,7 @@ _get_AAD_done\@: i = 1 setreg -.rep 9 # do 9 rounds +.rep \REP # do REP rounds vmovdqa 16*i(arg1), \T_key vaesenc \T_key, \XMM1, \XMM1 vaesenc \T_key, \XMM2, \XMM2 @@ -1983,7 +2028,7 @@ _initial_blocks_done\@: # ghash the 8 previously encrypted ciphertext blocks # arg1, arg3, arg4 are used as pointers only, not modified # r11 is the data offset value -.macro GHASH_8_ENCRYPT_8_PARALLEL_AVX2 T1 T2 T3 T4 T5 T6 CTR XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 T7 loop_idx ENC_DEC +.macro GHASH_8_ENCRYPT_8_PARALLEL_AVX2 REP T1 T2 T3 T4 T5 T6 CTR XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 T7 loop_idx ENC_DEC vmovdqa \XMM1, \T2 vmovdqa \XMM2, TMP2(%rsp) @@ -2252,6 +2297,23 @@ _initial_blocks_done\@: vmovdqu 16*10(arg1), \T5 + i = 11 + setreg +.rep (\REP-9) + vaesenc \T5, \XMM1, \XMM1 + vaesenc \T5, \XMM2, \XMM2 + vaesenc \T5, \XMM3, \XMM3 + vaesenc \T5, \XMM4, \XMM4 + vaesenc \T5, \XMM5, \XMM5 + vaesenc \T5, \XMM6, \XMM6 + vaesenc \T5, \XMM7, \XMM7 + vaesenc \T5, \XMM8, \XMM8 + + vmovdqu 16*i(arg1), \T5 + i = i + 1 + setreg +.endr + i = 0 j = 1 setreg @@ -2563,7 +2625,21 @@ ENDPROC(aesni_gcm_precomp_avx_gen4) ############################################################################### ENTRY(aesni_gcm_enc_avx_gen4) FUNC_SAVE - GCM_ENC_DEC INITIAL_BLOCKS_AVX2 GHASH_8_ENCRYPT_8_PARALLEL_AVX2 GHASH_LAST_8_AVX2 GHASH_MUL_AVX2 ENC + mov keysize,%eax + cmp $32, %eax + je key_256_enc4 + cmp $16, %eax + je key_128_enc4 + # must be 192 + GCM_ENC_DEC INITIAL_BLOCKS_AVX2, GHASH_8_ENCRYPT_8_PARALLEL_AVX2, GHASH_LAST_8_AVX2, GHASH_MUL_AVX2, ENC, 11 + FUNC_RESTORE + ret +key_128_enc4: + GCM_ENC_DEC INITIAL_BLOCKS_AVX2, GHASH_8_ENCRYPT_8_PARALLEL_AVX2, GHASH_LAST_8_AVX2, GHASH_MUL_AVX2, ENC, 9 + FUNC_RESTORE + ret +key_256_enc4: + GCM_ENC_DEC INITIAL_BLOCKS_AVX2, GHASH_8_ENCRYPT_8_PARALLEL_AVX2, GHASH_LAST_8_AVX2, GHASH_MUL_AVX2, ENC, 13 FUNC_RESTORE ret ENDPROC(aesni_gcm_enc_avx_gen4) @@ -2587,9 +2663,23 @@ ENDPROC(aesni_gcm_enc_avx_gen4) ############################################################################### ENTRY(aesni_gcm_dec_avx_gen4) FUNC_SAVE - GCM_ENC_DEC INITIAL_BLOCKS_AVX2 GHASH_8_ENCRYPT_8_PARALLEL_AVX2 GHASH_LAST_8_AVX2 GHASH_MUL_AVX2 DEC + mov keysize,%eax + cmp $32, %eax + je key_256_dec4 + cmp $16, %eax + je key_128_dec4 + # must be 192 + GCM_ENC_DEC INITIAL_BLOCKS_AVX2, GHASH_8_ENCRYPT_8_PARALLEL_AVX2, GHASH_LAST_8_AVX2, GHASH_MUL_AVX2, DEC, 11 FUNC_RESTORE - ret + ret +key_128_dec4: + GCM_ENC_DEC INITIAL_BLOCKS_AVX2, GHASH_8_ENCRYPT_8_PARALLEL_AVX2, GHASH_LAST_8_AVX2, GHASH_MUL_AVX2, DEC, 9 + FUNC_RESTORE + ret +key_256_dec4: + GCM_ENC_DEC INITIAL_BLOCKS_AVX2, GHASH_8_ENCRYPT_8_PARALLEL_AVX2, GHASH_LAST_8_AVX2, GHASH_MUL_AVX2, DEC, 13 + FUNC_RESTORE + ret ENDPROC(aesni_gcm_dec_avx_gen4) #endif /* CONFIG_AS_AVX2 */ diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c index d8b8cb33f608..7d1259feb0f9 100644 --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -209,8 +209,7 @@ static void aesni_gcm_enc_avx(void *ctx, u8 *hash_subkey, const u8 *aad, unsigned long aad_len, u8 *auth_tag, unsigned long auth_tag_len) { - struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx; - if ((plaintext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)){ + if (plaintext_len < AVX_GEN2_OPTSIZE) { aesni_gcm_enc(ctx, data, out, in, plaintext_len, iv, hash_subkey, aad, aad_len, auth_tag, auth_tag_len); @@ -227,8 +226,7 @@ static void aesni_gcm_dec_avx(void *ctx, u8 *hash_subkey, const u8 *aad, unsigned long aad_len, u8 *auth_tag, unsigned long auth_tag_len) { - struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx; - if ((ciphertext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)) { + if (ciphertext_len < AVX_GEN2_OPTSIZE) { aesni_gcm_dec(ctx, data, out, in, ciphertext_len, iv, hash_subkey, aad, aad_len, auth_tag, auth_tag_len); @@ -268,8 +266,7 @@ static void aesni_gcm_enc_avx2(void *ctx, u8 *hash_subkey, const u8 *aad, unsigned long aad_len, u8 *auth_tag, unsigned long auth_tag_len) { - struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx; - if ((plaintext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)) { + if (plaintext_len < AVX_GEN2_OPTSIZE) { aesni_gcm_enc(ctx, data, out, in, plaintext_len, iv, hash_subkey, aad, aad_len, auth_tag, auth_tag_len); @@ -290,8 +287,7 @@ static void aesni_gcm_dec_avx2(void *ctx, u8 *hash_subkey, const u8 *aad, unsigned long aad_len, u8 *auth_tag, unsigned long auth_tag_len) { - struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx; - if ((ciphertext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)) { + if (ciphertext_len < AVX_GEN2_OPTSIZE) { aesni_gcm_dec(ctx, data, out, in, ciphertext_len, iv, hash_subkey, aad, aad_len, auth_tag, auth_tag_len); @@ -928,8 +924,7 @@ static int gcmaes_encrypt(struct aead_request *req, unsigned int assoclen, struct scatter_walk dst_sg_walk = {}; struct gcm_context_data data AESNI_ALIGN_ATTR; - if (((struct crypto_aes_ctx *)aes_ctx)->key_length != AES_KEYSIZE_128 || - aesni_gcm_enc_tfm == aesni_gcm_enc || + if (aesni_gcm_enc_tfm == aesni_gcm_enc || req->cryptlen < AVX_GEN2_OPTSIZE) { return gcmaes_crypt_by_sg(true, req, assoclen, hash_subkey, iv, aes_ctx); @@ -1000,8 +995,7 @@ static int gcmaes_decrypt(struct aead_request *req, unsigned int assoclen, struct gcm_context_data data AESNI_ALIGN_ATTR; int retval = 0; - if (((struct crypto_aes_ctx *)aes_ctx)->key_length != AES_KEYSIZE_128 || - aesni_gcm_enc_tfm == aesni_gcm_enc || + if (aesni_gcm_enc_tfm == aesni_gcm_enc || req->cryptlen < AVX_GEN2_OPTSIZE) { return gcmaes_crypt_by_sg(false, req, assoclen, hash_subkey, iv, aes_ctx);