From patchwork Mon Jul 1 11:20:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 11025633 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 44B561510 for ; Mon, 1 Jul 2019 11:23:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 354BB283CA for ; Mon, 1 Jul 2019 11:23:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 279872866C; Mon, 1 Jul 2019 11:23:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 5ED08283CA for ; Mon, 1 Jul 2019 11:23:41 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hhuOC-0001Yo-2F; Mon, 01 Jul 2019 11:22:28 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hhuOA-0001Yi-Jf for xen-devel@lists.xenproject.org; Mon, 01 Jul 2019 11:22:26 +0000 X-Inumbo-ID: 7fb30dc8-9bf2-11e9-89b0-4f2d4cdb4058 Received: from m4a0040g.houston.softwaregrp.com (unknown [15.124.2.86]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 7fb30dc8-9bf2-11e9-89b0-4f2d4cdb4058; Mon, 01 Jul 2019 11:22:24 +0000 (UTC) Received: FROM m4a0040g.houston.softwaregrp.com (15.120.17.146) BY m4a0040g.houston.softwaregrp.com WITH ESMTP; Mon, 1 Jul 2019 11:22:05 +0000 Received: from M4W0334.microfocus.com (2002:f78:1192::f78:1192) by M4W0334.microfocus.com (2002:f78:1192::f78:1192) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1591.10; Mon, 1 Jul 2019 11:20:06 +0000 Received: from NAM04-CO1-obe.outbound.protection.outlook.com (15.124.8.11) by M4W0334.microfocus.com (15.120.17.146) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1591.10 via Frontend Transport; Mon, 1 Jul 2019 11:20:06 +0000 Received: from BY5PR18MB3394.namprd18.prod.outlook.com (10.255.139.95) by BY5PR18MB3363.namprd18.prod.outlook.com (10.255.139.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2032.18; Mon, 1 Jul 2019 11:20:04 +0000 Received: from BY5PR18MB3394.namprd18.prod.outlook.com ([fe80::2005:4b02:1d60:d1bc]) by BY5PR18MB3394.namprd18.prod.outlook.com ([fe80::2005:4b02:1d60:d1bc%3]) with mapi id 15.20.2008.020; Mon, 1 Jul 2019 11:20:03 +0000 From: Jan Beulich To: "xen-devel@lists.xenproject.org" Thread-Topic: [PATCH v9 07/23] x86emul: support AVX512F scatter insns Thread-Index: AQHVL/7t1rqMrMCzvUiPcnlIVNReCQ== Date: Mon, 1 Jul 2019 11:20:03 +0000 Message-ID: <34deb8ec-fe37-0c99-edcf-c28bae0620c6@suse.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: DB6P193CA0001.EURP193.PROD.OUTLOOK.COM (2603:10a6:6:29::11) To BY5PR18MB3394.namprd18.prod.outlook.com (2603:10b6:a03:194::31) authentication-results: spf=none (sender IP is ) smtp.mailfrom=JBeulich@suse.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [87.234.252.170] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: e2b692ea-993e-462f-edf4-08d6fe161048 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:BY5PR18MB3363; x-ms-traffictypediagnostic: BY5PR18MB3363: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:962; x-forefront-prvs: 00851CA28B x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(4636009)(396003)(346002)(39860400002)(376002)(136003)(366004)(199004)(189003)(14444005)(25786009)(2616005)(5640700003)(6486002)(256004)(66066001)(6436002)(486006)(36756003)(72206003)(11346002)(476003)(446003)(2501003)(3846002)(6116002)(66946007)(386003)(76176011)(52116002)(81156014)(8676002)(14454004)(8936002)(71190400001)(80792005)(2906002)(81166006)(305945005)(99286004)(31696002)(7736002)(102836004)(26005)(186003)(6506007)(86362001)(54906003)(478600001)(4326008)(316002)(66476007)(5660300002)(64756008)(66446008)(68736007)(73956011)(6512007)(71200400001)(31686004)(6916009)(53936002)(66556008)(2351001)(473944003)(414714003); DIR:OUT; SFP:1102; SCL:1; SRVR:BY5PR18MB3363; H:BY5PR18MB3394.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: suse.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: 5MIWf3ucRXs9X0LEcwBQkEtxc/0X9fd+MQUEyF2QIcUwpJAeBWS393XxTTu7XUgNW4xDaMYXwo86UUAcM9IhtSvn5TVv3G/CPqp6zFU5cP4YSgmdcGkED7dPyxtgN0DdMRQ8AtogDiL1WChiXJPHOZdnOGDj34h/dbOw+5++TKtHQZ+nZpRh+c5IkmpfdzXLmzzkpg1P1beHqQMQUzaLDiC1SacuglVQNUgIOkNwsWuCWCRS36t9PxqQczcVacJ++C2Wu6beabjdIA93Kmb1lo7bURuT86KJRLYmYQQna7AND8EgiONyqjLAJypOLcHBNK8zFDLgR4X+TpANmFmyvkFfye7B9qeAunu7RIzbBc8U93nYVtAqfJPjtHOyKf5103TQa7mLXHC1G5uI6EnJcpklHMhs7MssvKZ9s1mjxCY= Content-ID: <4B80AA5CBE00A348A97D27EDA3E37AA5@namprd18.prod.outlook.com> MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: e2b692ea-993e-462f-edf4-08d6fe161048 X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Jul 2019 11:20:03.7593 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 856b813c-16e5-49a5-85ec-6f081e13b527 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: JBeulich@suse.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR18MB3363 X-OriginatorOrg: suse.com Subject: [Xen-devel] [PATCH v9 07/23] x86emul: support AVX512F scatter insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Andrew Cooper , Wei Liu , RogerPau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP This completes support of AVX512F in the insn emulator. Note that in the test harness there's a little bit of trickery needed to get around the not fully consistent naming of AVX512VL gather and scatter compiler built-ins. To suppress expansion of the "di" and "si" tokens they get constructed by token concatenation in BS(), which is different from BG(). Signed-off-by: Jan Beulich Acked-by: Andrew Cooper , subject to the --- TBD: I couldn't really decide whether to duplicate code or merge scatter into gather emulation. --- v9: Suppress general register update upon failures. v7: Re-base. v6: New. --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -270,6 +270,8 @@ static const struct test avx512f_all[] = INSN(prolv, 66, 0f38, 15, vl, dq, vl), INSNX(pror, 66, 0f, 72, 0, vl, dq, vl), INSN(prorv, 66, 0f38, 14, vl, dq, vl), + INSN(pscatterd, 66, 0f38, a0, vl, dq, el), + INSN(pscatterq, 66, 0f38, a1, vl, dq, el), INSN(pshufd, 66, 0f, 70, vl, d, vl), INSN(pslld, 66, 0f, f2, el_4, d, vl), INSNX(pslld, 66, 0f, 72, 6, vl, d, vl), @@ -305,6 +307,8 @@ static const struct test avx512f_all[] = INSN(rsqrt14, 66, 0f38, 4f, el, sd, el), INSN(scalef, 66, 0f38, 2c, vl, sd, vl), INSN(scalef, 66, 0f38, 2d, el, sd, el), + INSN(scatterd, 66, 0f38, a2, vl, sd, el), + INSN(scatterq, 66, 0f38, a3, vl, sd, el), INSN_PFP(shuf, 0f, c6), INSN_FP(sqrt, 0f, 51), INSN_FP(sub, 0f, 5c), --- a/tools/tests/x86_emulator/simd-sg.c +++ b/tools/tests/x86_emulator/simd-sg.c @@ -48,10 +48,14 @@ typedef long long __attribute__((vector_ # endif # define BG_(dt, it, reg, mem, idx, msk, scl) \ __builtin_ia32_gather##it##dt(reg, mem, idx, to_mask(msk), scl) +# define BS_(dt, it, mem, idx, reg, msk, scl) \ + __builtin_ia32_scatter##it##dt(mem, to_mask(msk), idx, reg, scl) # else # define eq(x, y) (B(pcmpeqq, _mask, (vdi_t)(x), (vdi_t)(y), -1) == ALL_TRUE) # define BG_(dt, it, reg, mem, idx, msk, scl) \ __builtin_ia32_gather##it##dt(reg, mem, idx, B(ptestmq, , (vdi_t)(msk), (vdi_t)(msk), ~0), scl) +# define BS_(dt, it, mem, idx, reg, msk, scl) \ + __builtin_ia32_scatter##it##dt(mem, B(ptestmq, , (vdi_t)(msk), (vdi_t)(msk), ~0), idx, reg, scl) # endif /* * Instead of replicating the main IDX_SIZE conditional below three times, use @@ -59,6 +63,7 @@ typedef long long __attribute__((vector_ * respective relevant macro argument tokens. */ # define BG(dt, it, reg, mem, idx, msk, scl) BG_(dt, it, reg, mem, idx, msk, scl) +# define BS(dt, it, mem, idx, reg, msk, scl) BS_(dt, it##i, mem, idx, reg, msk, scl) # if VEC_MAX < 64 /* * The sub-512-bit built-ins have an extra "3" infix, presumably because the @@ -82,22 +87,30 @@ typedef long long __attribute__((vector_ # if IDX_SIZE == 4 # if INT_SIZE == 4 # define gather(reg, mem, idx, msk, scl) BG(v16si, si, reg, mem, idx, msk, scl) +# define scatter(mem, idx, reg, msk, scl) BS(v16si, s, mem, idx, reg, msk, scl) # elif INT_SIZE == 8 # define gather(reg, mem, idx, msk, scl) (vec_t)(BG(v8di, si, (vdi_t)(reg), mem, idx, msk, scl)) +# define scatter(mem, idx, reg, msk, scl) BS(v8di, s, mem, idx, (vdi_t)(reg), msk, scl) # elif FLOAT_SIZE == 4 # define gather(reg, mem, idx, msk, scl) BG(v16sf, si, reg, mem, idx, msk, scl) +# define scatter(mem, idx, reg, msk, scl) BS(v16sf, s, mem, idx, reg, msk, scl) # elif FLOAT_SIZE == 8 # define gather(reg, mem, idx, msk, scl) BG(v8df, si, reg, mem, idx, msk, scl) +# define scatter(mem, idx, reg, msk, scl) BS(v8df, s, mem, idx, reg, msk, scl) # endif # elif IDX_SIZE == 8 # if INT_SIZE == 4 # define gather(reg, mem, idx, msk, scl) BG(v16si, di, reg, mem, (idi_t)(idx), msk, scl) +# define scatter(mem, idx, reg, msk, scl) BS(v16si, d, mem, (idi_t)(idx), reg, msk, scl) # elif INT_SIZE == 8 # define gather(reg, mem, idx, msk, scl) (vec_t)(BG(v8di, di, (vdi_t)(reg), mem, (idi_t)(idx), msk, scl)) +# define scatter(mem, idx, reg, msk, scl) BS(v8di, d, mem, (idi_t)(idx), (vdi_t)(reg), msk, scl) # elif FLOAT_SIZE == 4 # define gather(reg, mem, idx, msk, scl) BG(v16sf, di, reg, mem, (idi_t)(idx), msk, scl) +# define scatter(mem, idx, reg, msk, scl) BS(v16sf, d, mem, (idi_t)(idx), reg, msk, scl) # elif FLOAT_SIZE == 8 # define gather(reg, mem, idx, msk, scl) BG(v8df, di, reg, mem, (idi_t)(idx), msk, scl) +# define scatter(mem, idx, reg, msk, scl) BS(v8df, d, mem, (idi_t)(idx), reg, msk, scl) # endif # endif #elif defined(__AVX2__) @@ -195,6 +208,8 @@ const typeof((vec_t){}[0]) array[] = { GLUE(PUT, VEC_MAX)(VEC_MAX + 1) }; +typeof((vec_t){}[0]) out[VEC_MAX * 2]; + int sg_test(void) { unsigned int i; @@ -275,5 +290,41 @@ int sg_test(void) # endif #endif +#ifdef scatter + + for ( i = 0; i < sizeof(out) / sizeof(*out); ++i ) + out[i] = 0; + + for ( i = 0; i < ITEM_COUNT; ++i ) + x[i] = i + 1; + + touch(x); + + scatter(out, (idx_t){}, x, (vec_t){ 1 } != 0, 1); + if ( out[0] != 1 ) + return __LINE__; + for ( i = 1; i < ITEM_COUNT; ++i ) + if ( out[i] ) + return __LINE__; + + scatter(out, (idx_t){}, x, full, 1); + if ( out[0] != ITEM_COUNT ) + return __LINE__; + for ( i = 1; i < ITEM_COUNT; ++i ) + if ( out[i] ) + return __LINE__; + + scatter(out, idx, x, full, ELEM_SIZE); + for ( i = 1; i <= ITEM_COUNT; ++i ) + if ( out[i] != i ) + return __LINE__; + + scatter(out, inv, x, full, ELEM_SIZE); + for ( i = 1; i <= ITEM_COUNT; ++i ) + if ( out[i] != ITEM_COUNT + 1 - i ) + return __LINE__; + +#endif + return 0; } --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -508,6 +508,7 @@ static const struct ext0f38_table { [0x9d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0x9e] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x9f] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, + [0xa0 ... 0xa3] = { .simd_size = simd_other, .vsib = 1, .d8s = d8s_dq }, [0xa6 ... 0xa8] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0xa9] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0xaa] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, @@ -9312,6 +9313,105 @@ x86_emulate( avx512_vlen_check(true); goto simd_zmm; + case X86EMUL_OPC_EVEX_66(0x0f38, 0xa0): /* vpscatterd{d,q} [xyz]mm,mem{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xa1): /* vpscatterq{d,q} [xyz]mm,mem{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xa2): /* vscatterdp{s,d} [xyz]mm,mem{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0xa3): /* vscatterqp{s,d} [xyz]mm,mem{k} */ + { + typeof(evex) *pevex; + union { + int32_t dw[16]; + int64_t qw[8]; + } index; + bool done = false; + + ASSERT(ea.type == OP_MEM); + fail_if(!ops->write); + generate_exception_if((!evex.opmsk || evex.brs || evex.z || + evex.reg != 0xf || + modrm_reg == state->sib_index), + EXC_UD); + avx512_vlen_check(false); + host_and_vcpu_must_have(avx512f); + get_fpu(X86EMUL_FPU_zmm); + + /* Read source and index registers. */ + opc = init_evex(stub); + pevex = copy_EVEX(opc, evex); + pevex->opcx = vex_0f; + opc[0] = 0x7f; /* vmovdqa{32,64} */ + /* Use (%rax) as destination and modrm_reg as source. */ + pevex->b = 1; + opc[1] = (modrm_reg & 7) << 3; + pevex->RX = 1; + opc[2] = 0xc3; + + invoke_stub("", "", "=m" (*mmvalp) : "a" (mmvalp)); + + pevex->pfx = vex_f3; /* vmovdqu{32,64} */ + pevex->w = b & 1; + /* Switch to sib_index as source. */ + pevex->r = !mode_64bit() || !(state->sib_index & 0x08); + pevex->R = !mode_64bit() || !(state->sib_index & 0x10); + opc[1] = (state->sib_index & 7) << 3; + + invoke_stub("", "", "=m" (index) : "a" (&index)); + put_stub(stub); + + /* Clear untouched parts of the mask value. */ + n = 1 << (2 + evex.lr - ((b & 1) | evex.w)); + op_bytes = 4 << evex.w; + op_mask &= (1 << n) - 1; + + for ( i = 0; op_mask; ++i ) + { + signed long idx = b & 1 ? index.qw[i] : index.dw[i]; + + if ( !(op_mask & (1 << i)) ) + continue; + + rc = ops->write(ea.mem.seg, + truncate_ea(ea.mem.off + (idx << state->sib_scale)), + (void *)mmvalp + i * op_bytes, op_bytes, ctxt); + if ( rc != X86EMUL_OKAY ) + { + /* See comment in gather emulation. */ + if ( rc != X86EMUL_EXCEPTION && done ) + rc = X86EMUL_RETRY; + break; + } + + op_mask &= ~(1 << i); + done = true; + +#ifdef __XEN__ + if ( op_mask && local_events_need_delivery() ) + { + rc = X86EMUL_RETRY; + break; + } +#endif + } + + /* Write mask register. See comment in gather emulation. */ + opc = get_stub(stub); + opc[0] = 0xc5; + opc[1] = 0xf8; + opc[2] = 0x90; + /* Use (%rax) as source. */ + opc[3] = evex.opmsk << 3; + opc[4] = 0xc3; + + invoke_stub("", "", "+m" (op_mask) : "a" (&op_mask)); + put_stub(stub); + + if ( rc != X86EMUL_OKAY ) + goto done; + + state->simd_size = simd_none; + break; + } + case X86EMUL_OPC(0x0f38, 0xc8): /* sha1nexte xmm/m128,xmm */ case X86EMUL_OPC(0x0f38, 0xc9): /* sha1msg1 xmm/m128,xmm */ case X86EMUL_OPC(0x0f38, 0xca): /* sha1msg2 xmm/m128,xmm */