From patchwork Mon Jul 1 11:18:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 11025629 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CCFED746 for ; Mon, 1 Jul 2019 11:21:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BEBCE283CA for ; Mon, 1 Jul 2019 11:21:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B2DE62866C; Mon, 1 Jul 2019 11:21:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id AC7F4283CA for ; Mon, 1 Jul 2019 11:21:57 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hhuLu-0000xR-3x; Mon, 01 Jul 2019 11:20:06 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hhuLs-0000qv-VM for xen-devel@lists.xenproject.org; Mon, 01 Jul 2019 11:20:05 +0000 X-Inumbo-ID: 2b7ca934-9bf2-11e9-8980-bc764e045a96 Received: from m4a0039g.houston.softwaregrp.com (unknown [15.124.2.85]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 2b7ca934-9bf2-11e9-8980-bc764e045a96; Mon, 01 Jul 2019 11:20:03 +0000 (UTC) Received: FROM m4a0039g.houston.softwaregrp.com (15.120.17.146) BY m4a0039g.houston.softwaregrp.com WITH ESMTP; Mon, 1 Jul 2019 11:16:28 +0000 Received: from M4W0334.microfocus.com (2002:f78:1192::f78:1192) by M4W0334.microfocus.com (2002:f78:1192::f78:1192) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1591.10; Mon, 1 Jul 2019 11:18:55 +0000 Received: from NAM04-CO1-obe.outbound.protection.outlook.com (15.124.8.14) by M4W0334.microfocus.com (15.120.17.146) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1591.10 via Frontend Transport; Mon, 1 Jul 2019 11:18:55 +0000 Received: from BY5PR18MB3394.namprd18.prod.outlook.com (10.255.139.95) by BY5PR18MB3363.namprd18.prod.outlook.com (10.255.139.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2032.18; Mon, 1 Jul 2019 11:18:53 +0000 Received: from BY5PR18MB3394.namprd18.prod.outlook.com ([fe80::2005:4b02:1d60:d1bc]) by BY5PR18MB3394.namprd18.prod.outlook.com ([fe80::2005:4b02:1d60:d1bc%3]) with mapi id 15.20.2008.020; Mon, 1 Jul 2019 11:18:53 +0000 From: Jan Beulich To: "xen-devel@lists.xenproject.org" Thread-Topic: [PATCH v9 05/23] x86emul: support AVX512F gather insns Thread-Index: AQHVL/7DlmNHyySWV0mdXoNuP5HddA== Date: Mon, 1 Jul 2019 11:18:53 +0000 Message-ID: <95252da8-777b-9527-6f5b-1e1a5994f845@suse.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: DB6PR01CA0043.eurprd01.prod.exchangelabs.com (2603:10a6:6:46::20) To BY5PR18MB3394.namprd18.prod.outlook.com (2603:10b6:a03:194::31) authentication-results: spf=none (sender IP is ) smtp.mailfrom=JBeulich@suse.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [87.234.252.170] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: c3f4a6dc-83de-4f88-3bc0-08d6fe15e657 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:BY5PR18MB3363; x-ms-traffictypediagnostic: BY5PR18MB3363: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:4125; x-forefront-prvs: 00851CA28B x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(4636009)(396003)(346002)(39860400002)(376002)(136003)(366004)(199004)(189003)(14444005)(25786009)(2616005)(5640700003)(6486002)(256004)(66066001)(6436002)(486006)(36756003)(72206003)(11346002)(476003)(446003)(2501003)(3846002)(6116002)(66946007)(386003)(76176011)(52116002)(81156014)(8676002)(14454004)(8936002)(71190400001)(80792005)(2906002)(81166006)(305945005)(99286004)(31696002)(7736002)(102836004)(26005)(186003)(6506007)(86362001)(54906003)(478600001)(4326008)(316002)(30864003)(66476007)(5660300002)(64756008)(66446008)(68736007)(73956011)(6512007)(71200400001)(31686004)(6916009)(53946003)(53936002)(66556008)(2351001); DIR:OUT; SFP:1102; SCL:1; SRVR:BY5PR18MB3363; H:BY5PR18MB3394.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: suse.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: L6ymetEpGWXSeI9wfrF/u3sXwKWwhp4F8NHu1imKYSfPLze1T6ysVcgmY0+Lw0LamJYbKUA/fZ7KfYOSAKLsKjfRiHW8SvlLOfCoNLj9rHgiV4opQxn6eGUU6CkpXZY5vCrw+hqAgoE342nZv+BZrytkXxTYoypHvjyvfI1WqoexpQMfaGKp2qbKtN8ay70YCJrxR6kzplNkuH8wn2cEzIV9InIvR81QFSNJUq2wSR1O2e0OwoyFLYcv8Mw96m874iwzHwVY39Ks2CZe7fXc85DSIPypCNN0awEvP6swccvjJa9g2RUTVXURDu25xixFVsIe5NX3/awtH7AspbA+DzwaKcy/9bynCGUrcwh5UN9yHnZJnIzHnsXG38Ji1BRGdyaHTigvOXDKp+X+8j+Fej4de2mK9X6GPSmLFEAwefw= Content-ID: MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: c3f4a6dc-83de-4f88-3bc0-08d6fe15e657 X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Jul 2019 11:18:53.3847 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 856b813c-16e5-49a5-85ec-6f081e13b527 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: JBeulich@suse.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR18MB3363 X-OriginatorOrg: suse.com Subject: [Xen-devel] [PATCH v9 05/23] x86emul: support AVX512F gather insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Andrew Cooper , Wei Liu , RogerPau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v9: Suppress general register update upon failures. Split out ModR/M handling changes as well as independent test harness ones into prereq patches. Re-base. v8: Re-base. v7: Fix ByteOp register decode. Re-base. v6: New. --- a/tools/tests/x86_emulator/Makefile +++ b/tools/tests/x86_emulator/Makefile @@ -18,7 +18,7 @@ CFLAGS += $(CFLAGS_xeninclude) SIMD := 3dnow sse sse2 sse4 avx avx2 xop avx512f avx512bw avx512dq avx512er FMA := fma4 fma -SG := avx2-sg +SG := avx2-sg avx512f-sg avx512vl-sg TESTCASES := blowfish $(SIMD) $(FMA) $(SG) OPMASK := avx512f avx512dq avx512bw @@ -66,6 +66,14 @@ xop-flts := $(avx-flts) avx512f-vecs := 64 16 32 avx512f-ints := 4 8 avx512f-flts := 4 8 +avx512f-sg-vecs := 64 +avx512f-sg-idxs := 4 8 +avx512f-sg-ints := $(avx512f-ints) +avx512f-sg-flts := $(avx512f-flts) +avx512vl-sg-vecs := 16 32 +avx512vl-sg-idxs := $(avx512f-sg-idxs) +avx512vl-sg-ints := $(avx512f-ints) +avx512vl-sg-flts := $(avx512f-flts) avx512bw-vecs := $(avx512f-vecs) avx512bw-ints := 1 2 avx512bw-flts := --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -176,6 +176,8 @@ static const struct test avx512f_all[] = INSN(fnmsub213, 66, 0f38, af, el, sd, el), INSN(fnmsub231, 66, 0f38, be, vl, sd, vl), INSN(fnmsub231, 66, 0f38, bf, el, sd, el), + INSN(gatherd, 66, 0f38, 92, vl, sd, el), + INSN(gatherq, 66, 0f38, 93, vl, sd, el), INSN(getexp, 66, 0f38, 42, vl, sd, vl), INSN(getexp, 66, 0f38, 43, el, sd, el), INSN(getmant, 66, 0f3a, 26, vl, sd, vl), @@ -229,6 +231,8 @@ static const struct test avx512f_all[] = INSN(permt2, 66, 0f38, 7e, vl, dq, vl), INSN(permt2, 66, 0f38, 7f, vl, sd, vl), INSN(pexpand, 66, 0f38, 89, vl, dq, el), + INSN(pgatherd, 66, 0f38, 90, vl, dq, el), + INSN(pgatherq, 66, 0f38, 91, vl, dq, el), INSN(pmaxs, 66, 0f38, 3d, vl, dq, vl), INSN(pmaxu, 66, 0f38, 3f, vl, dq, vl), INSN(pmins, 66, 0f38, 39, vl, dq, vl), --- a/tools/tests/x86_emulator/simd-sg.c +++ b/tools/tests/x86_emulator/simd-sg.c @@ -35,13 +35,78 @@ typedef long long __attribute__((vector_ #define ITEM_COUNT (VEC_SIZE / ELEM_SIZE < IVEC_SIZE / IDX_SIZE ? \ VEC_SIZE / ELEM_SIZE : IVEC_SIZE / IDX_SIZE) -#if VEC_SIZE == 16 -# define to_bool(cmp) __builtin_ia32_ptestc128(cmp, (vec_t){} == 0) -#else -# define to_bool(cmp) __builtin_ia32_ptestc256(cmp, (vec_t){} == 0) -#endif +#if defined(__AVX512F__) +# define ALL_TRUE (~0ULL >> (64 - ELEM_COUNT)) +# if ELEM_SIZE == 4 +# if IDX_SIZE == 4 || defined(__AVX512VL__) +# define to_mask(msk) B(ptestmd, , (vsi_t)(msk), (vsi_t)(msk), ~0) +# define eq(x, y) (B(pcmpeqd, _mask, (vsi_t)(x), (vsi_t)(y), -1) == ALL_TRUE) +# else +# define widen(x) __builtin_ia32_pmovzxdq512_mask((vsi_t)(x), (idi_t){}, ~0) +# define to_mask(msk) __builtin_ia32_ptestmq512(widen(msk), widen(msk), ~0) +# define eq(x, y) (__builtin_ia32_pcmpeqq512_mask(widen(x), widen(y), ~0) == ALL_TRUE) +# endif +# define BG_(dt, it, reg, mem, idx, msk, scl) \ + __builtin_ia32_gather##it##dt(reg, mem, idx, to_mask(msk), scl) +# else +# define eq(x, y) (B(pcmpeqq, _mask, (vdi_t)(x), (vdi_t)(y), -1) == ALL_TRUE) +# define BG_(dt, it, reg, mem, idx, msk, scl) \ + __builtin_ia32_gather##it##dt(reg, mem, idx, B(ptestmq, , (vdi_t)(msk), (vdi_t)(msk), ~0), scl) +# endif +/* + * Instead of replicating the main IDX_SIZE conditional below three times, use + * a double layer of macro invocations, allowing for substitution of the + * respective relevant macro argument tokens. + */ +# define BG(dt, it, reg, mem, idx, msk, scl) BG_(dt, it, reg, mem, idx, msk, scl) +# if VEC_MAX < 64 +/* + * The sub-512-bit built-ins have an extra "3" infix, presumably because the + * 512-bit names were chosen without the AVX512VL extension in mind (and hence + * making the latter collide with the AVX2 ones). + */ +# define si 3si +# define di 3di +# endif +# if VEC_MAX == 16 +# define v8df v2df +# define v8di v2di +# define v16sf v4sf +# define v16si v4si +# elif VEC_MAX == 32 +# define v8df v4df +# define v8di v4di +# define v16sf v8sf +# define v16si v8si +# endif +# if IDX_SIZE == 4 +# if INT_SIZE == 4 +# define gather(reg, mem, idx, msk, scl) BG(v16si, si, reg, mem, idx, msk, scl) +# elif INT_SIZE == 8 +# define gather(reg, mem, idx, msk, scl) (vec_t)(BG(v8di, si, (vdi_t)(reg), mem, idx, msk, scl)) +# elif FLOAT_SIZE == 4 +# define gather(reg, mem, idx, msk, scl) BG(v16sf, si, reg, mem, idx, msk, scl) +# elif FLOAT_SIZE == 8 +# define gather(reg, mem, idx, msk, scl) BG(v8df, si, reg, mem, idx, msk, scl) +# endif +# elif IDX_SIZE == 8 +# if INT_SIZE == 4 +# define gather(reg, mem, idx, msk, scl) BG(v16si, di, reg, mem, (idi_t)(idx), msk, scl) +# elif INT_SIZE == 8 +# define gather(reg, mem, idx, msk, scl) (vec_t)(BG(v8di, di, (vdi_t)(reg), mem, (idi_t)(idx), msk, scl)) +# elif FLOAT_SIZE == 4 +# define gather(reg, mem, idx, msk, scl) BG(v16sf, di, reg, mem, (idi_t)(idx), msk, scl) +# elif FLOAT_SIZE == 8 +# define gather(reg, mem, idx, msk, scl) BG(v8df, di, reg, mem, (idi_t)(idx), msk, scl) +# endif +# endif +#elif defined(__AVX2__) +# if VEC_SIZE == 16 +# define to_bool(cmp) __builtin_ia32_ptestc128(cmp, (vec_t){} == 0) +# else +# define to_bool(cmp) __builtin_ia32_ptestc256(cmp, (vec_t){} == 0) +# endif -#if defined(__AVX2__) # if VEC_MAX == 16 # if IDX_SIZE == 4 # if INT_SIZE == 4 @@ -111,6 +176,10 @@ typedef long long __attribute__((vector_ # endif #endif +#ifndef eq +# define eq(x, y) to_bool((x) == (y)) +#endif + #define GLUE_(x, y) x ## y #define GLUE(x, y) GLUE_(x, y) @@ -119,6 +188,7 @@ typedef long long __attribute__((vector_ #define PUT8(n) PUT4(n), PUT4((n) + 4) #define PUT16(n) PUT8(n), PUT8((n) + 8) #define PUT32(n) PUT16(n), PUT16((n) + 16) +#define PUT64(n) PUT32(n), PUT32((n) + 32) const typeof((vec_t){}[0]) array[] = { GLUE(PUT, VEC_MAX)(1), @@ -174,7 +244,7 @@ int sg_test(void) y = gather(full, array + ITEM_COUNT, -idx, full, ELEM_SIZE); #if ITEM_COUNT == ELEM_COUNT - if ( !to_bool(y == x - 1) ) + if ( !eq(y, x - 1) ) return __LINE__; #else for ( i = 0; i < ITEM_COUNT; ++i ) --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -22,6 +22,8 @@ asm ( ".pushsection .test, \"ax\", @prog #include "avx512dq-opmask.h" #include "avx512bw-opmask.h" #include "avx512f.h" +#include "avx512f-sg.h" +#include "avx512vl-sg.h" #include "avx512bw.h" #include "avx512dq.h" #include "avx512er.h" @@ -90,11 +92,13 @@ static bool simd_check_avx512f(void) return cpu_has_avx512f; } #define simd_check_avx512f_opmask simd_check_avx512f +#define simd_check_avx512f_sg simd_check_avx512f static bool simd_check_avx512f_vl(void) { return cpu_has_avx512f && cpu_has_avx512vl; } +#define simd_check_avx512vl_sg simd_check_avx512f_vl static bool simd_check_avx512dq(void) { @@ -291,6 +295,14 @@ static const struct { SIMD(AVX512F u32x16, avx512f, 64u4), SIMD(AVX512F s64x8, avx512f, 64i8), SIMD(AVX512F u64x8, avx512f, 64u8), + SIMD(AVX512F S/G f32[16x32], avx512f_sg, 64x4f4), + SIMD(AVX512F S/G f64[ 8x32], avx512f_sg, 64x4f8), + SIMD(AVX512F S/G f32[ 8x64], avx512f_sg, 64x8f4), + SIMD(AVX512F S/G f64[ 8x64], avx512f_sg, 64x8f8), + SIMD(AVX512F S/G i32[16x32], avx512f_sg, 64x4i4), + SIMD(AVX512F S/G i64[ 8x32], avx512f_sg, 64x4i8), + SIMD(AVX512F S/G i32[ 8x64], avx512f_sg, 64x8i4), + SIMD(AVX512F S/G i64[ 8x64], avx512f_sg, 64x8i8), AVX512VL(VL f32x4, avx512f, 16f4), AVX512VL(VL f64x2, avx512f, 16f8), AVX512VL(VL f32x8, avx512f, 32f4), @@ -303,6 +315,22 @@ static const struct { AVX512VL(VL u64x2, avx512f, 16u8), AVX512VL(VL s64x4, avx512f, 32i8), AVX512VL(VL u64x4, avx512f, 32u8), + SIMD(AVX512VL S/G f32[4x32], avx512vl_sg, 16x4f4), + SIMD(AVX512VL S/G f64[2x32], avx512vl_sg, 16x4f8), + SIMD(AVX512VL S/G f32[2x64], avx512vl_sg, 16x8f4), + SIMD(AVX512VL S/G f64[2x64], avx512vl_sg, 16x8f8), + SIMD(AVX512VL S/G f32[8x32], avx512vl_sg, 32x4f4), + SIMD(AVX512VL S/G f64[4x32], avx512vl_sg, 32x4f8), + SIMD(AVX512VL S/G f32[4x64], avx512vl_sg, 32x8f4), + SIMD(AVX512VL S/G f64[4x64], avx512vl_sg, 32x8f8), + SIMD(AVX512VL S/G i32[4x32], avx512vl_sg, 16x4i4), + SIMD(AVX512VL S/G i64[2x32], avx512vl_sg, 16x4i8), + SIMD(AVX512VL S/G i32[2x64], avx512vl_sg, 16x8i4), + SIMD(AVX512VL S/G i64[2x64], avx512vl_sg, 16x8i8), + SIMD(AVX512VL S/G i32[8x32], avx512vl_sg, 32x4i4), + SIMD(AVX512VL S/G i64[4x32], avx512vl_sg, 32x4i8), + SIMD(AVX512VL S/G i32[4x64], avx512vl_sg, 32x8i4), + SIMD(AVX512VL S/G i64[4x64], avx512vl_sg, 32x8i8), SIMD(AVX512BW s8x64, avx512bw, 64i1), SIMD(AVX512BW u8x64, avx512bw, 64u1), SIMD(AVX512BW s16x32, avx512bw, 64i2), --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -499,7 +499,7 @@ static const struct ext0f38_table { [0x8c] = { .simd_size = simd_packed_int }, [0x8d] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x8e] = { .simd_size = simd_packed_int, .to_mem = 1 }, - [0x90 ... 0x93] = { .simd_size = simd_other, .vsib = 1 }, + [0x90 ... 0x93] = { .simd_size = simd_other, .vsib = 1, .d8s = d8s_dq }, [0x96 ... 0x98] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x99] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0x9a] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, @@ -9100,6 +9100,133 @@ x86_emulate( put_stub(stub); if ( rc != X86EMUL_OKAY ) + goto done; + + state->simd_size = simd_none; + break; + } + + case X86EMUL_OPC_EVEX_66(0x0f38, 0x90): /* vpgatherd{d,q} mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x91): /* vpgatherq{d,q} mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x92): /* vgatherdp{s,d} mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x93): /* vgatherqp{s,d} mem,[xyz]mm{k} */ + { + typeof(evex) *pevex; + union { + int32_t dw[16]; + int64_t qw[8]; + } index; + bool done = false; + + ASSERT(ea.type == OP_MEM); + generate_exception_if((!evex.opmsk || evex.brs || evex.z || + evex.reg != 0xf || + modrm_reg == state->sib_index), + EXC_UD); + avx512_vlen_check(false); + host_and_vcpu_must_have(avx512f); + get_fpu(X86EMUL_FPU_zmm); + + /* Read destination and index registers. */ + opc = init_evex(stub); + pevex = copy_EVEX(opc, evex); + pevex->opcx = vex_0f; + opc[0] = 0x7f; /* vmovdqa{32,64} */ + /* + * The register writeback below has to retain masked-off elements, but + * needs to clear upper portions in the index-wider-than-data cases. + * Therefore read (and write below) the full register. The alternative + * would have been to fiddle with the mask register used. + */ + pevex->opmsk = 0; + /* Use (%rax) as destination and modrm_reg as source. */ + pevex->b = 1; + opc[1] = (modrm_reg & 7) << 3; + pevex->RX = 1; + opc[2] = 0xc3; + + invoke_stub("", "", "=m" (*mmvalp) : "a" (mmvalp)); + + pevex->pfx = vex_f3; /* vmovdqu{32,64} */ + pevex->w = b & 1; + /* Switch to sib_index as source. */ + pevex->r = !mode_64bit() || !(state->sib_index & 0x08); + pevex->R = !mode_64bit() || !(state->sib_index & 0x10); + opc[1] = (state->sib_index & 7) << 3; + + invoke_stub("", "", "=m" (index) : "a" (&index)); + put_stub(stub); + + /* Clear untouched parts of the destination and mask values. */ + n = 1 << (2 + evex.lr - ((b & 1) | evex.w)); + op_bytes = 4 << evex.w; + memset((void *)mmvalp + n * op_bytes, 0, 64 - n * op_bytes); + op_mask &= (1 << n) - 1; + + for ( i = 0; op_mask; ++i ) + { + signed long idx = b & 1 ? index.qw[i] : index.dw[i]; + + if ( !(op_mask & (1 << i)) ) + continue; + + rc = ops->read(ea.mem.seg, + truncate_ea(ea.mem.off + (idx << state->sib_scale)), + (void *)mmvalp + i * op_bytes, op_bytes, ctxt); + if ( rc != X86EMUL_OKAY ) + { + /* + * If we've made some progress and the access did not fault, + * force a retry instead. This is for example necessary to + * cope with the limited capacity of HVM's MMIO cache. + */ + if ( rc != X86EMUL_EXCEPTION && done ) + rc = X86EMUL_RETRY; + break; + } + + op_mask &= ~(1 << i); + done = true; + +#ifdef __XEN__ + if ( op_mask && local_events_need_delivery() ) + { + rc = X86EMUL_RETRY; + break; + } +#endif + } + + /* Write destination and mask registers. */ + opc = init_evex(stub); + pevex = copy_EVEX(opc, evex); + pevex->opcx = vex_0f; + opc[0] = 0x6f; /* vmovdqa{32,64} */ + pevex->opmsk = 0; + /* Use modrm_reg as destination and (%rax) as source. */ + pevex->b = 1; + opc[1] = (modrm_reg & 7) << 3; + pevex->RX = 1; + opc[2] = 0xc3; + + invoke_stub("", "", "+m" (*mmvalp) : "a" (mmvalp)); + + /* + * kmovw: This is VEX-encoded, so we can't use pevex. Avoid copy_VEX() etc + * as well, since we can easily use the 2-byte VEX form here. + */ + opc -= EVEX_PFX_BYTES; + opc[0] = 0xc5; + opc[1] = 0xf8; + opc[2] = 0x90; + /* Use (%rax) as source. */ + opc[3] = evex.opmsk << 3; + opc[4] = 0xc3; + + invoke_stub("", "", "+m" (op_mask) : "a" (&op_mask)); + put_stub(stub); + + if ( rc != X86EMUL_OKAY ) goto done; state->simd_size = simd_none;