Message ID | 745c685a-a614-6067-946d-c89fe98cb589@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86emul: remaining AVX512 support | expand |
On 01/07/2019 12:22, Jan Beulich wrote: > --- a/xen/tools/gen-cpuid.py > +++ b/xen/tools/gen-cpuid.py > @@ -268,7 +268,7 @@ def crunch_numbers(state): > # AVX512 extensions acting on vectors of bytes/words are made > # dependents of AVX512BW (as to requiring wider than 16-bit mask > # registers), despite the SDM not formally making this connection. > - AVX512BW: [AVX512_BF16, AVX512_VBMI, AVX512_VBMI2], > + AVX512BW: [AVX512_BF16, AVX512_BITALG, AVX512_VBMI, AVX512_VBMI2], BITALG should be after VBMI2, because everything in this table is ordered by bit number. With this fixed, Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
On 04.07.2019 16:47, Andrew Cooper wrote: > On 01/07/2019 12:22, Jan Beulich wrote: >> --- a/xen/tools/gen-cpuid.py >> +++ b/xen/tools/gen-cpuid.py >> @@ -268,7 +268,7 @@ def crunch_numbers(state): >> # AVX512 extensions acting on vectors of bytes/words are made >> # dependents of AVX512BW (as to requiring wider than 16-bit mask >> # registers), despite the SDM not formally making this connection. >> - AVX512BW: [AVX512_BF16, AVX512_VBMI, AVX512_VBMI2], >> + AVX512BW: [AVX512_BF16, AVX512_BITALG, AVX512_VBMI, AVX512_VBMI2], > > BITALG should be after VBMI2, because everything in this table is > ordered by bit number. As said before - there's no ordering by bit number possible here. The individual features may live on different (sub)leaves. By what you say BF16 shouldn't be first. The list here clearly is sorted alphabetically, and imo that's the only future proof sorting possible (and also for AVX512F, where I had previously offered to put together a patch to switch to alphabetical ordering, if only we could agree on that). > With this fixed, Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> As per above I'm not going to apply this without hearing back from you. Jan
On 04/07/2019 15:54, Jan Beulich wrote: > On 04.07.2019 16:47, Andrew Cooper wrote: >> On 01/07/2019 12:22, Jan Beulich wrote: >>> --- a/xen/tools/gen-cpuid.py >>> +++ b/xen/tools/gen-cpuid.py >>> @@ -268,7 +268,7 @@ def crunch_numbers(state): >>> # AVX512 extensions acting on vectors of bytes/words are made >>> # dependents of AVX512BW (as to requiring wider than 16-bit mask >>> # registers), despite the SDM not formally making this connection. >>> - AVX512BW: [AVX512_BF16, AVX512_VBMI, AVX512_VBMI2], >>> + AVX512BW: [AVX512_BF16, AVX512_BITALG, AVX512_VBMI, AVX512_VBMI2], >> BITALG should be after VBMI2, because everything in this table is >> ordered by bit number. > As said before - there's no ordering by bit number possible here. Its perfectly easy. Each feature has a unique number. > The individual features may live on different (sub)leaves. By > what you say BF16 shouldn't be first. The list here clearly is > sorted alphabetically, and imo that's the only future proof sorting > possible (and also for AVX512F, where I had previously offered to > put together a patch to switch to alphabetical ordering, if only we > could agree on that). In which case I missed it during review. This feature matrix is deliberately sorted by feature number in an effort to preserve chronology, which is a much more useful way of reasoning about feature dependencies. ~Andrew
On 04.07.2019 20:38, Andrew Cooper wrote: > On 04/07/2019 15:54, Jan Beulich wrote: >> On 04.07.2019 16:47, Andrew Cooper wrote: >>> On 01/07/2019 12:22, Jan Beulich wrote: >>>> --- a/xen/tools/gen-cpuid.py >>>> +++ b/xen/tools/gen-cpuid.py >>>> @@ -268,7 +268,7 @@ def crunch_numbers(state): >>>> # AVX512 extensions acting on vectors of bytes/words are made >>>> # dependents of AVX512BW (as to requiring wider than 16-bit mask >>>> # registers), despite the SDM not formally making this connection. >>>> - AVX512BW: [AVX512_BF16, AVX512_VBMI, AVX512_VBMI2], >>>> + AVX512BW: [AVX512_BF16, AVX512_BITALG, AVX512_VBMI, AVX512_VBMI2], >>> BITALG should be after VBMI2, because everything in this table is >>> ordered by bit number. >> As said before - there's no ordering by bit number possible here. > > Its perfectly easy. Each feature has a unique number. Well, okay, for sub-leaves of the same main leaf I can accept this. But what sorting do you suggest between basic and extended leaves? >> The individual features may live on different (sub)leaves. By >> what you say BF16 shouldn't be first. The list here clearly is >> sorted alphabetically, and imo that's the only future proof sorting >> possible (and also for AVX512F, where I had previously offered to >> put together a patch to switch to alphabetical ordering, if only we >> could agree on that). > > In which case I missed it during review. > > This feature matrix is deliberately sorted by feature number in an > effort to preserve chronology, which is a much more useful way of > reasoning about feature dependencies. Except that bit numbers are often, but not always an indication of chronological order. While I clearly disagree, for there to be progress here, do you expect me to re-arrange the dependency list above, i.e. going beyond your initial request? I certainly object to doing _just_ the originally requested adjustment ... Jan
--- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -538,6 +538,11 @@ static const struct test avx512pf_512[] INSNX(scatterpf1q, 66, 0f38, c7, 6, vl, sd, el), }; +static const struct test avx512_bitalg_all[] = { + INSN(popcnt, 66, 0f38, 54, vl, bw, vl), + INSN(pshufbitqmb, 66, 0f38, 8f, vl, b, vl), +}; + static const struct test avx512_vbmi_all[] = { INSN(permb, 66, 0f38, 8d, vl, b, vl), INSN(permi2b, 66, 0f38, 75, vl, b, vl), @@ -550,6 +555,10 @@ static const struct test avx512_vbmi2_al INSN(pexpand, 66, 0f38, 62, vl, bw, el), }; +static const struct test avx512_vpopcntdq_all[] = { + INSN(popcnt, 66, 0f38, 55, vl, dq, vl) +}; + static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 }; static const unsigned char vl_128[] = { VL_128 }; static const unsigned char vl_no128[] = { VL_512, VL_256 }; @@ -919,6 +928,8 @@ void evex_disp8_test(void *instr, struct RUN(avx512er, 512); #define cpu_has_avx512pf cpu_has_avx512f RUN(avx512pf, 512); + RUN(avx512_bitalg, all); RUN(avx512_vbmi, all); RUN(avx512_vbmi2, all); + RUN(avx512_vpopcntdq, all); } --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -143,6 +143,8 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512vl (cp.feat.avx512vl && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi2 (cp.feat.avx512_vbmi2 && xcr0_mask(0xe6)) +#define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6)) +#define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6)) #define cpu_has_xgetbv1 (cpu_has_xsave && cp.xstate.xgetbv1) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -479,6 +479,7 @@ static const struct ext0f38_table { [0x4d] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, [0x4e] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0x4f] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, + [0x54 ... 0x55] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl }, [0x58] = { .simd_size = simd_other, .two_op = 1, .d8s = 2 }, [0x59] = { .simd_size = simd_other, .two_op = 1, .d8s = 3 }, [0x5a] = { .simd_size = simd_128, .two_op = 1, .d8s = 4 }, @@ -501,6 +502,7 @@ static const struct ext0f38_table { [0x8c] = { .simd_size = simd_packed_int }, [0x8d] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x8e] = { .simd_size = simd_packed_int, .to_mem = 1 }, + [0x8f] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x90 ... 0x93] = { .simd_size = simd_other, .vsib = 1, .d8s = d8s_dq }, [0x96 ... 0x98] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x99] = { .simd_size = simd_scalar_vexw, .d8s = d8s_dq }, @@ -1883,6 +1885,8 @@ in_protmode( #define vcpu_has_avx512vl() (ctxt->cpuid->feat.avx512vl) #define vcpu_has_avx512_vbmi() (ctxt->cpuid->feat.avx512_vbmi) #define vcpu_has_avx512_vbmi2() (ctxt->cpuid->feat.avx512_vbmi2) +#define vcpu_has_avx512_bitalg() (ctxt->cpuid->feat.avx512_bitalg) +#define vcpu_has_avx512_vpopcntdq() (ctxt->cpuid->feat.avx512_vpopcntdq) #define vcpu_has_rdpid() (ctxt->cpuid->feat.rdpid) #define vcpu_must_have(feat) \ @@ -8899,6 +8903,19 @@ x86_emulate( generate_exception_if(vex.l, EXC_UD); goto simd_0f_avx; + case X86EMUL_OPC_EVEX_66(0x0f38, 0x8f): /* vpshufbitqmb [xyz]mm/mem,[xyz]mm,k{k} */ + generate_exception_if(evex.w || !evex.r || !evex.R || evex.z, EXC_UD); + /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x54): /* vpopcnt{b,w} [xyz]mm/mem,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512_bitalg); + generate_exception_if(evex.brs, EXC_UD); + elem_bytes = 1 << evex.w; + goto avx512f_no_sae; + + case X86EMUL_OPC_EVEX_66(0x0f38, 0x55): /* vpopcnt{d,q} [xyz]mm/mem,[xyz]mm{k} */ + host_and_vcpu_must_have(avx512_vpopcntdq); + goto avx512f_no_sae; + case X86EMUL_OPC_VEX_66(0x0f38, 0x58): /* vpbroadcastd xmm/m32,{x,y}mm */ case X86EMUL_OPC_VEX_66(0x0f38, 0x59): /* vpbroadcastq xmm/m64,{x,y}mm */ case X86EMUL_OPC_VEX_66(0x0f38, 0x78): /* vpbroadcastb xmm/m8,{x,y}mm */ --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -110,6 +110,8 @@ /* CPUID level 0x00000007:0.ecx */ #define cpu_has_avx512_vbmi boot_cpu_has(X86_FEATURE_AVX512_VBMI) #define cpu_has_avx512_vbmi2 boot_cpu_has(X86_FEATURE_AVX512_VBMI2) +#define cpu_has_avx512_bitalg boot_cpu_has(X86_FEATURE_AVX512_BITALG) +#define cpu_has_avx512_vpopcntdq boot_cpu_has(X86_FEATURE_AVX512_VPOPCNTDQ) #define cpu_has_rdpid boot_cpu_has(X86_FEATURE_RDPID) /* CPUID level 0x80000007.edx */ --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -229,6 +229,7 @@ XEN_CPUFEATURE(UMIP, 6*32+ 2) / XEN_CPUFEATURE(PKU, 6*32+ 3) /*H Protection Keys for Userspace */ XEN_CPUFEATURE(OSPKE, 6*32+ 4) /*! OS Protection Keys Enable */ XEN_CPUFEATURE(AVX512_VBMI2, 6*32+ 6) /*A Additional AVX-512 Vector Byte Manipulation Instrs */ +XEN_CPUFEATURE(AVX512_BITALG, 6*32+12) /*A Support for VPOPCNT[B,W] and VPSHUFBITQMB */ XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A POPCNT for vectors of DW/QW */ XEN_CPUFEATURE(RDPID, 6*32+22) /*A RDPID instruction */ --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -268,7 +268,7 @@ def crunch_numbers(state): # AVX512 extensions acting on vectors of bytes/words are made # dependents of AVX512BW (as to requiring wider than 16-bit mask # registers), despite the SDM not formally making this connection. - AVX512BW: [AVX512_BF16, AVX512_VBMI, AVX512_VBMI2], + AVX512BW: [AVX512_BF16, AVX512_BITALG, AVX512_VBMI, AVX512_VBMI2], # The features: # * Single Thread Indirect Branch Predictors
Plus the only other AVX512_BITALG one. As in a few cases before, since the insns here and in particular their memory access patterns follow the usual scheme, I didn't think it was necessary to add a contrived test specifically for them, beyond the Disp8 scaling one. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v9: Re-base. v7: Re-base. v6: New.