Message ID | 1454679743-18133-13-git-send-email-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote: > @@ -20,12 +21,34 @@ uint32_t __read_mostly hvm_featureset[FSCAPINTS]; > > static void sanitise_featureset(uint32_t *fs) > { > + uint32_t disabled_features[FSCAPINTS]; > unsigned int i; > > for ( i = 0; i < FSCAPINTS; ++i ) > { > /* Clamp to known mask. */ > fs[i] &= known_features[i]; > + > + /* > + * Identify which features with deep dependencies have been > + * disabled. > + */ > + disabled_features[i] = ~fs[i] & deep_features[i]; > + } > + > + for_each_set_bit(i, (void *)disabled_features, > + sizeof(disabled_features) * 8) > + { > + const uint32_t *dfs = lookup_deep_deps(i); > + unsigned int j; > + > + ASSERT(dfs); /* deep_features[] should guarentee this. */ > + > + for ( j = 0; j < FSCAPINTS; ++j ) > + { > + fs[j] &= ~dfs[j]; > + disabled_features[j] &= ~dfs[j]; > + } > } Am I getting the logic in the Python script right that it is indeed unnecessary for this loop to be restarted even when a feature at a higher numbered bit position results in a lower numbered bit getting cleared? > @@ -153,6 +176,36 @@ void calculate_featuresets(void) > calculate_hvm_featureset(); > } > > +const uint32_t *lookup_deep_deps(uint32_t feature) Do you really mean this rather internal function to be non-static? Even if there was a later use in this series, it would seem suspicious to me; in fact I'd have expected for it and ... > +{ > + static const struct { > + uint32_t feature; > + uint32_t fs[FSCAPINTS]; > + } deep_deps[] = INIT_DEEP_DEPS; ... this data to again be placed in .init.* sections. > --- a/xen/tools/gen-cpuid.py > +++ b/xen/tools/gen-cpuid.py > @@ -138,6 +138,61 @@ def crunch_numbers(state): > state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, nr_entries) > state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries) > > + deps = { > + XSAVE: > + (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX), > + > + AVX: > + (FMA, FMA4, F16C, AVX2, XOP), I continue to question whether namely XOP, but perhaps also the others here except maybe AVX2, really is depending on AVX, and not just on XSAVE. > + PAE: > + (LM, ), > + > + LM: > + (CX16, LAHF_LM, PAGE1GB), > + > + XMM: > + (LM, ), > + > + XMM2: > + (LM, ), > + > + XMM3: > + (LM, ), I don't think so - SSE2 is a commonly implied prereq for long mode, but not SSE3. Instead what I'm missing is a dependency of SSEn, AES, SHA and maybe more on SSE (or maybe directly FXSR as per above), and of SSE on FXSR. And there may be more, like MMX really ought to be dependent on FPU or FXSR (not currently expressable as it seems). Jan
On 15/02/16 14:06, Jan Beulich wrote: >>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote: >> @@ -20,12 +21,34 @@ uint32_t __read_mostly hvm_featureset[FSCAPINTS]; >> >> static void sanitise_featureset(uint32_t *fs) >> { >> + uint32_t disabled_features[FSCAPINTS]; >> unsigned int i; >> >> for ( i = 0; i < FSCAPINTS; ++i ) >> { >> /* Clamp to known mask. */ >> fs[i] &= known_features[i]; >> + >> + /* >> + * Identify which features with deep dependencies have been >> + * disabled. >> + */ >> + disabled_features[i] = ~fs[i] & deep_features[i]; >> + } >> + >> + for_each_set_bit(i, (void *)disabled_features, >> + sizeof(disabled_features) * 8) >> + { >> + const uint32_t *dfs = lookup_deep_deps(i); >> + unsigned int j; >> + >> + ASSERT(dfs); /* deep_features[] should guarentee this. */ >> + >> + for ( j = 0; j < FSCAPINTS; ++j ) >> + { >> + fs[j] &= ~dfs[j]; >> + disabled_features[j] &= ~dfs[j]; >> + } >> } > Am I getting the logic in the Python script right that it is indeed > unnecessary for this loop to be restarted even when a feature > at a higher numbered bit position results in a lower numbered > bit getting cleared? Correct - the python logic flattens the dependency tree so an individual lookup_deep_deps() will get you all features eventually influenced by feature i. It might be the deep features for i includes a feature numerically lower than i, but because dfs[] contains all features on the eventual chain, we don't need to start back from 0 again. I felt this was far more productive to code at build time, rather than at runtime. > >> @@ -153,6 +176,36 @@ void calculate_featuresets(void) >> calculate_hvm_featureset(); >> } >> >> +const uint32_t *lookup_deep_deps(uint32_t feature) > Do you really mean this rather internal function to be non-static? (You have found the answer already) > Even if there was a later use in this series, it would seem suspicious > to me; in fact I'd have expected for it and ... > >> +{ >> + static const struct { >> + uint32_t feature; >> + uint32_t fs[FSCAPINTS]; >> + } deep_deps[] = INIT_DEEP_DEPS; > ... this data to again be placed in .init.* sections. For this series, I think it can be .init. For the future cpuid changes, I am going to need to use it for validation during the set_cpuid hypercall, so will have to move to not being .init for that. > >> --- a/xen/tools/gen-cpuid.py >> +++ b/xen/tools/gen-cpuid.py >> @@ -138,6 +138,61 @@ def crunch_numbers(state): >> state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, nr_entries) >> state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries) >> >> + deps = { >> + XSAVE: >> + (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX), >> + >> + AVX: >> + (FMA, FMA4, F16C, AVX2, XOP), > I continue to question whether namely XOP, but perhaps also the > others here except maybe AVX2, really is depending on AVX, and > not just on XSAVE. I am sure we have had this argument before. All VEX encoded SIMD instructions (including XOP which is listed in the same category by AMD) are specified to act on 256bit AVX state, and require AVX enabled in xcr0 to avoid #UD faults. This includes VEX instructions encoding %xmm registers, which explicitly zero the upper 128bits of the associated %ymm register. This is very clearly a dependency on AVX, even if it isn't written in one clear concise statement in the instruction manuals. > >> + PAE: >> + (LM, ), >> + >> + LM: >> + (CX16, LAHF_LM, PAGE1GB), >> + >> + XMM: >> + (LM, ), >> + >> + XMM2: >> + (LM, ), >> + >> + XMM3: >> + (LM, ), > I don't think so - SSE2 is a commonly implied prereq for long mode, > but not SSE3. Instead what I'm missing is a dependency of SSEn, > AES, SHA and maybe more on SSE (or maybe directly FXSR as per > above), and of SSE on FXSR. And there may be more, like MMX > really ought to be dependent on FPU or FXSR (not currently > expressable as it seems). I will double check all of these. ~Andrew
>>> On 15.02.16 at 16:28, <andrew.cooper3@citrix.com> wrote: > On 15/02/16 14:06, Jan Beulich wrote: >>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote: >>> @@ -20,12 +21,34 @@ uint32_t __read_mostly hvm_featureset[FSCAPINTS]; >>> >>> static void sanitise_featureset(uint32_t *fs) >>> { >>> + uint32_t disabled_features[FSCAPINTS]; >>> unsigned int i; >>> >>> for ( i = 0; i < FSCAPINTS; ++i ) >>> { >>> /* Clamp to known mask. */ >>> fs[i] &= known_features[i]; >>> + >>> + /* >>> + * Identify which features with deep dependencies have been >>> + * disabled. >>> + */ >>> + disabled_features[i] = ~fs[i] & deep_features[i]; >>> + } >>> + >>> + for_each_set_bit(i, (void *)disabled_features, >>> + sizeof(disabled_features) * 8) >>> + { >>> + const uint32_t *dfs = lookup_deep_deps(i); >>> + unsigned int j; >>> + >>> + ASSERT(dfs); /* deep_features[] should guarentee this. */ >>> + >>> + for ( j = 0; j < FSCAPINTS; ++j ) >>> + { >>> + fs[j] &= ~dfs[j]; >>> + disabled_features[j] &= ~dfs[j]; >>> + } >>> } >> Am I getting the logic in the Python script right that it is indeed >> unnecessary for this loop to be restarted even when a feature >> at a higher numbered bit position results in a lower numbered >> bit getting cleared? > > Correct - the python logic flattens the dependency tree so an individual > lookup_deep_deps() will get you all features eventually influenced by > feature i. It might be the deep features for i includes a feature > numerically lower than i, but because dfs[] contains all features on the > eventual chain, we don't need to start back from 0 again. > > I felt this was far more productive to code at build time, rather than > at runtime. Oh, definitely. >>> --- a/xen/tools/gen-cpuid.py >>> +++ b/xen/tools/gen-cpuid.py >>> @@ -138,6 +138,61 @@ def crunch_numbers(state): >>> state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, > nr_entries) >>> state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries) >>> >>> + deps = { >>> + XSAVE: >>> + (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX), >>> + >>> + AVX: >>> + (FMA, FMA4, F16C, AVX2, XOP), >> I continue to question whether namely XOP, but perhaps also the >> others here except maybe AVX2, really is depending on AVX, and >> not just on XSAVE. > > I am sure we have had this argument before. Indeed, hence the "I continue to ...". > All VEX encoded SIMD instructions (including XOP which is listed in the > same category by AMD) are specified to act on 256bit AVX state, and > require AVX enabled in xcr0 to avoid #UD faults. This includes VEX > instructions encoding %xmm registers, which explicitly zero the upper > 128bits of the associated %ymm register. > > This is very clearly a dependency on AVX, even if it isn't written in > one clear concise statement in the instruction manuals. The question is what AVX actually means: To me it's an instruction set extension, not one of machine state. The machine state extension to me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I intentionally say "to me", because I can certainly see why this may be viewed differently.) Note how you yourself have recourse to XCR0, which is very clearly tied to XSAVE and not AVX, above (and note also that there's nothing called AVX to be enabled in XCR0, it's YMM that you talk about). Jan
On 15/02/16 15:52, Jan Beulich wrote: > >>>> --- a/xen/tools/gen-cpuid.py >>>> +++ b/xen/tools/gen-cpuid.py >>>> @@ -138,6 +138,61 @@ def crunch_numbers(state): >>>> state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, >> nr_entries) >>>> state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries) >>>> >>>> + deps = { >>>> + XSAVE: >>>> + (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX), >>>> + >>>> + AVX: >>>> + (FMA, FMA4, F16C, AVX2, XOP), >>> I continue to question whether namely XOP, but perhaps also the >>> others here except maybe AVX2, really is depending on AVX, and >>> not just on XSAVE. >> I am sure we have had this argument before. > Indeed, hence the "I continue to ...". > >> All VEX encoded SIMD instructions (including XOP which is listed in the >> same category by AMD) are specified to act on 256bit AVX state, and >> require AVX enabled in xcr0 to avoid #UD faults. This includes VEX >> instructions encoding %xmm registers, which explicitly zero the upper >> 128bits of the associated %ymm register. >> >> This is very clearly a dependency on AVX, even if it isn't written in >> one clear concise statement in the instruction manuals. > The question is what AVX actually means: To me it's an instruction set > extension, not one of machine state. The machine state extension to > me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I > intentionally say "to me", because I can certainly see why this may be > viewed differently.) The AVX feature bit is also the indicator that the AVX bit may be set in XCR0, which links it to machine state and not just instruction sets. > Note how you yourself have recourse to XCR0, > which is very clearly tied to XSAVE and not AVX, above (and note also > that there's nothing called AVX to be enabled in XCR0, it's YMM that > you talk about). The key point is this. If I choose to enable XSAVE and disable AVX for a domain, that domain is unable to FMA/FMA4/F16C instructions. It therefore shouldn't see the features. ~Andrew
>>> On 15.02.16 at 17:09, <andrew.cooper3@citrix.com> wrote: > On 15/02/16 15:52, Jan Beulich wrote: >> >>>>> --- a/xen/tools/gen-cpuid.py >>>>> +++ b/xen/tools/gen-cpuid.py >>>>> @@ -138,6 +138,61 @@ def crunch_numbers(state): >>>>> state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, >>> nr_entries) >>>>> state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries) >>>>> >>>>> + deps = { >>>>> + XSAVE: >>>>> + (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX), >>>>> + >>>>> + AVX: >>>>> + (FMA, FMA4, F16C, AVX2, XOP), >>>> I continue to question whether namely XOP, but perhaps also the >>>> others here except maybe AVX2, really is depending on AVX, and >>>> not just on XSAVE. >>> I am sure we have had this argument before. >> Indeed, hence the "I continue to ...". >> >>> All VEX encoded SIMD instructions (including XOP which is listed in the >>> same category by AMD) are specified to act on 256bit AVX state, and >>> require AVX enabled in xcr0 to avoid #UD faults. This includes VEX >>> instructions encoding %xmm registers, which explicitly zero the upper >>> 128bits of the associated %ymm register. >>> >>> This is very clearly a dependency on AVX, even if it isn't written in >>> one clear concise statement in the instruction manuals. >> The question is what AVX actually means: To me it's an instruction set >> extension, not one of machine state. The machine state extension to >> me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I >> intentionally say "to me", because I can certainly see why this may be >> viewed differently.) > > The AVX feature bit is also the indicator that the AVX bit may be set in > XCR0, which links it to machine state and not just instruction sets. No, it's not (and again - there's no bit named AVX in XCR0): Which bits can be set in XCR0 is enumerated by CPUID[0xd].EDX:EAX, which is - surprise, surprise - the so called XSTATE leaf (i.e. related to XSAVE, and not to AVX). >> Note how you yourself have recourse to XCR0, >> which is very clearly tied to XSAVE and not AVX, above (and note also >> that there's nothing called AVX to be enabled in XCR0, it's YMM that >> you talk about). > > The key point is this. If I choose to enable XSAVE and disable AVX for > a domain, that domain is unable to FMA/FMA4/F16C instructions. It > therefore shouldn't see the features. Are you sure? Did you try? Those instructions may not be very useful without other AVX instructions, but I don't think there's any coupling. And if I, as an example, look at one of the 3-operand vfmadd instructions, I also don't see any #UD resulting from the AVX bit being clear (as opposed to various of the AVX-512 extensions, which clearly document that AVX512F needs to always be checked). It's only in the textual description of e.g. FMA or AVX2 detection where such a connection is being made. In any event, please don't misunderstand my bringing up of this as objection to the way you handle things. I merely wanted to point out again that this is not the only way the (often self- contradictory) SDM can be understood. Jan
On 15/02/16 16:27, Jan Beulich wrote: >>>> On 15.02.16 at 17:09, <andrew.cooper3@citrix.com> wrote: >> On 15/02/16 15:52, Jan Beulich wrote: >>>>>> --- a/xen/tools/gen-cpuid.py >>>>>> +++ b/xen/tools/gen-cpuid.py >>>>>> @@ -138,6 +138,61 @@ def crunch_numbers(state): >>>>>> state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, >>>> nr_entries) >>>>>> state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries) >>>>>> >>>>>> + deps = { >>>>>> + XSAVE: >>>>>> + (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX), >>>>>> + >>>>>> + AVX: >>>>>> + (FMA, FMA4, F16C, AVX2, XOP), >>>>> I continue to question whether namely XOP, but perhaps also the >>>>> others here except maybe AVX2, really is depending on AVX, and >>>>> not just on XSAVE. >>>> I am sure we have had this argument before. >>> Indeed, hence the "I continue to ...". >>> >>>> All VEX encoded SIMD instructions (including XOP which is listed in the >>>> same category by AMD) are specified to act on 256bit AVX state, and >>>> require AVX enabled in xcr0 to avoid #UD faults. This includes VEX >>>> instructions encoding %xmm registers, which explicitly zero the upper >>>> 128bits of the associated %ymm register. >>>> >>>> This is very clearly a dependency on AVX, even if it isn't written in >>>> one clear concise statement in the instruction manuals. >>> The question is what AVX actually means: To me it's an instruction set >>> extension, not one of machine state. The machine state extension to >>> me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I >>> intentionally say "to me", because I can certainly see why this may be >>> viewed differently.) >> The AVX feature bit is also the indicator that the AVX bit may be set in >> XCR0, which links it to machine state and not just instruction sets. > No, it's not (and again - there's no bit named AVX in XCR0): (and again - Intel disagree) The Intel manual uniformly refers to XCR0.AVX (bit 2). AMD uses XCR0.YMM. > Which > bits can be set in XCR0 is enumerated by CPUID[0xd].EDX:EAX, > which is - surprise, surprise - the so called XSTATE leaf (i.e. related > to XSAVE, and not to AVX). In hardware, all these bits are almost certainly hardwired on or off. Part of the issue here is that with virtualisation, there are a whole lot more combinations than exist on real hardware. Whether right or wrong, the values for guests values for CPUID[0xd].EDX:EAX are now generated from the guest featureset. This is based on my assumption that that's how real hardware actually works, and prevents the possibility of them getting out of sync. > >>> Note how you yourself have recourse to XCR0, >>> which is very clearly tied to XSAVE and not AVX, above (and note also >>> that there's nothing called AVX to be enabled in XCR0, it's YMM that >>> you talk about). >> The key point is this. If I choose to enable XSAVE and disable AVX for >> a domain, that domain is unable to FMA/FMA4/F16C instructions. It >> therefore shouldn't see the features. > Are you sure? Did you try? Yes void test_main(void) { printk("AVX Testing\n"); write_cr4(read_cr4() | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT | X86_CR4_OSXSAVE); asm volatile ("xsetbv" :: "a" (0x7), "d" (0), "c" (0)); asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2"); asm volatile ("xsetbv" :: "a" (0x3), "d" (0), "c" (0)); asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2"); xtf_success(); } with disassembly: 00104000 <test_main>: 104000: 48 83 ec 08 sub $0x8,%rsp 104004: bf b0 4c 10 00 mov $0x104cb0,%edi 104009: 31 c0 xor %eax,%eax 10400b: e8 b0 c2 ff ff callq 1002c0 <printk> 104010: 0f 20 e0 mov %cr4,%rax 104013: 48 0d 00 06 04 00 or $0x40600,%rax 104019: 0f 22 e0 mov %rax,%cr4 10401c: 31 c9 xor %ecx,%ecx 10401e: 31 d2 xor %edx,%edx 104020: b8 07 00 00 00 mov $0x7,%eax 104025: 0f 01 d1 xsetbv 104028: c4 e2 f1 98 d0 vfmadd132pd %xmm0,%xmm1,%xmm2 10402d: b0 03 mov $0x3,%al 10402f: 0f 01 d1 xsetbv 104032: c4 e2 f1 98 d0 vfmadd132pd %xmm0,%xmm1,%xmm2 104037: 48 83 c4 08 add $0x8,%rsp 10403b: e9 60 d6 ff ff jmpq 1016a0 <xtf_success> causes a #UD exception on the second FMA instruction only: (d3) [ 357.071427] --- Xen Test Framework --- (d3) [ 357.094556] Environment: HVM 64bit (Long mode 4 levels) (d3) [ 357.094709] AVX Testing (d3) [ 357.094867] ****************************** (d3) [ 357.095050] PANIC: Unhandled exception: vec 6 at 0008:0000000000104032 (d3) [ 357.095160] ****************************** > Those instructions may not be very > useful without other AVX instructions, but I don't think there's > any coupling. And if I, as an example, look at one of the > 3-operand vfmadd instructions, I also don't see any #UD > resulting from the AVX bit being clear (as opposed to various of > the AVX-512 extensions, which clearly document that AVX512F > needs to always be checked). It's only in the textual description > of e.g. FMA or AVX2 detection where such a connection is being > made. > > In any event, please don't misunderstand my bringing up of this > as objection to the way you handle things. I merely wanted to > point out again that this is not the only way the (often self- > contradictory) SDM can be understood. The fact that there is ambiguity means that we must be even more careful when making changes like this. After all, if there are multiple ways to interpret the text, you can probably bet that different software takes contrary interpretations. ~Andrew
>>> On 15.02.16 at 20:07, <andrew.cooper3@citrix.com> wrote: > On 15/02/16 16:27, Jan Beulich wrote: >>>>> On 15.02.16 at 17:09, <andrew.cooper3@citrix.com> wrote: >>> On 15/02/16 15:52, Jan Beulich wrote: >>>>>>> --- a/xen/tools/gen-cpuid.py >>>>>>> +++ b/xen/tools/gen-cpuid.py >>>>>>> @@ -138,6 +138,61 @@ def crunch_numbers(state): >>>>>>> state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, >>>>> nr_entries) >>>>>>> state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries) >>>>>>> >>>>>>> + deps = { >>>>>>> + XSAVE: >>>>>>> + (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX), >>>>>>> + >>>>>>> + AVX: >>>>>>> + (FMA, FMA4, F16C, AVX2, XOP), >>>>>> I continue to question whether namely XOP, but perhaps also the >>>>>> others here except maybe AVX2, really is depending on AVX, and >>>>>> not just on XSAVE. >>>>> I am sure we have had this argument before. >>>> Indeed, hence the "I continue to ...". >>>> >>>>> All VEX encoded SIMD instructions (including XOP which is listed in the >>>>> same category by AMD) are specified to act on 256bit AVX state, and >>>>> require AVX enabled in xcr0 to avoid #UD faults. This includes VEX >>>>> instructions encoding %xmm registers, which explicitly zero the upper >>>>> 128bits of the associated %ymm register. >>>>> >>>>> This is very clearly a dependency on AVX, even if it isn't written in >>>>> one clear concise statement in the instruction manuals. >>>> The question is what AVX actually means: To me it's an instruction set >>>> extension, not one of machine state. The machine state extension to >>>> me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I >>>> intentionally say "to me", because I can certainly see why this may be >>>> viewed differently.) >>> The AVX feature bit is also the indicator that the AVX bit may be set in >>> XCR0, which links it to machine state and not just instruction sets. >> No, it's not (and again - there's no bit named AVX in XCR0): > > (and again - Intel disagree) The Intel manual uniformly refers to > XCR0.AVX (bit 2). AMD uses XCR0.YMM. Interesting. I'm sure early documentation was different, which would be in line with the bits named YMM and ZMM etc in our code base. But anyway. >> Which >> bits can be set in XCR0 is enumerated by CPUID[0xd].EDX:EAX, >> which is - surprise, surprise - the so called XSTATE leaf (i.e. related >> to XSAVE, and not to AVX). > > In hardware, all these bits are almost certainly hardwired on or off. > Part of the issue here is that with virtualisation, there are a whole > lot more combinations than exist on real hardware. > > Whether right or wrong, the values for guests values for > CPUID[0xd].EDX:EAX are now generated from the guest featureset. This is > based on my assumption that that's how real hardware actually works, and > prevents the possibility of them getting out of sync. Which I agree with. In this context, however, I wonder how you mean to do what you say above. I don't recall having see any related code, but of course this may well be in one of the patches I didn't yet get around to look at. >>>> Note how you yourself have recourse to XCR0, >>>> which is very clearly tied to XSAVE and not AVX, above (and note also >>>> that there's nothing called AVX to be enabled in XCR0, it's YMM that >>>> you talk about). >>> The key point is this. If I choose to enable XSAVE and disable AVX for >>> a domain, that domain is unable to FMA/FMA4/F16C instructions. It >>> therefore shouldn't see the features. >> Are you sure? Did you try? > > Yes > > void test_main(void) > { > printk("AVX Testing\n"); > > write_cr4(read_cr4() | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT | X86_CR4_OSXSAVE); > > asm volatile ("xsetbv" :: "a" (0x7), "d" (0), "c" (0)); > asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2"); > > asm volatile ("xsetbv" :: "a" (0x3), "d" (0), "c" (0)); > asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2"); Here you clear the bit in XCR0, which wasn't the question. The question was whether VFMADD... would fault when the CPUID AVX bit is clear. Jan
On 16/02/16 09:54, Jan Beulich wrote: >>>> On 15.02.16 at 20:07, <andrew.cooper3@citrix.com> wrote: >> On 15/02/16 16:27, Jan Beulich wrote: >>>>>> On 15.02.16 at 17:09, <andrew.cooper3@citrix.com> wrote: >>>> On 15/02/16 15:52, Jan Beulich wrote: >>>>>>>> --- a/xen/tools/gen-cpuid.py >>>>>>>> +++ b/xen/tools/gen-cpuid.py >>>>>>>> @@ -138,6 +138,61 @@ def crunch_numbers(state): >>>>>>>> state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, >>>>>> nr_entries) >>>>>>>> state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries) >>>>>>>> >>>>>>>> + deps = { >>>>>>>> + XSAVE: >>>>>>>> + (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX), >>>>>>>> + >>>>>>>> + AVX: >>>>>>>> + (FMA, FMA4, F16C, AVX2, XOP), >>>>>>> I continue to question whether namely XOP, but perhaps also the >>>>>>> others here except maybe AVX2, really is depending on AVX, and >>>>>>> not just on XSAVE. >>>>>> I am sure we have had this argument before. >>>>> Indeed, hence the "I continue to ...". >>>>> >>>>>> All VEX encoded SIMD instructions (including XOP which is listed in the >>>>>> same category by AMD) are specified to act on 256bit AVX state, and >>>>>> require AVX enabled in xcr0 to avoid #UD faults. This includes VEX >>>>>> instructions encoding %xmm registers, which explicitly zero the upper >>>>>> 128bits of the associated %ymm register. >>>>>> >>>>>> This is very clearly a dependency on AVX, even if it isn't written in >>>>>> one clear concise statement in the instruction manuals. >>>>> The question is what AVX actually means: To me it's an instruction set >>>>> extension, not one of machine state. The machine state extension to >>>>> me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I >>>>> intentionally say "to me", because I can certainly see why this may be >>>>> viewed differently.) >>>> The AVX feature bit is also the indicator that the AVX bit may be set in >>>> XCR0, which links it to machine state and not just instruction sets. >>> No, it's not (and again - there's no bit named AVX in XCR0): >> (and again - Intel disagree) The Intel manual uniformly refers to >> XCR0.AVX (bit 2). AMD uses XCR0.YMM. > Interesting. I'm sure early documentation was different, which > would be in line with the bits named YMM and ZMM etc in our > code base. But anyway. > >>> Which >>> bits can be set in XCR0 is enumerated by CPUID[0xd].EDX:EAX, >>> which is - surprise, surprise - the so called XSTATE leaf (i.e. related >>> to XSAVE, and not to AVX). >> In hardware, all these bits are almost certainly hardwired on or off. >> Part of the issue here is that with virtualisation, there are a whole >> lot more combinations than exist on real hardware. >> >> Whether right or wrong, the values for guests values for >> CPUID[0xd].EDX:EAX are now generated from the guest featureset. This is >> based on my assumption that that's how real hardware actually works, and >> prevents the possibility of them getting out of sync. > Which I agree with. In this context, however, I wonder how you > mean to do what you say above. I don't recall having see any > related code, but of course this may well be in one of the patches > I didn't yet get around to look at. > >>>>> Note how you yourself have recourse to XCR0, >>>>> which is very clearly tied to XSAVE and not AVX, above (and note also >>>>> that there's nothing called AVX to be enabled in XCR0, it's YMM that >>>>> you talk about). >>>> The key point is this. If I choose to enable XSAVE and disable AVX for >>>> a domain, that domain is unable to FMA/FMA4/F16C instructions. It >>>> therefore shouldn't see the features. >>> Are you sure? Did you try? >> Yes >> >> void test_main(void) >> { >> printk("AVX Testing\n"); >> >> write_cr4(read_cr4() | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT | X86_CR4_OSXSAVE); >> >> asm volatile ("xsetbv" :: "a" (0x7), "d" (0), "c" (0)); >> asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2"); >> >> asm volatile ("xsetbv" :: "a" (0x3), "d" (0), "c" (0)); >> asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2"); > Here you clear the bit in XCR0, which wasn't the question. The > question was whether VFMADD... would fault when the CPUID > AVX bit is clear. How will the pipeline know whether the guest was offered the AVX flag in cpuid? Hiding feature bits does not prevent the functionality from working. i.e. 1GB superpages don't stop working if you hide the feature bit. No amount of levelling can stop a guest from manually probing for features/instruction sets. ~Andrew
>>> On 17.02.16 at 11:25, <andrew.cooper3@citrix.com> wrote: > On 16/02/16 09:54, Jan Beulich wrote: >>>>> On 15.02.16 at 20:07, <andrew.cooper3@citrix.com> wrote: >>> On 15/02/16 16:27, Jan Beulich wrote: >>>>>>> On 15.02.16 at 17:09, <andrew.cooper3@citrix.com> wrote: >>>>> The key point is this. If I choose to enable XSAVE and disable AVX for >>>>> a domain, that domain is unable to FMA/FMA4/F16C instructions. It >>>>> therefore shouldn't see the features. >>>> Are you sure? Did you try? >>> Yes >>> >>> void test_main(void) >>> { >>> printk("AVX Testing\n"); >>> >>> write_cr4(read_cr4() | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT | X86_CR4_OSXSAVE); >>> >>> asm volatile ("xsetbv" :: "a" (0x7), "d" (0), "c" (0)); >>> asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2"); >>> >>> asm volatile ("xsetbv" :: "a" (0x3), "d" (0), "c" (0)); >>> asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2"); >> Here you clear the bit in XCR0, which wasn't the question. The >> question was whether VFMADD... would fault when the CPUID >> AVX bit is clear. > > How will the pipeline know whether the guest was offered the AVX flag in > cpuid? > > Hiding feature bits does not prevent the functionality from working. > i.e. 1GB superpages don't stop working if you hide the feature bit. No > amount of levelling can stop a guest from manually probing for > features/instruction sets. That wasn't my point. I was intentionally referring to the CPUID bits, knowing that us masking them in software doesn't affect hardware in any way. Hence the "Are you sure? Did you try?" which were really meant in a rhetorical way, as the answer can only possible be "no" here (unless you had control over the hardware). Jan
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c index 1af0e6c..25dcd0e 100644 --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -12,6 +12,7 @@ const uint32_t inverted_features[] = INIT_INVERTED_FEATURES; static const uint32_t pv_featuremask[] = INIT_PV_FEATURES; static const uint32_t hvm_shadow_featuremask[] = INIT_HVM_SHADOW_FEATURES; static const uint32_t hvm_hap_featuremask[] = INIT_HVM_HAP_FEATURES; +static const uint32_t deep_features[] = INIT_DEEP_FEATURES; uint32_t __read_mostly raw_featureset[FSCAPINTS]; uint32_t __read_mostly host_featureset[FSCAPINTS]; @@ -20,12 +21,34 @@ uint32_t __read_mostly hvm_featureset[FSCAPINTS]; static void sanitise_featureset(uint32_t *fs) { + uint32_t disabled_features[FSCAPINTS]; unsigned int i; for ( i = 0; i < FSCAPINTS; ++i ) { /* Clamp to known mask. */ fs[i] &= known_features[i]; + + /* + * Identify which features with deep dependencies have been + * disabled. + */ + disabled_features[i] = ~fs[i] & deep_features[i]; + } + + for_each_set_bit(i, (void *)disabled_features, + sizeof(disabled_features) * 8) + { + const uint32_t *dfs = lookup_deep_deps(i); + unsigned int j; + + ASSERT(dfs); /* deep_features[] should guarentee this. */ + + for ( j = 0; j < FSCAPINTS; ++j ) + { + fs[j] &= ~dfs[j]; + disabled_features[j] &= ~dfs[j]; + } } switch ( boot_cpu_data.x86_vendor ) @@ -153,6 +176,36 @@ void calculate_featuresets(void) calculate_hvm_featureset(); } +const uint32_t *lookup_deep_deps(uint32_t feature) +{ + static const struct { + uint32_t feature; + uint32_t fs[FSCAPINTS]; + } deep_deps[] = INIT_DEEP_DEPS; + unsigned int start = 0, end = ARRAY_SIZE(deep_deps); + + BUILD_BUG_ON(ARRAY_SIZE(deep_deps) != NR_DEEP_DEPS); + + /* Fast early exit. */ + if ( !test_bit(feature, deep_features) ) + return NULL; + + /* deep_deps[] is sorted. Perform a binary search. */ + while ( start < end ) + { + unsigned int mid = start + ((end - start) / 2); + + if ( deep_deps[mid].feature > feature ) + end = mid; + else if ( deep_deps[mid].feature < feature ) + start = mid + 1; + else + return deep_deps[mid].fs; + } + + return NULL; +} + static void __maybe_unused build_assertions(void) { BUILD_BUG_ON(ARRAY_SIZE(known_features) != FSCAPINTS); @@ -160,6 +213,7 @@ static void __maybe_unused build_assertions(void) BUILD_BUG_ON(ARRAY_SIZE(pv_featuremask) != FSCAPINTS); BUILD_BUG_ON(ARRAY_SIZE(hvm_shadow_featuremask) != FSCAPINTS); BUILD_BUG_ON(ARRAY_SIZE(hvm_hap_featuremask) != FSCAPINTS); + BUILD_BUG_ON(ARRAY_SIZE(deep_features) != FSCAPINTS); } /* diff --git a/xen/include/asm-x86/cpuid.h b/xen/include/asm-x86/cpuid.h index 18ba95b..cd7fa90 100644 --- a/xen/include/asm-x86/cpuid.h +++ b/xen/include/asm-x86/cpuid.h @@ -28,6 +28,8 @@ extern uint32_t hvm_featureset[FSCAPINTS]; void calculate_featuresets(void); +const uint32_t *lookup_deep_deps(uint32_t feature); + #endif /* __ASSEMBLY__ */ #endif /* !__X86_CPUID_H__ */ diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py index 5f0f892..c44f124 100755 --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -138,6 +138,61 @@ def crunch_numbers(state): state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, nr_entries) state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries) + deps = { + XSAVE: + (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX), + + AVX: + (FMA, FMA4, F16C, AVX2, XOP), + + PAE: + (LM, ), + + LM: + (CX16, LAHF_LM, PAGE1GB), + + XMM: + (LM, ), + + XMM2: + (LM, ), + + XMM3: + (LM, ), + + APIC: + (X2APIC, ), + + PSE: + (PSE36, ), + } + + deep_features = tuple(sorted(deps.keys())) + state.deep_deps = {} + + for feat in deep_features: + + seen = [feat] + to_process = list(deps[feat]) + + while len(to_process): + f = to_process.pop(0) + + if f in seen: + raise Fail("ERROR: Cycle found with %s when processing %s" + % (state.names[f], state.names[feat])) + + seen.append(f) + to_process.extend(deps.get(f, [])) + + state.deep_deps[feat] = seen[1:] + + state.deep_features = featureset_to_uint32s(deps.keys(), nr_entries) + state.nr_deep_deps = len(state.deep_deps.keys()) + + for k, v in state.deep_deps.iteritems(): + state.deep_deps[k] = featureset_to_uint32s(v, nr_entries) + def write_results(state): state.output.write( @@ -164,6 +219,12 @@ def write_results(state): #define INIT_HVM_SHADOW_FEATURES { \\\n%s\n} #define INIT_HVM_HAP_FEATURES { \\\n%s\n} + +#define NR_DEEP_DEPS %s + +#define INIT_DEEP_FEATURES { \\\n%s\n} + +#define INIT_DEEP_DEPS { \\ """ % (state.nr_entries, state.common, format_uint32s(state.known, 4), @@ -171,10 +232,20 @@ def write_results(state): format_uint32s(state.pv, 4), format_uint32s(state.hvm_shadow, 4), format_uint32s(state.hvm_hap, 4), + state.nr_deep_deps, + format_uint32s(state.deep_features, 4), )) + for dep in sorted(state.deep_deps.keys()): + state.output.write( + " { %#xU, /* %s */ { \\\n%s\n }, }, \\\n" + % (dep, state.names[dep], + format_uint32s(state.deep_deps[dep], 8) + )) + state.output.write( -""" +"""} + #endif /* __XEN_X86__FEATURESET_DATA__ */ """)
Some features depend on other features. Working out and maintaining the exact dependency tree is complicated, so it is expressed in the automatic generation script, and flattened for faster runtime use. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> For all intents an purposes, new in v2. --- xen/arch/x86/cpuid.c | 54 +++++++++++++++++++++++++++++++++ xen/include/asm-x86/cpuid.h | 2 ++ xen/tools/gen-cpuid.py | 73 ++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 128 insertions(+), 1 deletion(-)