[v2,12/30] xen/x86: Generate deep dependencies of features

Message ID	1454679743-18133-13-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: Andrew Cooper <andrew.cooper3@citrix.com> To: Xen-devel <xen-devel@lists.xen.org> Date: Fri, 5 Feb 2016 13:42:05 +0000 Message-ID: <1454679743-18133-13-git-send-email-andrew.cooper3@citrix.com> In-Reply-To: <1454679743-18133-1-git-send-email-andrew.cooper3@citrix.com> References: <1454679743-18133-1-git-send-email-andrew.cooper3@citrix.com> MIME-Version: 1.0 Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Jan Beulich <JBeulich@suse.com> Subject: [Xen-devel] [PATCH v2 12/30] xen/x86: Generate deep dependencies of features Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org

Andrew Cooper Feb. 5, 2016, 1:42 p.m. UTC

Some features depend on other features.  Working out and maintaining the exact
dependency tree is complicated, so it is expressed in the automatic generation
script, and flattened for faster runtime use.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

For all intents an purposes, new in v2.
---
 xen/arch/x86/cpuid.c        | 54 +++++++++++++++++++++++++++++++++
 xen/include/asm-x86/cpuid.h |  2 ++
 xen/tools/gen-cpuid.py      | 73 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 128 insertions(+), 1 deletion(-)

Jan Beulich Feb. 15, 2016, 2:06 p.m. UTC | #1

>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> @@ -20,12 +21,34 @@ uint32_t __read_mostly hvm_featureset[FSCAPINTS];
>  
>  static void sanitise_featureset(uint32_t *fs)
>  {
> +    uint32_t disabled_features[FSCAPINTS];
>      unsigned int i;
>  
>      for ( i = 0; i < FSCAPINTS; ++i )
>      {
>          /* Clamp to known mask. */
>          fs[i] &= known_features[i];
> +
> +        /*
> +         * Identify which features with deep dependencies have been
> +         * disabled.
> +         */
> +        disabled_features[i] = ~fs[i] & deep_features[i];
> +    }
> +
> +    for_each_set_bit(i, (void *)disabled_features,
> +                     sizeof(disabled_features) * 8)
> +    {
> +        const uint32_t *dfs = lookup_deep_deps(i);
> +        unsigned int j;
> +
> +        ASSERT(dfs); /* deep_features[] should guarentee this. */
> +
> +        for ( j = 0; j < FSCAPINTS; ++j )
> +        {
> +            fs[j] &= ~dfs[j];
> +            disabled_features[j] &= ~dfs[j];
> +        }
>      }

Am I getting the logic in the Python script right that it is indeed
unnecessary for this loop to be restarted even when a feature
at a higher numbered bit position results in a lower numbered
bit getting cleared?

> @@ -153,6 +176,36 @@ void calculate_featuresets(void)
>      calculate_hvm_featureset();
>  }
>  
> +const uint32_t *lookup_deep_deps(uint32_t feature)

Do you really mean this rather internal function to be non-static?
Even if there was a later use in this series, it would seem suspicious
to me; in fact I'd have expected for it and ...

> +{
> +    static const struct {
> +        uint32_t feature;
> +        uint32_t fs[FSCAPINTS];
> +    } deep_deps[] = INIT_DEEP_DEPS;

... this data to again be placed in .init.* sections.

> --- a/xen/tools/gen-cpuid.py
> +++ b/xen/tools/gen-cpuid.py
> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, nr_entries)
>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>  
> +    deps = {
> +        XSAVE:
> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
> +
> +        AVX:
> +        (FMA, FMA4, F16C, AVX2, XOP),

I continue to question whether namely XOP, but perhaps also the
others here except maybe AVX2, really is depending on AVX, and
not just on XSAVE.

> +        PAE:
> +        (LM, ),
> +
> +        LM:
> +        (CX16, LAHF_LM, PAGE1GB),
> +
> +        XMM:
> +        (LM, ),
> +
> +        XMM2:
> +        (LM, ),
> +
> +        XMM3:
> +        (LM, ),

I don't think so - SSE2 is a commonly implied prereq for long mode,
but not SSE3. Instead what I'm missing is a dependency of SSEn,
AES, SHA and maybe more on SSE (or maybe directly FXSR as per
above), and of SSE on FXSR. And there may be more, like MMX
really ought to be dependent on FPU or FXSR (not currently
expressable as it seems).

Jan

Andrew Cooper Feb. 15, 2016, 3:28 p.m. UTC | #2

On 15/02/16 14:06, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> @@ -20,12 +21,34 @@ uint32_t __read_mostly hvm_featureset[FSCAPINTS];
>>  
>>  static void sanitise_featureset(uint32_t *fs)
>>  {
>> +    uint32_t disabled_features[FSCAPINTS];
>>      unsigned int i;
>>  
>>      for ( i = 0; i < FSCAPINTS; ++i )
>>      {
>>          /* Clamp to known mask. */
>>          fs[i] &= known_features[i];
>> +
>> +        /*
>> +         * Identify which features with deep dependencies have been
>> +         * disabled.
>> +         */
>> +        disabled_features[i] = ~fs[i] & deep_features[i];
>> +    }
>> +
>> +    for_each_set_bit(i, (void *)disabled_features,
>> +                     sizeof(disabled_features) * 8)
>> +    {
>> +        const uint32_t *dfs = lookup_deep_deps(i);
>> +        unsigned int j;
>> +
>> +        ASSERT(dfs); /* deep_features[] should guarentee this. */
>> +
>> +        for ( j = 0; j < FSCAPINTS; ++j )
>> +        {
>> +            fs[j] &= ~dfs[j];
>> +            disabled_features[j] &= ~dfs[j];
>> +        }
>>      }
> Am I getting the logic in the Python script right that it is indeed
> unnecessary for this loop to be restarted even when a feature
> at a higher numbered bit position results in a lower numbered
> bit getting cleared?

Correct - the python logic flattens the dependency tree so an individual
lookup_deep_deps() will get you all features eventually influenced by
feature i.  It might be the deep features for i includes a feature
numerically lower than i, but because dfs[] contains all features on the
eventual chain, we don't need to start back from 0 again.

I felt this was far more productive to code at build time, rather than
at runtime.

>
>> @@ -153,6 +176,36 @@ void calculate_featuresets(void)
>>      calculate_hvm_featureset();
>>  }
>>  
>> +const uint32_t *lookup_deep_deps(uint32_t feature)
> Do you really mean this rather internal function to be non-static?

(You have found the answer already)

> Even if there was a later use in this series, it would seem suspicious
> to me; in fact I'd have expected for it and ...
>
>> +{
>> +    static const struct {
>> +        uint32_t feature;
>> +        uint32_t fs[FSCAPINTS];
>> +    } deep_deps[] = INIT_DEEP_DEPS;
> ... this data to again be placed in .init.* sections.

For this series, I think it can be .init.  For the future cpuid changes,
I am going to need to use it for validation during the set_cpuid
hypercall, so will have to move to not being .init for that.

>
>> --- a/xen/tools/gen-cpuid.py
>> +++ b/xen/tools/gen-cpuid.py
>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, nr_entries)
>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>  
>> +    deps = {
>> +        XSAVE:
>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>> +
>> +        AVX:
>> +        (FMA, FMA4, F16C, AVX2, XOP),
> I continue to question whether namely XOP, but perhaps also the
> others here except maybe AVX2, really is depending on AVX, and
> not just on XSAVE.

I am sure we have had this argument before.

All VEX encoded SIMD instructions (including XOP which is listed in the
same category by AMD) are specified to act on 256bit AVX state, and
require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
instructions encoding %xmm registers, which explicitly zero the upper
128bits of the associated %ymm register.

This is very clearly a dependency on AVX, even if it isn't written in
one clear concise statement in the instruction manuals.

>
>> +        PAE:
>> +        (LM, ),
>> +
>> +        LM:
>> +        (CX16, LAHF_LM, PAGE1GB),
>> +
>> +        XMM:
>> +        (LM, ),
>> +
>> +        XMM2:
>> +        (LM, ),
>> +
>> +        XMM3:
>> +        (LM, ),
> I don't think so - SSE2 is a commonly implied prereq for long mode,
> but not SSE3. Instead what I'm missing is a dependency of SSEn,
> AES, SHA and maybe more on SSE (or maybe directly FXSR as per
> above), and of SSE on FXSR. And there may be more, like MMX
> really ought to be dependent on FPU or FXSR (not currently
> expressable as it seems).

I will double check all of these.

~Andrew

Jan Beulich Feb. 15, 2016, 3:52 p.m. UTC | #3

>>> On 15.02.16 at 16:28, <andrew.cooper3@citrix.com> wrote:
> On 15/02/16 14:06, Jan Beulich wrote:
>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>> @@ -20,12 +21,34 @@ uint32_t __read_mostly hvm_featureset[FSCAPINTS];
>>>  
>>>  static void sanitise_featureset(uint32_t *fs)
>>>  {
>>> +    uint32_t disabled_features[FSCAPINTS];
>>>      unsigned int i;
>>>  
>>>      for ( i = 0; i < FSCAPINTS; ++i )
>>>      {
>>>          /* Clamp to known mask. */
>>>          fs[i] &= known_features[i];
>>> +
>>> +        /*
>>> +         * Identify which features with deep dependencies have been
>>> +         * disabled.
>>> +         */
>>> +        disabled_features[i] = ~fs[i] & deep_features[i];
>>> +    }
>>> +
>>> +    for_each_set_bit(i, (void *)disabled_features,
>>> +                     sizeof(disabled_features) * 8)
>>> +    {
>>> +        const uint32_t *dfs = lookup_deep_deps(i);
>>> +        unsigned int j;
>>> +
>>> +        ASSERT(dfs); /* deep_features[] should guarentee this. */
>>> +
>>> +        for ( j = 0; j < FSCAPINTS; ++j )
>>> +        {
>>> +            fs[j] &= ~dfs[j];
>>> +            disabled_features[j] &= ~dfs[j];
>>> +        }
>>>      }
>> Am I getting the logic in the Python script right that it is indeed
>> unnecessary for this loop to be restarted even when a feature
>> at a higher numbered bit position results in a lower numbered
>> bit getting cleared?
> 
> Correct - the python logic flattens the dependency tree so an individual
> lookup_deep_deps() will get you all features eventually influenced by
> feature i.  It might be the deep features for i includes a feature
> numerically lower than i, but because dfs[] contains all features on the
> eventual chain, we don't need to start back from 0 again.
> 
> I felt this was far more productive to code at build time, rather than
> at runtime.

Oh, definitely.

>>> --- a/xen/tools/gen-cpuid.py
>>> +++ b/xen/tools/gen-cpuid.py
>>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, 
> nr_entries)
>>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>>  
>>> +    deps = {
>>> +        XSAVE:
>>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>>> +
>>> +        AVX:
>>> +        (FMA, FMA4, F16C, AVX2, XOP),
>> I continue to question whether namely XOP, but perhaps also the
>> others here except maybe AVX2, really is depending on AVX, and
>> not just on XSAVE.
> 
> I am sure we have had this argument before.

Indeed, hence the "I continue to ...".

> All VEX encoded SIMD instructions (including XOP which is listed in the
> same category by AMD) are specified to act on 256bit AVX state, and
> require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
> instructions encoding %xmm registers, which explicitly zero the upper
> 128bits of the associated %ymm register.
> 
> This is very clearly a dependency on AVX, even if it isn't written in
> one clear concise statement in the instruction manuals.

The question is what AVX actually means: To me it's an instruction set
extension, not one of machine state. The machine state extension to
me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I
intentionally say "to me", because I can certainly see why this may be
viewed differently.) Note how you yourself have recourse to XCR0,
which is very clearly tied to XSAVE and not AVX, above (and note also
that there's nothing called AVX to be enabled in XCR0, it's YMM that
you talk about).

Jan

Andrew Cooper Feb. 15, 2016, 4:09 p.m. UTC | #4

On 15/02/16 15:52, Jan Beulich wrote:
>
>>>> --- a/xen/tools/gen-cpuid.py
>>>> +++ b/xen/tools/gen-cpuid.py
>>>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, 
>> nr_entries)
>>>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>>>  
>>>> +    deps = {
>>>> +        XSAVE:
>>>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>>>> +
>>>> +        AVX:
>>>> +        (FMA, FMA4, F16C, AVX2, XOP),
>>> I continue to question whether namely XOP, but perhaps also the
>>> others here except maybe AVX2, really is depending on AVX, and
>>> not just on XSAVE.
>> I am sure we have had this argument before.
> Indeed, hence the "I continue to ...".
>
>> All VEX encoded SIMD instructions (including XOP which is listed in the
>> same category by AMD) are specified to act on 256bit AVX state, and
>> require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
>> instructions encoding %xmm registers, which explicitly zero the upper
>> 128bits of the associated %ymm register.
>>
>> This is very clearly a dependency on AVX, even if it isn't written in
>> one clear concise statement in the instruction manuals.
> The question is what AVX actually means: To me it's an instruction set
> extension, not one of machine state. The machine state extension to
> me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I
> intentionally say "to me", because I can certainly see why this may be
> viewed differently.)

The AVX feature bit is also the indicator that the AVX bit may be set in
XCR0, which links it to machine state and not just instruction sets.

>  Note how you yourself have recourse to XCR0,
> which is very clearly tied to XSAVE and not AVX, above (and note also
> that there's nothing called AVX to be enabled in XCR0, it's YMM that
> you talk about).

The key point is this.  If I choose to enable XSAVE and disable AVX for
a domain, that domain is unable to FMA/FMA4/F16C instructions.  It
therefore shouldn't see the features.

~Andrew

Jan Beulich Feb. 15, 2016, 4:27 p.m. UTC | #5

>>> On 15.02.16 at 17:09, <andrew.cooper3@citrix.com> wrote:
> On 15/02/16 15:52, Jan Beulich wrote:
>>
>>>>> --- a/xen/tools/gen-cpuid.py
>>>>> +++ b/xen/tools/gen-cpuid.py
>>>>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>>>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, 
>>> nr_entries)
>>>>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>>>>  
>>>>> +    deps = {
>>>>> +        XSAVE:
>>>>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>>>>> +
>>>>> +        AVX:
>>>>> +        (FMA, FMA4, F16C, AVX2, XOP),
>>>> I continue to question whether namely XOP, but perhaps also the
>>>> others here except maybe AVX2, really is depending on AVX, and
>>>> not just on XSAVE.
>>> I am sure we have had this argument before.
>> Indeed, hence the "I continue to ...".
>>
>>> All VEX encoded SIMD instructions (including XOP which is listed in the
>>> same category by AMD) are specified to act on 256bit AVX state, and
>>> require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
>>> instructions encoding %xmm registers, which explicitly zero the upper
>>> 128bits of the associated %ymm register.
>>>
>>> This is very clearly a dependency on AVX, even if it isn't written in
>>> one clear concise statement in the instruction manuals.
>> The question is what AVX actually means: To me it's an instruction set
>> extension, not one of machine state. The machine state extension to
>> me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I
>> intentionally say "to me", because I can certainly see why this may be
>> viewed differently.)
> 
> The AVX feature bit is also the indicator that the AVX bit may be set in
> XCR0, which links it to machine state and not just instruction sets.

No, it's not (and again - there's no bit named AVX in XCR0): Which
bits can be set in XCR0 is enumerated by CPUID[0xd].EDX:EAX,
which is - surprise, surprise - the so called XSTATE leaf (i.e. related
to XSAVE, and not to AVX).

>>  Note how you yourself have recourse to XCR0,
>> which is very clearly tied to XSAVE and not AVX, above (and note also
>> that there's nothing called AVX to be enabled in XCR0, it's YMM that
>> you talk about).
> 
> The key point is this.  If I choose to enable XSAVE and disable AVX for
> a domain, that domain is unable to FMA/FMA4/F16C instructions.  It
> therefore shouldn't see the features.

Are you sure? Did you try? Those instructions may not be very
useful without other AVX instructions, but I don't think there's
any coupling. And if I, as an example, look at one of the
3-operand vfmadd instructions, I also don't see any #UD
resulting from the AVX bit being clear (as opposed to various of
the AVX-512 extensions, which clearly document that AVX512F
needs to always be checked). It's only in the textual description
of e.g. FMA or AVX2 detection where such a connection is being
made.

In any event, please don't misunderstand my bringing up of this
as objection to the way you handle things. I merely wanted to
point out again that this is not the only way the (often self-
contradictory) SDM can be understood.

Jan

Andrew Cooper Feb. 15, 2016, 7:07 p.m. UTC | #6

On 15/02/16 16:27, Jan Beulich wrote:
>>>> On 15.02.16 at 17:09, <andrew.cooper3@citrix.com> wrote:
>> On 15/02/16 15:52, Jan Beulich wrote:
>>>>>> --- a/xen/tools/gen-cpuid.py
>>>>>> +++ b/xen/tools/gen-cpuid.py
>>>>>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>>>>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, 
>>>> nr_entries)
>>>>>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>>>>>  
>>>>>> +    deps = {
>>>>>> +        XSAVE:
>>>>>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>>>>>> +
>>>>>> +        AVX:
>>>>>> +        (FMA, FMA4, F16C, AVX2, XOP),
>>>>> I continue to question whether namely XOP, but perhaps also the
>>>>> others here except maybe AVX2, really is depending on AVX, and
>>>>> not just on XSAVE.
>>>> I am sure we have had this argument before.
>>> Indeed, hence the "I continue to ...".
>>>
>>>> All VEX encoded SIMD instructions (including XOP which is listed in the
>>>> same category by AMD) are specified to act on 256bit AVX state, and
>>>> require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
>>>> instructions encoding %xmm registers, which explicitly zero the upper
>>>> 128bits of the associated %ymm register.
>>>>
>>>> This is very clearly a dependency on AVX, even if it isn't written in
>>>> one clear concise statement in the instruction manuals.
>>> The question is what AVX actually means: To me it's an instruction set
>>> extension, not one of machine state. The machine state extension to
>>> me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I
>>> intentionally say "to me", because I can certainly see why this may be
>>> viewed differently.)
>> The AVX feature bit is also the indicator that the AVX bit may be set in
>> XCR0, which links it to machine state and not just instruction sets.
> No, it's not (and again - there's no bit named AVX in XCR0):

(and again - Intel disagree) The Intel manual uniformly refers to
XCR0.AVX (bit 2).  AMD uses XCR0.YMM.

>  Which
> bits can be set in XCR0 is enumerated by CPUID[0xd].EDX:EAX,
> which is - surprise, surprise - the so called XSTATE leaf (i.e. related
> to XSAVE, and not to AVX).

In hardware, all these bits are almost certainly hardwired on or off. 
Part of the issue here is that with virtualisation, there are a whole
lot more combinations than exist on real hardware.

Whether right or wrong, the values for guests values for
CPUID[0xd].EDX:EAX are now generated from the guest featureset.  This is
based on my assumption that that's how real hardware actually works, and
prevents the possibility of them getting out of sync.

>
>>>  Note how you yourself have recourse to XCR0,
>>> which is very clearly tied to XSAVE and not AVX, above (and note also
>>> that there's nothing called AVX to be enabled in XCR0, it's YMM that
>>> you talk about).
>> The key point is this.  If I choose to enable XSAVE and disable AVX for
>> a domain, that domain is unable to FMA/FMA4/F16C instructions.  It
>> therefore shouldn't see the features.
> Are you sure? Did you try?

Yes

void test_main(void)
{
    printk("AVX Testing\n");

    write_cr4(read_cr4() | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT |
X86_CR4_OSXSAVE);

    asm volatile ("xsetbv" :: "a" (0x7), "d" (0), "c" (0));
    asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");

    asm volatile ("xsetbv" :: "a" (0x3), "d" (0), "c" (0));
    asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");

    xtf_success();
}

with disassembly:

00104000 <test_main>:
  104000:       48 83 ec 08             sub    $0x8,%rsp
  104004:       bf b0 4c 10 00          mov    $0x104cb0,%edi
  104009:       31 c0                   xor    %eax,%eax
  10400b:       e8 b0 c2 ff ff          callq  1002c0 <printk>
  104010:       0f 20 e0                mov    %cr4,%rax
  104013:       48 0d 00 06 04 00       or     $0x40600,%rax
  104019:       0f 22 e0                mov    %rax,%cr4
  10401c:       31 c9                   xor    %ecx,%ecx
  10401e:       31 d2                   xor    %edx,%edx
  104020:       b8 07 00 00 00          mov    $0x7,%eax
  104025:       0f 01 d1                xsetbv
  104028:       c4 e2 f1 98 d0          vfmadd132pd %xmm0,%xmm1,%xmm2
  10402d:       b0 03                   mov    $0x3,%al
  10402f:       0f 01 d1                xsetbv
  104032:       c4 e2 f1 98 d0          vfmadd132pd %xmm0,%xmm1,%xmm2
  104037:       48 83 c4 08             add    $0x8,%rsp
  10403b:       e9 60 d6 ff ff          jmpq   1016a0 <xtf_success>

causes a #UD exception on the second FMA instruction only:

(d3) [  357.071427] --- Xen Test Framework ---
(d3) [  357.094556] Environment: HVM 64bit (Long mode 4 levels)
(d3) [  357.094709] AVX Testing
(d3) [  357.094867] ******************************
(d3) [  357.095050] PANIC: Unhandled exception: vec 6 at
0008:0000000000104032
(d3) [  357.095160] ******************************


>  Those instructions may not be very
> useful without other AVX instructions, but I don't think there's
> any coupling. And if I, as an example, look at one of the
> 3-operand vfmadd instructions, I also don't see any #UD
> resulting from the AVX bit being clear (as opposed to various of
> the AVX-512 extensions, which clearly document that AVX512F
> needs to always be checked). It's only in the textual description
> of e.g. FMA or AVX2 detection where such a connection is being
> made.
>
> In any event, please don't misunderstand my bringing up of this
> as objection to the way you handle things. I merely wanted to
> point out again that this is not the only way the (often self-
> contradictory) SDM can be understood.

The fact that there is ambiguity means that we must be even more careful
when making changes like this.  After all, if there are multiple ways to
interpret the text, you can probably bet that different software takes
contrary interpretations.

~Andrew

Jan Beulich Feb. 16, 2016, 9:54 a.m. UTC | #7

>>> On 15.02.16 at 20:07, <andrew.cooper3@citrix.com> wrote:
> On 15/02/16 16:27, Jan Beulich wrote:
>>>>> On 15.02.16 at 17:09, <andrew.cooper3@citrix.com> wrote:
>>> On 15/02/16 15:52, Jan Beulich wrote:
>>>>>>> --- a/xen/tools/gen-cpuid.py
>>>>>>> +++ b/xen/tools/gen-cpuid.py
>>>>>>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>>>>>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, 
>>>>> nr_entries)
>>>>>>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>>>>>>  
>>>>>>> +    deps = {
>>>>>>> +        XSAVE:
>>>>>>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>>>>>>> +
>>>>>>> +        AVX:
>>>>>>> +        (FMA, FMA4, F16C, AVX2, XOP),
>>>>>> I continue to question whether namely XOP, but perhaps also the
>>>>>> others here except maybe AVX2, really is depending on AVX, and
>>>>>> not just on XSAVE.
>>>>> I am sure we have had this argument before.
>>>> Indeed, hence the "I continue to ...".
>>>>
>>>>> All VEX encoded SIMD instructions (including XOP which is listed in the
>>>>> same category by AMD) are specified to act on 256bit AVX state, and
>>>>> require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
>>>>> instructions encoding %xmm registers, which explicitly zero the upper
>>>>> 128bits of the associated %ymm register.
>>>>>
>>>>> This is very clearly a dependency on AVX, even if it isn't written in
>>>>> one clear concise statement in the instruction manuals.
>>>> The question is what AVX actually means: To me it's an instruction set
>>>> extension, not one of machine state. The machine state extension to
>>>> me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I
>>>> intentionally say "to me", because I can certainly see why this may be
>>>> viewed differently.)
>>> The AVX feature bit is also the indicator that the AVX bit may be set in
>>> XCR0, which links it to machine state and not just instruction sets.
>> No, it's not (and again - there's no bit named AVX in XCR0):
> 
> (and again - Intel disagree) The Intel manual uniformly refers to
> XCR0.AVX (bit 2).  AMD uses XCR0.YMM.

Interesting. I'm sure early documentation was different, which
would be in line with the bits named YMM and ZMM etc in our
code base. But anyway.

>>  Which
>> bits can be set in XCR0 is enumerated by CPUID[0xd].EDX:EAX,
>> which is - surprise, surprise - the so called XSTATE leaf (i.e. related
>> to XSAVE, and not to AVX).
> 
> In hardware, all these bits are almost certainly hardwired on or off. 
> Part of the issue here is that with virtualisation, there are a whole
> lot more combinations than exist on real hardware.
> 
> Whether right or wrong, the values for guests values for
> CPUID[0xd].EDX:EAX are now generated from the guest featureset.  This is
> based on my assumption that that's how real hardware actually works, and
> prevents the possibility of them getting out of sync.

Which I agree with. In this context, however, I wonder how you
mean to do what you say above. I don't recall having see any
related code, but of course this may well be in one of the patches
I didn't yet get around to look at.

>>>>  Note how you yourself have recourse to XCR0,
>>>> which is very clearly tied to XSAVE and not AVX, above (and note also
>>>> that there's nothing called AVX to be enabled in XCR0, it's YMM that
>>>> you talk about).
>>> The key point is this.  If I choose to enable XSAVE and disable AVX for
>>> a domain, that domain is unable to FMA/FMA4/F16C instructions.  It
>>> therefore shouldn't see the features.
>> Are you sure? Did you try?
> 
> Yes
> 
> void test_main(void)
> {
>     printk("AVX Testing\n");
> 
>     write_cr4(read_cr4() | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT | X86_CR4_OSXSAVE);
> 
>     asm volatile ("xsetbv" :: "a" (0x7), "d" (0), "c" (0));
>     asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");
> 
>     asm volatile ("xsetbv" :: "a" (0x3), "d" (0), "c" (0));
>     asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");

Here you clear the bit in XCR0, which wasn't the question. The
question was whether VFMADD... would fault when the CPUID
AVX bit is clear.

Jan

Andrew Cooper Feb. 17, 2016, 10:25 a.m. UTC | #8

On 16/02/16 09:54, Jan Beulich wrote:
>>>> On 15.02.16 at 20:07, <andrew.cooper3@citrix.com> wrote:
>> On 15/02/16 16:27, Jan Beulich wrote:
>>>>>> On 15.02.16 at 17:09, <andrew.cooper3@citrix.com> wrote:
>>>> On 15/02/16 15:52, Jan Beulich wrote:
>>>>>>>> --- a/xen/tools/gen-cpuid.py
>>>>>>>> +++ b/xen/tools/gen-cpuid.py
>>>>>>>> @@ -138,6 +138,61 @@ def crunch_numbers(state):
>>>>>>>>      state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, 
>>>>>> nr_entries)
>>>>>>>>      state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
>>>>>>>>  
>>>>>>>> +    deps = {
>>>>>>>> +        XSAVE:
>>>>>>>> +        (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX),
>>>>>>>> +
>>>>>>>> +        AVX:
>>>>>>>> +        (FMA, FMA4, F16C, AVX2, XOP),
>>>>>>> I continue to question whether namely XOP, but perhaps also the
>>>>>>> others here except maybe AVX2, really is depending on AVX, and
>>>>>>> not just on XSAVE.
>>>>>> I am sure we have had this argument before.
>>>>> Indeed, hence the "I continue to ...".
>>>>>
>>>>>> All VEX encoded SIMD instructions (including XOP which is listed in the
>>>>>> same category by AMD) are specified to act on 256bit AVX state, and
>>>>>> require AVX enabled in xcr0 to avoid #UD faults.  This includes VEX
>>>>>> instructions encoding %xmm registers, which explicitly zero the upper
>>>>>> 128bits of the associated %ymm register.
>>>>>>
>>>>>> This is very clearly a dependency on AVX, even if it isn't written in
>>>>>> one clear concise statement in the instruction manuals.
>>>>> The question is what AVX actually means: To me it's an instruction set
>>>>> extension, not one of machine state. The machine state extension to
>>>>> me is tied to XSAVE (or more precisely to XSAVE's YMM state). (But I
>>>>> intentionally say "to me", because I can certainly see why this may be
>>>>> viewed differently.)
>>>> The AVX feature bit is also the indicator that the AVX bit may be set in
>>>> XCR0, which links it to machine state and not just instruction sets.
>>> No, it's not (and again - there's no bit named AVX in XCR0):
>> (and again - Intel disagree) The Intel manual uniformly refers to
>> XCR0.AVX (bit 2).  AMD uses XCR0.YMM.
> Interesting. I'm sure early documentation was different, which
> would be in line with the bits named YMM and ZMM etc in our
> code base. But anyway.
>
>>>  Which
>>> bits can be set in XCR0 is enumerated by CPUID[0xd].EDX:EAX,
>>> which is - surprise, surprise - the so called XSTATE leaf (i.e. related
>>> to XSAVE, and not to AVX).
>> In hardware, all these bits are almost certainly hardwired on or off. 
>> Part of the issue here is that with virtualisation, there are a whole
>> lot more combinations than exist on real hardware.
>>
>> Whether right or wrong, the values for guests values for
>> CPUID[0xd].EDX:EAX are now generated from the guest featureset.  This is
>> based on my assumption that that's how real hardware actually works, and
>> prevents the possibility of them getting out of sync.
> Which I agree with. In this context, however, I wonder how you
> mean to do what you say above. I don't recall having see any
> related code, but of course this may well be in one of the patches
> I didn't yet get around to look at.
>
>>>>>  Note how you yourself have recourse to XCR0,
>>>>> which is very clearly tied to XSAVE and not AVX, above (and note also
>>>>> that there's nothing called AVX to be enabled in XCR0, it's YMM that
>>>>> you talk about).
>>>> The key point is this.  If I choose to enable XSAVE and disable AVX for
>>>> a domain, that domain is unable to FMA/FMA4/F16C instructions.  It
>>>> therefore shouldn't see the features.
>>> Are you sure? Did you try?
>> Yes
>>
>> void test_main(void)
>> {
>>     printk("AVX Testing\n");
>>
>>     write_cr4(read_cr4() | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT | X86_CR4_OSXSAVE);
>>
>>     asm volatile ("xsetbv" :: "a" (0x7), "d" (0), "c" (0));
>>     asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");
>>
>>     asm volatile ("xsetbv" :: "a" (0x3), "d" (0), "c" (0));
>>     asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");
> Here you clear the bit in XCR0, which wasn't the question. The
> question was whether VFMADD... would fault when the CPUID
> AVX bit is clear.

How will the pipeline know whether the guest was offered the AVX flag in
cpuid?

Hiding feature bits does not prevent the functionality from working. 
i.e. 1GB superpages don't stop working if you hide the feature bit.  No
amount of levelling can stop a guest from manually probing for
features/instruction sets.

~Andrew

Jan Beulich Feb. 17, 2016, 10:42 a.m. UTC | #9

>>> On 17.02.16 at 11:25, <andrew.cooper3@citrix.com> wrote:
> On 16/02/16 09:54, Jan Beulich wrote:
>>>>> On 15.02.16 at 20:07, <andrew.cooper3@citrix.com> wrote:
>>> On 15/02/16 16:27, Jan Beulich wrote:
>>>>>>> On 15.02.16 at 17:09, <andrew.cooper3@citrix.com> wrote:
>>>>> The key point is this.  If I choose to enable XSAVE and disable AVX for
>>>>> a domain, that domain is unable to FMA/FMA4/F16C instructions.  It
>>>>> therefore shouldn't see the features.
>>>> Are you sure? Did you try?
>>> Yes
>>>
>>> void test_main(void)
>>> {
>>>     printk("AVX Testing\n");
>>>
>>>     write_cr4(read_cr4() | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT | X86_CR4_OSXSAVE);
>>>
>>>     asm volatile ("xsetbv" :: "a" (0x7), "d" (0), "c" (0));
>>>     asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");
>>>
>>>     asm volatile ("xsetbv" :: "a" (0x3), "d" (0), "c" (0));
>>>     asm volatile ("vfmadd132pd %xmm0, %xmm1, %xmm2");
>> Here you clear the bit in XCR0, which wasn't the question. The
>> question was whether VFMADD... would fault when the CPUID
>> AVX bit is clear.
> 
> How will the pipeline know whether the guest was offered the AVX flag in
> cpuid?
> 
> Hiding feature bits does not prevent the functionality from working. 
> i.e. 1GB superpages don't stop working if you hide the feature bit.  No
> amount of levelling can stop a guest from manually probing for
> features/instruction sets.

That wasn't my point. I was intentionally referring to the CPUID
bits, knowing that us masking them in software doesn't affect
hardware in any way. Hence the "Are you sure? Did you try?"
which were really meant in a rhetorical way, as the answer can
only possible be "no" here (unless you had control over the
hardware).

Jan

[v2,12/30] xen/x86: Generate deep dependencies of features

Commit Message

Comments

Patch