diff mbox series

[v2,3/4] x86/cpu-policy: Simplify recalculate_xstate()

Message ID 20240429182823.1130436-4-andrew.cooper3@citrix.com (mailing list archive)
State New
Headers show
Series x86/xstate: Fixes to size calculations | expand

Commit Message

Andrew Cooper April 29, 2024, 6:28 p.m. UTC
Make use of the new xstate_uncompressed_size() helper rather than maintaining
the running calculation while accumulating feature components.

The rest of the CPUID data can come direct from the raw cpu policy.  All
per-component data form an ABI through the behaviour of the X{SAVE,RSTOR}*
instructions.

Use for_each_set_bit() rather than opencoding a slightly awkward version of
it.  Mask the attributes in ecx down based on the visible features.  This
isn't actually necessary for any components or attributes defined at the time
of writing (up to AMX), but is added out of an abundance of caution.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

v2:
 * Tie ALIGN64 to xsavec rather than xsaves.
---
 xen/arch/x86/cpu-policy.c         | 55 +++++++++++--------------------
 xen/arch/x86/include/asm/xstate.h |  1 +
 2 files changed, 21 insertions(+), 35 deletions(-)

Comments

Jan Beulich May 2, 2024, 12:39 p.m. UTC | #1
On 29.04.2024 20:28, Andrew Cooper wrote:
> Make use of the new xstate_uncompressed_size() helper rather than maintaining
> the running calculation while accumulating feature components.

xstate_uncompressed_size() isn't really a new function, but the re-work of
an earlier one. That, aiui, could have been used here, too, just that it
would have been inefficient to do so. IOW perhaps drop "the new"?

> The rest of the CPUID data can come direct from the raw cpu policy.  All
> per-component data form an ABI through the behaviour of the X{SAVE,RSTOR}*
> instructions.
> 
> Use for_each_set_bit() rather than opencoding a slightly awkward version of
> it.  Mask the attributes in ecx down based on the visible features.  This
> isn't actually necessary for any components or attributes defined at the time
> of writing (up to AMX), but is added out of an abundance of caution.

As to this, ...

> @@ -206,61 +205,47 @@ static void recalculate_xstate(struct cpu_policy *p)
>          return;
>  
>      if ( p->basic.avx )
> -    {
>          xstates |= X86_XCR0_YMM;
> -        xstate_size = max(xstate_size,
> -                          xstate_offsets[X86_XCR0_YMM_POS] +
> -                          xstate_sizes[X86_XCR0_YMM_POS]);
> -    }
>  
>      if ( p->feat.mpx )
> -    {
>          xstates |= X86_XCR0_BNDREGS | X86_XCR0_BNDCSR;
> -        xstate_size = max(xstate_size,
> -                          xstate_offsets[X86_XCR0_BNDCSR_POS] +
> -                          xstate_sizes[X86_XCR0_BNDCSR_POS]);
> -    }
>  
>      if ( p->feat.avx512f )
> -    {
>          xstates |= X86_XCR0_OPMASK | X86_XCR0_ZMM | X86_XCR0_HI_ZMM;
> -        xstate_size = max(xstate_size,
> -                          xstate_offsets[X86_XCR0_HI_ZMM_POS] +
> -                          xstate_sizes[X86_XCR0_HI_ZMM_POS]);
> -    }
>  
>      if ( p->feat.pku )
> -    {
>          xstates |= X86_XCR0_PKRU;
> -        xstate_size = max(xstate_size,
> -                          xstate_offsets[X86_XCR0_PKRU_POS] +
> -                          xstate_sizes[X86_XCR0_PKRU_POS]);
> -    }
>  
> -    p->xstate.max_size  =  xstate_size;
> +    /* Subleaf 0 */
> +    p->xstate.max_size =
> +        xstate_uncompressed_size(xstates & ~XSTATE_XSAVES_ONLY);
>      p->xstate.xcr0_low  =  xstates & ~XSTATE_XSAVES_ONLY;
>      p->xstate.xcr0_high = (xstates & ~XSTATE_XSAVES_ONLY) >> 32;
>  
> +    /* Subleaf 1 */
>      p->xstate.Da1 = Da1;
> +    if ( p->xstate.xsavec )
> +        ecx_mask |= XSTATE_ALIGN64;
> +
>      if ( p->xstate.xsaves )
>      {
> +        ecx_mask |= XSTATE_XSS;
>          p->xstate.xss_low   =  xstates & XSTATE_XSAVES_ONLY;
>          p->xstate.xss_high  = (xstates & XSTATE_XSAVES_ONLY) >> 32;
>      }
> -    else
> -        xstates &= ~XSTATE_XSAVES_ONLY;
>  
> -    for ( i = 2; i < min(63UL, ARRAY_SIZE(p->xstate.comp)); ++i )
> +    /* Subleafs 2+ */
> +    xstates &= ~XSTATE_FP_SSE;
> +    BUILD_BUG_ON(ARRAY_SIZE(p->xstate.comp) < 63);
> +    for_each_set_bit ( i, &xstates, 63 )
>      {
> -        uint64_t curr_xstate = 1UL << i;
> -
> -        if ( !(xstates & curr_xstate) )
> -            continue;
> -
> -        p->xstate.comp[i].size   = xstate_sizes[i];
> -        p->xstate.comp[i].offset = xstate_offsets[i];
> -        p->xstate.comp[i].xss    = curr_xstate & XSTATE_XSAVES_ONLY;
> -        p->xstate.comp[i].align  = curr_xstate & xstate_align;

... for this bit, isn't the move from this ...

> +        /*
> +         * Pass through size (eax) and offset (ebx) directly.  Visbility of
> +         * attributes in ecx limited by visible features in Da1.
> +         */
> +        p->xstate.raw[i].a = raw_cpu_policy.xstate.raw[i].a;
> +        p->xstate.raw[i].b = raw_cpu_policy.xstate.raw[i].b;
> +        p->xstate.raw[i].c = raw_cpu_policy.xstate.raw[i].c & ecx_mask;

... to this changing what guests get to see, i.e. (mildly?) incompatible?
(For .xss there's no issue because XSTATE_XSAVES_ONLY is still 0.)

Jan
Andrew Cooper May 2, 2024, 1:24 p.m. UTC | #2
On 02/05/2024 1:39 pm, Jan Beulich wrote:
> On 29.04.2024 20:28, Andrew Cooper wrote:
>> Make use of the new xstate_uncompressed_size() helper rather than maintaining
>> the running calculation while accumulating feature components.
> xstate_uncompressed_size() isn't really a new function, but the re-work of
> an earlier one. That, aiui, could have been used here, too, just that it
> would have been inefficient to do so. IOW perhaps drop "the new"?

Ok.

>
>> The rest of the CPUID data can come direct from the raw cpu policy.  All
>> per-component data form an ABI through the behaviour of the X{SAVE,RSTOR}*
>> instructions.
>>
>> Use for_each_set_bit() rather than opencoding a slightly awkward version of
>> it.  Mask the attributes in ecx down based on the visible features.  This
>> isn't actually necessary for any components or attributes defined at the time
>> of writing (up to AMX), but is added out of an abundance of caution.
> As to this, ...
>
>> @@ -206,61 +205,47 @@ static void recalculate_xstate(struct cpu_policy *p)
>>          return;
>>  
>>      if ( p->basic.avx )
>> -    {
>>          xstates |= X86_XCR0_YMM;
>> -        xstate_size = max(xstate_size,
>> -                          xstate_offsets[X86_XCR0_YMM_POS] +
>> -                          xstate_sizes[X86_XCR0_YMM_POS]);
>> -    }
>>  
>>      if ( p->feat.mpx )
>> -    {
>>          xstates |= X86_XCR0_BNDREGS | X86_XCR0_BNDCSR;
>> -        xstate_size = max(xstate_size,
>> -                          xstate_offsets[X86_XCR0_BNDCSR_POS] +
>> -                          xstate_sizes[X86_XCR0_BNDCSR_POS]);
>> -    }
>>  
>>      if ( p->feat.avx512f )
>> -    {
>>          xstates |= X86_XCR0_OPMASK | X86_XCR0_ZMM | X86_XCR0_HI_ZMM;
>> -        xstate_size = max(xstate_size,
>> -                          xstate_offsets[X86_XCR0_HI_ZMM_POS] +
>> -                          xstate_sizes[X86_XCR0_HI_ZMM_POS]);
>> -    }
>>  
>>      if ( p->feat.pku )
>> -    {
>>          xstates |= X86_XCR0_PKRU;
>> -        xstate_size = max(xstate_size,
>> -                          xstate_offsets[X86_XCR0_PKRU_POS] +
>> -                          xstate_sizes[X86_XCR0_PKRU_POS]);
>> -    }
>>  
>> -    p->xstate.max_size  =  xstate_size;
>> +    /* Subleaf 0 */
>> +    p->xstate.max_size =
>> +        xstate_uncompressed_size(xstates & ~XSTATE_XSAVES_ONLY);
>>      p->xstate.xcr0_low  =  xstates & ~XSTATE_XSAVES_ONLY;
>>      p->xstate.xcr0_high = (xstates & ~XSTATE_XSAVES_ONLY) >> 32;
>>  
>> +    /* Subleaf 1 */
>>      p->xstate.Da1 = Da1;
>> +    if ( p->xstate.xsavec )
>> +        ecx_mask |= XSTATE_ALIGN64;
>> +
>>      if ( p->xstate.xsaves )
>>      {
>> +        ecx_mask |= XSTATE_XSS;
>>          p->xstate.xss_low   =  xstates & XSTATE_XSAVES_ONLY;
>>          p->xstate.xss_high  = (xstates & XSTATE_XSAVES_ONLY) >> 32;
>>      }
>> -    else
>> -        xstates &= ~XSTATE_XSAVES_ONLY;
>>  
>> -    for ( i = 2; i < min(63UL, ARRAY_SIZE(p->xstate.comp)); ++i )
>> +    /* Subleafs 2+ */
>> +    xstates &= ~XSTATE_FP_SSE;
>> +    BUILD_BUG_ON(ARRAY_SIZE(p->xstate.comp) < 63);
>> +    for_each_set_bit ( i, &xstates, 63 )
>>      {
>> -        uint64_t curr_xstate = 1UL << i;
>> -
>> -        if ( !(xstates & curr_xstate) )
>> -            continue;
>> -
>> -        p->xstate.comp[i].size   = xstate_sizes[i];
>> -        p->xstate.comp[i].offset = xstate_offsets[i];
>> -        p->xstate.comp[i].xss    = curr_xstate & XSTATE_XSAVES_ONLY;
>> -        p->xstate.comp[i].align  = curr_xstate & xstate_align;
> ... for this bit, isn't the move from this ...
>
>> +        /*
>> +         * Pass through size (eax) and offset (ebx) directly.  Visbility of
>> +         * attributes in ecx limited by visible features in Da1.
>> +         */
>> +        p->xstate.raw[i].a = raw_cpu_policy.xstate.raw[i].a;
>> +        p->xstate.raw[i].b = raw_cpu_policy.xstate.raw[i].b;
>> +        p->xstate.raw[i].c = raw_cpu_policy.xstate.raw[i].c & ecx_mask;
> ... to this changing what guests get to see, i.e. (mildly?) incompatible?

No.

The only "rows" in leaf 0xd we expose to guests are AVX, MPX, AVX512 and
PKU  (higher up in this hunk, selecting valid bits in xstates).  None of
these have a non-zero value in ecx.

This is a latent bug until we offer AMX or CET, hence why I wanted to
complete this series before your AMX series goes in.

~Andrew
Jan Beulich May 2, 2024, 2:33 p.m. UTC | #3
On 02.05.2024 15:24, Andrew Cooper wrote:
> On 02/05/2024 1:39 pm, Jan Beulich wrote:
>> On 29.04.2024 20:28, Andrew Cooper wrote:
>>> Make use of the new xstate_uncompressed_size() helper rather than maintaining
>>> the running calculation while accumulating feature components.
>> xstate_uncompressed_size() isn't really a new function, but the re-work of
>> an earlier one. That, aiui, could have been used here, too, just that it
>> would have been inefficient to do so. IOW perhaps drop "the new"?
> 
> Ok.
> 
>>
>>> The rest of the CPUID data can come direct from the raw cpu policy.  All
>>> per-component data form an ABI through the behaviour of the X{SAVE,RSTOR}*
>>> instructions.
>>>
>>> Use for_each_set_bit() rather than opencoding a slightly awkward version of
>>> it.  Mask the attributes in ecx down based on the visible features.  This
>>> isn't actually necessary for any components or attributes defined at the time
>>> of writing (up to AMX), but is added out of an abundance of caution.
>> As to this, ...
>>
>>> @@ -206,61 +205,47 @@ static void recalculate_xstate(struct cpu_policy *p)
>>>          return;
>>>  
>>>      if ( p->basic.avx )
>>> -    {
>>>          xstates |= X86_XCR0_YMM;
>>> -        xstate_size = max(xstate_size,
>>> -                          xstate_offsets[X86_XCR0_YMM_POS] +
>>> -                          xstate_sizes[X86_XCR0_YMM_POS]);
>>> -    }
>>>  
>>>      if ( p->feat.mpx )
>>> -    {
>>>          xstates |= X86_XCR0_BNDREGS | X86_XCR0_BNDCSR;
>>> -        xstate_size = max(xstate_size,
>>> -                          xstate_offsets[X86_XCR0_BNDCSR_POS] +
>>> -                          xstate_sizes[X86_XCR0_BNDCSR_POS]);
>>> -    }
>>>  
>>>      if ( p->feat.avx512f )
>>> -    {
>>>          xstates |= X86_XCR0_OPMASK | X86_XCR0_ZMM | X86_XCR0_HI_ZMM;
>>> -        xstate_size = max(xstate_size,
>>> -                          xstate_offsets[X86_XCR0_HI_ZMM_POS] +
>>> -                          xstate_sizes[X86_XCR0_HI_ZMM_POS]);
>>> -    }
>>>  
>>>      if ( p->feat.pku )
>>> -    {
>>>          xstates |= X86_XCR0_PKRU;
>>> -        xstate_size = max(xstate_size,
>>> -                          xstate_offsets[X86_XCR0_PKRU_POS] +
>>> -                          xstate_sizes[X86_XCR0_PKRU_POS]);
>>> -    }
>>>  
>>> -    p->xstate.max_size  =  xstate_size;
>>> +    /* Subleaf 0 */
>>> +    p->xstate.max_size =
>>> +        xstate_uncompressed_size(xstates & ~XSTATE_XSAVES_ONLY);
>>>      p->xstate.xcr0_low  =  xstates & ~XSTATE_XSAVES_ONLY;
>>>      p->xstate.xcr0_high = (xstates & ~XSTATE_XSAVES_ONLY) >> 32;
>>>  
>>> +    /* Subleaf 1 */
>>>      p->xstate.Da1 = Da1;
>>> +    if ( p->xstate.xsavec )
>>> +        ecx_mask |= XSTATE_ALIGN64;
>>> +
>>>      if ( p->xstate.xsaves )
>>>      {
>>> +        ecx_mask |= XSTATE_XSS;
>>>          p->xstate.xss_low   =  xstates & XSTATE_XSAVES_ONLY;
>>>          p->xstate.xss_high  = (xstates & XSTATE_XSAVES_ONLY) >> 32;
>>>      }
>>> -    else
>>> -        xstates &= ~XSTATE_XSAVES_ONLY;
>>>  
>>> -    for ( i = 2; i < min(63UL, ARRAY_SIZE(p->xstate.comp)); ++i )
>>> +    /* Subleafs 2+ */
>>> +    xstates &= ~XSTATE_FP_SSE;
>>> +    BUILD_BUG_ON(ARRAY_SIZE(p->xstate.comp) < 63);
>>> +    for_each_set_bit ( i, &xstates, 63 )
>>>      {
>>> -        uint64_t curr_xstate = 1UL << i;
>>> -
>>> -        if ( !(xstates & curr_xstate) )
>>> -            continue;
>>> -
>>> -        p->xstate.comp[i].size   = xstate_sizes[i];
>>> -        p->xstate.comp[i].offset = xstate_offsets[i];
>>> -        p->xstate.comp[i].xss    = curr_xstate & XSTATE_XSAVES_ONLY;
>>> -        p->xstate.comp[i].align  = curr_xstate & xstate_align;
>> ... for this bit, isn't the move from this ...
>>
>>> +        /*
>>> +         * Pass through size (eax) and offset (ebx) directly.  Visbility of
>>> +         * attributes in ecx limited by visible features in Da1.
>>> +         */
>>> +        p->xstate.raw[i].a = raw_cpu_policy.xstate.raw[i].a;
>>> +        p->xstate.raw[i].b = raw_cpu_policy.xstate.raw[i].b;
>>> +        p->xstate.raw[i].c = raw_cpu_policy.xstate.raw[i].c & ecx_mask;
>> ... to this changing what guests get to see, i.e. (mildly?) incompatible?
> 
> No.
> 
> The only "rows" in leaf 0xd we expose to guests are AVX, MPX, AVX512 and
> PKU  (higher up in this hunk, selecting valid bits in xstates).  None of
> these have a non-zero value in ecx.
> 
> This is a latent bug until we offer AMX or CET, hence why I wanted to
> complete this series before your AMX series goes in.

Oh, so finally a description of that very issue you kept mentioning. With
the small tweak to the description then
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 4b6d96276399..fc7933be8577 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -193,8 +193,7 @@  static void sanitise_featureset(uint32_t *fs)
 static void recalculate_xstate(struct cpu_policy *p)
 {
     uint64_t xstates = XSTATE_FP_SSE;
-    uint32_t xstate_size = XSTATE_AREA_MIN_SIZE;
-    unsigned int i, Da1 = p->xstate.Da1;
+    unsigned int i, ecx_mask = 0, Da1 = p->xstate.Da1;
 
     /*
      * The Da1 leaf is the only piece of information preserved in the common
@@ -206,61 +205,47 @@  static void recalculate_xstate(struct cpu_policy *p)
         return;
 
     if ( p->basic.avx )
-    {
         xstates |= X86_XCR0_YMM;
-        xstate_size = max(xstate_size,
-                          xstate_offsets[X86_XCR0_YMM_POS] +
-                          xstate_sizes[X86_XCR0_YMM_POS]);
-    }
 
     if ( p->feat.mpx )
-    {
         xstates |= X86_XCR0_BNDREGS | X86_XCR0_BNDCSR;
-        xstate_size = max(xstate_size,
-                          xstate_offsets[X86_XCR0_BNDCSR_POS] +
-                          xstate_sizes[X86_XCR0_BNDCSR_POS]);
-    }
 
     if ( p->feat.avx512f )
-    {
         xstates |= X86_XCR0_OPMASK | X86_XCR0_ZMM | X86_XCR0_HI_ZMM;
-        xstate_size = max(xstate_size,
-                          xstate_offsets[X86_XCR0_HI_ZMM_POS] +
-                          xstate_sizes[X86_XCR0_HI_ZMM_POS]);
-    }
 
     if ( p->feat.pku )
-    {
         xstates |= X86_XCR0_PKRU;
-        xstate_size = max(xstate_size,
-                          xstate_offsets[X86_XCR0_PKRU_POS] +
-                          xstate_sizes[X86_XCR0_PKRU_POS]);
-    }
 
-    p->xstate.max_size  =  xstate_size;
+    /* Subleaf 0 */
+    p->xstate.max_size =
+        xstate_uncompressed_size(xstates & ~XSTATE_XSAVES_ONLY);
     p->xstate.xcr0_low  =  xstates & ~XSTATE_XSAVES_ONLY;
     p->xstate.xcr0_high = (xstates & ~XSTATE_XSAVES_ONLY) >> 32;
 
+    /* Subleaf 1 */
     p->xstate.Da1 = Da1;
+    if ( p->xstate.xsavec )
+        ecx_mask |= XSTATE_ALIGN64;
+
     if ( p->xstate.xsaves )
     {
+        ecx_mask |= XSTATE_XSS;
         p->xstate.xss_low   =  xstates & XSTATE_XSAVES_ONLY;
         p->xstate.xss_high  = (xstates & XSTATE_XSAVES_ONLY) >> 32;
     }
-    else
-        xstates &= ~XSTATE_XSAVES_ONLY;
 
-    for ( i = 2; i < min(63UL, ARRAY_SIZE(p->xstate.comp)); ++i )
+    /* Subleafs 2+ */
+    xstates &= ~XSTATE_FP_SSE;
+    BUILD_BUG_ON(ARRAY_SIZE(p->xstate.comp) < 63);
+    for_each_set_bit ( i, &xstates, 63 )
     {
-        uint64_t curr_xstate = 1UL << i;
-
-        if ( !(xstates & curr_xstate) )
-            continue;
-
-        p->xstate.comp[i].size   = xstate_sizes[i];
-        p->xstate.comp[i].offset = xstate_offsets[i];
-        p->xstate.comp[i].xss    = curr_xstate & XSTATE_XSAVES_ONLY;
-        p->xstate.comp[i].align  = curr_xstate & xstate_align;
+        /*
+         * Pass through size (eax) and offset (ebx) directly.  Visbility of
+         * attributes in ecx limited by visible features in Da1.
+         */
+        p->xstate.raw[i].a = raw_cpu_policy.xstate.raw[i].a;
+        p->xstate.raw[i].b = raw_cpu_policy.xstate.raw[i].b;
+        p->xstate.raw[i].c = raw_cpu_policy.xstate.raw[i].c & ecx_mask;
     }
 }
 
diff --git a/xen/arch/x86/include/asm/xstate.h b/xen/arch/x86/include/asm/xstate.h
index f5115199d4f9..bfb66dd766b6 100644
--- a/xen/arch/x86/include/asm/xstate.h
+++ b/xen/arch/x86/include/asm/xstate.h
@@ -40,6 +40,7 @@  extern uint32_t mxcsr_mask;
 #define XSTATE_XSAVES_ONLY         0
 #define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
+#define XSTATE_XSS     (1U << 0)
 #define XSTATE_ALIGN64 (1U << 1)
 
 extern u64 xfeature_mask;