[v6,08/11] xen/lib: Add topology generator for x86

Message ID	20241001123807.605-9-alejandro.vallejo@cloud.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: Alejandro Vallejo <alejandro.vallejo@cloud.com> To: Xen-devel <xen-devel@lists.xenproject.org> Cc: Alejandro Vallejo <alejandro.vallejo@cloud.com>, Jan Beulich <jbeulich@suse.com>, Andrew Cooper <andrew.cooper3@citrix.com>, =?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com>, Anthony PERARD <anthony.perard@vates.tech> Subject: [PATCH v6 08/11] xen/lib: Add topology generator for x86 Date: Tue, 1 Oct 2024 13:38:04 +0100 Message-ID: <20241001123807.605-9-alejandro.vallejo@cloud.com> In-Reply-To: <20241001123807.605-1-alejandro.vallejo@cloud.com> References: <20241001123807.605-1-alejandro.vallejo@cloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	x86: Expose consistent topology to guests \| expand [v6,00/11] x86: Expose consistent topology to guests [v6,01/11] lib/x86: Relax checks about policy compatibility [v6,02/11] x86/vlapic: Move lapic migration checks to the check hooks [v6,03/11] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area [v6,04/11] xen/x86: Add supporting code for uploading LAPIC contexts during domain create [v6,05/11] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves [v6,06/11] tools/libacpi: Use LUT of APIC IDs rather than function pointer [v6,07/11] tools/libguest: Always set vCPU context in vcpu_hvm() [v6,08/11] xen/lib: Add topology generator for x86 [v6,09/11] xen/x86: Derive topologically correct x2APIC IDs from the policy [v6,10/11] tools/libguest: Set distinct x2APIC IDs for each vCPU [v6,11/11] tools/x86: Synthesise domain topologies

Message ID

20241001123807.605-9-alejandro.vallejo@cloud.com (mailing list archive)

State

Superseded

Headers

Errors-To: xen-devel-bounces@lists.xenproject.org
Precedence: list
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
To: Xen-devel <xen-devel@lists.xenproject.org>
Cc: Alejandro Vallejo <alejandro.vallejo@cloud.com>,
 Jan Beulich <jbeulich@suse.com>, Andrew Cooper <andrew.cooper3@citrix.com>,
	=?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com>,
 Anthony PERARD <anthony.perard@vates.tech>
Subject: [PATCH v6 08/11] xen/lib: Add topology generator for x86
Date: Tue,  1 Oct 2024 13:38:04 +0100
Message-ID: <20241001123807.605-9-alejandro.vallejo@cloud.com>
In-Reply-To: <20241001123807.605-1-alejandro.vallejo@cloud.com>
References: <20241001123807.605-1-alejandro.vallejo@cloud.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

x86: Expose consistent topology to guests | expand

Commit Message

Alejandro Vallejo Oct. 1, 2024, 12:38 p.m. UTC

Add a helper to populate topology leaves in the cpu policy from
threads/core and cores/package counts. It's unit-tested in
test-cpu-policy.c,
but it's not connected to the rest of the code yet.

Adds the ASSERT() macro to xen/lib/x86/private.h, as it was missing.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
 tools/tests/cpu-policy/test-cpu-policy.c | 133 +++++++++++++++++++++++
 xen/include/xen/lib/x86/cpu-policy.h     |  16 +++
 xen/lib/x86/policy.c                     |  88 +++++++++++++++
 xen/lib/x86/private.h                    |   4 +
 4 files changed, 241 insertions(+)

Comments

Jan Beulich Oct. 9, 2024, 2:45 p.m. UTC | #1

On 01.10.2024 14:38, Alejandro Vallejo wrote:
> --- a/xen/include/xen/lib/x86/cpu-policy.h
> +++ b/xen/include/xen/lib/x86/cpu-policy.h
> @@ -542,6 +542,22 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
>                                      const struct cpu_policy *guest,
>                                      struct cpu_policy_errors *err);
>  
> +/**
> + * Synthesise topology information in `p` given high-level constraints
> + *
> + * Topology is given in various fields accross several leaves, some of
> + * which are vendor-specific. This function uses the policy itself to
> + * derive such leaves from threads/core and cores/package.

Isn't it more like s/uses/fills/ (and the rest of the sentence then
possibly adjust some to match)? The policy looks to be purely an output
here (except for the vendor field).

> --- a/xen/lib/x86/policy.c
> +++ b/xen/lib/x86/policy.c
> @@ -2,6 +2,94 @@
>  
>  #include <xen/lib/x86/cpu-policy.h>
>  
> +static unsigned int order(unsigned int n)
> +{
> +    ASSERT(n); /* clz(0) is UB */
> +
> +    return 8 * sizeof(n) - __builtin_clz(n);
> +}
> +
> +int x86_topo_from_parts(struct cpu_policy *p,
> +                        unsigned int threads_per_core,
> +                        unsigned int cores_per_pkg)
> +{
> +    unsigned int threads_per_pkg = threads_per_core * cores_per_pkg;

What about the (admittedly absurd) case of this overflowing?

> +    unsigned int apic_id_size;
> +
> +    if ( !p || !threads_per_core || !cores_per_pkg )
> +        return -EINVAL;
> +
> +    p->basic.max_leaf = MAX(0xb, p->basic.max_leaf);

Better use the type-safe max() (and min() further down)?

> +    memset(p->topo.raw, 0, sizeof(p->topo.raw));
> +
> +    /* thread level */
> +    p->topo.subleaf[0].nr_logical = threads_per_core;
> +    p->topo.subleaf[0].id_shift = 0;
> +    p->topo.subleaf[0].level = 0;
> +    p->topo.subleaf[0].type = 1;
> +    if ( threads_per_core > 1 )
> +        p->topo.subleaf[0].id_shift = order(threads_per_core - 1);
> +
> +    /* core level */
> +    p->topo.subleaf[1].nr_logical = cores_per_pkg;
> +    if ( p->x86_vendor == X86_VENDOR_INTEL )
> +        p->topo.subleaf[1].nr_logical = threads_per_pkg;
> +    p->topo.subleaf[1].id_shift = p->topo.subleaf[0].id_shift;
> +    p->topo.subleaf[1].level = 1;
> +    p->topo.subleaf[1].type = 2;
> +    if ( cores_per_pkg > 1 )
> +        p->topo.subleaf[1].id_shift += order(cores_per_pkg - 1);
> +
> +    apic_id_size = p->topo.subleaf[1].id_shift;
> +
> +    /*
> +     * Contrary to what the name might seem to imply. HTT is an enabler for
> +     * SMP and there's no harm in setting it even with a single vCPU.
> +     */
> +    p->basic.htt = true;
> +    p->basic.lppp = MIN(0xff, threads_per_pkg);
> +
> +    switch ( p->x86_vendor )
> +    {
> +    case X86_VENDOR_INTEL: {
> +        struct cpuid_cache_leaf *sl = p->cache.subleaf;
> +
> +        for ( size_t i = 0; sl->type &&
> +                            i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
> +        {
> +            sl->cores_per_package = cores_per_pkg - 1;
> +            sl->threads_per_cache = threads_per_core - 1;
> +            if ( sl->type == 3 /* unified cache */ )
> +                sl->threads_per_cache = threads_per_pkg - 1;

I wasn't able to find documentation for this, well, anomaly. Can you please
point me at where this is spelled out?

Jan

Alejandro Vallejo Oct. 9, 2024, 5:57 p.m. UTC | #2

On Wed Oct 9, 2024 at 3:45 PM BST, Jan Beulich wrote:
> On 01.10.2024 14:38, Alejandro Vallejo wrote:
> > --- a/xen/include/xen/lib/x86/cpu-policy.h
> > +++ b/xen/include/xen/lib/x86/cpu-policy.h
> > @@ -542,6 +542,22 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
> >                                      const struct cpu_policy *guest,
> >                                      struct cpu_policy_errors *err);
> >  
> > +/**
> > + * Synthesise topology information in `p` given high-level constraints
> > + *
> > + * Topology is given in various fields accross several leaves, some of
> > + * which are vendor-specific. This function uses the policy itself to
> > + * derive such leaves from threads/core and cores/package.
>
> Isn't it more like s/uses/fills/ (and the rest of the sentence then
> possibly adjust some to match)? The policy looks to be purely an output
> here (except for the vendor field).

Sure.

>
> > --- a/xen/lib/x86/policy.c
> > +++ b/xen/lib/x86/policy.c
> > @@ -2,6 +2,94 @@
> >  
> >  #include <xen/lib/x86/cpu-policy.h>
> >  
> > +static unsigned int order(unsigned int n)
> > +{
> > +    ASSERT(n); /* clz(0) is UB */
> > +
> > +    return 8 * sizeof(n) - __builtin_clz(n);
> > +}
> > +
> > +int x86_topo_from_parts(struct cpu_policy *p,
> > +                        unsigned int threads_per_core,
> > +                        unsigned int cores_per_pkg)
> > +{
> > +    unsigned int threads_per_pkg = threads_per_core * cores_per_pkg;
>
> What about the (admittedly absurd) case of this overflowing?

Each of them individually could overflow the fields in which they are used.

Does returning EINVAL if either threads_per_core or cores_per_pkg overflow the
INTEL structure j

>
> > +    unsigned int apic_id_size;
> > +
> > +    if ( !p || !threads_per_core || !cores_per_pkg )
> > +        return -EINVAL;
> > +
> > +    p->basic.max_leaf = MAX(0xb, p->basic.max_leaf);
>
> Better use the type-safe max() (and min() further down)?

Sure

>
> > +    memset(p->topo.raw, 0, sizeof(p->topo.raw));
> > +
> > +    /* thread level */
> > +    p->topo.subleaf[0].nr_logical = threads_per_core;
> > +    p->topo.subleaf[0].id_shift = 0;
> > +    p->topo.subleaf[0].level = 0;
> > +    p->topo.subleaf[0].type = 1;
> > +    if ( threads_per_core > 1 )
> > +        p->topo.subleaf[0].id_shift = order(threads_per_core - 1);
> > +
> > +    /* core level */
> > +    p->topo.subleaf[1].nr_logical = cores_per_pkg;
> > +    if ( p->x86_vendor == X86_VENDOR_INTEL )
> > +        p->topo.subleaf[1].nr_logical = threads_per_pkg;
> > +    p->topo.subleaf[1].id_shift = p->topo.subleaf[0].id_shift;
> > +    p->topo.subleaf[1].level = 1;
> > +    p->topo.subleaf[1].type = 2;
> > +    if ( cores_per_pkg > 1 )
> > +        p->topo.subleaf[1].id_shift += order(cores_per_pkg - 1);
> > +
> > +    apic_id_size = p->topo.subleaf[1].id_shift;
> > +
> > +    /*
> > +     * Contrary to what the name might seem to imply. HTT is an enabler for
> > +     * SMP and there's no harm in setting it even with a single vCPU.
> > +     */
> > +    p->basic.htt = true;
> > +    p->basic.lppp = MIN(0xff, threads_per_pkg);
> > +
> > +    switch ( p->x86_vendor )
> > +    {
> > +    case X86_VENDOR_INTEL: {
> > +        struct cpuid_cache_leaf *sl = p->cache.subleaf;
> > +
> > +        for ( size_t i = 0; sl->type &&
> > +                            i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
> > +        {
> > +            sl->cores_per_package = cores_per_pkg - 1;
> > +            sl->threads_per_cache = threads_per_core - 1;
> > +            if ( sl->type == 3 /* unified cache */ )
> > +                sl->threads_per_cache = threads_per_pkg - 1;
>
> I wasn't able to find documentation for this, well, anomaly. Can you please
> point me at where this is spelled out?

That's showing all unified caches as caches covering the whole package. We
could do it the other way around (but I don't want to reverse engineer what the
host policy says because that's irrelevant). There's nothing in the SDM (AFAIK)
forcing L2 or L3 to behave one way or another, so we get to choose. I thought
it more helpful to make all unified caches unified across the package. to give
more information in the leaf.

My own system exposes 2 unified caches (data trimmed for space):

``` cpuid

   deterministic cache parameters (4):
      --- cache 0 ---
      cache type                         = data cache (1)
      cache level                        = 0x1 (1)
      maximum IDs for CPUs sharing cache = 0x1 (1)
      maximum IDs for cores in pkg       = 0xf (15)
      --- cache 1 ---
      cache type                         = instruction cache (2)
      cache level                        = 0x1 (1)
      maximum IDs for CPUs sharing cache = 0x1 (1)
      maximum IDs for cores in pkg       = 0xf (15)
      --- cache 2 ---
      cache type                         = unified cache (3)
      cache level                        = 0x2 (2)
      maximum IDs for CPUs sharing cache = 0x1 (1)
      maximum IDs for cores in pkg       = 0xf (15)
      --- cache 3 ---
      cache type                         = unified cache (3)
      cache level                        = 0x3 (3)
      maximum IDs for CPUs sharing cache = 0x1f (31)
      maximum IDs for cores in pkg       = 0xf (15)
      --- cache 4 ---
      cache type                         = no more caches (0)
```

>
> Jan

Cheers,
Alejandro

Jan Beulich Oct. 10, 2024, 7:54 a.m. UTC | #3

On 09.10.2024 19:57, Alejandro Vallejo wrote:
> On Wed Oct 9, 2024 at 3:45 PM BST, Jan Beulich wrote:
>> On 01.10.2024 14:38, Alejandro Vallejo wrote:
>>> --- a/xen/lib/x86/policy.c
>>> +++ b/xen/lib/x86/policy.c
>>> @@ -2,6 +2,94 @@
>>>  
>>>  #include <xen/lib/x86/cpu-policy.h>
>>>  
>>> +static unsigned int order(unsigned int n)
>>> +{
>>> +    ASSERT(n); /* clz(0) is UB */
>>> +
>>> +    return 8 * sizeof(n) - __builtin_clz(n);
>>> +}
>>> +
>>> +int x86_topo_from_parts(struct cpu_policy *p,
>>> +                        unsigned int threads_per_core,
>>> +                        unsigned int cores_per_pkg)
>>> +{
>>> +    unsigned int threads_per_pkg = threads_per_core * cores_per_pkg;
>>
>> What about the (admittedly absurd) case of this overflowing?
> 
> Each of them individually could overflow the fields in which they are used.
> 
> Does returning EINVAL if either threads_per_core or cores_per_pkg overflow the
> INTEL structure j

The sentence looks unfinished, so I can only vaguely say that my answer to
the question would likely be "yes".

>>> +    switch ( p->x86_vendor )
>>> +    {
>>> +    case X86_VENDOR_INTEL: {
>>> +        struct cpuid_cache_leaf *sl = p->cache.subleaf;
>>> +
>>> +        for ( size_t i = 0; sl->type &&
>>> +                            i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
>>> +        {
>>> +            sl->cores_per_package = cores_per_pkg - 1;
>>> +            sl->threads_per_cache = threads_per_core - 1;
>>> +            if ( sl->type == 3 /* unified cache */ )
>>> +                sl->threads_per_cache = threads_per_pkg - 1;
>>
>> I wasn't able to find documentation for this, well, anomaly. Can you please
>> point me at where this is spelled out?
> 
> That's showing all unified caches as caches covering the whole package. We
> could do it the other way around (but I don't want to reverse engineer what the
> host policy says because that's irrelevant). There's nothing in the SDM (AFAIK)
> forcing L2 or L3 to behave one way or another, so we get to choose. I thought
> it more helpful to make all unified caches unified across the package. to give
> more information in the leaf.
> 
> My own system exposes 2 unified caches (data trimmed for space):
> 
> ``` cpuid
> 
>    deterministic cache parameters (4):
>       --- cache 0 ---
>       cache type                         = data cache (1)
>       cache level                        = 0x1 (1)
>       maximum IDs for CPUs sharing cache = 0x1 (1)
>       maximum IDs for cores in pkg       = 0xf (15)
>       --- cache 1 ---
>       cache type                         = instruction cache (2)
>       cache level                        = 0x1 (1)
>       maximum IDs for CPUs sharing cache = 0x1 (1)
>       maximum IDs for cores in pkg       = 0xf (15)
>       --- cache 2 ---
>       cache type                         = unified cache (3)
>       cache level                        = 0x2 (2)
>       maximum IDs for CPUs sharing cache = 0x1 (1)

Note how this is different ...

>       maximum IDs for cores in pkg       = 0xf (15)
>       --- cache 3 ---
>       cache type                         = unified cache (3)
>       cache level                        = 0x3 (3)
>       maximum IDs for CPUs sharing cache = 0x1f (31)

... from this, whereas your code would make it the same.

Especially if this is something you do beyond / outside the spec, it imo
needs reasoning about in fair detail in the description.

Jan

Alejandro Vallejo Oct. 15, 2024, 1:08 p.m. UTC | #4

On Thu Oct 10, 2024 at 8:54 AM BST, Jan Beulich wrote:
> On 09.10.2024 19:57, Alejandro Vallejo wrote:
> > On Wed Oct 9, 2024 at 3:45 PM BST, Jan Beulich wrote:
> >> On 01.10.2024 14:38, Alejandro Vallejo wrote:
> >>> --- a/xen/lib/x86/policy.c
> >>> +++ b/xen/lib/x86/policy.c
> >>> @@ -2,6 +2,94 @@
> >>>  
> >>>  #include <xen/lib/x86/cpu-policy.h>
> >>>  
> >>> +static unsigned int order(unsigned int n)
> >>> +{
> >>> +    ASSERT(n); /* clz(0) is UB */
> >>> +
> >>> +    return 8 * sizeof(n) - __builtin_clz(n);
> >>> +}
> >>> +
> >>> +int x86_topo_from_parts(struct cpu_policy *p,
> >>> +                        unsigned int threads_per_core,
> >>> +                        unsigned int cores_per_pkg)
> >>> +{
> >>> +    unsigned int threads_per_pkg = threads_per_core * cores_per_pkg;
> >>
> >> What about the (admittedly absurd) case of this overflowing?
> > 
> > Each of them individually could overflow the fields in which they are used.
> > 
> > Does returning EINVAL if either threads_per_core or cores_per_pkg overflow the
> > INTEL structure j
>
> The sentence looks unfinished, so I can only vaguely say that my answer to
> the question would likely be "yes".

It was indeed. Regardless, the number of bits available in Intel's cache
subleaves is rather limited, so I'll be clipping those to the maximum on
overflow and...

>
> >>> +    switch ( p->x86_vendor )
> >>> +    {
> >>> +    case X86_VENDOR_INTEL: {
> >>> +        struct cpuid_cache_leaf *sl = p->cache.subleaf;
> >>> +
> >>> +        for ( size_t i = 0; sl->type &&
> >>> +                            i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
> >>> +        {
> >>> +            sl->cores_per_package = cores_per_pkg - 1;
> >>> +            sl->threads_per_cache = threads_per_core - 1;
> >>> +            if ( sl->type == 3 /* unified cache */ )
> >>> +                sl->threads_per_cache = threads_per_pkg - 1;
> >>
> >> I wasn't able to find documentation for this, well, anomaly. Can you please
> >> point me at where this is spelled out?
> > 
> > That's showing all unified caches as caches covering the whole package. We
> > could do it the other way around (but I don't want to reverse engineer what the
> > host policy says because that's irrelevant). There's nothing in the SDM (AFAIK)
> > forcing L2 or L3 to behave one way or another, so we get to choose. I thought
> > it more helpful to make all unified caches unified across the package. to give
> > more information in the leaf.
> > 
> > My own system exposes 2 unified caches (data trimmed for space):
> > 
> > ``` cpuid
> > 
> >    deterministic cache parameters (4):
> >       --- cache 0 ---
> >       cache type                         = data cache (1)
> >       cache level                        = 0x1 (1)
> >       maximum IDs for CPUs sharing cache = 0x1 (1)
> >       maximum IDs for cores in pkg       = 0xf (15)
> >       --- cache 1 ---
> >       cache type                         = instruction cache (2)
> >       cache level                        = 0x1 (1)
> >       maximum IDs for CPUs sharing cache = 0x1 (1)
> >       maximum IDs for cores in pkg       = 0xf (15)
> >       --- cache 2 ---
> >       cache type                         = unified cache (3)
> >       cache level                        = 0x2 (2)
> >       maximum IDs for CPUs sharing cache = 0x1 (1)
>
> Note how this is different ...
>
> >       maximum IDs for cores in pkg       = 0xf (15)
> >       --- cache 3 ---
> >       cache type                         = unified cache (3)
> >       cache level                        = 0x3 (3)
> >       maximum IDs for CPUs sharing cache = 0x1f (31)
>
> ... from this, whereas your code would make it the same.
>
> Especially if this is something you do beyond / outside the spec, it imo
> needs reasoning about in fair detail in the description.

... given the risk of clipping, I'll get rid of that conditional too to make it
easier for a non-clipped number to be reported.

I'll write in the commit message the behaviour on overflow for these leaves.

>
> Jan

Cheers,
Alejandro

diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c
index 9216010b1c5d..870f7ecee0e5 100644
--- a/tools/tests/cpu-policy/test-cpu-policy.c
+++ b/tools/tests/cpu-policy/test-cpu-policy.c
@@ -654,6 +654,137 @@  static void test_is_compatible_failure(void)
     }
 }
 
+static void test_topo_from_parts(void)
+{
+    static const struct test {
+        unsigned int threads_per_core;
+        unsigned int cores_per_pkg;
+        struct cpu_policy policy;
+    } tests[] = {
+        {
+            .threads_per_core = 3, .cores_per_pkg = 1,
+            .policy = {
+                .x86_vendor = X86_VENDOR_AMD,
+                .topo.subleaf = {
+                    { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
+                    { .nr_logical = 1, .level = 1, .type = 2, .id_shift = 2, },
+                },
+            },
+        },
+        {
+            .threads_per_core = 1, .cores_per_pkg = 3,
+            .policy = {
+                .x86_vendor = X86_VENDOR_AMD,
+                .topo.subleaf = {
+                    { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, },
+                    { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
+                },
+            },
+        },
+        {
+            .threads_per_core = 7, .cores_per_pkg = 5,
+            .policy = {
+                .x86_vendor = X86_VENDOR_AMD,
+                .topo.subleaf = {
+                    { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, },
+                    { .nr_logical = 5, .level = 1, .type = 2, .id_shift = 6, },
+                },
+            },
+        },
+        {
+            .threads_per_core = 2, .cores_per_pkg = 128,
+            .policy = {
+                .x86_vendor = X86_VENDOR_AMD,
+                .topo.subleaf = {
+                    { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, },
+                    { .nr_logical = 128, .level = 1, .type = 2,
+                      .id_shift = 8, },
+                },
+            },
+        },
+        {
+            .threads_per_core = 3, .cores_per_pkg = 1,
+            .policy = {
+                .x86_vendor = X86_VENDOR_INTEL,
+                .topo.subleaf = {
+                    { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
+                    { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
+                },
+            },
+        },
+        {
+            .threads_per_core = 1, .cores_per_pkg = 3,
+            .policy = {
+                .x86_vendor = X86_VENDOR_INTEL,
+                .topo.subleaf = {
+                    { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, },
+                    { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
+                },
+            },
+        },
+        {
+            .threads_per_core = 7, .cores_per_pkg = 5,
+            .policy = {
+                .x86_vendor = X86_VENDOR_INTEL,
+                .topo.subleaf = {
+                    { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, },
+                    { .nr_logical = 35, .level = 1, .type = 2, .id_shift = 6, },
+                },
+            },
+        },
+        {
+            .threads_per_core = 2, .cores_per_pkg = 128,
+            .policy = {
+                .x86_vendor = X86_VENDOR_INTEL,
+                .topo.subleaf = {
+                    { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, },
+                    { .nr_logical = 256, .level = 1, .type = 2,
+                      .id_shift = 8, },
+                },
+            },
+        },
+    };
+
+    printf("Testing topology synthesis from parts:\n");
+
+    for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
+    {
+        const struct test *t = &tests[i];
+        struct cpu_policy actual = { .x86_vendor = t->policy.x86_vendor };
+        int rc = x86_topo_from_parts(&actual, t->threads_per_core,
+                                     t->cores_per_pkg);
+
+        if ( rc || memcmp(&actual.topo, &t->policy.topo, sizeof(actual.topo)) )
+        {
+#define TOPO(n, f)  t->policy.topo.subleaf[(n)].f, actual.topo.subleaf[(n)].f
+            fail("FAIL[%d] - '%s %u t/c, %u c/p'\n",
+                 rc,
+                 x86_cpuid_vendor_to_str(t->policy.x86_vendor),
+                 t->threads_per_core, t->cores_per_pkg);
+            printf("  subleaf=%u  expected_n=%u actual_n=%u\n"
+                   "             expected_lvl=%u actual_lvl=%u\n"
+                   "             expected_type=%u actual_type=%u\n"
+                   "             expected_shift=%u actual_shift=%u\n",
+                   0,
+                   TOPO(0, nr_logical),
+                   TOPO(0, level),
+                   TOPO(0, type),
+                   TOPO(0, id_shift));
+
+            printf("  subleaf=%u  expected_n=%u actual_n=%u\n"
+                   "             expected_lvl=%u actual_lvl=%u\n"
+                   "             expected_type=%u actual_type=%u\n"
+                   "             expected_shift=%u actual_shift=%u\n",
+                   1,
+                   TOPO(1, nr_logical),
+                   TOPO(1, level),
+                   TOPO(1, type),
+                   TOPO(1, id_shift));
+#undef TOPO
+        }
+    }
+}
+
 int main(int argc, char **argv)
 {
     printf("CPU Policy unit tests\n");
@@ -671,6 +802,8 @@  int main(int argc, char **argv)
     test_is_compatible_success();
     test_is_compatible_failure();
 
+    test_topo_from_parts();
+
     if ( nr_failures )
         printf("Done: %u failures\n", nr_failures);
     else
diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
index f43e1a3b21e9..116b305a1d7f 100644
--- a/xen/include/xen/lib/x86/cpu-policy.h
+++ b/xen/include/xen/lib/x86/cpu-policy.h
@@ -542,6 +542,22 @@  int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
                                     const struct cpu_policy *guest,
                                     struct cpu_policy_errors *err);
 
+/**
+ * Synthesise topology information in `p` given high-level constraints
+ *
+ * Topology is given in various fields accross several leaves, some of
+ * which are vendor-specific. This function uses the policy itself to
+ * derive such leaves from threads/core and cores/package.
+ *
+ * @param p                   CPU policy of the domain.
+ * @param threads_per_core    threads/core. Doesn't need to be a power of 2.
+ * @param cores_per_package   cores/package. Doesn't need to be a power of 2.
+ * @return                    0 on success; -errno on failure
+ */
+int x86_topo_from_parts(struct cpu_policy *p,
+                        unsigned int threads_per_core,
+                        unsigned int cores_per_pkg);
+
 #endif /* !XEN_LIB_X86_POLICIES_H */
 
 /*
diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
index 63bc96451d2c..16b09a427841 100644
--- a/xen/lib/x86/policy.c
+++ b/xen/lib/x86/policy.c
@@ -2,6 +2,94 @@ 
 
 #include <xen/lib/x86/cpu-policy.h>
 
+static unsigned int order(unsigned int n)
+{
+    ASSERT(n); /* clz(0) is UB */
+
+    return 8 * sizeof(n) - __builtin_clz(n);
+}
+
+int x86_topo_from_parts(struct cpu_policy *p,
+                        unsigned int threads_per_core,
+                        unsigned int cores_per_pkg)
+{
+    unsigned int threads_per_pkg = threads_per_core * cores_per_pkg;
+    unsigned int apic_id_size;
+
+    if ( !p || !threads_per_core || !cores_per_pkg )
+        return -EINVAL;
+
+    p->basic.max_leaf = MAX(0xb, p->basic.max_leaf);
+
+    memset(p->topo.raw, 0, sizeof(p->topo.raw));
+
+    /* thread level */
+    p->topo.subleaf[0].nr_logical = threads_per_core;
+    p->topo.subleaf[0].id_shift = 0;
+    p->topo.subleaf[0].level = 0;
+    p->topo.subleaf[0].type = 1;
+    if ( threads_per_core > 1 )
+        p->topo.subleaf[0].id_shift = order(threads_per_core - 1);
+
+    /* core level */
+    p->topo.subleaf[1].nr_logical = cores_per_pkg;
+    if ( p->x86_vendor == X86_VENDOR_INTEL )
+        p->topo.subleaf[1].nr_logical = threads_per_pkg;
+    p->topo.subleaf[1].id_shift = p->topo.subleaf[0].id_shift;
+    p->topo.subleaf[1].level = 1;
+    p->topo.subleaf[1].type = 2;
+    if ( cores_per_pkg > 1 )
+        p->topo.subleaf[1].id_shift += order(cores_per_pkg - 1);
+
+    apic_id_size = p->topo.subleaf[1].id_shift;
+
+    /*
+     * Contrary to what the name might seem to imply. HTT is an enabler for
+     * SMP and there's no harm in setting it even with a single vCPU.
+     */
+    p->basic.htt = true;
+    p->basic.lppp = MIN(0xff, threads_per_pkg);
+
+    switch ( p->x86_vendor )
+    {
+    case X86_VENDOR_INTEL: {
+        struct cpuid_cache_leaf *sl = p->cache.subleaf;
+
+        for ( size_t i = 0; sl->type &&
+                            i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
+        {
+            sl->cores_per_package = cores_per_pkg - 1;
+            sl->threads_per_cache = threads_per_core - 1;
+            if ( sl->type == 3 /* unified cache */ )
+                sl->threads_per_cache = threads_per_pkg - 1;
+        }
+        break;
+    }
+
+    case X86_VENDOR_AMD:
+    case X86_VENDOR_HYGON:
+        /* Expose p->basic.lppp */
+        p->extd.cmp_legacy = true;
+
+        /* Clip NC to the maximum value it can hold */
+        p->extd.nc = MIN(0xff, threads_per_pkg - 1);
+
+        /* TODO: Expose leaf e1E */
+        p->extd.topoext = false;
+
+        /*
+         * Clip APIC ID to 8 bits, as that's what high core-count machines do.
+         *
+         * That's what AMD EPYC 9654 does with >256 CPUs.
+         */
+        p->extd.apic_id_size = MIN(8, apic_id_size);
+
+        break;
+    }
+
+    return 0;
+}
+
 int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
                                     const struct cpu_policy *guest,
                                     struct cpu_policy_errors *err)
diff --git a/xen/lib/x86/private.h b/xen/lib/x86/private.h
index 60bb82a400b7..2ec9dbee33c2 100644
--- a/xen/lib/x86/private.h
+++ b/xen/lib/x86/private.h
@@ -4,6 +4,7 @@ 
 #ifdef __XEN__
 
 #include <xen/bitops.h>
+#include <xen/bug.h>
 #include <xen/guest_access.h>
 #include <xen/kernel.h>
 #include <xen/lib.h>
@@ -17,6 +18,7 @@ 
 
 #else
 
+#include <assert.h>
 #include <errno.h>
 #include <inttypes.h>
 #include <stdbool.h>
@@ -28,6 +30,8 @@ 
 
 #include <xen-tools/common-macros.h>
 
+#define ASSERT(x) assert(x)
+
 static inline bool test_bit(unsigned int bit, const void *vaddr)
 {
     const char *addr = vaddr;

[v6,08/11] xen/lib: Add topology generator for x86

Commit Message

Comments

Patch