diff mbox series

[17/21] libs/guest: introduce helper set cpu topology in cpu policy

Message ID 20210323095849.37858-18-roger.pau@citrix.com (mailing list archive)
State Superseded
Headers show
Series libs/guest: new CPUID/MSR interface | expand

Commit Message

Roger Pau Monne March 23, 2021, 9:58 a.m. UTC
This logic is pulled out from xc_cpuid_apply_policy and placed into a
separate helper.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 tools/include/xenctrl.h         |   4 +
 tools/libs/guest/xg_cpuid_x86.c | 181 +++++++++++++++++---------------
 2 files changed, 102 insertions(+), 83 deletions(-)

Comments

Andrew Cooper April 1, 2021, 5:22 p.m. UTC | #1
On 23/03/2021 09:58, Roger Pau Monne wrote:
> This logic is pulled out from xc_cpuid_apply_policy and placed into a
> separate helper.
>
> No functional change intended.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  tools/include/xenctrl.h         |   4 +
>  tools/libs/guest/xg_cpuid_x86.c | 181 +++++++++++++++++---------------
>  2 files changed, 102 insertions(+), 83 deletions(-)
>
> diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
> index 6f7158156fa..9f94e61523e 100644
> --- a/tools/include/xenctrl.h
> +++ b/tools/include/xenctrl.h
> @@ -2631,6 +2631,10 @@ int xc_cpu_policy_calc_compatible(xc_interface *xch,
>  int xc_cpu_policy_make_compatible(xc_interface *xch, xc_cpu_policy_t policy,
>                                    bool hvm);
>  
> +/* Setup the policy topology. */
> +int xc_cpu_policy_topology(xc_interface *xch, xc_cpu_policy_t policy,
> +                           bool hvm);

I'm not sure how I feel about this.  It's repeating the mistake we've
currently got with topology handling.

One part of it needs to be part of "compatible".  We need to run the
below logic, *in this form* as part of magic-ing a policy out of thin
air for the incoming VM with no data.

However, for any non-broken logic, the caller needs to specify the
topology which wishes to be expressed.

Do we want SMT at all?  Do we want 1, 2, 4 or other threads/core.
How many cores per socket?  Its very common these days for
non-power-of-2 numbers.  Our default case ought to be to match the host
topology.

Do we want to support 3 threads/core?  Sure - its weird to think about,
but its semantically equivalent to using non-power-of-2 numbers at other
levels, and would certainly be useful to express for testing purposes.

What about Intel's leaf 0x1f with the SMT > Core > Module > Tile > Die
topology layout?

The answers to these questions also need to fix Xen so that APIC_ID
isn't vcpu_id * 2 (which is horribly broken on non-Intel or Intel
Knight* hardware).  It also needs to change how the MADT is written for
guests, and how the IO-APIC IDs are assigned (matters for the AMD
topology algorithms).

There are further implications.  Should we prohibit creating a 4-vcpu VM
with cores/socket=128?  A regular kernel will demand an IOMMU for this
configuration as we end up with APIC IDs above 255.  OTOH, there are
also virtualisation schemes now to support 32k vcpus without an IOMMU,
which KVM and HyperV now speak.

Fixing our topology problems is a monumental can of worms.  While we
should keep it in mind, we should try not to conflate it with "make
libxl/libxc's CPUID logic more sane, and include MSRs", which is large
enough task on its own.

What I suspect we want in the short term is
xc_cpu_policy_legacy_adjust() or equivalent, which is very clear that it
is a transitional API only, which for now can be used everywhere where
xc_cpuid_apply_policy() is used.  As we pull various logical areas out
of this, we'll adjust the callers appropriately, and eventually delete
this function.

~Andrew
diff mbox series

Patch

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 6f7158156fa..9f94e61523e 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -2631,6 +2631,10 @@  int xc_cpu_policy_calc_compatible(xc_interface *xch,
 int xc_cpu_policy_make_compatible(xc_interface *xch, xc_cpu_policy_t policy,
                                   bool hvm);
 
+/* Setup the policy topology. */
+int xc_cpu_policy_topology(xc_interface *xch, xc_cpu_policy_t policy,
+                           bool hvm);
+
 int xc_get_cpu_levelling_caps(xc_interface *xch, uint32_t *caps);
 int xc_get_cpu_featureset(xc_interface *xch, uint32_t index,
                           uint32_t *nr_features, uint32_t *featureset);
diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
index 2abaf400a2b..d50822c0abb 100644
--- a/tools/libs/guest/xg_cpuid_x86.c
+++ b/tools/libs/guest/xg_cpuid_x86.c
@@ -433,13 +433,11 @@  int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
 {
     int rc;
     xc_dominfo_t di;
-    unsigned int i, nr_leaves, nr_msrs;
+    unsigned int nr_leaves, nr_msrs;
     xen_cpuid_leaf_t *leaves = NULL;
     struct cpuid_policy *p = NULL;
     struct cpu_policy policy = { };
     uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
-    uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {};
-    uint32_t len = ARRAY_SIZE(host_featureset);
 
     if ( xc_domain_getinfo(xch, domid, 1, &di) != 1 ||
          di.domid != domid )
@@ -462,22 +460,6 @@  int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
          (p = calloc(1, sizeof(*p))) == NULL )
         goto out;
 
-    /* Get the host policy. */
-    rc = xc_get_cpu_featureset(xch, XEN_SYSCTL_cpu_featureset_host,
-                               &len, host_featureset);
-    if ( rc )
-    {
-        /* Tolerate "buffer too small", as we've got the bits we need. */
-        if ( errno == ENOBUFS )
-            rc = 0;
-        else
-        {
-            PERROR("Failed to obtain host featureset");
-            rc = -errno;
-            goto out;
-        }
-    }
-
     /* Get the domain's default policy. */
     nr_msrs = 0;
     rc = get_system_cpu_policy(xch, di.hvm ? XEN_SYSCTL_cpu_policy_hvm_default
@@ -564,70 +546,10 @@  int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
         }
     }
 
-    if ( !di.hvm )
-    {
-        /*
-         * On hardware without CPUID Faulting, PV guests see real topology.
-         * As a consequence, they also need to see the host htt/cmp fields.
-         */
-        p->basic.htt       = test_bit(X86_FEATURE_HTT, host_featureset);
-        p->extd.cmp_legacy = test_bit(X86_FEATURE_CMP_LEGACY, host_featureset);
-    }
-    else
-    {
-        /*
-         * Topology for HVM guests is entirely controlled by Xen.  For now, we
-         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
-         */
-        p->basic.htt = true;
-        p->extd.cmp_legacy = false;
-
-        /*
-         * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
-         * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
-         * overflow.
-         */
-        if ( !(p->basic.lppp & 0x80) )
-            p->basic.lppp *= 2;
-
-        switch ( p->x86_vendor )
-        {
-        case X86_VENDOR_INTEL:
-            for ( i = 0; (p->cache.subleaf[i].type &&
-                          i < ARRAY_SIZE(p->cache.raw)); ++i )
-            {
-                p->cache.subleaf[i].cores_per_package =
-                    (p->cache.subleaf[i].cores_per_package << 1) | 1;
-                p->cache.subleaf[i].threads_per_cache = 0;
-            }
-            break;
-
-        case X86_VENDOR_AMD:
-        case X86_VENDOR_HYGON:
-            /*
-             * Leaf 0x80000008 ECX[15:12] is ApicIdCoreSize.
-             * Leaf 0x80000008 ECX[7:0] is NumberOfCores (minus one).
-             * Update to reflect vLAPIC_ID = vCPU_ID * 2.  But avoid
-             * - overflow,
-             * - going out of sync with leaf 1 EBX[23:16],
-             * - incrementing ApicIdCoreSize when it's zero (which changes the
-             *   meaning of bits 7:0).
-             *
-             * UPDATE: I addition to avoiding overflow, some
-             * proprietary operating systems have trouble with
-             * apic_id_size values greater than 7.  Limit the value to
-             * 7 for now.
-             */
-            if ( p->extd.nc < 0x7f )
-            {
-                if ( p->extd.apic_id_size != 0 && p->extd.apic_id_size < 0x7 )
-                    p->extd.apic_id_size++;
-
-                p->extd.nc = (p->extd.nc << 1) | 1;
-            }
-            break;
-        }
-    }
+    policy.cpuid = p;
+    rc = xc_cpu_policy_topology(xch, &policy, di.hvm);
+    if ( rc )
+        goto out;
 
     rc = x86_cpuid_copy_to_buffer(p, leaves, &nr_leaves);
     if ( rc )
@@ -1257,3 +1179,96 @@  int xc_cpu_policy_make_compatible(xc_interface *xch, xc_cpu_policy_t policy,
     xc_cpu_policy_destroy(host);
     return rc;
 }
+
+int xc_cpu_policy_topology(xc_interface *xch, xc_cpu_policy_t policy,
+                           bool hvm)
+{
+    if ( !hvm )
+    {
+        xc_cpu_policy_t host;
+        int rc;
+
+        host = xc_cpu_policy_init();
+        if ( !host )
+        {
+            errno = ENOMEM;
+            return -1;
+        }
+
+        rc = xc_cpu_policy_get_system(xch, XEN_SYSCTL_cpu_policy_host, host);
+        if ( rc )
+        {
+            ERROR("Failed to get host policy");
+            xc_cpu_policy_destroy(host);
+            return rc;
+        }
+
+
+        /*
+         * On hardware without CPUID Faulting, PV guests see real topology.
+         * As a consequence, they also need to see the host htt/cmp fields.
+         */
+        policy->cpuid->basic.htt = host->cpuid->basic.htt;
+        policy->cpuid->extd.cmp_legacy = host->cpuid->extd.cmp_legacy;
+    }
+    else
+    {
+        unsigned int i;
+
+        /*
+         * Topology for HVM guests is entirely controlled by Xen.  For now, we
+         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
+         */
+        policy->cpuid->basic.htt = true;
+        policy->cpuid->extd.cmp_legacy = false;
+
+        /*
+         * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
+         * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
+         * overflow.
+         */
+        if ( !(policy->cpuid->basic.lppp & 0x80) )
+            policy->cpuid->basic.lppp *= 2;
+
+        switch ( policy->cpuid->x86_vendor )
+        {
+        case X86_VENDOR_INTEL:
+            for ( i = 0; (policy->cpuid->cache.subleaf[i].type &&
+                          i < ARRAY_SIZE(policy->cpuid->cache.raw)); ++i )
+            {
+                policy->cpuid->cache.subleaf[i].cores_per_package =
+                  (policy->cpuid->cache.subleaf[i].cores_per_package << 1) | 1;
+                policy->cpuid->cache.subleaf[i].threads_per_cache = 0;
+            }
+            break;
+
+        case X86_VENDOR_AMD:
+        case X86_VENDOR_HYGON:
+            /*
+             * Leaf 0x80000008 ECX[15:12] is ApicIdCoreSize.
+             * Leaf 0x80000008 ECX[7:0] is NumberOfCores (minus one).
+             * Update to reflect vLAPIC_ID = vCPU_ID * 2.  But avoid
+             * - overflow,
+             * - going out of sync with leaf 1 EBX[23:16],
+             * - incrementing ApicIdCoreSize when it's zero (which changes the
+             *   meaning of bits 7:0).
+             *
+             * UPDATE: I addition to avoiding overflow, some
+             * proprietary operating systems have trouble with
+             * apic_id_size values greater than 7.  Limit the value to
+             * 7 for now.
+             */
+            if ( policy->cpuid->extd.nc < 0x7f )
+            {
+                if ( policy->cpuid->extd.apic_id_size != 0 &&
+                     policy->cpuid->extd.apic_id_size < 0x7 )
+                    policy->cpuid->extd.apic_id_size++;
+
+                policy->cpuid->extd.nc = (policy->cpuid->extd.nc << 1) | 1;
+            }
+            break;
+        }
+    }
+
+    return 0;
+}