Message ID | 9912423b866ed696c375e0a51954d363c3706470.1716976271.git.alejandro.vallejo@cloud.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86: Expose consistent topology to guests | expand |
On Wed, May 29, 2024 at 03:32:31PM +0100, Alejandro Vallejo wrote: > This allows the initial x2APIC ID to be sent on the migration stream. The > hardcoded mapping x2apic_id=2*vcpu_id is maintained for the time being. > Given the vlapic data is zero-extended on restore, fix up migrations from > hosts without the field by setting it to the old convention if zero. > > x2APIC IDs are calculated from the CPU policy where the guest topology is > defined. For the time being, the function simply returns the old > relationship, but will eventually return results consistent with the > topology. > > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Thanks, Roger.
On 29/05/2024 3:32 pm, Alejandro Vallejo wrote: > diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c > index f033d22785be..b70b22d55fcf 100644 > --- a/xen/lib/x86/policy.c > +++ b/xen/lib/x86/policy.c > @@ -2,6 +2,17 @@ > > #include <xen/lib/x86/cpu-policy.h> > > +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id) > +{ > + /* > + * TODO: Derive x2APIC ID from the topology information inside `p` > + * rather than from the vCPU ID alone. This bodge is a temporary > + * measure until all infra is in place to retrieve or derive the > + * initial x2APIC ID from migrated domains. > + */ > + return id * 2; > +} > + I'm afraid it's nonsensical to try and derive x2APIC ID from a policy+vcpu_id. Take a step back, and think the data through. A VM has: * A unique APIC_ID for each vCPU * Info in CPUID describing how to decompose the APIC_ID into topology Right now, because this is all completely broken, we have: * Hardcoded APIC_ID = vCPU_ID * 2 * Total nonsense in CPUID When constructing a VM, the toolstack (given suitable admin guidance/defaults) *must* choose both: * The APIC_ID themselves * The CPUID topo data to match i.e. this series should be editing the toolstack's call to xc_domain_hvm_setcontext(). It's not, because AFAICT you're depending on the migration compatibility logic and inserting a new hardcoded assumption about symmetry of the layout. The data flows we need are: (New) create: * Toolstack chooses both parts of topo information * Xen needs a default, which reasonably can be APIC_ID=vCPU_ID when the rest of the data flow has been cleaned up. But this is needs to be explicit in vcpu_create() and without reference to the policy. And to be clear, it's fine for now for the toolstack to choose a symmetric layout and pick appropriate APIC_IDs+CPUID for this, but it needs to be the toolstack making this decision, not Xen inventing state out of thin air based on the toolstack only giving half the information. (New) migrate: * Data from the stream, exactly as presented (Compat) migrate: * Synthesize the missing xapic_id field in LAPIC_REGs as APIC_ID=vCPU_ID * 2. I'm pretty sure this will be a net reduction in complexity in this series. It definitely reduces the Xen complexity. ~Andrew
On 30/05/2024 12:08, Andrew Cooper wrote: > On 29/05/2024 3:32 pm, Alejandro Vallejo wrote: >> diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c >> index f033d22785be..b70b22d55fcf 100644 >> --- a/xen/lib/x86/policy.c >> +++ b/xen/lib/x86/policy.c >> @@ -2,6 +2,17 @@ >> >> #include <xen/lib/x86/cpu-policy.h> >> >> +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id) >> +{ >> + /* >> + * TODO: Derive x2APIC ID from the topology information inside `p` >> + * rather than from the vCPU ID alone. This bodge is a temporary >> + * measure until all infra is in place to retrieve or derive the >> + * initial x2APIC ID from migrated domains. >> + */ >> + return id * 2; >> +} >> + > > I'm afraid it's nonsensical to try and derive x2APIC ID from a > policy+vcpu_id. That's debatable, and we clearly have different views, however... > > Take a step back, and think the data through. > > A VM has: > * A unique APIC_ID for each vCPU > * Info in CPUID describing how to decompose the APIC_ID into topology > > Right now, because this is all completely broken, we have: > * Hardcoded APIC_ID = vCPU_ID * 2 > * Total nonsense in CPUID > > > When constructing a VM, the toolstack (given suitable admin > guidance/defaults) *must* choose both: > * The APIC_ID themselves > * The CPUID topo data to match > > i.e. this series should be editing the toolstack's call to > xc_domain_hvm_setcontext(). > > It's not, because AFAICT you're depending on the migration compatibility > logic and inserting a new hardcoded assumption about symmetry of the layout. > > > The data flows we need are: > > (New) create: > * Toolstack chooses both parts of topo information > * Xen needs a default, which reasonably can be APIC_ID=vCPU_ID when the > rest of the data flow has been cleaned up. But this is needs to be > explicit in vcpu_create() and without reference to the policy. > > And to be clear, it's fine for now for the toolstack to choose a > symmetric layout and pick appropriate APIC_IDs+CPUID for this, but it > needs to be the toolstack making this decision, not Xen inventing state > out of thin air based on the toolstack only giving half the information. > > (New) migrate: > * Data from the stream, exactly as presented > > (Compat) migrate: > * Synthesize the missing xapic_id field in LAPIC_REGs as APIC_ID=vCPU_ID > * 2. > > I'm pretty sure this will be a net reduction in complexity in this > series. It definitely reduces the Xen complexity. > > ~Andrew ... I didn't know toolstack could send hvmcontexts during non-migrated domain creation. That's neat! I was going to defend my approach (because it does make sense), but there's an extra benefit from yours you didn't seem to notice. With the x2apicid in the migration stream (patches 1 and parts of 2) it's not only possible to set the APIC ID from toolstack per vCPU with the contexts, but it would also allow toolstack to be responsible to preinitialize all the APICs in x2apic mode when any of them is 255 or more. I'll try to do that soon-ish. I suspect the pain points are going to be making it work nicely as well on 1vCPU systems with no APIC (are those expected to work?). I'm not looking forward to re-testing all of this again... Cheers, Alejandro
On Thu, May 30, 2024 at 02:48:10PM +0100, Alejandro Vallejo wrote: > I'll try to do that soon-ish. I suspect the pain points are going to be > making it work nicely as well on 1vCPU systems with no APIC (are > those expected to work?). We do not allow creation of PVH/HVM domains without an emulated local APIC, and I don't think we ever want to allow doing so (see emulation_flags_ok()). Thanks, Roger.
On Thu, May 30, 2024 at 12:08:26PM +0100, Andrew Cooper wrote: > On 29/05/2024 3:32 pm, Alejandro Vallejo wrote: > > diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c > > index f033d22785be..b70b22d55fcf 100644 > > --- a/xen/lib/x86/policy.c > > +++ b/xen/lib/x86/policy.c > > @@ -2,6 +2,17 @@ > > > > #include <xen/lib/x86/cpu-policy.h> > > > > +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id) > > +{ > > + /* > > + * TODO: Derive x2APIC ID from the topology information inside `p` > > + * rather than from the vCPU ID alone. This bodge is a temporary > > + * measure until all infra is in place to retrieve or derive the > > + * initial x2APIC ID from migrated domains. > > + */ > > + return id * 2; > > +} > > + > > I'm afraid it's nonsensical to try and derive x2APIC ID from a > policy+vcpu_id. > > Take a step back, and think the data through. > > A VM has: > * A unique APIC_ID for each vCPU > * Info in CPUID describing how to decompose the APIC_ID into topology > > Right now, because this is all completely broken, we have: > * Hardcoded APIC_ID = vCPU_ID * 2 > * Total nonsense in CPUID > > > When constructing a VM, the toolstack (given suitable admin > guidance/defaults) *must* choose both: > * The APIC_ID themselves > * The CPUID topo data to match > > i.e. this series should be editing the toolstack's call to > xc_domain_hvm_setcontext(). > > It's not, because AFAICT you're depending on the migration compatibility > logic and inserting a new hardcoded assumption about symmetry of the layout. > > > The data flows we need are: > > (New) create: > * Toolstack chooses both parts of topo information > * Xen needs a default, which reasonably can be APIC_ID=vCPU_ID when the > rest of the data flow has been cleaned up. But this is needs to be > explicit in vcpu_create() and without reference to the policy. Doesn't using APIC_ID=vCPU_ID limits us to only being able to expose certain typologies? (as vCPU IDs are contiguous). For example exposing a topology with 3 cores per package won't be possible? Not saying it's a bad move to start this way, but if we want to support exposing more exotic topology sooner or later we will need some kind of logic that assigns the APIC IDs based on the knowledge of the expected topology. Whether is gets such knowledge from the CPU policy or directly from the toolstack is another question. Thanks, Roger.
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c index 7a38e032146a..ebcdbc5cbc5d 100644 --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -139,10 +139,9 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, const struct cpu_user_regs *regs; case 0x1: - /* TODO: Rework topology logic. */ res->b &= 0x00ffffffu; if ( is_hvm_domain(d) ) - res->b |= (v->vcpu_id * 2) << 24; + res->b |= vlapic_x2apic_id(vcpu_vlapic(v)) << 24; /* TODO: Rework vPMU control in terms of toolstack choices. */ if ( vpmu_available(v) && @@ -312,18 +311,15 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, case 0xb: /* - * In principle, this leaf is Intel-only. In practice, it is tightly - * coupled with x2apic, and we offer an x2apic-capable APIC emulation - * to guests on AMD hardware as well. - * - * TODO: Rework topology logic. + * Don't expose topology information to PV guests. Exposed on HVM + * along with x2APIC because they are tightly coupled. */ - if ( p->basic.x2apic ) + if ( is_hvm_domain(d) && p->basic.x2apic ) { *(uint8_t *)&res->c = subleaf; /* Fix the x2APIC identifier. */ - res->d = v->vcpu_id * 2; + res->d = vlapic_x2apic_id(vcpu_vlapic(v)); } break; diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index a0df62b5ec0a..626a6258a4d4 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -1072,7 +1072,7 @@ static uint32_t x2apic_ldr_from_id(uint32_t id) static void set_x2apic_id(struct vlapic *vlapic) { const struct vcpu *v = vlapic_vcpu(vlapic); - uint32_t apic_id = v->vcpu_id * 2; + uint32_t apic_id = vlapic->hw.x2apic_id; uint32_t apic_ldr = x2apic_ldr_from_id(apic_id); /* @@ -1086,6 +1086,26 @@ static void set_x2apic_id(struct vlapic *vlapic) vlapic_set_reg(vlapic, APIC_LDR, apic_ldr); } +void vlapic_cpu_policy_changed(struct vcpu *v) +{ + struct vlapic *vlapic = vcpu_vlapic(v); + const struct cpu_policy *cp = v->domain->arch.cpu_policy; + + /* + * Don't override the initial x2APIC ID if we have migrated it or + * if the domain doesn't have vLAPIC at all. + */ + if ( !has_vlapic(v->domain) || vlapic->loaded.hw ) + return; + + vlapic->hw.x2apic_id = x86_x2apic_id_from_vcpu_id(cp, v->vcpu_id); + + if ( vlapic_x2apic_mode(vlapic) ) + set_x2apic_id(vlapic); /* Set the APIC ID _and_ the LDR */ + else + vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id)); +} + int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val) { const struct cpu_policy *cp = v->domain->arch.cpu_policy; @@ -1452,7 +1472,7 @@ void vlapic_reset(struct vlapic *vlapic) if ( v->vcpu_id == 0 ) vlapic->hw.apic_base_msr |= APIC_BASE_BSP; - vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24); + vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id)); vlapic_do_init(vlapic); } @@ -1520,6 +1540,16 @@ static void lapic_load_fixup(struct vlapic *vlapic) const struct vcpu *v = vlapic_vcpu(vlapic); uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id); + /* + * Loading record without hw.x2apic_id in the save stream, calculate using + * the traditional "vcpu_id * 2" relation. There's an implicit assumption + * that vCPU0 always has x2APIC0, which is true for the old relation, and + * still holds under the new x2APIC generation algorithm. While that case + * goes through the conditional it's benign because it still maps to zero. + */ + if ( !vlapic->hw.x2apic_id ) + vlapic->hw.x2apic_id = v->vcpu_id * 2; + /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */ if ( !vlapic_x2apic_mode(vlapic) || (vlapic->loaded.ldr == good_ldr) ) @@ -1588,6 +1618,13 @@ static int cf_check lapic_check_hidden(const struct domain *d, APIC_BASE_EXTD ) return -EINVAL; + /* + * Fail migrations from newer versions of Xen where + * rsvd_zero is interpreted as something else. + */ + if ( s.rsvd_zero ) + return -EINVAL; + return 0; } diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h index 1c01e22c8e62..746b4739f53f 100644 --- a/xen/arch/x86/include/asm/hvm/hvm.h +++ b/xen/arch/x86/include/asm/hvm/hvm.h @@ -16,6 +16,7 @@ #include <asm/current.h> #include <asm/x86_emulate.h> #include <asm/hvm/asid.h> +#include <asm/hvm/vlapic.h> struct pirq; /* needed by pi_update_irte */ @@ -448,6 +449,7 @@ static inline void hvm_update_guest_efer(struct vcpu *v) static inline void hvm_cpuid_policy_changed(struct vcpu *v) { alternative_vcall(hvm_funcs.cpuid_policy_changed, v); + vlapic_cpu_policy_changed(v); } static inline void hvm_set_tsc_offset(struct vcpu *v, uint64_t offset, diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h index 2c4ff94ae7a8..34f23cd38a20 100644 --- a/xen/arch/x86/include/asm/hvm/vlapic.h +++ b/xen/arch/x86/include/asm/hvm/vlapic.h @@ -44,6 +44,7 @@ #define vlapic_xapic_mode(vlapic) \ (!vlapic_hw_disabled(vlapic) && \ !((vlapic)->hw.apic_base_msr & APIC_BASE_EXTD)) +#define vlapic_x2apic_id(vlapic) ((vlapic)->hw.x2apic_id) /* * Generic APIC bitmap vector update & search routines. @@ -107,6 +108,7 @@ int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool force_ack); int vlapic_init(struct vcpu *v); void vlapic_destroy(struct vcpu *v); +void vlapic_cpu_policy_changed(struct vcpu *v); void vlapic_reset(struct vlapic *vlapic); diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h index 7ecacadde165..1c2ec669ffc9 100644 --- a/xen/include/public/arch-x86/hvm/save.h +++ b/xen/include/public/arch-x86/hvm/save.h @@ -394,6 +394,8 @@ struct hvm_hw_lapic { uint32_t disabled; /* VLAPIC_xx_DISABLED */ uint32_t timer_divisor; uint64_t tdt_msr; + uint32_t x2apic_id; + uint32_t rsvd_zero; }; DECLARE_HVM_SAVE_TYPE(LAPIC, 5, struct hvm_hw_lapic); diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h index d5e447e9dc06..392320b9adbe 100644 --- a/xen/include/xen/lib/x86/cpu-policy.h +++ b/xen/include/xen/lib/x86/cpu-policy.h @@ -542,6 +542,15 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host, const struct cpu_policy *guest, struct cpu_policy_errors *err); +/** + * Calculates the x2APIC ID of a vCPU given a CPU policy + * + * @param p CPU policy of the domain. + * @param id vCPU ID of the vCPU. + * @returns x2APIC ID of the vCPU. + */ +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id); + #endif /* !XEN_LIB_X86_POLICIES_H */ /* diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c index f033d22785be..b70b22d55fcf 100644 --- a/xen/lib/x86/policy.c +++ b/xen/lib/x86/policy.c @@ -2,6 +2,17 @@ #include <xen/lib/x86/cpu-policy.h> +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id) +{ + /* + * TODO: Derive x2APIC ID from the topology information inside `p` + * rather than from the vCPU ID alone. This bodge is a temporary + * measure until all infra is in place to retrieve or derive the + * initial x2APIC ID from migrated domains. + */ + return id * 2; +} + int x86_cpu_policies_are_compatible(const struct cpu_policy *host, const struct cpu_policy *guest, struct cpu_policy_errors *err)
This allows the initial x2APIC ID to be sent on the migration stream. The hardcoded mapping x2apic_id=2*vcpu_id is maintained for the time being. Given the vlapic data is zero-extended on restore, fix up migrations from hosts without the field by setting it to the old convention if zero. x2APIC IDs are calculated from the CPU policy where the guest topology is defined. For the time being, the function simply returns the old relationship, but will eventually return results consistent with the topology. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- v3: * Added rsvd_zero check to the check hook (introduced in v3/patch1). * Set APIC ID properly during policy update if the APIC is already in x2apic mode, ensuring its LDR is updated too in that case. * Fixed typo in variable for x86_x2apic_id_from_vcpu_id(). * Missed due to being mid-series. * Rewrote the comment on CPUID leaf 0xb. * Rewrote the comment on x86_x2apic_id_from_vcpu_id() --- xen/arch/x86/cpuid.c | 14 ++++----- xen/arch/x86/hvm/vlapic.c | 41 ++++++++++++++++++++++++-- xen/arch/x86/include/asm/hvm/hvm.h | 2 ++ xen/arch/x86/include/asm/hvm/vlapic.h | 2 ++ xen/include/public/arch-x86/hvm/save.h | 2 ++ xen/include/xen/lib/x86/cpu-policy.h | 9 ++++++ xen/lib/x86/policy.c | 11 +++++++ 7 files changed, 70 insertions(+), 11 deletions(-)