Message ID | 20231121162604.19405-1-alejandro.vallejo@cloud.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] xen/x86: On x2APIC mode, derive LDR from APIC_ID | expand |
On Tue, Nov 21, 2023 at 04:26:04PM +0000, Alejandro Vallejo wrote: > Both Intel and AMD manuals agree that on x2APIC mode, the APIC LDR and ID > registers are derivable from each other through a fixed formula. > > Xen uses that formula, but applies it to vCPU IDs (which are sequential) > rather than x2APIC_IDs (which are not, at the moment). As I understand it, > this is an attempt to tightly pack vCPUs into clusters so each cluster has > 16 vCPUs rather than 8, but this is problematic for OSs that might read the > x2APIC_ID and internally derive LDR (or the other way around) Ugh, forgot about Roger's commit message request >I would replace the underscore from x2APIC ID with a space instead. Happy for that to happen on commit if the rest looks ok. Cheers, Alejandro
On Tue, Nov 21, 2023 at 04:26:04PM +0000, Alejandro Vallejo wrote: > Both Intel and AMD manuals agree that on x2APIC mode, the APIC LDR and ID > registers are derivable from each other through a fixed formula. > > Xen uses that formula, but applies it to vCPU IDs (which are sequential) > rather than x2APIC_IDs (which are not, at the moment). As I understand it, > this is an attempt to tightly pack vCPUs into clusters so each cluster has > 16 vCPUs rather than 8, but this is problematic for OSs that might read the > x2APIC_ID and internally derive LDR (or the other way around) > > This patch fixes the implementation so we follow the rules in the x2APIC > spec(s). > > The patch also covers migrations from broken hypervisors, so LDRs are > preserved even for hotppluggable CPUs and across APIC resets. > > Fixes: f9e0cccf7b35 ("x86/HVM: fix ID handling of x2APIC emulation") > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> LGTM, just a couple of style comments. > --- > I tested this by creating 3 checkpoints. > 1. One with pre-4.4 broken state (every LDR=1, by hacking save_regs) > 2. One with 4.4 onwards broken state (LDRs packed in their clusters) > 3. One with correct LDR values > > (1) and (3) restores to the same thing. Consistent APIC_ID+LDR > (2) restores to what it previously had and hotplugs follow the same logic > --- > xen/arch/x86/hvm/vlapic.c | 81 +++++++++++++++++++-------- > xen/arch/x86/include/asm/hvm/domain.h | 13 +++++ > 2 files changed, 72 insertions(+), 22 deletions(-) > > diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c > index a8e87c4446..7f169f1e5f 100644 > --- a/xen/arch/x86/hvm/vlapic.c > +++ b/xen/arch/x86/hvm/vlapic.c > @@ -1061,13 +1061,23 @@ static const struct hvm_mmio_ops vlapic_mmio_ops = { > .write = vlapic_mmio_write, > }; > > +static uint32_t x2apic_ldr_from_id(uint32_t id) > +{ > + return ((id & ~0xF) << 12) | (1 << (id & 0xF)); Seeing other usages in vlapic.c I think the preference is to use lower case for hex numbers. > +} > + > static void set_x2apic_id(struct vlapic *vlapic) > { > - u32 id = vlapic_vcpu(vlapic)->vcpu_id; > - u32 ldr = ((id & ~0xf) << 12) | (1 << (id & 0xf)); > + uint32_t vcpu_id = vlapic_vcpu(vlapic)->vcpu_id; > + uint32_t apic_id = vcpu_id * 2; > + uint32_t apic_ldr = x2apic_ldr_from_id(apic_id); > > - vlapic_set_reg(vlapic, APIC_ID, id * 2); > - vlapic_set_reg(vlapic, APIC_LDR, ldr); > + /* This is a migration bug workaround. See wall of text in lapic_load_fixup() */ > + if ( vlapic_domain(vlapic)->arch.hvm.has_inconsistent_x2apic_ldr_bug ) > + apic_ldr = x2apic_ldr_from_id(vcpu_id); > + > + vlapic_set_reg(vlapic, APIC_ID, apic_id); > + vlapic_set_reg(vlapic, APIC_LDR, apic_ldr); > } > > int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val) > @@ -1495,30 +1505,57 @@ static int cf_check lapic_save_regs(struct vcpu *v, hvm_domain_context_t *h) > /* > * Following lapic_load_hidden()/lapic_load_regs() we may need to > * correct ID and LDR when they come from an old, broken hypervisor. > + * > + * Xen <= 4.4 had a bug by which all the APICs configured in x2APIC mode > + * got LDR = 1. This was fixed back then, but another bug was introduced > + * causing APIC ID and LDR to break the consistency they are meant to have > + * according to the specs (LDR was derived from vCPU ID, rather than APIC > + * ID) > + * > + * Long story short, we can detect both cases here. For the LDR=1 case we > + * want to fix it up on migrate, as IPIs just don't work on non-physical > + * mode otherwise. For the other case we actually want to preserve previous > + * behaviour so that existing running instances that may have already read > + * the LDR at the source host aren't surprised when IPIs stop working as > + * they did at the other end. > + * > + * Note that "x2apic_id == 0" has always been correct and can't be used to > + * discriminate these cases. > + * I think it's best if this big comment was split between the relevant parts of the if below used to detect the broken states, as that makes the comments more in-place with the code. > + * Yuck! > */ > static void lapic_load_fixup(struct vlapic *vlapic) > { > - uint32_t id = vlapic->loaded.id; > + /* > + * This LDR would be present in broken versions of Xen 4.4 through 4.18. > + * It's correct for the cpu with x2apic_id=0 (vcpu0), but invalid for > + * any other. > + */ > + uint32_t bad_ldr = x2apic_ldr_from_id(vlapic_vcpu(vlapic)->vcpu_id); > > - if ( vlapic_x2apic_mode(vlapic) && id && vlapic->loaded.ldr == 1 ) > - { > + /* > + * No need to perform fixups in non-x2apic mode, and x2apic_id == 0 has > + * always been correct. > + */ > + if ( !vlapic_x2apic_mode(vlapic) || !vlapic->loaded.id ) You could replace the !vlapic->loaded.id check and instead use: vlapic->loaded.ldr == x2apic_ldr_from_id(vlapic->loaded.id) As that will allow returning early from the function if the LDR is correct. Then if none of the fixups below apply we could print a warning message that the LDR is incorrect, but cannot be fixed up. > + return; > + > + if ( vlapic->loaded.ldr == 1 ) > + /* > + * Migration from a broken Xen 4.4 or earlier. We can't leave it > + * as-is because it assigned the same LDR to every CPU. We'll fix > + * the bug now and assign LDR values consistent with the APIC ID. > + */ > + set_x2apic_id(vlapic); Previous code also did some checks here related to APIC ID sanity, which are now dropped? Might be worth mentioning in the commit message, if that was intended. > + else if ( bad_ldr == vlapic->loaded.ldr ) > /* > - * This is optional: ID != 0 contradicts LDR == 1. It's being added > - * to aid in eventual debugging of issues arising from the fixup done > - * here, but can be dropped as soon as it is found to conflict with > - * other (future) changes. > + * This is a migration from a broken Xen between 4.4 and 4.18 and we > + * must _PRESERVE_ LDRs so new vCPUs use consistent derivations. In > + * this case we set this domain boolean so future CPU hotplugs > + * derive an LDR consistent with the older Xen's broken idea of > + * consistency. > */ > - if ( GET_xAPIC_ID(id) != vlapic_vcpu(vlapic)->vcpu_id * 2 || > - id != SET_xAPIC_ID(GET_xAPIC_ID(id)) ) > - printk(XENLOG_G_WARNING "%pv: bogus APIC ID %#x loaded\n", > - vlapic_vcpu(vlapic), id); > - set_x2apic_id(vlapic); > - } > - else /* Undo an eventual earlier fixup. */ > - { > - vlapic_set_reg(vlapic, APIC_ID, id); > - vlapic_set_reg(vlapic, APIC_LDR, vlapic->loaded.ldr); > - } > + vlapic_domain(vlapic)->arch.hvm.has_inconsistent_x2apic_ldr_bug = true; > } > > static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h) > diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h > index 6e53ce4449..a42a6e99bb 100644 > --- a/xen/arch/x86/include/asm/hvm/domain.h > +++ b/xen/arch/x86/include/asm/hvm/domain.h > @@ -61,6 +61,19 @@ struct hvm_domain { > /* Cached CF8 for guest PCI config cycles */ > uint32_t pci_cf8; > > + /* > + * Xen had a bug between 4.4 and 4.18 by which the x2APIC LDR was > + * derived from the vcpu_id rather than the x2APIC ID. This is wrong, > + * but changing this behaviour is tricky because guests might have > + * already read the LDR and used it accordingly. In the interest of not > + * breaking migrations from those hypervisors we track here whether > + * this domain suffers from this or not so a hotplugged vCPU or an APIC > + * reset can recover the same LDR it would've had on the older host. > + * > + * Yuck! > + */ > + bool has_inconsistent_x2apic_ldr_bug; Could you place the new field after the existing boolean fields in the struct? (AFAICT there's plenty of padding left there) I also think the field name is too long, I would rather use x2apic_ldr_vcpu_id for example (to note that LDR is calculated from vCPU ID rather than APIC ID). I think it would be good if we could trim a bit the comments, as I get the impression it's a bit repetitive. So I would leave the big explanation in lapic_load_fixup(), and just comment here: /* Compatibility setting for a bug in x2APIC LDR format. */ Thanks, Roger.
On Wed, Nov 22, 2023 at 02:40:02PM +0100, Roger Pau Monné wrote: > On Tue, Nov 21, 2023 at 04:26:04PM +0000, Alejandro Vallejo wrote: > > Both Intel and AMD manuals agree that on x2APIC mode, the APIC LDR and ID > > registers are derivable from each other through a fixed formula. > > > > Xen uses that formula, but applies it to vCPU IDs (which are sequential) > > rather than x2APIC_IDs (which are not, at the moment). As I understand it, > > this is an attempt to tightly pack vCPUs into clusters so each cluster has > > 16 vCPUs rather than 8, but this is problematic for OSs that might read the > > x2APIC_ID and internally derive LDR (or the other way around) > > > > This patch fixes the implementation so we follow the rules in the x2APIC > > spec(s). > > > > The patch also covers migrations from broken hypervisors, so LDRs are > > preserved even for hotppluggable CPUs and across APIC resets. > > > > Fixes: f9e0cccf7b35 ("x86/HVM: fix ID handling of x2APIC emulation") > > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> > > LGTM, just a couple of style comments. > > > --- > > I tested this by creating 3 checkpoints. > > 1. One with pre-4.4 broken state (every LDR=1, by hacking save_regs) > > 2. One with 4.4 onwards broken state (LDRs packed in their clusters) > > 3. One with correct LDR values > > > > (1) and (3) restores to the same thing. Consistent APIC_ID+LDR > > (2) restores to what it previously had and hotplugs follow the same logic > > --- > > xen/arch/x86/hvm/vlapic.c | 81 +++++++++++++++++++-------- > > xen/arch/x86/include/asm/hvm/domain.h | 13 +++++ > > 2 files changed, 72 insertions(+), 22 deletions(-) > > > > diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c > > index a8e87c4446..7f169f1e5f 100644 > > --- a/xen/arch/x86/hvm/vlapic.c > > +++ b/xen/arch/x86/hvm/vlapic.c > > @@ -1061,13 +1061,23 @@ static const struct hvm_mmio_ops vlapic_mmio_ops = { > > .write = vlapic_mmio_write, > > }; > > > > +static uint32_t x2apic_ldr_from_id(uint32_t id) > > +{ > > + return ((id & ~0xF) << 12) | (1 << (id & 0xF)); > > Seeing other usages in vlapic.c I think the preference is to use lower > case for hex numbers. I thought it was covered by a MISRA rule, but it seems to apply only to the literal suffixes. Fair enough, I'll revert to lowercase. > > > +} > > + > > static void set_x2apic_id(struct vlapic *vlapic) > > { > > - u32 id = vlapic_vcpu(vlapic)->vcpu_id; > > - u32 ldr = ((id & ~0xf) << 12) | (1 << (id & 0xf)); > > + uint32_t vcpu_id = vlapic_vcpu(vlapic)->vcpu_id; > > + uint32_t apic_id = vcpu_id * 2; > > + uint32_t apic_ldr = x2apic_ldr_from_id(apic_id); > > > > - vlapic_set_reg(vlapic, APIC_ID, id * 2); > > - vlapic_set_reg(vlapic, APIC_LDR, ldr); > > + /* This is a migration bug workaround. See wall of text in lapic_load_fixup() */ > > + if ( vlapic_domain(vlapic)->arch.hvm.has_inconsistent_x2apic_ldr_bug ) > > + apic_ldr = x2apic_ldr_from_id(vcpu_id); > > + > > + vlapic_set_reg(vlapic, APIC_ID, apic_id); > > + vlapic_set_reg(vlapic, APIC_LDR, apic_ldr); > > } > > > > int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val) > > @@ -1495,30 +1505,57 @@ static int cf_check lapic_save_regs(struct vcpu *v, hvm_domain_context_t *h) > > /* > > * Following lapic_load_hidden()/lapic_load_regs() we may need to > > * correct ID and LDR when they come from an old, broken hypervisor. > > + * > > + * Xen <= 4.4 had a bug by which all the APICs configured in x2APIC mode > > + * got LDR = 1. This was fixed back then, but another bug was introduced > > + * causing APIC ID and LDR to break the consistency they are meant to have > > + * according to the specs (LDR was derived from vCPU ID, rather than APIC > > + * ID) > > + * > > + * Long story short, we can detect both cases here. For the LDR=1 case we > > + * want to fix it up on migrate, as IPIs just don't work on non-physical > > + * mode otherwise. For the other case we actually want to preserve previous > > + * behaviour so that existing running instances that may have already read > > + * the LDR at the source host aren't surprised when IPIs stop working as > > + * they did at the other end. > > + * > > + * Note that "x2apic_id == 0" has always been correct and can't be used to > > + * discriminate these cases. > > + * > > I think it's best if this big comment was split between the relevant > parts of the if below used to detect the broken states, as that makes > the comments more in-place with the code. Ack > > > + * Yuck! > > */ > > static void lapic_load_fixup(struct vlapic *vlapic) > > { > > - uint32_t id = vlapic->loaded.id; > > + /* > > + * This LDR would be present in broken versions of Xen 4.4 through 4.18. > > + * It's correct for the cpu with x2apic_id=0 (vcpu0), but invalid for > > + * any other. > > + */ > > + uint32_t bad_ldr = x2apic_ldr_from_id(vlapic_vcpu(vlapic)->vcpu_id); > > > > - if ( vlapic_x2apic_mode(vlapic) && id && vlapic->loaded.ldr == 1 ) > > - { > > + /* > > + * No need to perform fixups in non-x2apic mode, and x2apic_id == 0 has > > + * always been correct. > > + */ > > + if ( !vlapic_x2apic_mode(vlapic) || !vlapic->loaded.id ) > > You could replace the !vlapic->loaded.id check and instead use: > > vlapic->loaded.ldr == x2apic_ldr_from_id(vlapic->loaded.id) > > As that will allow returning early from the function if the LDR is > correct. Then if none of the fixups below apply we could print a > warning message that the LDR is incorrect, but cannot be fixed up. Sounds good. > > > + return; > > + > > + if ( vlapic->loaded.ldr == 1 ) > > + /* > > + * Migration from a broken Xen 4.4 or earlier. We can't leave it > > + * as-is because it assigned the same LDR to every CPU. We'll fix > > + * the bug now and assign LDR values consistent with the APIC ID. > > + */ > > + set_x2apic_id(vlapic); > > Previous code also did some checks here related to APIC ID sanity, > which are now dropped? > > Might be worth mentioning in the commit message, if that was intended. > It was intentional, yes. And it's true it warrants something in the commit message. For reference, the checks previously done where... > > + else if ( bad_ldr == vlapic->loaded.ldr ) > > /* > > - * This is optional: ID != 0 contradicts LDR == 1. It's being added > > - * to aid in eventual debugging of issues arising from the fixup done > > - * here, but can be dropped as soon as it is found to conflict with > > - * other (future) changes. > > + * This is a migration from a broken Xen between 4.4 and 4.18 and we > > + * must _PRESERVE_ LDRs so new vCPUs use consistent derivations. In > > + * this case we set this domain boolean so future CPU hotplugs > > + * derive an LDR consistent with the older Xen's broken idea of > > + * consistency. > > */ > > - if ( GET_xAPIC_ID(id) != vlapic_vcpu(vlapic)->vcpu_id * 2 || > > - id != SET_xAPIC_ID(GET_xAPIC_ID(id)) ) > > - printk(XENLOG_G_WARNING "%pv: bogus APIC ID %#x loaded\n", > > - vlapic_vcpu(vlapic), id); ... these. But they _seem_ bogus for several reasons. And I just got rid of them on these grounds: * It's using the GET_xAPIC_ID() macro on the register, but the LAPIC is not in xAPIC mode (due to the previous check), so the legacy APIC must be derived from the lowest octet, not the highest. That macro is meant to be used on the MMIO register, not the MSR one. * The printk (wants to be) triggered when the ID field is not "canonical" OR the x2APIC ID is not a pure xAPIC ID. Neither of these things are problems per se, the former just happens to be true at the moment, but the latter may very well not be, and that shouldn't trigger a scary printk. * The error message seems to imply the effective APIC ID eventually left loaded is the bogus one, but that's not the case. Actually, I might just move the printk into a separate else in line with your previous suggestion. > > - set_x2apic_id(vlapic); > > - } > > - else /* Undo an eventual earlier fixup. */ > > - { > > - vlapic_set_reg(vlapic, APIC_ID, id); > > - vlapic_set_reg(vlapic, APIC_LDR, vlapic->loaded.ldr); > > - } > > + vlapic_domain(vlapic)->arch.hvm.has_inconsistent_x2apic_ldr_bug = true; > > } > > > > static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h) > > diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h > > index 6e53ce4449..a42a6e99bb 100644 > > --- a/xen/arch/x86/include/asm/hvm/domain.h > > +++ b/xen/arch/x86/include/asm/hvm/domain.h > > @@ -61,6 +61,19 @@ struct hvm_domain { > > /* Cached CF8 for guest PCI config cycles */ > > uint32_t pci_cf8; > > > > + /* > > + * Xen had a bug between 4.4 and 4.18 by which the x2APIC LDR was > > + * derived from the vcpu_id rather than the x2APIC ID. This is wrong, > > + * but changing this behaviour is tricky because guests might have > > + * already read the LDR and used it accordingly. In the interest of not > > + * breaking migrations from those hypervisors we track here whether > > + * this domain suffers from this or not so a hotplugged vCPU or an APIC > > + * reset can recover the same LDR it would've had on the older host. > > + * > > + * Yuck! > > + */ > > + bool has_inconsistent_x2apic_ldr_bug; > > Could you place the new field after the existing boolean fields in the > struct? (AFAICT there's plenty of padding left there) Sure. I placed it somewhere with unused padding. (that u32 is sandwiched between an "unsigned long" and a pointer), but I'm happy to stash it with the other booleans. > > I also think the field name is too long, I would rather use > x2apic_ldr_vcpu_id for example (to note that LDR is calculated from > vCPU ID rather than APIC ID). I made it intentionally long so it can't be missed that it's a workaround and not architectural behaviour. Would you consider "x2apic_ldr_bug_vcpu_id" acceptable? I'd rather keep at least the "bug" part of the identifier around so it's not lost to time that this is a workaround marker and nothing else. > > I think it would be good if we could trim a bit the comments, as I get > the impression it's a bit repetitive. > > So I would leave the big explanation in lapic_load_fixup(), and just > comment here: > > /* Compatibility setting for a bug in x2APIC LDR format. */ Sure, I'll put that comment on a diet > > Thanks, Roger. Cheers, Alejandro
On Wed, Nov 22, 2023 at 03:11:52PM +0000, Alejandro Vallejo wrote: > On Wed, Nov 22, 2023 at 02:40:02PM +0100, Roger Pau Monné wrote: > > On Tue, Nov 21, 2023 at 04:26:04PM +0000, Alejandro Vallejo wrote: > > > diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c > > > index a8e87c4446..7f169f1e5f 100644 > > > --- a/xen/arch/x86/hvm/vlapic.c > > > +++ b/xen/arch/x86/hvm/vlapic.c > > > @@ -1061,13 +1061,23 @@ static const struct hvm_mmio_ops vlapic_mmio_ops = { > > > .write = vlapic_mmio_write, > > > }; > > > > > > +static uint32_t x2apic_ldr_from_id(uint32_t id) > > > +{ > > > + return ((id & ~0xF) << 12) | (1 << (id & 0xF)); > > > > Seeing other usages in vlapic.c I think the preference is to use lower > > case for hex numbers. > I thought it was covered by a MISRA rule, but it seems to apply only to the > literal suffixes. Fair enough, I'll revert to lowercase. FWIW, I'm thinking that I want to move x2apic_ldr_from_id() to a header file, so that it can also be used by the native APIC driver in order to detect when the LDR field is not configured according to the spec (so adding a consistency check in init_apic_ldr_x2apic_cluster()). Anyway, might look into doing that after this patch is in. > > > + else if ( bad_ldr == vlapic->loaded.ldr ) > > > /* > > > - * This is optional: ID != 0 contradicts LDR == 1. It's being added > > > - * to aid in eventual debugging of issues arising from the fixup done > > > - * here, but can be dropped as soon as it is found to conflict with > > > - * other (future) changes. > > > + * This is a migration from a broken Xen between 4.4 and 4.18 and we > > > + * must _PRESERVE_ LDRs so new vCPUs use consistent derivations. In > > > + * this case we set this domain boolean so future CPU hotplugs > > > + * derive an LDR consistent with the older Xen's broken idea of > > > + * consistency. > > > */ > > > - if ( GET_xAPIC_ID(id) != vlapic_vcpu(vlapic)->vcpu_id * 2 || > > > - id != SET_xAPIC_ID(GET_xAPIC_ID(id)) ) > > > - printk(XENLOG_G_WARNING "%pv: bogus APIC ID %#x loaded\n", > > > - vlapic_vcpu(vlapic), id); > ... these. But they _seem_ bogus for several reasons. And I just got rid of > them on these grounds: > > * It's using the GET_xAPIC_ID() macro on the register, but the LAPIC is > not in xAPIC mode (due to the previous check), so the legacy APIC must > be derived from the lowest octet, not the highest. That macro is meant > to be used on the MMIO register, not the MSR one. > * The printk (wants to be) triggered when the ID field is not "canonical" > OR the x2APIC ID is not a pure xAPIC ID. Neither of these things are > problems per se, the former just happens to be true at the moment, but > the latter may very well not be, and that shouldn't trigger a scary > printk. > * The error message seems to imply the effective APIC ID eventually left > loaded is the bogus one, but that's not the case. > > Actually, I might just move the printk into a separate else in line with > your previous suggestion. Sounds good. And I agree that using {GET,SET}_xAPIC_ID() on the x2APIC 32bit width IDs looks to be wrong. > > > static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h) > > > diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h > > > index 6e53ce4449..a42a6e99bb 100644 > > > --- a/xen/arch/x86/include/asm/hvm/domain.h > > > +++ b/xen/arch/x86/include/asm/hvm/domain.h > > > @@ -61,6 +61,19 @@ struct hvm_domain { > > > /* Cached CF8 for guest PCI config cycles */ > > > uint32_t pci_cf8; > > > > > > + /* > > > + * Xen had a bug between 4.4 and 4.18 by which the x2APIC LDR was > > > + * derived from the vcpu_id rather than the x2APIC ID. This is wrong, > > > + * but changing this behaviour is tricky because guests might have > > > + * already read the LDR and used it accordingly. In the interest of not > > > + * breaking migrations from those hypervisors we track here whether > > > + * this domain suffers from this or not so a hotplugged vCPU or an APIC > > > + * reset can recover the same LDR it would've had on the older host. > > > + * > > > + * Yuck! > > > + */ > > > + bool has_inconsistent_x2apic_ldr_bug; > > > > Could you place the new field after the existing boolean fields in the > > struct? (AFAICT there's plenty of padding left there) > Sure. I placed it somewhere with unused padding. (that u32 is sandwiched > between an "unsigned long" and a pointer), but I'm happy to stash it with > the other booleans. Yes, there's plenty of padding but I feel it's better to place it with the rest of the booleans, as there's also padding there. > > > > I also think the field name is too long, I would rather use > > x2apic_ldr_vcpu_id for example (to note that LDR is calculated from > > vCPU ID rather than APIC ID). > I made it intentionally long so it can't be missed that it's a workaround > and not architectural behaviour. Would you consider > "x2apic_ldr_bug_vcpu_id" acceptable? I'd rather keep at least the "bug" > part of the identifier around so it's not lost to time that this is a > workaround marker and nothing else OK, if you think mentioning `bug` is helpful, I think it would be best placed as a prefix: bug_x2apic_ldr_vcpu_id. Having it in the middle of the field name makes it harder to spot. Thanks, Roger.
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index a8e87c4446..7f169f1e5f 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -1061,13 +1061,23 @@ static const struct hvm_mmio_ops vlapic_mmio_ops = { .write = vlapic_mmio_write, }; +static uint32_t x2apic_ldr_from_id(uint32_t id) +{ + return ((id & ~0xF) << 12) | (1 << (id & 0xF)); +} + static void set_x2apic_id(struct vlapic *vlapic) { - u32 id = vlapic_vcpu(vlapic)->vcpu_id; - u32 ldr = ((id & ~0xf) << 12) | (1 << (id & 0xf)); + uint32_t vcpu_id = vlapic_vcpu(vlapic)->vcpu_id; + uint32_t apic_id = vcpu_id * 2; + uint32_t apic_ldr = x2apic_ldr_from_id(apic_id); - vlapic_set_reg(vlapic, APIC_ID, id * 2); - vlapic_set_reg(vlapic, APIC_LDR, ldr); + /* This is a migration bug workaround. See wall of text in lapic_load_fixup() */ + if ( vlapic_domain(vlapic)->arch.hvm.has_inconsistent_x2apic_ldr_bug ) + apic_ldr = x2apic_ldr_from_id(vcpu_id); + + vlapic_set_reg(vlapic, APIC_ID, apic_id); + vlapic_set_reg(vlapic, APIC_LDR, apic_ldr); } int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val) @@ -1495,30 +1505,57 @@ static int cf_check lapic_save_regs(struct vcpu *v, hvm_domain_context_t *h) /* * Following lapic_load_hidden()/lapic_load_regs() we may need to * correct ID and LDR when they come from an old, broken hypervisor. + * + * Xen <= 4.4 had a bug by which all the APICs configured in x2APIC mode + * got LDR = 1. This was fixed back then, but another bug was introduced + * causing APIC ID and LDR to break the consistency they are meant to have + * according to the specs (LDR was derived from vCPU ID, rather than APIC + * ID) + * + * Long story short, we can detect both cases here. For the LDR=1 case we + * want to fix it up on migrate, as IPIs just don't work on non-physical + * mode otherwise. For the other case we actually want to preserve previous + * behaviour so that existing running instances that may have already read + * the LDR at the source host aren't surprised when IPIs stop working as + * they did at the other end. + * + * Note that "x2apic_id == 0" has always been correct and can't be used to + * discriminate these cases. + * + * Yuck! */ static void lapic_load_fixup(struct vlapic *vlapic) { - uint32_t id = vlapic->loaded.id; + /* + * This LDR would be present in broken versions of Xen 4.4 through 4.18. + * It's correct for the cpu with x2apic_id=0 (vcpu0), but invalid for + * any other. + */ + uint32_t bad_ldr = x2apic_ldr_from_id(vlapic_vcpu(vlapic)->vcpu_id); - if ( vlapic_x2apic_mode(vlapic) && id && vlapic->loaded.ldr == 1 ) - { + /* + * No need to perform fixups in non-x2apic mode, and x2apic_id == 0 has + * always been correct. + */ + if ( !vlapic_x2apic_mode(vlapic) || !vlapic->loaded.id ) + return; + + if ( vlapic->loaded.ldr == 1 ) + /* + * Migration from a broken Xen 4.4 or earlier. We can't leave it + * as-is because it assigned the same LDR to every CPU. We'll fix + * the bug now and assign LDR values consistent with the APIC ID. + */ + set_x2apic_id(vlapic); + else if ( bad_ldr == vlapic->loaded.ldr ) /* - * This is optional: ID != 0 contradicts LDR == 1. It's being added - * to aid in eventual debugging of issues arising from the fixup done - * here, but can be dropped as soon as it is found to conflict with - * other (future) changes. + * This is a migration from a broken Xen between 4.4 and 4.18 and we + * must _PRESERVE_ LDRs so new vCPUs use consistent derivations. In + * this case we set this domain boolean so future CPU hotplugs + * derive an LDR consistent with the older Xen's broken idea of + * consistency. */ - if ( GET_xAPIC_ID(id) != vlapic_vcpu(vlapic)->vcpu_id * 2 || - id != SET_xAPIC_ID(GET_xAPIC_ID(id)) ) - printk(XENLOG_G_WARNING "%pv: bogus APIC ID %#x loaded\n", - vlapic_vcpu(vlapic), id); - set_x2apic_id(vlapic); - } - else /* Undo an eventual earlier fixup. */ - { - vlapic_set_reg(vlapic, APIC_ID, id); - vlapic_set_reg(vlapic, APIC_LDR, vlapic->loaded.ldr); - } + vlapic_domain(vlapic)->arch.hvm.has_inconsistent_x2apic_ldr_bug = true; } static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h) diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h index 6e53ce4449..a42a6e99bb 100644 --- a/xen/arch/x86/include/asm/hvm/domain.h +++ b/xen/arch/x86/include/asm/hvm/domain.h @@ -61,6 +61,19 @@ struct hvm_domain { /* Cached CF8 for guest PCI config cycles */ uint32_t pci_cf8; + /* + * Xen had a bug between 4.4 and 4.18 by which the x2APIC LDR was + * derived from the vcpu_id rather than the x2APIC ID. This is wrong, + * but changing this behaviour is tricky because guests might have + * already read the LDR and used it accordingly. In the interest of not + * breaking migrations from those hypervisors we track here whether + * this domain suffers from this or not so a hotplugged vCPU or an APIC + * reset can recover the same LDR it would've had on the older host. + * + * Yuck! + */ + bool has_inconsistent_x2apic_ldr_bug; + struct pl_time *pl_time; struct hvm_io_handler *io_handler;
Both Intel and AMD manuals agree that on x2APIC mode, the APIC LDR and ID registers are derivable from each other through a fixed formula. Xen uses that formula, but applies it to vCPU IDs (which are sequential) rather than x2APIC_IDs (which are not, at the moment). As I understand it, this is an attempt to tightly pack vCPUs into clusters so each cluster has 16 vCPUs rather than 8, but this is problematic for OSs that might read the x2APIC_ID and internally derive LDR (or the other way around) This patch fixes the implementation so we follow the rules in the x2APIC spec(s). The patch also covers migrations from broken hypervisors, so LDRs are preserved even for hotppluggable CPUs and across APIC resets. Fixes: f9e0cccf7b35 ("x86/HVM: fix ID handling of x2APIC emulation") Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- I tested this by creating 3 checkpoints. 1. One with pre-4.4 broken state (every LDR=1, by hacking save_regs) 2. One with 4.4 onwards broken state (LDRs packed in their clusters) 3. One with correct LDR values (1) and (3) restores to the same thing. Consistent APIC_ID+LDR (2) restores to what it previously had and hotplugs follow the same logic --- xen/arch/x86/hvm/vlapic.c | 81 +++++++++++++++++++-------- xen/arch/x86/include/asm/hvm/domain.h | 13 +++++ 2 files changed, 72 insertions(+), 22 deletions(-)