diff mbox series

[v2] xen/x86: On x2APIC mode, derive LDR from APIC_ID

Message ID 20231121162604.19405-1-alejandro.vallejo@cloud.com (mailing list archive)
State New, archived
Headers show
Series [v2] xen/x86: On x2APIC mode, derive LDR from APIC_ID | expand

Commit Message

Alejandro Vallejo Nov. 21, 2023, 4:26 p.m. UTC
Both Intel and AMD manuals agree that on x2APIC mode, the APIC LDR and ID
registers are derivable from each other through a fixed formula.

Xen uses that formula, but applies it to vCPU IDs (which are sequential)
rather than x2APIC_IDs (which are not, at the moment). As I understand it,
this is an attempt to tightly pack vCPUs into clusters so each cluster has
16 vCPUs rather than 8, but this is problematic for OSs that might read the
x2APIC_ID and internally derive LDR (or the other way around)

This patch fixes the implementation so we follow the rules in the x2APIC
spec(s).

The patch also covers migrations from broken hypervisors, so LDRs are
preserved even for hotppluggable CPUs and across APIC resets.

Fixes: f9e0cccf7b35 ("x86/HVM: fix ID handling of x2APIC emulation")
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
I tested this by creating 3 checkpoints.
 1. One with pre-4.4 broken state (every LDR=1, by hacking save_regs)
 2. One with 4.4 onwards broken state (LDRs packed in their clusters)
 3. One with correct LDR values

(1) and (3) restores to the same thing. Consistent APIC_ID+LDR
(2) restores to what it previously had and hotplugs follow the same logic
---
 xen/arch/x86/hvm/vlapic.c             | 81 +++++++++++++++++++--------
 xen/arch/x86/include/asm/hvm/domain.h | 13 +++++
 2 files changed, 72 insertions(+), 22 deletions(-)

Comments

Alejandro Vallejo Nov. 21, 2023, 4:29 p.m. UTC | #1
On Tue, Nov 21, 2023 at 04:26:04PM +0000, Alejandro Vallejo wrote:
> Both Intel and AMD manuals agree that on x2APIC mode, the APIC LDR and ID
> registers are derivable from each other through a fixed formula.
> 
> Xen uses that formula, but applies it to vCPU IDs (which are sequential)
> rather than x2APIC_IDs (which are not, at the moment). As I understand it,
> this is an attempt to tightly pack vCPUs into clusters so each cluster has
> 16 vCPUs rather than 8, but this is problematic for OSs that might read the
> x2APIC_ID and internally derive LDR (or the other way around)


Ugh, forgot about Roger's commit message request
>I would replace the underscore from x2APIC ID with a space instead.
Happy for that to happen on commit if the rest looks ok.

Cheers,
Alejandro
Roger Pau Monné Nov. 22, 2023, 1:40 p.m. UTC | #2
On Tue, Nov 21, 2023 at 04:26:04PM +0000, Alejandro Vallejo wrote:
> Both Intel and AMD manuals agree that on x2APIC mode, the APIC LDR and ID
> registers are derivable from each other through a fixed formula.
> 
> Xen uses that formula, but applies it to vCPU IDs (which are sequential)
> rather than x2APIC_IDs (which are not, at the moment). As I understand it,
> this is an attempt to tightly pack vCPUs into clusters so each cluster has
> 16 vCPUs rather than 8, but this is problematic for OSs that might read the
> x2APIC_ID and internally derive LDR (or the other way around)
> 
> This patch fixes the implementation so we follow the rules in the x2APIC
> spec(s).
> 
> The patch also covers migrations from broken hypervisors, so LDRs are
> preserved even for hotppluggable CPUs and across APIC resets.
> 
> Fixes: f9e0cccf7b35 ("x86/HVM: fix ID handling of x2APIC emulation")
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>

LGTM, just a couple of style comments.

> ---
> I tested this by creating 3 checkpoints.
>  1. One with pre-4.4 broken state (every LDR=1, by hacking save_regs)
>  2. One with 4.4 onwards broken state (LDRs packed in their clusters)
>  3. One with correct LDR values
> 
> (1) and (3) restores to the same thing. Consistent APIC_ID+LDR
> (2) restores to what it previously had and hotplugs follow the same logic
> ---
>  xen/arch/x86/hvm/vlapic.c             | 81 +++++++++++++++++++--------
>  xen/arch/x86/include/asm/hvm/domain.h | 13 +++++
>  2 files changed, 72 insertions(+), 22 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index a8e87c4446..7f169f1e5f 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -1061,13 +1061,23 @@ static const struct hvm_mmio_ops vlapic_mmio_ops = {
>      .write = vlapic_mmio_write,
>  };
>  
> +static uint32_t x2apic_ldr_from_id(uint32_t id)
> +{
> +    return ((id & ~0xF) << 12) | (1 << (id & 0xF));

Seeing other usages in vlapic.c I think the preference is to use lower
case for hex numbers.

> +}
> +
>  static void set_x2apic_id(struct vlapic *vlapic)
>  {
> -    u32 id = vlapic_vcpu(vlapic)->vcpu_id;
> -    u32 ldr = ((id & ~0xf) << 12) | (1 << (id & 0xf));
> +    uint32_t vcpu_id = vlapic_vcpu(vlapic)->vcpu_id;
> +    uint32_t apic_id = vcpu_id * 2;
> +    uint32_t apic_ldr = x2apic_ldr_from_id(apic_id);
>  
> -    vlapic_set_reg(vlapic, APIC_ID, id * 2);
> -    vlapic_set_reg(vlapic, APIC_LDR, ldr);
> +    /* This is a migration bug workaround. See wall of text in lapic_load_fixup() */
> +    if ( vlapic_domain(vlapic)->arch.hvm.has_inconsistent_x2apic_ldr_bug )
> +        apic_ldr = x2apic_ldr_from_id(vcpu_id);
> +
> +    vlapic_set_reg(vlapic, APIC_ID, apic_id);
> +    vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
>  }
>  
>  int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
> @@ -1495,30 +1505,57 @@ static int cf_check lapic_save_regs(struct vcpu *v, hvm_domain_context_t *h)
>  /*
>   * Following lapic_load_hidden()/lapic_load_regs() we may need to
>   * correct ID and LDR when they come from an old, broken hypervisor.
> + *
> + * Xen <= 4.4 had a bug by which all the APICs configured in x2APIC mode
> + * got LDR = 1. This was fixed back then, but another bug was introduced
> + * causing APIC ID and LDR to break the consistency they are meant to have
> + * according to the specs (LDR was derived from vCPU ID, rather than APIC
> + * ID)
> + *
> + * Long story short, we can detect both cases here. For the LDR=1 case we
> + * want to fix it up on migrate, as IPIs just don't work on non-physical
> + * mode otherwise. For the other case we actually want to preserve previous
> + * behaviour so that existing running instances that may have already read
> + * the LDR at the source host aren't surprised when IPIs stop working as
> + * they did at the other end.
> + *
> + * Note that "x2apic_id == 0" has always been correct and can't be used to
> + * discriminate these cases.
> + *

I think it's best if this big comment was split between the relevant
parts of the if below used to detect the broken states, as that makes
the comments more in-place with the code.

> + * Yuck!
>   */
>  static void lapic_load_fixup(struct vlapic *vlapic)
>  {
> -    uint32_t id = vlapic->loaded.id;
> +    /*
> +     * This LDR would be present in broken versions of Xen 4.4 through 4.18.
> +     * It's correct for the cpu with x2apic_id=0 (vcpu0), but invalid for
> +     * any other.
> +     */
> +    uint32_t bad_ldr = x2apic_ldr_from_id(vlapic_vcpu(vlapic)->vcpu_id);
>  
> -    if ( vlapic_x2apic_mode(vlapic) && id && vlapic->loaded.ldr == 1 )
> -    {
> +    /*
> +     * No need to perform fixups in non-x2apic mode, and x2apic_id == 0 has
> +     * always been correct.
> +     */
> +    if ( !vlapic_x2apic_mode(vlapic) || !vlapic->loaded.id )

You could replace the !vlapic->loaded.id check and instead use:

vlapic->loaded.ldr == x2apic_ldr_from_id(vlapic->loaded.id)

As that will allow returning early from the function if the LDR is
correct.  Then if none of the fixups below apply we could print a
warning message that the LDR is incorrect, but cannot be fixed up.

> +        return;
> +
> +    if ( vlapic->loaded.ldr == 1 )
> +       /*
> +        * Migration from a broken Xen 4.4 or earlier. We can't leave it
> +        * as-is because it assigned the same LDR to every CPU. We'll fix
> +        * the bug now and assign LDR values consistent with the APIC ID.
> +        */
> +        set_x2apic_id(vlapic);

Previous code also did some checks here related to APIC ID sanity,
which are now dropped?

Might be worth mentioning in the commit message, if that was intended.

> +    else if ( bad_ldr == vlapic->loaded.ldr )
>          /*
> -         * This is optional: ID != 0 contradicts LDR == 1. It's being added
> -         * to aid in eventual debugging of issues arising from the fixup done
> -         * here, but can be dropped as soon as it is found to conflict with
> -         * other (future) changes.
> +         * This is a migration from a broken Xen between 4.4 and 4.18 and we
> +         * must _PRESERVE_ LDRs so new vCPUs use consistent derivations. In
> +         * this case we set this domain boolean so future CPU hotplugs
> +         * derive an LDR consistent with the older Xen's broken idea of
> +         * consistency.
>           */
> -        if ( GET_xAPIC_ID(id) != vlapic_vcpu(vlapic)->vcpu_id * 2 ||
> -             id != SET_xAPIC_ID(GET_xAPIC_ID(id)) )
> -            printk(XENLOG_G_WARNING "%pv: bogus APIC ID %#x loaded\n",
> -                   vlapic_vcpu(vlapic), id);
> -        set_x2apic_id(vlapic);
> -    }
> -    else /* Undo an eventual earlier fixup. */
> -    {
> -        vlapic_set_reg(vlapic, APIC_ID, id);
> -        vlapic_set_reg(vlapic, APIC_LDR, vlapic->loaded.ldr);
> -    }
> +        vlapic_domain(vlapic)->arch.hvm.has_inconsistent_x2apic_ldr_bug = true;
>  }
>  
>  static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
> diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h
> index 6e53ce4449..a42a6e99bb 100644
> --- a/xen/arch/x86/include/asm/hvm/domain.h
> +++ b/xen/arch/x86/include/asm/hvm/domain.h
> @@ -61,6 +61,19 @@ struct hvm_domain {
>      /* Cached CF8 for guest PCI config cycles */
>      uint32_t                pci_cf8;
>  
> +    /*
> +     * Xen had a bug between 4.4 and 4.18 by which the x2APIC LDR was
> +     * derived from the vcpu_id rather than the x2APIC ID. This is wrong,
> +     * but changing this behaviour is tricky because guests might have
> +     * already read the LDR and used it accordingly. In the interest of not
> +     * breaking migrations from those hypervisors we track here whether
> +     * this domain suffers from this or not so a hotplugged vCPU or an APIC
> +     * reset can recover the same LDR it would've had on the older host.
> +     *
> +     * Yuck!
> +     */
> +    bool has_inconsistent_x2apic_ldr_bug;

Could you place the new field after the existing boolean fields in the
struct? (AFAICT there's plenty of padding left there)

I also think the field name is too long, I would rather use
x2apic_ldr_vcpu_id for example (to note that LDR is calculated from
vCPU ID rather than APIC ID).

I think it would be good if we could trim a bit the comments, as I get
the impression it's a bit repetitive.

So I would leave the big explanation in lapic_load_fixup(), and just
comment here:

/* Compatibility setting for a bug in x2APIC LDR format. */

Thanks, Roger.
Alejandro Vallejo Nov. 22, 2023, 3:11 p.m. UTC | #3
On Wed, Nov 22, 2023 at 02:40:02PM +0100, Roger Pau Monné wrote:
> On Tue, Nov 21, 2023 at 04:26:04PM +0000, Alejandro Vallejo wrote:
> > Both Intel and AMD manuals agree that on x2APIC mode, the APIC LDR and ID
> > registers are derivable from each other through a fixed formula.
> > 
> > Xen uses that formula, but applies it to vCPU IDs (which are sequential)
> > rather than x2APIC_IDs (which are not, at the moment). As I understand it,
> > this is an attempt to tightly pack vCPUs into clusters so each cluster has
> > 16 vCPUs rather than 8, but this is problematic for OSs that might read the
> > x2APIC_ID and internally derive LDR (or the other way around)
> > 
> > This patch fixes the implementation so we follow the rules in the x2APIC
> > spec(s).
> > 
> > The patch also covers migrations from broken hypervisors, so LDRs are
> > preserved even for hotppluggable CPUs and across APIC resets.
> > 
> > Fixes: f9e0cccf7b35 ("x86/HVM: fix ID handling of x2APIC emulation")
> > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> 
> LGTM, just a couple of style comments.
> 
> > ---
> > I tested this by creating 3 checkpoints.
> >  1. One with pre-4.4 broken state (every LDR=1, by hacking save_regs)
> >  2. One with 4.4 onwards broken state (LDRs packed in their clusters)
> >  3. One with correct LDR values
> > 
> > (1) and (3) restores to the same thing. Consistent APIC_ID+LDR
> > (2) restores to what it previously had and hotplugs follow the same logic
> > ---
> >  xen/arch/x86/hvm/vlapic.c             | 81 +++++++++++++++++++--------
> >  xen/arch/x86/include/asm/hvm/domain.h | 13 +++++
> >  2 files changed, 72 insertions(+), 22 deletions(-)
> > 
> > diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> > index a8e87c4446..7f169f1e5f 100644
> > --- a/xen/arch/x86/hvm/vlapic.c
> > +++ b/xen/arch/x86/hvm/vlapic.c
> > @@ -1061,13 +1061,23 @@ static const struct hvm_mmio_ops vlapic_mmio_ops = {
> >      .write = vlapic_mmio_write,
> >  };
> >  
> > +static uint32_t x2apic_ldr_from_id(uint32_t id)
> > +{
> > +    return ((id & ~0xF) << 12) | (1 << (id & 0xF));
> 
> Seeing other usages in vlapic.c I think the preference is to use lower
> case for hex numbers.
I thought it was covered by a MISRA rule, but it seems to apply only to the
literal suffixes. Fair enough, I'll revert to lowercase.

> 
> > +}
> > +
> >  static void set_x2apic_id(struct vlapic *vlapic)
> >  {
> > -    u32 id = vlapic_vcpu(vlapic)->vcpu_id;
> > -    u32 ldr = ((id & ~0xf) << 12) | (1 << (id & 0xf));
> > +    uint32_t vcpu_id = vlapic_vcpu(vlapic)->vcpu_id;
> > +    uint32_t apic_id = vcpu_id * 2;
> > +    uint32_t apic_ldr = x2apic_ldr_from_id(apic_id);
> >  
> > -    vlapic_set_reg(vlapic, APIC_ID, id * 2);
> > -    vlapic_set_reg(vlapic, APIC_LDR, ldr);
> > +    /* This is a migration bug workaround. See wall of text in lapic_load_fixup() */
> > +    if ( vlapic_domain(vlapic)->arch.hvm.has_inconsistent_x2apic_ldr_bug )
> > +        apic_ldr = x2apic_ldr_from_id(vcpu_id);
> > +
> > +    vlapic_set_reg(vlapic, APIC_ID, apic_id);
> > +    vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
> >  }
> >  
> >  int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
> > @@ -1495,30 +1505,57 @@ static int cf_check lapic_save_regs(struct vcpu *v, hvm_domain_context_t *h)
> >  /*
> >   * Following lapic_load_hidden()/lapic_load_regs() we may need to
> >   * correct ID and LDR when they come from an old, broken hypervisor.
> > + *
> > + * Xen <= 4.4 had a bug by which all the APICs configured in x2APIC mode
> > + * got LDR = 1. This was fixed back then, but another bug was introduced
> > + * causing APIC ID and LDR to break the consistency they are meant to have
> > + * according to the specs (LDR was derived from vCPU ID, rather than APIC
> > + * ID)
> > + *
> > + * Long story short, we can detect both cases here. For the LDR=1 case we
> > + * want to fix it up on migrate, as IPIs just don't work on non-physical
> > + * mode otherwise. For the other case we actually want to preserve previous
> > + * behaviour so that existing running instances that may have already read
> > + * the LDR at the source host aren't surprised when IPIs stop working as
> > + * they did at the other end.
> > + *
> > + * Note that "x2apic_id == 0" has always been correct and can't be used to
> > + * discriminate these cases.
> > + *
> 
> I think it's best if this big comment was split between the relevant
> parts of the if below used to detect the broken states, as that makes
> the comments more in-place with the code.
Ack
> 
> > + * Yuck!
> >   */
> >  static void lapic_load_fixup(struct vlapic *vlapic)
> >  {
> > -    uint32_t id = vlapic->loaded.id;
> > +    /*
> > +     * This LDR would be present in broken versions of Xen 4.4 through 4.18.
> > +     * It's correct for the cpu with x2apic_id=0 (vcpu0), but invalid for
> > +     * any other.
> > +     */
> > +    uint32_t bad_ldr = x2apic_ldr_from_id(vlapic_vcpu(vlapic)->vcpu_id);
> >  
> > -    if ( vlapic_x2apic_mode(vlapic) && id && vlapic->loaded.ldr == 1 )
> > -    {
> > +    /*
> > +     * No need to perform fixups in non-x2apic mode, and x2apic_id == 0 has
> > +     * always been correct.
> > +     */
> > +    if ( !vlapic_x2apic_mode(vlapic) || !vlapic->loaded.id )
> 
> You could replace the !vlapic->loaded.id check and instead use:
> 
> vlapic->loaded.ldr == x2apic_ldr_from_id(vlapic->loaded.id)
> 
> As that will allow returning early from the function if the LDR is
> correct.  Then if none of the fixups below apply we could print a
> warning message that the LDR is incorrect, but cannot be fixed up.
Sounds good.

> 
> > +        return;
> > +
> > +    if ( vlapic->loaded.ldr == 1 )
> > +       /*
> > +        * Migration from a broken Xen 4.4 or earlier. We can't leave it
> > +        * as-is because it assigned the same LDR to every CPU. We'll fix
> > +        * the bug now and assign LDR values consistent with the APIC ID.
> > +        */
> > +        set_x2apic_id(vlapic);
> 
> Previous code also did some checks here related to APIC ID sanity,
> which are now dropped?
> 
> Might be worth mentioning in the commit message, if that was intended.
> 
It was intentional, yes. And it's true it warrants something in the commit
message. For reference, the checks previously done where...

> > +    else if ( bad_ldr == vlapic->loaded.ldr )
> >          /*
> > -         * This is optional: ID != 0 contradicts LDR == 1. It's being added
> > -         * to aid in eventual debugging of issues arising from the fixup done
> > -         * here, but can be dropped as soon as it is found to conflict with
> > -         * other (future) changes.
> > +         * This is a migration from a broken Xen between 4.4 and 4.18 and we
> > +         * must _PRESERVE_ LDRs so new vCPUs use consistent derivations. In
> > +         * this case we set this domain boolean so future CPU hotplugs
> > +         * derive an LDR consistent with the older Xen's broken idea of
> > +         * consistency.
> >           */
> > -        if ( GET_xAPIC_ID(id) != vlapic_vcpu(vlapic)->vcpu_id * 2 ||
> > -             id != SET_xAPIC_ID(GET_xAPIC_ID(id)) )
> > -            printk(XENLOG_G_WARNING "%pv: bogus APIC ID %#x loaded\n",
> > -                   vlapic_vcpu(vlapic), id);
... these. But they _seem_ bogus for several reasons. And I just got rid of
them on these grounds:

  * It's using the GET_xAPIC_ID() macro on the register, but the LAPIC is
    not in xAPIC mode (due to the previous check), so the legacy APIC must
    be derived from the lowest octet, not the highest. That macro is meant
    to be used on the MMIO register, not the MSR one.
  * The printk (wants to be) triggered when the ID field is not "canonical"
    OR the x2APIC ID is not a pure xAPIC ID. Neither of these things are
    problems per se, the former just happens to be true at the moment, but
    the latter may very well not be, and that shouldn't trigger a scary
    printk.
  * The error message seems to imply the effective APIC ID eventually left
    loaded is the bogus one, but that's not the case.

Actually, I might just move the printk into a separate else in line with
your previous suggestion.

> > -        set_x2apic_id(vlapic);
> > -    }
> > -    else /* Undo an eventual earlier fixup. */
> > -    {
> > -        vlapic_set_reg(vlapic, APIC_ID, id);
> > -        vlapic_set_reg(vlapic, APIC_LDR, vlapic->loaded.ldr);
> > -    }
> > +        vlapic_domain(vlapic)->arch.hvm.has_inconsistent_x2apic_ldr_bug = true;
> >  }
> >  
> >  static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
> > diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h
> > index 6e53ce4449..a42a6e99bb 100644
> > --- a/xen/arch/x86/include/asm/hvm/domain.h
> > +++ b/xen/arch/x86/include/asm/hvm/domain.h
> > @@ -61,6 +61,19 @@ struct hvm_domain {
> >      /* Cached CF8 for guest PCI config cycles */
> >      uint32_t                pci_cf8;
> >  
> > +    /*
> > +     * Xen had a bug between 4.4 and 4.18 by which the x2APIC LDR was
> > +     * derived from the vcpu_id rather than the x2APIC ID. This is wrong,
> > +     * but changing this behaviour is tricky because guests might have
> > +     * already read the LDR and used it accordingly. In the interest of not
> > +     * breaking migrations from those hypervisors we track here whether
> > +     * this domain suffers from this or not so a hotplugged vCPU or an APIC
> > +     * reset can recover the same LDR it would've had on the older host.
> > +     *
> > +     * Yuck!
> > +     */
> > +    bool has_inconsistent_x2apic_ldr_bug;
> 
> Could you place the new field after the existing boolean fields in the
> struct? (AFAICT there's plenty of padding left there)
Sure. I placed it somewhere with unused padding. (that u32 is sandwiched
between an "unsigned long" and a pointer), but I'm happy to stash it with
the other booleans.

> 
> I also think the field name is too long, I would rather use
> x2apic_ldr_vcpu_id for example (to note that LDR is calculated from
> vCPU ID rather than APIC ID).
I made it intentionally long so it can't be missed that it's a workaround
and not architectural behaviour. Would you consider
"x2apic_ldr_bug_vcpu_id" acceptable? I'd rather keep at least the "bug"
part of the identifier around so it's not lost to time that this is a
workaround marker and nothing else.

> 
> I think it would be good if we could trim a bit the comments, as I get
> the impression it's a bit repetitive.
> 
> So I would leave the big explanation in lapic_load_fixup(), and just
> comment here:
> 
> /* Compatibility setting for a bug in x2APIC LDR format. */
Sure, I'll put that comment on a diet

> 
> Thanks, Roger.

Cheers,
Alejandro
Roger Pau Monné Nov. 22, 2023, 5:04 p.m. UTC | #4
On Wed, Nov 22, 2023 at 03:11:52PM +0000, Alejandro Vallejo wrote:
> On Wed, Nov 22, 2023 at 02:40:02PM +0100, Roger Pau Monné wrote:
> > On Tue, Nov 21, 2023 at 04:26:04PM +0000, Alejandro Vallejo wrote:
> > > diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> > > index a8e87c4446..7f169f1e5f 100644
> > > --- a/xen/arch/x86/hvm/vlapic.c
> > > +++ b/xen/arch/x86/hvm/vlapic.c
> > > @@ -1061,13 +1061,23 @@ static const struct hvm_mmio_ops vlapic_mmio_ops = {
> > >      .write = vlapic_mmio_write,
> > >  };
> > >  
> > > +static uint32_t x2apic_ldr_from_id(uint32_t id)
> > > +{
> > > +    return ((id & ~0xF) << 12) | (1 << (id & 0xF));
> > 
> > Seeing other usages in vlapic.c I think the preference is to use lower
> > case for hex numbers.
> I thought it was covered by a MISRA rule, but it seems to apply only to the
> literal suffixes. Fair enough, I'll revert to lowercase.

FWIW, I'm thinking that I want to move x2apic_ldr_from_id() to a
header file, so that it can also be used by the native APIC driver in
order to detect when the LDR field is not configured according to the
spec (so adding a consistency check in
init_apic_ldr_x2apic_cluster()).

Anyway, might look into doing that after this patch is in.

> > > +    else if ( bad_ldr == vlapic->loaded.ldr )
> > >          /*
> > > -         * This is optional: ID != 0 contradicts LDR == 1. It's being added
> > > -         * to aid in eventual debugging of issues arising from the fixup done
> > > -         * here, but can be dropped as soon as it is found to conflict with
> > > -         * other (future) changes.
> > > +         * This is a migration from a broken Xen between 4.4 and 4.18 and we
> > > +         * must _PRESERVE_ LDRs so new vCPUs use consistent derivations. In
> > > +         * this case we set this domain boolean so future CPU hotplugs
> > > +         * derive an LDR consistent with the older Xen's broken idea of
> > > +         * consistency.
> > >           */
> > > -        if ( GET_xAPIC_ID(id) != vlapic_vcpu(vlapic)->vcpu_id * 2 ||
> > > -             id != SET_xAPIC_ID(GET_xAPIC_ID(id)) )
> > > -            printk(XENLOG_G_WARNING "%pv: bogus APIC ID %#x loaded\n",
> > > -                   vlapic_vcpu(vlapic), id);
> ... these. But they _seem_ bogus for several reasons. And I just got rid of
> them on these grounds:
> 
>   * It's using the GET_xAPIC_ID() macro on the register, but the LAPIC is
>     not in xAPIC mode (due to the previous check), so the legacy APIC must
>     be derived from the lowest octet, not the highest. That macro is meant
>     to be used on the MMIO register, not the MSR one.
>   * The printk (wants to be) triggered when the ID field is not "canonical"
>     OR the x2APIC ID is not a pure xAPIC ID. Neither of these things are
>     problems per se, the former just happens to be true at the moment, but
>     the latter may very well not be, and that shouldn't trigger a scary
>     printk.
>   * The error message seems to imply the effective APIC ID eventually left
>     loaded is the bogus one, but that's not the case.
> 
> Actually, I might just move the printk into a separate else in line with
> your previous suggestion.

Sounds good.  And I agree that using {GET,SET}_xAPIC_ID() on the
x2APIC 32bit width IDs looks to be wrong.

> > >  static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
> > > diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h
> > > index 6e53ce4449..a42a6e99bb 100644
> > > --- a/xen/arch/x86/include/asm/hvm/domain.h
> > > +++ b/xen/arch/x86/include/asm/hvm/domain.h
> > > @@ -61,6 +61,19 @@ struct hvm_domain {
> > >      /* Cached CF8 for guest PCI config cycles */
> > >      uint32_t                pci_cf8;
> > >  
> > > +    /*
> > > +     * Xen had a bug between 4.4 and 4.18 by which the x2APIC LDR was
> > > +     * derived from the vcpu_id rather than the x2APIC ID. This is wrong,
> > > +     * but changing this behaviour is tricky because guests might have
> > > +     * already read the LDR and used it accordingly. In the interest of not
> > > +     * breaking migrations from those hypervisors we track here whether
> > > +     * this domain suffers from this or not so a hotplugged vCPU or an APIC
> > > +     * reset can recover the same LDR it would've had on the older host.
> > > +     *
> > > +     * Yuck!
> > > +     */
> > > +    bool has_inconsistent_x2apic_ldr_bug;
> > 
> > Could you place the new field after the existing boolean fields in the
> > struct? (AFAICT there's plenty of padding left there)
> Sure. I placed it somewhere with unused padding. (that u32 is sandwiched
> between an "unsigned long" and a pointer), but I'm happy to stash it with
> the other booleans.

Yes, there's plenty of padding but I feel it's better to place it with
the rest of the booleans, as there's also padding there.

> > 
> > I also think the field name is too long, I would rather use
> > x2apic_ldr_vcpu_id for example (to note that LDR is calculated from
> > vCPU ID rather than APIC ID).
> I made it intentionally long so it can't be missed that it's a workaround
> and not architectural behaviour. Would you consider
> "x2apic_ldr_bug_vcpu_id" acceptable? I'd rather keep at least the "bug"
> part of the identifier around so it's not lost to time that this is a
> workaround marker and nothing else

OK, if you think mentioning `bug` is helpful, I think it would be best
placed as a prefix: bug_x2apic_ldr_vcpu_id.  Having it in the middle
of the field name makes it harder to spot.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index a8e87c4446..7f169f1e5f 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1061,13 +1061,23 @@  static const struct hvm_mmio_ops vlapic_mmio_ops = {
     .write = vlapic_mmio_write,
 };
 
+static uint32_t x2apic_ldr_from_id(uint32_t id)
+{
+    return ((id & ~0xF) << 12) | (1 << (id & 0xF));
+}
+
 static void set_x2apic_id(struct vlapic *vlapic)
 {
-    u32 id = vlapic_vcpu(vlapic)->vcpu_id;
-    u32 ldr = ((id & ~0xf) << 12) | (1 << (id & 0xf));
+    uint32_t vcpu_id = vlapic_vcpu(vlapic)->vcpu_id;
+    uint32_t apic_id = vcpu_id * 2;
+    uint32_t apic_ldr = x2apic_ldr_from_id(apic_id);
 
-    vlapic_set_reg(vlapic, APIC_ID, id * 2);
-    vlapic_set_reg(vlapic, APIC_LDR, ldr);
+    /* This is a migration bug workaround. See wall of text in lapic_load_fixup() */
+    if ( vlapic_domain(vlapic)->arch.hvm.has_inconsistent_x2apic_ldr_bug )
+        apic_ldr = x2apic_ldr_from_id(vcpu_id);
+
+    vlapic_set_reg(vlapic, APIC_ID, apic_id);
+    vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
 }
 
 int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
@@ -1495,30 +1505,57 @@  static int cf_check lapic_save_regs(struct vcpu *v, hvm_domain_context_t *h)
 /*
  * Following lapic_load_hidden()/lapic_load_regs() we may need to
  * correct ID and LDR when they come from an old, broken hypervisor.
+ *
+ * Xen <= 4.4 had a bug by which all the APICs configured in x2APIC mode
+ * got LDR = 1. This was fixed back then, but another bug was introduced
+ * causing APIC ID and LDR to break the consistency they are meant to have
+ * according to the specs (LDR was derived from vCPU ID, rather than APIC
+ * ID)
+ *
+ * Long story short, we can detect both cases here. For the LDR=1 case we
+ * want to fix it up on migrate, as IPIs just don't work on non-physical
+ * mode otherwise. For the other case we actually want to preserve previous
+ * behaviour so that existing running instances that may have already read
+ * the LDR at the source host aren't surprised when IPIs stop working as
+ * they did at the other end.
+ *
+ * Note that "x2apic_id == 0" has always been correct and can't be used to
+ * discriminate these cases.
+ *
+ * Yuck!
  */
 static void lapic_load_fixup(struct vlapic *vlapic)
 {
-    uint32_t id = vlapic->loaded.id;
+    /*
+     * This LDR would be present in broken versions of Xen 4.4 through 4.18.
+     * It's correct for the cpu with x2apic_id=0 (vcpu0), but invalid for
+     * any other.
+     */
+    uint32_t bad_ldr = x2apic_ldr_from_id(vlapic_vcpu(vlapic)->vcpu_id);
 
-    if ( vlapic_x2apic_mode(vlapic) && id && vlapic->loaded.ldr == 1 )
-    {
+    /*
+     * No need to perform fixups in non-x2apic mode, and x2apic_id == 0 has
+     * always been correct.
+     */
+    if ( !vlapic_x2apic_mode(vlapic) || !vlapic->loaded.id )
+        return;
+
+    if ( vlapic->loaded.ldr == 1 )
+       /*
+        * Migration from a broken Xen 4.4 or earlier. We can't leave it
+        * as-is because it assigned the same LDR to every CPU. We'll fix
+        * the bug now and assign LDR values consistent with the APIC ID.
+        */
+        set_x2apic_id(vlapic);
+    else if ( bad_ldr == vlapic->loaded.ldr )
         /*
-         * This is optional: ID != 0 contradicts LDR == 1. It's being added
-         * to aid in eventual debugging of issues arising from the fixup done
-         * here, but can be dropped as soon as it is found to conflict with
-         * other (future) changes.
+         * This is a migration from a broken Xen between 4.4 and 4.18 and we
+         * must _PRESERVE_ LDRs so new vCPUs use consistent derivations. In
+         * this case we set this domain boolean so future CPU hotplugs
+         * derive an LDR consistent with the older Xen's broken idea of
+         * consistency.
          */
-        if ( GET_xAPIC_ID(id) != vlapic_vcpu(vlapic)->vcpu_id * 2 ||
-             id != SET_xAPIC_ID(GET_xAPIC_ID(id)) )
-            printk(XENLOG_G_WARNING "%pv: bogus APIC ID %#x loaded\n",
-                   vlapic_vcpu(vlapic), id);
-        set_x2apic_id(vlapic);
-    }
-    else /* Undo an eventual earlier fixup. */
-    {
-        vlapic_set_reg(vlapic, APIC_ID, id);
-        vlapic_set_reg(vlapic, APIC_LDR, vlapic->loaded.ldr);
-    }
+        vlapic_domain(vlapic)->arch.hvm.has_inconsistent_x2apic_ldr_bug = true;
 }
 
 static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h
index 6e53ce4449..a42a6e99bb 100644
--- a/xen/arch/x86/include/asm/hvm/domain.h
+++ b/xen/arch/x86/include/asm/hvm/domain.h
@@ -61,6 +61,19 @@  struct hvm_domain {
     /* Cached CF8 for guest PCI config cycles */
     uint32_t                pci_cf8;
 
+    /*
+     * Xen had a bug between 4.4 and 4.18 by which the x2APIC LDR was
+     * derived from the vcpu_id rather than the x2APIC ID. This is wrong,
+     * but changing this behaviour is tricky because guests might have
+     * already read the LDR and used it accordingly. In the interest of not
+     * breaking migrations from those hypervisors we track here whether
+     * this domain suffers from this or not so a hotplugged vCPU or an APIC
+     * reset can recover the same LDR it would've had on the older host.
+     *
+     * Yuck!
+     */
+    bool has_inconsistent_x2apic_ldr_bug;
+
     struct pl_time         *pl_time;
 
     struct hvm_io_handler *io_handler;