diff mbox series

[v2,2/3] x86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate

Message ID 20230601144845.1554589-3-andrew.cooper3@citrix.com (mailing list archive)
State Superseded
Headers show
Series x86: RSBA and RRSBA handling | expand

Commit Message

Andrew Cooper June 1, 2023, 2:48 p.m. UTC
In order to level a VM safely for migration, the toolstack needs to know the
RSBA/RRSBA properties of the CPU, whether or not they happen to be enumerated.

See the code comment for details.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>

v2:
 * Rewrite almost from scratch.
---
 xen/arch/x86/include/asm/cpufeature.h |  1 +
 xen/arch/x86/spec_ctrl.c              | 92 +++++++++++++++++++++++++--
 2 files changed, 88 insertions(+), 5 deletions(-)

Comments

Jan Beulich June 2, 2023, 9:56 a.m. UTC | #1
On 01.06.2023 16:48, Andrew Cooper wrote:
> @@ -593,15 +596,85 @@ static bool __init retpoline_calculations(void)
>          return false;
>  
>      /*
> -     * RSBA may be set by a hypervisor to indicate that we may move to a
> -     * processor which isn't retpoline-safe.
> +     * The meaning of the RSBA and RRSBA bits have evolved over time.  The
> +     * agreed upon meaning at the time of writing (May 2023) is thus:
> +     *
> +     * - RSBA (RSB Alternative) means that an RSB may fall back to an
> +     *   alternative predictor on underflow.  Skylake uarch and later all have
> +     *   this property.  Broadwell too, when running microcode versions prior
> +     *   to Jan 2018.
> +     *
> +     * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
> +     *   tagging of predictions with the mode in which they were learned.  So
> +     *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
> +     *
> +     * - CPUs are not expected to enumerate both RSBA and RRSBA.
> +     *
> +     * Some parts (Broadwell) are not expected to ever enumerate this
> +     * behaviour directly.  Other parts have differing enumeration with
> +     * microcode version.  Fix up Xen's idea, so we can advertise them safely
> +     * to guests, and so toolstacks can level a VM safety for migration.
> +     *
> +     * The following states exist:
> +     *
> +     * |   | RSBA | EIBRS | RRSBA | Notes              | Action        |
> +     * |---+------+-------+-------+--------------------+---------------|
> +     * | 1 |    0 |     0 |     0 | OK (older parts)   | Maybe +RSBA   |
> +     * | 2 |    0 |     0 |     1 | Broken             | +RSBA, -RRSBA |
> +     * | 3 |    0 |     1 |     0 | OK (pre-Aug ucode) | +RRSBA        |
> +     * | 4 |    0 |     1 |     1 | OK                 |               |
> +     * | 5 |    1 |     0 |     0 | OK                 |               |
> +     * | 6 |    1 |     0 |     1 | Broken             | -RRSBA        |
> +     * | 7 |    1 |     1 |     0 | Broken             | -RSBA, +RRSBA |
> +     * | 8 |    1 |     1 |     1 | Broken             | -RSBA         |
>       *
> +     * However, we doesn't need perfect adherence to the spec.  Identify the

Nit: "don't" or "it"?

> +     * broken cases (so we stand a chance of spotting violated assumptions),
> +     * and fix up Rows 1 and 3 so Xen can use RSBA || RRSBA to identify
> +     * "alternative predictors potentially in use".

Considering that it's rows 2, 6, 7, and 8 which are broken, I find this
comment a little misleading. To me it doesn't become clear whether them
subsequently being left alone (and merely a log message issued) is
intentional.

> +     */
> +    if ( cpu_has_eibrs ? cpu_has_rsba  /* Rows 7, 8 */
> +                       : cpu_has_rrsba /* Rows 2, 6 */ )
> +        printk(XENLOG_ERR
> +               "FIRMWARE BUG: CPU %02x-%02x-%02x, ucode 0x%08x: RSBA %u, EIBRS %u, RRSBA %u\n",
> +               boot_cpu_data.x86, boot_cpu_data.x86_model,
> +               boot_cpu_data.x86_mask, ucode_rev,
> +               cpu_has_rsba, cpu_has_eibrs, cpu_has_rrsba);
> +
> +    /*
>       * Processors offering Enhanced IBRS are not guarenteed to be
>       * repoline-safe.
>       */
> -    if ( cpu_has_rsba || cpu_has_eibrs )
> +    if ( cpu_has_eibrs )
> +    {
> +        /*
> +         * Prior to the August 2023 microcode, many eIBRS-capable parts did
> +         * not enumerate RRSBA.
> +         */
> +        if ( !cpu_has_rrsba )
> +            setup_force_cpu_cap(X86_FEATURE_RRSBA);
> +
> +        return false;
> +    }

No clearing of RSBA in this case? I fear we may end up with misbehavior if
our own records aren't kept consistent with our assumptions. (This then
extends to the "just a log message" above as well.)

Also the inner conditional continues to strike me as odd; could you add
half a sentence to the comment (or description) as to meaning to leave
is_forced_cpu_cap() function correctly (which in turn raises the question
whether - down the road - this is actually going to matter)?

> +    /*
> +     * RSBA is explicitly enumerated in some cases, but may also be set by a
> +     * hypervisor to indicate that we may move to a processor which isn't
> +     * retpoline-safe.
> +     */
> +    if ( cpu_has_rsba )
>          return false;
>  
> +    /*
> +     * At this point, we've filtered all the legal RSBA || RRSBA cases (or the
> +     * known non-ideal cases).  If ARCH_CAPS is visible, trust the absence of
> +     * RSBA || RRSBA.  There's no known microcode which advertises ARCH_CAPS
> +     * without RSBA or EIBRS, and if we're virtualised we can't rely the model
> +     * check anyway.
> +     */

I think "no known" wants further qualification: When IBRSB was first
introduced, EIBRS and RSBA weren't even known about. I also didn't
think all hardware (i.e. sufficiently old one) did get newer ucode
when these started to be known. Possibly you mean "... which wrongly
advertises ..."?

> @@ -689,6 +762,15 @@ static bool __init retpoline_calculations(void)
>          break;
>      }
>  
> +    if ( !safe )
> +    {
> +        /*
> +         * Note: the eIBRS-capable parts are filtered out earlier, so the
> +         * remainder here are the ones which suffer only RSBA behaviour.
> +         */

As I think I had mentioned already, I find "only" here odd, when RSBA
is more severe than RRSBA. Maybe the "only" could move earlier, e.g.
between "are" and "the"? Then again this may be another non-native-
speaker issue of mine ...

Jan
Andrew Cooper June 2, 2023, 1:57 p.m. UTC | #2
On 02/06/2023 10:56 am, Jan Beulich wrote:
> On 01.06.2023 16:48, Andrew Cooper wrote:
>> @@ -593,15 +596,85 @@ static bool __init retpoline_calculations(void)
>>          return false;
>>  
>>      /*
>> -     * RSBA may be set by a hypervisor to indicate that we may move to a
>> -     * processor which isn't retpoline-safe.
>> +     * The meaning of the RSBA and RRSBA bits have evolved over time.  The
>> +     * agreed upon meaning at the time of writing (May 2023) is thus:
>> +     *
>> +     * - RSBA (RSB Alternative) means that an RSB may fall back to an
>> +     *   alternative predictor on underflow.  Skylake uarch and later all have
>> +     *   this property.  Broadwell too, when running microcode versions prior
>> +     *   to Jan 2018.
>> +     *
>> +     * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
>> +     *   tagging of predictions with the mode in which they were learned.  So
>> +     *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
>> +     *
>> +     * - CPUs are not expected to enumerate both RSBA and RRSBA.
>> +     *
>> +     * Some parts (Broadwell) are not expected to ever enumerate this
>> +     * behaviour directly.  Other parts have differing enumeration with
>> +     * microcode version.  Fix up Xen's idea, so we can advertise them safely
>> +     * to guests, and so toolstacks can level a VM safety for migration.
>> +     *
>> +     * The following states exist:
>> +     *
>> +     * |   | RSBA | EIBRS | RRSBA | Notes              | Action        |
>> +     * |---+------+-------+-------+--------------------+---------------|
>> +     * | 1 |    0 |     0 |     0 | OK (older parts)   | Maybe +RSBA   |
>> +     * | 2 |    0 |     0 |     1 | Broken             | +RSBA, -RRSBA |
>> +     * | 3 |    0 |     1 |     0 | OK (pre-Aug ucode) | +RRSBA        |
>> +     * | 4 |    0 |     1 |     1 | OK                 |               |
>> +     * | 5 |    1 |     0 |     0 | OK                 |               |
>> +     * | 6 |    1 |     0 |     1 | Broken             | -RRSBA        |
>> +     * | 7 |    1 |     1 |     0 | Broken             | -RSBA, +RRSBA |
>> +     * | 8 |    1 |     1 |     1 | Broken             | -RSBA         |
>>       *
>> +     * However, we doesn't need perfect adherence to the spec.  Identify the
> Nit: "don't" or "it"?

Oops.  This used to read "Xen doesn't need".  So much for last minute
changes.

>
>> +     * broken cases (so we stand a chance of spotting violated assumptions),
>> +     * and fix up Rows 1 and 3 so Xen can use RSBA || RRSBA to identify
>> +     * "alternative predictors potentially in use".
> Considering that it's rows 2, 6, 7, and 8 which are broken, I find this
> comment a little misleading. To me it doesn't become clear whether them
> subsequently being left alone (and merely a log message issued) is
> intentional.

It is intentional.

I don't know if these combinations exist in practice anywhere or not. 
Intel think they oughtn't to, and it's quite possible that the printk()
is unreachable, but given the complexity and shifting meanings over time
here, I think it would be unwise to simply assume this to be true.

But at the same time, if it is an unreachable code, it would be equally
unwise to have a load of fixup code which we can't test.  I've still got
the fixup code in a separate patch incase we need to put it back in.

I have checked that this printk() doesn't trigger for any of the CPUs
and microcode combinations I have easily to hand, but it's not an
exhaustive test.

>
>> +     */
>> +    if ( cpu_has_eibrs ? cpu_has_rsba  /* Rows 7, 8 */
>> +                       : cpu_has_rrsba /* Rows 2, 6 */ )
>> +        printk(XENLOG_ERR
>> +               "FIRMWARE BUG: CPU %02x-%02x-%02x, ucode 0x%08x: RSBA %u, EIBRS %u, RRSBA %u\n",
>> +               boot_cpu_data.x86, boot_cpu_data.x86_model,
>> +               boot_cpu_data.x86_mask, ucode_rev,
>> +               cpu_has_rsba, cpu_has_eibrs, cpu_has_rrsba);
>> +
>> +    /*
>>       * Processors offering Enhanced IBRS are not guarenteed to be
>>       * repoline-safe.
>>       */
>> -    if ( cpu_has_rsba || cpu_has_eibrs )
>> +    if ( cpu_has_eibrs )
>> +    {
>> +        /*
>> +         * Prior to the August 2023 microcode, many eIBRS-capable parts did
>> +         * not enumerate RRSBA.
>> +         */
>> +        if ( !cpu_has_rrsba )
>> +            setup_force_cpu_cap(X86_FEATURE_RRSBA);
>> +
>> +        return false;
>> +    }
> No clearing of RSBA in this case? I fear we may end up with misbehavior if
> our own records aren't kept consistent with our assumptions. (This then
> extends to the "just a log message" above as well.)

Well quite, which is why I've gone to lengths to state what our
assumptions are.

Right now, there is nothing in Xen itself where RSBA vs RRSBA matters. 
Until this patch, we don't even have cpu_has_rrsba, and remember that
Xen was not vulnerable to CVE-2022-29901 (Intel Retbleed) because we
chose to use the microcode IBRS implementation on early Skylake, rather
than hope that Retpoline was safe enough and go with the faster option.


In v1, having RSBA and RRSBA (working as I thought they were supposed to
work) *did* matter for the default cpu-policy derivation to work nicely.

But that was invalidated by the clarification to say that RSBA and RRSBA
should never be seen together, which in turn completely changed the
derivation logic.

In v2, it doesn't matter if Xen ends up seeing both RSBA and RRSBA.  It
explicitly can cope (by treating them the same WRT Retpoline), and the
derivation logic now calculates both completely from scratch (and based
on RSBA || RRSBA).

If Xen's assumptions change, then of course we could end up with
misbehaviour, but I think it's unlikely, and I don't think this code is
any more liable to misbehave than anything else in spec-ctrl.c.

> Also the inner conditional continues to strike me as odd; could you add
> half a sentence to the comment (or description) as to meaning to leave
> is_forced_cpu_cap() function correctly (which in turn raises the question
> whether - down the road - this is actually going to matter)?

Look at the single user of is_forced_cpu_cap().

I am not micro-optimising a single branch out of the init section on the
blind hope that the contradictory behaviour it creates won't matter in
the future.  Every forced cap is an abnormal case, and it's almost
certainly my future time which will be spent unravelling the
contradictory behaviour when it comes back to bite.

>> +    /*
>> +     * RSBA is explicitly enumerated in some cases, but may also be set by a
>> +     * hypervisor to indicate that we may move to a processor which isn't
>> +     * retpoline-safe.
>> +     */
>> +    if ( cpu_has_rsba )
>>          return false;
>>  
>> +    /*
>> +     * At this point, we've filtered all the legal RSBA || RRSBA cases (or the
>> +     * known non-ideal cases).  If ARCH_CAPS is visible, trust the absence of
>> +     * RSBA || RRSBA.  There's no known microcode which advertises ARCH_CAPS
>> +     * without RSBA or EIBRS, and if we're virtualised we can't rely the model
>> +     * check anyway.
>> +     */
> I think "no known" wants further qualification: When IBRSB was first
> introduced, EIBRS and RSBA weren't even known about. I also didn't
> think all hardware (i.e. sufficiently old one) did get newer ucode
> when these started to be known. Possibly you mean "... which wrongly
> advertises ..."?

ARCH_CAPS equally didn't exit originally.  ARCH_CAPS, RSBA and EIBRS all
appeared together - see how they're bits 0 and 1 in the MSR.  RRSBA on
the other hand is bit 19, which gives you some idea of how recent it is.

The original intention (AIUI) was that ARCH_CAPS would only exist in
CLX/CFL-R and later which had EIBRS.  But it had to be retrofitted to
older parts in order to enumerate energy-filtering to fix the RAPL
attack against SGX.

The guidance (again, AIUI) was always that if you can see ARCH_CAPS you
should trust the value, if for no other reason than "your hypervisor
will want you not to use a model check".  And this is also why it's
taken so long for us to ARCH_CAPS advertised - advertising ARCH_CAPS and
getting RSBA wrong is worse than "sorry, you're on your own".


None of this is perfect - it was put together in reaction to emergency
situations, where "doing the best we can, urgently" is far more
important than missing the deadline.

Personally, I don't think the RRSBA bit is useful, and I argued against
introducing it.  It literally means "RSBA, with EIBRS restricting which
alternative predictions to select from", and IMO adds complexity with no
benefit.  But others wanted it, and the rest is history.

>
>> @@ -689,6 +762,15 @@ static bool __init retpoline_calculations(void)
>>          break;
>>      }
>>  
>> +    if ( !safe )
>> +    {
>> +        /*
>> +         * Note: the eIBRS-capable parts are filtered out earlier, so the
>> +         * remainder here are the ones which suffer only RSBA behaviour.
>> +         */
> As I think I had mentioned already, I find "only" here odd, when RSBA
> is more severe than RRSBA. Maybe the "only" could move earlier, e.g.
> between "are" and "the"? Then again this may be another non-native-
> speaker issue of mine ...

Well, that is something which has arguably changed between v1 and v2.

Originally, this was really just the Broadwell and Skylake case, and the
point was to explain why we weren't adjusting RRSBA too.

But yeah, I think the "only" can be dropped given the other
rearrangements in v2.

~Andrew
Jan Beulich June 5, 2023, 7:48 a.m. UTC | #3
On 02.06.2023 15:57, Andrew Cooper wrote:
> On 02/06/2023 10:56 am, Jan Beulich wrote:
>> On 01.06.2023 16:48, Andrew Cooper wrote:
>>> @@ -593,15 +596,85 @@ static bool __init retpoline_calculations(void)
>>>          return false;
>>>  
>>>      /*
>>> -     * RSBA may be set by a hypervisor to indicate that we may move to a
>>> -     * processor which isn't retpoline-safe.
>>> +     * The meaning of the RSBA and RRSBA bits have evolved over time.  The
>>> +     * agreed upon meaning at the time of writing (May 2023) is thus:
>>> +     *
>>> +     * - RSBA (RSB Alternative) means that an RSB may fall back to an
>>> +     *   alternative predictor on underflow.  Skylake uarch and later all have
>>> +     *   this property.  Broadwell too, when running microcode versions prior
>>> +     *   to Jan 2018.
>>> +     *
>>> +     * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
>>> +     *   tagging of predictions with the mode in which they were learned.  So
>>> +     *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
>>> +     *
>>> +     * - CPUs are not expected to enumerate both RSBA and RRSBA.
>>> +     *
>>> +     * Some parts (Broadwell) are not expected to ever enumerate this
>>> +     * behaviour directly.  Other parts have differing enumeration with
>>> +     * microcode version.  Fix up Xen's idea, so we can advertise them safely
>>> +     * to guests, and so toolstacks can level a VM safety for migration.
>>> +     *
>>> +     * The following states exist:
>>> +     *
>>> +     * |   | RSBA | EIBRS | RRSBA | Notes              | Action        |
>>> +     * |---+------+-------+-------+--------------------+---------------|
>>> +     * | 1 |    0 |     0 |     0 | OK (older parts)   | Maybe +RSBA   |
>>> +     * | 2 |    0 |     0 |     1 | Broken             | +RSBA, -RRSBA |
>>> +     * | 3 |    0 |     1 |     0 | OK (pre-Aug ucode) | +RRSBA        |
>>> +     * | 4 |    0 |     1 |     1 | OK                 |               |
>>> +     * | 5 |    1 |     0 |     0 | OK                 |               |
>>> +     * | 6 |    1 |     0 |     1 | Broken             | -RRSBA        |
>>> +     * | 7 |    1 |     1 |     0 | Broken             | -RSBA, +RRSBA |
>>> +     * | 8 |    1 |     1 |     1 | Broken             | -RSBA         |
>>>       *
>>> +     * However, we doesn't need perfect adherence to the spec.  Identify the
>>> +     * broken cases (so we stand a chance of spotting violated assumptions),
>>> +     * and fix up Rows 1 and 3 so Xen can use RSBA || RRSBA to identify
>>> +     * "alternative predictors potentially in use".
>> Considering that it's rows 2, 6, 7, and 8 which are broken, I find this
>> comment a little misleading. To me it doesn't become clear whether them
>> subsequently being left alone (and merely a log message issued) is
>> intentional.
> 
> It is intentional.
> 
> I don't know if these combinations exist in practice anywhere or not. 
> Intel think they oughtn't to, and it's quite possible that the printk()
> is unreachable, but given the complexity and shifting meanings over time
> here, I think it would be unwise to simply assume this to be true.

I agree.

> But at the same time, if it is an unreachable code, it would be equally
> unwise to have a load of fixup code which we can't test.  I've still got
> the fixup code in a separate patch incase we need to put it back in.

Iirc the fixup code you had wasn't really "a load of code". Thing though
is: If such a combination did exist, according to our history we'd be at
least on the edge of needing to issue an XSA along with adding the
missing fixup code. From all I can tell that risk would be lower if we
had that fixup code: It might well be correct.

Nevertheless, if you decide to leave out any fixup, may I ask that you
say so very explicitly in the comment?

>>> +     */
>>> +    if ( cpu_has_eibrs ? cpu_has_rsba  /* Rows 7, 8 */
>>> +                       : cpu_has_rrsba /* Rows 2, 6 */ )
>>> +        printk(XENLOG_ERR
>>> +               "FIRMWARE BUG: CPU %02x-%02x-%02x, ucode 0x%08x: RSBA %u, EIBRS %u, RRSBA %u\n",
>>> +               boot_cpu_data.x86, boot_cpu_data.x86_model,
>>> +               boot_cpu_data.x86_mask, ucode_rev,
>>> +               cpu_has_rsba, cpu_has_eibrs, cpu_has_rrsba);
>>> +
>>> +    /*
>>>       * Processors offering Enhanced IBRS are not guarenteed to be
>>>       * repoline-safe.
>>>       */
>>> -    if ( cpu_has_rsba || cpu_has_eibrs )
>>> +    if ( cpu_has_eibrs )
>>> +    {
>>> +        /*
>>> +         * Prior to the August 2023 microcode, many eIBRS-capable parts did
>>> +         * not enumerate RRSBA.
>>> +         */
>>> +        if ( !cpu_has_rrsba )
>>> +            setup_force_cpu_cap(X86_FEATURE_RRSBA);
>>> +
>>> +        return false;
>>> +    }
>> No clearing of RSBA in this case? I fear we may end up with misbehavior if
>> our own records aren't kept consistent with our assumptions. (This then
>> extends to the "just a log message" above as well.)
> 
> Well quite, which is why I've gone to lengths to state what our
> assumptions are.
> 
> Right now, there is nothing in Xen itself where RSBA vs RRSBA matters. 
> Until this patch, we don't even have cpu_has_rrsba, and remember that
> Xen was not vulnerable to CVE-2022-29901 (Intel Retbleed) because we
> chose to use the microcode IBRS implementation on early Skylake, rather
> than hope that Retpoline was safe enough and go with the faster option.
> 
> 
> In v1, having RSBA and RRSBA (working as I thought they were supposed to
> work) *did* matter for the default cpu-policy derivation to work nicely.
> 
> But that was invalidated by the clarification to say that RSBA and RRSBA
> should never be seen together, which in turn completely changed the
> derivation logic.
> 
> In v2, it doesn't matter if Xen ends up seeing both RSBA and RRSBA.  It
> explicitly can cope (by treating them the same WRT Retpoline), and the
> derivation logic now calculates both completely from scratch (and based
> on RSBA || RRSBA).

Like above, may I ask that you say so explicitly in the / a comment right
here?

>> Also the inner conditional continues to strike me as odd; could you add
>> half a sentence to the comment (or description) as to meaning to leave
>> is_forced_cpu_cap() function correctly (which in turn raises the question
>> whether - down the road - this is actually going to matter)?
> 
> Look at the single user of is_forced_cpu_cap().
> 
> I am not micro-optimising a single branch out of the init section on the
> blind hope that the contradictory behaviour it creates won't matter in
> the future.  Every forced cap is an abnormal case, and it's almost
> certainly my future time which will be spent unravelling the
> contradictory behaviour when it comes back to bite.

My request isn't about optimization at all, but about an apparent pattern
of unnecessary redundancy (which only as a side effect leads to the
elimination of a branch and hence some tiny bit of optimization). But if
you're sure this is going to be obvious to everyone but me, I'm not going
to insist.

>>> +    /*
>>> +     * RSBA is explicitly enumerated in some cases, but may also be set by a
>>> +     * hypervisor to indicate that we may move to a processor which isn't
>>> +     * retpoline-safe.
>>> +     */
>>> +    if ( cpu_has_rsba )
>>>          return false;
>>>  
>>> +    /*
>>> +     * At this point, we've filtered all the legal RSBA || RRSBA cases (or the
>>> +     * known non-ideal cases).  If ARCH_CAPS is visible, trust the absence of
>>> +     * RSBA || RRSBA.  There's no known microcode which advertises ARCH_CAPS
>>> +     * without RSBA or EIBRS, and if we're virtualised we can't rely the model
>>> +     * check anyway.
>>> +     */
>> I think "no known" wants further qualification: When IBRSB was first
>> introduced, EIBRS and RSBA weren't even known about. I also didn't
>> think all hardware (i.e. sufficiently old one) did get newer ucode
>> when these started to be known. Possibly you mean "... which wrongly
>> advertises ..."?
> 
> ARCH_CAPS equally didn't exit originally.  ARCH_CAPS, RSBA and EIBRS all
> appeared together - see how they're bits 0 and 1 in the MSR.  RRSBA on
> the other hand is bit 19, which gives you some idea of how recent it is.

Hmm, yes, I see.

Jan
Jan Beulich June 5, 2023, 7:50 a.m. UTC | #4
On 05.06.2023 09:48, Jan Beulich wrote:
> On 02.06.2023 15:57, Andrew Cooper wrote:
>> On 02/06/2023 10:56 am, Jan Beulich wrote:
>>> On 01.06.2023 16:48, Andrew Cooper wrote:
>>>> @@ -593,15 +596,85 @@ static bool __init retpoline_calculations(void)
>>>>          return false;
>>>>  
>>>>      /*
>>>> -     * RSBA may be set by a hypervisor to indicate that we may move to a
>>>> -     * processor which isn't retpoline-safe.
>>>> +     * The meaning of the RSBA and RRSBA bits have evolved over time.  The
>>>> +     * agreed upon meaning at the time of writing (May 2023) is thus:
>>>> +     *
>>>> +     * - RSBA (RSB Alternative) means that an RSB may fall back to an
>>>> +     *   alternative predictor on underflow.  Skylake uarch and later all have
>>>> +     *   this property.  Broadwell too, when running microcode versions prior
>>>> +     *   to Jan 2018.
>>>> +     *
>>>> +     * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
>>>> +     *   tagging of predictions with the mode in which they were learned.  So
>>>> +     *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
>>>> +     *
>>>> +     * - CPUs are not expected to enumerate both RSBA and RRSBA.
>>>> +     *
>>>> +     * Some parts (Broadwell) are not expected to ever enumerate this
>>>> +     * behaviour directly.  Other parts have differing enumeration with
>>>> +     * microcode version.  Fix up Xen's idea, so we can advertise them safely
>>>> +     * to guests, and so toolstacks can level a VM safety for migration.
>>>> +     *
>>>> +     * The following states exist:
>>>> +     *
>>>> +     * |   | RSBA | EIBRS | RRSBA | Notes              | Action        |
>>>> +     * |---+------+-------+-------+--------------------+---------------|
>>>> +     * | 1 |    0 |     0 |     0 | OK (older parts)   | Maybe +RSBA   |
>>>> +     * | 2 |    0 |     0 |     1 | Broken             | +RSBA, -RRSBA |
>>>> +     * | 3 |    0 |     1 |     0 | OK (pre-Aug ucode) | +RRSBA        |
>>>> +     * | 4 |    0 |     1 |     1 | OK                 |               |
>>>> +     * | 5 |    1 |     0 |     0 | OK                 |               |
>>>> +     * | 6 |    1 |     0 |     1 | Broken             | -RRSBA        |
>>>> +     * | 7 |    1 |     1 |     0 | Broken             | -RSBA, +RRSBA |
>>>> +     * | 8 |    1 |     1 |     1 | Broken             | -RSBA         |
>>>>       *
>>>> +     * However, we doesn't need perfect adherence to the spec.  Identify the
>>>> +     * broken cases (so we stand a chance of spotting violated assumptions),
>>>> +     * and fix up Rows 1 and 3 so Xen can use RSBA || RRSBA to identify
>>>> +     * "alternative predictors potentially in use".
>>> Considering that it's rows 2, 6, 7, and 8 which are broken, I find this
>>> comment a little misleading. To me it doesn't become clear whether them
>>> subsequently being left alone (and merely a log message issued) is
>>> intentional.
>>
>> It is intentional.
>>
>> I don't know if these combinations exist in practice anywhere or not. 
>> Intel think they oughtn't to, and it's quite possible that the printk()
>> is unreachable, but given the complexity and shifting meanings over time
>> here, I think it would be unwise to simply assume this to be true.
> 
> I agree.

Thinking of it - would we perhaps want to go a step further an taint the
system in such a case? I would then view this as kind of "Xen not
(security) supported on this hardware." Until we manage to fix (or work
around) the issue.

Jan
Andrew Cooper June 5, 2023, 9:44 a.m. UTC | #5
On 05/06/2023 8:50 am, Jan Beulich wrote:
> On 05.06.2023 09:48, Jan Beulich wrote:
>> On 02.06.2023 15:57, Andrew Cooper wrote:
>>> On 02/06/2023 10:56 am, Jan Beulich wrote:
>>>> On 01.06.2023 16:48, Andrew Cooper wrote:
>>>>> @@ -593,15 +596,85 @@ static bool __init retpoline_calculations(void)
>>>>>          return false;
>>>>>  
>>>>>      /*
>>>>> -     * RSBA may be set by a hypervisor to indicate that we may move to a
>>>>> -     * processor which isn't retpoline-safe.
>>>>> +     * The meaning of the RSBA and RRSBA bits have evolved over time.  The
>>>>> +     * agreed upon meaning at the time of writing (May 2023) is thus:
>>>>> +     *
>>>>> +     * - RSBA (RSB Alternative) means that an RSB may fall back to an
>>>>> +     *   alternative predictor on underflow.  Skylake uarch and later all have
>>>>> +     *   this property.  Broadwell too, when running microcode versions prior
>>>>> +     *   to Jan 2018.
>>>>> +     *
>>>>> +     * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
>>>>> +     *   tagging of predictions with the mode in which they were learned.  So
>>>>> +     *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
>>>>> +     *
>>>>> +     * - CPUs are not expected to enumerate both RSBA and RRSBA.
>>>>> +     *
>>>>> +     * Some parts (Broadwell) are not expected to ever enumerate this
>>>>> +     * behaviour directly.  Other parts have differing enumeration with
>>>>> +     * microcode version.  Fix up Xen's idea, so we can advertise them safely
>>>>> +     * to guests, and so toolstacks can level a VM safety for migration.
>>>>> +     *
>>>>> +     * The following states exist:
>>>>> +     *
>>>>> +     * |   | RSBA | EIBRS | RRSBA | Notes              | Action        |
>>>>> +     * |---+------+-------+-------+--------------------+---------------|
>>>>> +     * | 1 |    0 |     0 |     0 | OK (older parts)   | Maybe +RSBA   |
>>>>> +     * | 2 |    0 |     0 |     1 | Broken             | +RSBA, -RRSBA |
>>>>> +     * | 3 |    0 |     1 |     0 | OK (pre-Aug ucode) | +RRSBA        |
>>>>> +     * | 4 |    0 |     1 |     1 | OK                 |               |
>>>>> +     * | 5 |    1 |     0 |     0 | OK                 |               |
>>>>> +     * | 6 |    1 |     0 |     1 | Broken             | -RRSBA        |
>>>>> +     * | 7 |    1 |     1 |     0 | Broken             | -RSBA, +RRSBA |
>>>>> +     * | 8 |    1 |     1 |     1 | Broken             | -RSBA         |
>>>>>       *
>>>>> +     * However, we doesn't need perfect adherence to the spec.  Identify the
>>>>> +     * broken cases (so we stand a chance of spotting violated assumptions),
>>>>> +     * and fix up Rows 1 and 3 so Xen can use RSBA || RRSBA to identify
>>>>> +     * "alternative predictors potentially in use".
>>>> Considering that it's rows 2, 6, 7, and 8 which are broken, I find this
>>>> comment a little misleading. To me it doesn't become clear whether them
>>>> subsequently being left alone (and merely a log message issued) is
>>>> intentional.
>>> It is intentional.
>>>
>>> I don't know if these combinations exist in practice anywhere or not. 
>>> Intel think they oughtn't to, and it's quite possible that the printk()
>>> is unreachable, but given the complexity and shifting meanings over time
>>> here, I think it would be unwise to simply assume this to be true.
>> I agree.
> Thinking of it - would we perhaps want to go a step further an taint the
> system in such a case? I would then view this as kind of "Xen not
> (security) supported on this hardware." Until we manage to fix (or work
> around) the issue.

'S' for out-of-spec seems like it fits the bill.

In fact, that can also be used for the CET vs MSR_SPEC_CTRL cross-check
at the start of init_speculation_mitigations().

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/include/asm/cpufeature.h b/xen/arch/x86/include/asm/cpufeature.h
index ace31e3b1f1a..e2cb8f3cc728 100644
--- a/xen/arch/x86/include/asm/cpufeature.h
+++ b/xen/arch/x86/include/asm/cpufeature.h
@@ -193,6 +193,7 @@  static inline bool boot_cpu_has(unsigned int feat)
 #define cpu_has_tsx_ctrl        boot_cpu_has(X86_FEATURE_TSX_CTRL)
 #define cpu_has_taa_no          boot_cpu_has(X86_FEATURE_TAA_NO)
 #define cpu_has_fb_clear        boot_cpu_has(X86_FEATURE_FB_CLEAR)
+#define cpu_has_rrsba           boot_cpu_has(X86_FEATURE_RRSBA)
 
 /* Synthesized. */
 #define cpu_has_arch_perfmon    boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index daee61900afa..29ed410da47a 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -579,7 +579,10 @@  static bool __init check_smt_enabled(void)
     return false;
 }
 
-/* Calculate whether Retpoline is known-safe on this CPU. */
+/*
+ * Calculate whether Retpoline is known-safe on this CPU.  Fix up the
+ * RSBA/RRSBA bits as necessary.
+ */
 static bool __init retpoline_calculations(void)
 {
     unsigned int ucode_rev = this_cpu(cpu_sig).rev;
@@ -593,15 +596,85 @@  static bool __init retpoline_calculations(void)
         return false;
 
     /*
-     * RSBA may be set by a hypervisor to indicate that we may move to a
-     * processor which isn't retpoline-safe.
+     * The meaning of the RSBA and RRSBA bits have evolved over time.  The
+     * agreed upon meaning at the time of writing (May 2023) is thus:
+     *
+     * - RSBA (RSB Alternative) means that an RSB may fall back to an
+     *   alternative predictor on underflow.  Skylake uarch and later all have
+     *   this property.  Broadwell too, when running microcode versions prior
+     *   to Jan 2018.
+     *
+     * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
+     *   tagging of predictions with the mode in which they were learned.  So
+     *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
+     *
+     * - CPUs are not expected to enumerate both RSBA and RRSBA.
+     *
+     * Some parts (Broadwell) are not expected to ever enumerate this
+     * behaviour directly.  Other parts have differing enumeration with
+     * microcode version.  Fix up Xen's idea, so we can advertise them safely
+     * to guests, and so toolstacks can level a VM safety for migration.
+     *
+     * The following states exist:
+     *
+     * |   | RSBA | EIBRS | RRSBA | Notes              | Action        |
+     * |---+------+-------+-------+--------------------+---------------|
+     * | 1 |    0 |     0 |     0 | OK (older parts)   | Maybe +RSBA   |
+     * | 2 |    0 |     0 |     1 | Broken             | +RSBA, -RRSBA |
+     * | 3 |    0 |     1 |     0 | OK (pre-Aug ucode) | +RRSBA        |
+     * | 4 |    0 |     1 |     1 | OK                 |               |
+     * | 5 |    1 |     0 |     0 | OK                 |               |
+     * | 6 |    1 |     0 |     1 | Broken             | -RRSBA        |
+     * | 7 |    1 |     1 |     0 | Broken             | -RSBA, +RRSBA |
+     * | 8 |    1 |     1 |     1 | Broken             | -RSBA         |
      *
+     * However, we doesn't need perfect adherence to the spec.  Identify the
+     * broken cases (so we stand a chance of spotting violated assumptions),
+     * and fix up Rows 1 and 3 so Xen can use RSBA || RRSBA to identify
+     * "alternative predictors potentially in use".
+     */
+    if ( cpu_has_eibrs ? cpu_has_rsba  /* Rows 7, 8 */
+                       : cpu_has_rrsba /* Rows 2, 6 */ )
+        printk(XENLOG_ERR
+               "FIRMWARE BUG: CPU %02x-%02x-%02x, ucode 0x%08x: RSBA %u, EIBRS %u, RRSBA %u\n",
+               boot_cpu_data.x86, boot_cpu_data.x86_model,
+               boot_cpu_data.x86_mask, ucode_rev,
+               cpu_has_rsba, cpu_has_eibrs, cpu_has_rrsba);
+
+    /*
      * Processors offering Enhanced IBRS are not guarenteed to be
      * repoline-safe.
      */
-    if ( cpu_has_rsba || cpu_has_eibrs )
+    if ( cpu_has_eibrs )
+    {
+        /*
+         * Prior to the August 2023 microcode, many eIBRS-capable parts did
+         * not enumerate RRSBA.
+         */
+        if ( !cpu_has_rrsba )
+            setup_force_cpu_cap(X86_FEATURE_RRSBA);
+
+        return false;
+    }
+
+    /*
+     * RSBA is explicitly enumerated in some cases, but may also be set by a
+     * hypervisor to indicate that we may move to a processor which isn't
+     * retpoline-safe.
+     */
+    if ( cpu_has_rsba )
         return false;
 
+    /*
+     * At this point, we've filtered all the legal RSBA || RRSBA cases (or the
+     * known non-ideal cases).  If ARCH_CAPS is visible, trust the absence of
+     * RSBA || RRSBA.  There's no known microcode which advertises ARCH_CAPS
+     * without RSBA or EIBRS, and if we're virtualised we can't rely the model
+     * check anyway.
+     */
+    if ( cpu_has_arch_caps )
+        return true;
+
     switch ( boot_cpu_data.x86_model )
     {
     case 0x17: /* Penryn */
@@ -689,6 +762,15 @@  static bool __init retpoline_calculations(void)
         break;
     }
 
+    if ( !safe )
+    {
+        /*
+         * Note: the eIBRS-capable parts are filtered out earlier, so the
+         * remainder here are the ones which suffer only RSBA behaviour.
+         */
+        setup_force_cpu_cap(X86_FEATURE_RSBA);
+    }
+
     return safe;
 }
 
@@ -1148,7 +1230,7 @@  void __init init_speculation_mitigations(void)
             thunk = THUNK_JMP;
     }
 
-    /* Determine if retpoline is safe on this CPU. */
+    /* Determine if retpoline is safe on this CPU.  Fix up RSBA/RRSBA enumerations. */
     retpoline_safe = retpoline_calculations();
 
     /*