diff mbox series

x86/ucode: Further fixes to identify "ucode already up to date"

Message ID 20240516113103.3018940-1-andrew.cooper3@citrix.com (mailing list archive)
State New
Headers show
Series x86/ucode: Further fixes to identify "ucode already up to date" | expand

Commit Message

Andrew Cooper May 16, 2024, 11:31 a.m. UTC
When the revision in hardware is newer than anything Xen has to hand,
'microcode_cache' isn't set up.  Then, `xen-ucode` initiates the update
because it doesn't know whether the revisions across the system are symmetric
or not.  This involves the patch getting all the way into the
apply_microcode() hooks before being found to be too old.

This is all a giant mess and needs an overhaul, but in the short term simply
adjust the apply_microcode() to return -EEXIST.

Also, unconditionally print the preexisting microcode revision on boot.  It's
relevant information which is otherwise unavailable if Xen doesn't find new
microcode to use.

Fixes: 648db37a155a ("x86/ucode: Distinguish "ucode already up to date"")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Fouad Hilly <fouad.hilly@cloud.com>

Sorry Fouad, but this collides with your `--force` series once again.
Hopefully it might make things fractionally easier.

Background: For 06-55-04 (Skylake server, stepping 4 specifically), there's a
recent production firmware update which has a newer microcode revision than
exists in the Intel public microcode repository.  It's causing a mess in our
automated testing, although it is finding good bugs...
---
 xen/arch/x86/cpu/microcode/amd.c   | 7 +++++--
 xen/arch/x86/cpu/microcode/core.c  | 2 ++
 xen/arch/x86/cpu/microcode/intel.c | 7 +++++--
 3 files changed, 12 insertions(+), 4 deletions(-)

Comments

Jan Beulich May 16, 2024, 11:44 a.m. UTC | #1
On 16.05.2024 13:31, Andrew Cooper wrote:
> When the revision in hardware is newer than anything Xen has to hand,
> 'microcode_cache' isn't set up.  Then, `xen-ucode` initiates the update
> because it doesn't know whether the revisions across the system are symmetric
> or not.  This involves the patch getting all the way into the
> apply_microcode() hooks before being found to be too old.
> 
> This is all a giant mess and needs an overhaul, but in the short term simply
> adjust the apply_microcode() to return -EEXIST.
> 
> Also, unconditionally print the preexisting microcode revision on boot.  It's
> relevant information which is otherwise unavailable if Xen doesn't find new
> microcode to use.

Since you do this for the BSP only, I'm okay with that. Doing this for all
CPUs would have added too much verbosity imo, and I would then have asked
to log the pre-existing revision only when no update would be done by us.

> Fixes: 648db37a155a ("x86/ucode: Distinguish "ucode already up to date"")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with one small request related to the remark above:

> --- a/xen/arch/x86/cpu/microcode/core.c
> +++ b/xen/arch/x86/cpu/microcode/core.c
> @@ -881,6 +881,8 @@ int __init early_microcode_init(unsigned long *module_map,
>  
>      ucode_ops.collect_cpu_info();
>  
> +    printk(XENLOG_INFO "Boot microcode revision: 0x%08x\n", this_cpu(cpu_sig).rev);

Can this please be "BSP" or "Boot CPU" instead of just "Boot", to clarify
which CPU's information this is? I'm pretty sure you too have hit systems
where firmware doesn't update all cores.

Jan
Roger Pau Monné May 16, 2024, 11:50 a.m. UTC | #2
On Thu, May 16, 2024 at 12:31:03PM +0100, Andrew Cooper wrote:
> When the revision in hardware is newer than anything Xen has to hand,
> 'microcode_cache' isn't set up.  Then, `xen-ucode` initiates the update
> because it doesn't know whether the revisions across the system are symmetric
> or not.  This involves the patch getting all the way into the
> apply_microcode() hooks before being found to be too old.
> 
> This is all a giant mess and needs an overhaul, but in the short term simply
> adjust the apply_microcode() to return -EEXIST.
> 
> Also, unconditionally print the preexisting microcode revision on boot.  It's
> relevant information which is otherwise unavailable if Xen doesn't find new
> microcode to use.
> 
> Fixes: 648db37a155a ("x86/ucode: Distinguish "ucode already up to date"")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Fouad Hilly <fouad.hilly@cloud.com>
> 
> Sorry Fouad, but this collides with your `--force` series once again.
> Hopefully it might make things fractionally easier.
> 
> Background: For 06-55-04 (Skylake server, stepping 4 specifically), there's a
> recent production firmware update which has a newer microcode revision than
> exists in the Intel public microcode repository.  It's causing a mess in our
> automated testing, although it is finding good bugs...
> ---
>  xen/arch/x86/cpu/microcode/amd.c   | 7 +++++--
>  xen/arch/x86/cpu/microcode/core.c  | 2 ++
>  xen/arch/x86/cpu/microcode/intel.c | 7 +++++--
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
> index 17e68697d5bf..f76a563c8b84 100644
> --- a/xen/arch/x86/cpu/microcode/amd.c
> +++ b/xen/arch/x86/cpu/microcode/amd.c
> @@ -222,12 +222,15 @@ static int cf_check apply_microcode(const struct microcode_patch *patch)
>      uint32_t rev, old_rev = sig->rev;
>      enum microcode_match_result result = microcode_fits(patch);
>  
> +    if ( result == MIS_UCODE )
> +        return -EINVAL;
> +
>      /*
>       * Allow application of the same revision to pick up SMT-specific changes
>       * even if the revision of the other SMT thread is already up-to-date.
>       */
> -    if ( result != NEW_UCODE && result != SAME_UCODE )
> -        return -EINVAL;
> +    if ( result == OLD_UCODE )
> +        return -EEXIST;

Won't it be simpler to just add this check ahead of the existing one,
so that you can leave the code as-is, iow:

    if ( result == OLD_UCODE )
        return -EEXIST;

    /*
     * Allow application of the same revision to pick up SMT-specific changes
     * even if the revision of the other SMT thread is already up-to-date.
     */
    if ( result != NEW_UCODE && result != SAME_UCODE )
        return -EINVAL;

Thanks, Roger.
Andrew Cooper May 16, 2024, 12:30 p.m. UTC | #3
On 16/05/2024 12:50 pm, Roger Pau Monné wrote:
> On Thu, May 16, 2024 at 12:31:03PM +0100, Andrew Cooper wrote:
>> When the revision in hardware is newer than anything Xen has to hand,
>> 'microcode_cache' isn't set up.  Then, `xen-ucode` initiates the update
>> because it doesn't know whether the revisions across the system are symmetric
>> or not.  This involves the patch getting all the way into the
>> apply_microcode() hooks before being found to be too old.
>>
>> This is all a giant mess and needs an overhaul, but in the short term simply
>> adjust the apply_microcode() to return -EEXIST.
>>
>> Also, unconditionally print the preexisting microcode revision on boot.  It's
>> relevant information which is otherwise unavailable if Xen doesn't find new
>> microcode to use.
>>
>> Fixes: 648db37a155a ("x86/ucode: Distinguish "ucode already up to date"")
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Fouad Hilly <fouad.hilly@cloud.com>
>>
>> Sorry Fouad, but this collides with your `--force` series once again.
>> Hopefully it might make things fractionally easier.
>>
>> Background: For 06-55-04 (Skylake server, stepping 4 specifically), there's a
>> recent production firmware update which has a newer microcode revision than
>> exists in the Intel public microcode repository.  It's causing a mess in our
>> automated testing, although it is finding good bugs...
>> ---
>>  xen/arch/x86/cpu/microcode/amd.c   | 7 +++++--
>>  xen/arch/x86/cpu/microcode/core.c  | 2 ++
>>  xen/arch/x86/cpu/microcode/intel.c | 7 +++++--
>>  3 files changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
>> index 17e68697d5bf..f76a563c8b84 100644
>> --- a/xen/arch/x86/cpu/microcode/amd.c
>> +++ b/xen/arch/x86/cpu/microcode/amd.c
>> @@ -222,12 +222,15 @@ static int cf_check apply_microcode(const struct microcode_patch *patch)
>>      uint32_t rev, old_rev = sig->rev;
>>      enum microcode_match_result result = microcode_fits(patch);
>>  
>> +    if ( result == MIS_UCODE )
>> +        return -EINVAL;
>> +
>>      /*
>>       * Allow application of the same revision to pick up SMT-specific changes
>>       * even if the revision of the other SMT thread is already up-to-date.
>>       */
>> -    if ( result != NEW_UCODE && result != SAME_UCODE )
>> -        return -EINVAL;
>> +    if ( result == OLD_UCODE )
>> +        return -EEXIST;
> Won't it be simpler to just add this check ahead of the existing one,
> so that you can leave the code as-is, iow:
>
>     if ( result == OLD_UCODE )
>         return -EEXIST;
>
>     /*
>      * Allow application of the same revision to pick up SMT-specific changes
>      * even if the revision of the other SMT thread is already up-to-date.
>      */
>     if ( result != NEW_UCODE && result != SAME_UCODE )
>         return -EINVAL;
>
> Thanks, Roger.

Not really, no.  That still leaves this piece of logic which is
misleading IMO.

MIS_UCODE is the only -EINVAL worthy case.

Every other *_UCODE constant needs to be 0 or -EEXIST, depending on
allow-same/--force.

~Andrew
Andrew Cooper May 16, 2024, 12:31 p.m. UTC | #4
On 16/05/2024 12:44 pm, Jan Beulich wrote:
> On 16.05.2024 13:31, Andrew Cooper wrote:
>> When the revision in hardware is newer than anything Xen has to hand,
>> 'microcode_cache' isn't set up.  Then, `xen-ucode` initiates the update
>> because it doesn't know whether the revisions across the system are symmetric
>> or not.  This involves the patch getting all the way into the
>> apply_microcode() hooks before being found to be too old.
>>
>> This is all a giant mess and needs an overhaul, but in the short term simply
>> adjust the apply_microcode() to return -EEXIST.
>>
>> Also, unconditionally print the preexisting microcode revision on boot.  It's
>> relevant information which is otherwise unavailable if Xen doesn't find new
>> microcode to use.
> Since you do this for the BSP only, I'm okay with that. Doing this for all
> CPUs would have added too much verbosity imo, and I would then have asked
> to log the pre-existing revision only when no update would be done by us.
>
>> Fixes: 648db37a155a ("x86/ucode: Distinguish "ucode already up to date"")
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

> with one small request related to the remark above:
>
>> --- a/xen/arch/x86/cpu/microcode/core.c
>> +++ b/xen/arch/x86/cpu/microcode/core.c
>> @@ -881,6 +881,8 @@ int __init early_microcode_init(unsigned long *module_map,
>>  
>>      ucode_ops.collect_cpu_info();
>>  
>> +    printk(XENLOG_INFO "Boot microcode revision: 0x%08x\n", this_cpu(cpu_sig).rev);
> Can this please be "BSP" or "Boot CPU" instead of just "Boot", to clarify
> which CPU's information this is? I'm pretty sure you too have hit systems
> where firmware doesn't update all cores.

I'll switch to BSP.

I have further plans (4.20 at this point) to reduce logspam.

* The AP boot path should warn if it finds a revision which isn't the
bsp_orig version, cached version, or thread-0 revision.  Most of what is
printed right now is expected and normal in the system.

* Late load should print once, not once per CPU.

~Andrew
Roger Pau Monné May 16, 2024, 12:45 p.m. UTC | #5
On Thu, May 16, 2024 at 01:30:21PM +0100, Andrew Cooper wrote:
> On 16/05/2024 12:50 pm, Roger Pau Monné wrote:
> > On Thu, May 16, 2024 at 12:31:03PM +0100, Andrew Cooper wrote:
> >> When the revision in hardware is newer than anything Xen has to hand,
> >> 'microcode_cache' isn't set up.  Then, `xen-ucode` initiates the update
> >> because it doesn't know whether the revisions across the system are symmetric
> >> or not.  This involves the patch getting all the way into the
> >> apply_microcode() hooks before being found to be too old.
> >>
> >> This is all a giant mess and needs an overhaul, but in the short term simply
> >> adjust the apply_microcode() to return -EEXIST.
> >>
> >> Also, unconditionally print the preexisting microcode revision on boot.  It's
> >> relevant information which is otherwise unavailable if Xen doesn't find new
> >> microcode to use.
> >>
> >> Fixes: 648db37a155a ("x86/ucode: Distinguish "ucode already up to date"")
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> ---
> >> CC: Jan Beulich <JBeulich@suse.com>
> >> CC: Roger Pau Monné <roger.pau@citrix.com>
> >> CC: Fouad Hilly <fouad.hilly@cloud.com>
> >>
> >> Sorry Fouad, but this collides with your `--force` series once again.
> >> Hopefully it might make things fractionally easier.
> >>
> >> Background: For 06-55-04 (Skylake server, stepping 4 specifically), there's a
> >> recent production firmware update which has a newer microcode revision than
> >> exists in the Intel public microcode repository.  It's causing a mess in our
> >> automated testing, although it is finding good bugs...
> >> ---
> >>  xen/arch/x86/cpu/microcode/amd.c   | 7 +++++--
> >>  xen/arch/x86/cpu/microcode/core.c  | 2 ++
> >>  xen/arch/x86/cpu/microcode/intel.c | 7 +++++--
> >>  3 files changed, 12 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
> >> index 17e68697d5bf..f76a563c8b84 100644
> >> --- a/xen/arch/x86/cpu/microcode/amd.c
> >> +++ b/xen/arch/x86/cpu/microcode/amd.c
> >> @@ -222,12 +222,15 @@ static int cf_check apply_microcode(const struct microcode_patch *patch)
> >>      uint32_t rev, old_rev = sig->rev;
> >>      enum microcode_match_result result = microcode_fits(patch);
> >>  
> >> +    if ( result == MIS_UCODE )
> >> +        return -EINVAL;
> >> +
> >>      /*
> >>       * Allow application of the same revision to pick up SMT-specific changes
> >>       * even if the revision of the other SMT thread is already up-to-date.
> >>       */
> >> -    if ( result != NEW_UCODE && result != SAME_UCODE )
> >> -        return -EINVAL;
> >> +    if ( result == OLD_UCODE )
> >> +        return -EEXIST;
> > Won't it be simpler to just add this check ahead of the existing one,
> > so that you can leave the code as-is, iow:
> >
> >     if ( result == OLD_UCODE )
> >         return -EEXIST;
> >
> >     /*
> >      * Allow application of the same revision to pick up SMT-specific changes
> >      * even if the revision of the other SMT thread is already up-to-date.
> >      */
> >     if ( result != NEW_UCODE && result != SAME_UCODE )
> >         return -EINVAL;
> >
> > Thanks, Roger.
> 
> Not really, no.  That still leaves this piece of logic which is
> misleading IMO.
> 
> MIS_UCODE is the only -EINVAL worthy case.
> 
> Every other *_UCODE constant needs to be 0 or -EEXIST, depending on
> allow-same/--force.

OK, my main concern was the previous logic wouldn't allow a newly
introduced state to get past the return -EINVAL, while the new logic
could possibly allow it to pass through.

I don't think adding states is that common, and if you prefer it that
way it's fine.

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
index 17e68697d5bf..f76a563c8b84 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -222,12 +222,15 @@  static int cf_check apply_microcode(const struct microcode_patch *patch)
     uint32_t rev, old_rev = sig->rev;
     enum microcode_match_result result = microcode_fits(patch);
 
+    if ( result == MIS_UCODE )
+        return -EINVAL;
+
     /*
      * Allow application of the same revision to pick up SMT-specific changes
      * even if the revision of the other SMT thread is already up-to-date.
      */
-    if ( result != NEW_UCODE && result != SAME_UCODE )
-        return -EINVAL;
+    if ( result == OLD_UCODE )
+        return -EEXIST;
 
     if ( check_final_patch_levels(sig) )
     {
diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
index 762111838f5f..519798dca4af 100644
--- a/xen/arch/x86/cpu/microcode/core.c
+++ b/xen/arch/x86/cpu/microcode/core.c
@@ -881,6 +881,8 @@  int __init early_microcode_init(unsigned long *module_map,
 
     ucode_ops.collect_cpu_info();
 
+    printk(XENLOG_INFO "Boot microcode revision: 0x%08x\n", this_cpu(cpu_sig).rev);
+
     /*
      * Some hypervisors deliberately report a microcode revision of -1 to
      * mean that they will not accept microcode updates.
diff --git a/xen/arch/x86/cpu/microcode/intel.c b/xen/arch/x86/cpu/microcode/intel.c
index 96f34b336b21..f505aa1b7888 100644
--- a/xen/arch/x86/cpu/microcode/intel.c
+++ b/xen/arch/x86/cpu/microcode/intel.c
@@ -294,10 +294,13 @@  static int cf_check apply_microcode(const struct microcode_patch *patch)
 
     result = microcode_update_match(patch);
 
-    if ( result != NEW_UCODE &&
-         !(opt_ucode_allow_same && result == SAME_UCODE) )
+    if ( result == MIS_UCODE )
         return -EINVAL;
 
+    if ( result == OLD_UCODE ||
+         (result == SAME_UCODE && !opt_ucode_allow_same) )
+        return -EEXIST;
+
     wbinvd();
 
     wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)patch->data);