diff mbox series

[v1,2/2] microcode: reject late ucode loading if any core is parked

Message ID 1574291155-26032-2-git-send-email-chao.gao@intel.com (mailing list archive)
State New, archived
Headers show
Series [v1,1/2] x86/cpu: maintain a parked CPU bitmap | expand

Commit Message

Chao Gao Nov. 20, 2019, 11:05 p.m. UTC
If a core with all of its thread being parked, late ucode loading
which currently only loads ucode on online threads would lead to
differing ucode revisions in the system. In general, keeping ucode
revision consistent would be less error-prone. To this end, if there
is a parked thread doesn't have an online sibling thread, late ucode
loading is rejected.

Two threads are on the same core or computing unit iff they have
the same phys_proc_id and cpu_core_id/compute_unit_id. Based on
phys_proc_id and cpu_core_id/compute_unit_id, an unique core id
is generated for each thread. And use a bitmap to reduce the
number of comparison.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
Changes:
 - traverse the new parked cpu bitmap to find a parked core. It avoids
 access uninitialized cpu_data of a hot-added CPU.
 - use bitmap_empty() rather than find_first_bit() to check whether a
 bitmap is empty.
---
 xen/arch/x86/microcode.c        | 63 +++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/processor.h |  1 +
 2 files changed, 64 insertions(+)

Comments

Jan Beulich Nov. 21, 2019, 10:21 a.m. UTC | #1
On 21.11.2019 00:05, Chao Gao wrote:
> If a core with all of its thread being parked, late ucode loading
> which currently only loads ucode on online threads would lead to
> differing ucode revisions in the system. In general, keeping ucode
> revision consistent would be less error-prone. To this end, if there
> is a parked thread doesn't have an online sibling thread, late ucode
> loading is rejected.

I'm confused. I thought we had agreed that the long term solution
would be to briefly bring online a thread of cores with all their
threads parked. What you do here was meant to be a temporary step
only for 4.13, for which it is too late now (unless Jürgen
indicates otherwise).

> --- a/xen/arch/x86/microcode.c
> +++ b/xen/arch/x86/microcode.c
> @@ -584,6 +584,51 @@ static int do_microcode_update(void *patch)
>      return ret;
>  }
>  
> +static unsigned int unique_core_id(unsigned int cpu, unsigned int socket_shift)
> +{
> +    unsigned int core_id = cpu_to_cu(cpu);
> +
> +    if ( core_id == INVALID_CUID )
> +        core_id = cpu_to_core(cpu);
> +
> +    return (cpu_to_socket(cpu) << socket_shift) + core_id;
> +}
> +
> +static int has_parked_core(void)
> +{
> +    int ret;
> +    unsigned int cpu, max_bits, core_width;
> +    unsigned int max_sockets = 1, max_cores = 1;
> +    unsigned long *bitmap;
> +
> +    if ( !park_offline_cpus )
> +        return 0;
> +
> +    for_each_parked_cpu(cpu)
> +    {
> +        /* Note that cpu_to_socket() get an ID starting from 0. */
> +        max_sockets = max(max_sockets, cpu_to_socket(cpu) + 1);
> +        max_cores = max(max_cores, cpu_data[cpu].x86_max_cores);
> +    }
> +
> +    core_width = fls(max_cores);
> +    max_bits = max_sockets << core_width;

Isn't this off by one? If max_cores is 1, you don't need to shift
max_sockets (or the cpu_to_socket() value in unique_core_id()) at
all, for example.

With this in mind, instead of the park_offline_cpus check at the
top of the function, wouldn't it make sense to check here whether
max_sockets and max_cores are both still 1, in which case at
least one thread of the only core of the only socket in the system
is obviously still online (the one we're running on)?

Jan
Chao Gao Nov. 21, 2019, 11:51 a.m. UTC | #2
On Thu, Nov 21, 2019 at 11:21:13AM +0100, Jan Beulich wrote:
>On 21.11.2019 00:05, Chao Gao wrote:
>> If a core with all of its thread being parked, late ucode loading
>> which currently only loads ucode on online threads would lead to
>> differing ucode revisions in the system. In general, keeping ucode
>> revision consistent would be less error-prone. To this end, if there
>> is a parked thread doesn't have an online sibling thread, late ucode
>> loading is rejected.
>
>I'm confused. I thought we had agreed that the long term solution
>would be to briefly bring online a thread of cores with all their
>threads parked.

I don't remeber that we reached such an aggrement. But if it happened,
I am really sorry for forgeting it.

Actually, I think Dom0 has the information (cpu topology and each cpu's
offline/online status) to decide if there is a parked core or not.
IMO, rejecting late loading in this case is just a defense check. Dom0
is able to correct the situation by bringing up some cpus.

>What you do here was meant to be a temporary step
>only for 4.13, for which it is too late now (unless Jürgen
>indicates otherwise).
>
>> --- a/xen/arch/x86/microcode.c
>> +++ b/xen/arch/x86/microcode.c
>> @@ -584,6 +584,51 @@ static int do_microcode_update(void *patch)
>>      return ret;
>>  }
>>  
>> +static unsigned int unique_core_id(unsigned int cpu, unsigned int socket_shift)
>> +{
>> +    unsigned int core_id = cpu_to_cu(cpu);
>> +
>> +    if ( core_id == INVALID_CUID )
>> +        core_id = cpu_to_core(cpu);
>> +
>> +    return (cpu_to_socket(cpu) << socket_shift) + core_id;
>> +}
>> +
>> +static int has_parked_core(void)
>> +{
>> +    int ret;
>> +    unsigned int cpu, max_bits, core_width;
>> +    unsigned int max_sockets = 1, max_cores = 1;
>> +    unsigned long *bitmap;
>> +
>> +    if ( !park_offline_cpus )
>> +        return 0;
>> +
>> +    for_each_parked_cpu(cpu)
>> +    {
>> +        /* Note that cpu_to_socket() get an ID starting from 0. */
>> +        max_sockets = max(max_sockets, cpu_to_socket(cpu) + 1);
>> +        max_cores = max(max_cores, cpu_data[cpu].x86_max_cores);
>> +    }
>> +
>> +    core_width = fls(max_cores);
>> +    max_bits = max_sockets << core_width;
>
>Isn't this off by one? If max_cores is 1, you don't need to shift
>max_sockets (or the cpu_to_socket() value in unique_core_id()) at
>all, for example.
>
>With this in mind, instead of the park_offline_cpus check at the
>top of the function, wouldn't it make sense to check here whether
>max_sockets and max_cores are both still 1, in which case at
>least one thread of the only core of the only socket in the system
>is obviously still online (the one we're running on)?

Agree. Will follow your suggestion.

Thanks
Chao
Jürgen Groß Nov. 21, 2019, 11:51 a.m. UTC | #3
On 21.11.19 11:21, Jan Beulich wrote:
> On 21.11.2019 00:05, Chao Gao wrote:
>> If a core with all of its thread being parked, late ucode loading
>> which currently only loads ucode on online threads would lead to
>> differing ucode revisions in the system. In general, keeping ucode
>> revision consistent would be less error-prone. To this end, if there
>> is a parked thread doesn't have an online sibling thread, late ucode
>> loading is rejected.
> 
> I'm confused. I thought we had agreed that the long term solution
> would be to briefly bring online a thread of cores with all their
> threads parked. What you do here was meant to be a temporary step
> only for 4.13, for which it is too late now (unless Jürgen
> indicates otherwise).

Which I don't intend to do.


Juergen
Jan Beulich Nov. 21, 2019, 5:13 p.m. UTC | #4
On 21.11.2019 12:51, Chao Gao wrote:
> On Thu, Nov 21, 2019 at 11:21:13AM +0100, Jan Beulich wrote:
>> On 21.11.2019 00:05, Chao Gao wrote:
>>> If a core with all of its thread being parked, late ucode loading
>>> which currently only loads ucode on online threads would lead to
>>> differing ucode revisions in the system. In general, keeping ucode
>>> revision consistent would be less error-prone. To this end, if there
>>> is a parked thread doesn't have an online sibling thread, late ucode
>>> loading is rejected.
>>
>> I'm confused. I thought we had agreed that the long term solution
>> would be to briefly bring online a thread of cores with all their
>> threads parked.
> 
> I don't remeber that we reached such an aggrement. But if it happened,
> I am really sorry for forgeting it.
> 
> Actually, I think Dom0 has the information (cpu topology and each cpu's
> offline/online status) to decide if there is a parked core or not.
> IMO, rejecting late loading in this case is just a defense check. Dom0
> is able to correct the situation by bringing up some cpus.

Dom0 can _fully_ bring up CPUs, but not with the intent of _just_
getting their ucode updated.

>> What you do here was meant to be a temporary step
>> only for 4.13, for which it is too late now (unless Jürgen
>> indicates otherwise).
>>
>>> --- a/xen/arch/x86/microcode.c
>>> +++ b/xen/arch/x86/microcode.c
>>> @@ -584,6 +584,51 @@ static int do_microcode_update(void *patch)
>>>      return ret;
>>>  }
>>>  
>>> +static unsigned int unique_core_id(unsigned int cpu, unsigned int socket_shift)
>>> +{
>>> +    unsigned int core_id = cpu_to_cu(cpu);
>>> +
>>> +    if ( core_id == INVALID_CUID )
>>> +        core_id = cpu_to_core(cpu);
>>> +
>>> +    return (cpu_to_socket(cpu) << socket_shift) + core_id;
>>> +}
>>> +
>>> +static int has_parked_core(void)
>>> +{
>>> +    int ret;
>>> +    unsigned int cpu, max_bits, core_width;
>>> +    unsigned int max_sockets = 1, max_cores = 1;
>>> +    unsigned long *bitmap;
>>> +
>>> +    if ( !park_offline_cpus )
>>> +        return 0;
>>> +
>>> +    for_each_parked_cpu(cpu)
>>> +    {
>>> +        /* Note that cpu_to_socket() get an ID starting from 0. */
>>> +        max_sockets = max(max_sockets, cpu_to_socket(cpu) + 1);
>>> +        max_cores = max(max_cores, cpu_data[cpu].x86_max_cores);
>>> +    }
>>> +
>>> +    core_width = fls(max_cores);
>>> +    max_bits = max_sockets << core_width;
>>
>> Isn't this off by one? If max_cores is 1, you don't need to shift
>> max_sockets (or the cpu_to_socket() value in unique_core_id()) at
>> all, for example.
>>
>> With this in mind, instead of the park_offline_cpus check at the
>> top of the function, wouldn't it make sense to check here whether
>> max_sockets and max_cores are both still 1, in which case at
>> least one thread of the only core of the only socket in the system
>> is obviously still online (the one we're running on)?
> 
> Agree. Will follow your suggestion.

Aiui it'll be correct again only if the parked map gets allocated
unconditionally.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 65d1f41..dcc8e4b 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -584,6 +584,51 @@  static int do_microcode_update(void *patch)
     return ret;
 }
 
+static unsigned int unique_core_id(unsigned int cpu, unsigned int socket_shift)
+{
+    unsigned int core_id = cpu_to_cu(cpu);
+
+    if ( core_id == INVALID_CUID )
+        core_id = cpu_to_core(cpu);
+
+    return (cpu_to_socket(cpu) << socket_shift) + core_id;
+}
+
+static int has_parked_core(void)
+{
+    int ret;
+    unsigned int cpu, max_bits, core_width;
+    unsigned int max_sockets = 1, max_cores = 1;
+    unsigned long *bitmap;
+
+    if ( !park_offline_cpus )
+        return 0;
+
+    for_each_parked_cpu(cpu)
+    {
+        /* Note that cpu_to_socket() get an ID starting from 0. */
+        max_sockets = max(max_sockets, cpu_to_socket(cpu) + 1);
+        max_cores = max(max_cores, cpu_data[cpu].x86_max_cores);
+    }
+
+    core_width = fls(max_cores);
+    max_bits = max_sockets << core_width;
+    bitmap = xzalloc_array(unsigned long, BITS_TO_LONGS(max_bits));
+    if ( !bitmap )
+        return -ENOMEM;
+
+    for_each_parked_cpu(cpu)
+        __set_bit(unique_core_id(cpu, core_width), bitmap);
+
+    for_each_online_cpu(cpu)
+        __clear_bit(unique_core_id(cpu, core_width), bitmap);
+
+    ret = !bitmap_empty(bitmap, max_bits);
+    xfree(bitmap);
+
+    return ret;
+}
+
 int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
 {
     int ret;
@@ -629,6 +674,24 @@  int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
         return -EPERM;
     }
 
+    /*
+     * If there is a core with all of its threads parked, late loading may
+     * cause differing ucode revisions in the system. Refuse this operation.
+     */
+    ret = has_parked_core();
+    if ( ret )
+    {
+        if ( ret > 0 )
+        {
+            printk(XENLOG_WARNING
+                   "Aborted: found a parked core (parked CPU bitmap: %*pbl)\n",
+                   CPUMASK_PR(cpu_parked_map));
+            ret = -EPERM;
+        }
+        xfree(buffer);
+        goto put;
+    }
+
     patch = parse_blob(buffer, len);
     xfree(buffer);
     if ( IS_ERR(patch) )
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 557f9b6..f8a9e93 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -171,6 +171,7 @@  extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
 
 #define cpu_to_core(_cpu)   (cpu_data[_cpu].cpu_core_id)
 #define cpu_to_socket(_cpu) (cpu_data[_cpu].phys_proc_id)
+#define cpu_to_cu(_cpu)     (cpu_data[_cpu].compute_unit_id)
 
 unsigned int apicid_to_socket(unsigned int);