Message ID | 1574291155-26032-1-git-send-email-chao.gao@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v1,1/2] x86/cpu: maintain a parked CPU bitmap | expand |
Hi, On 20/11/2019 23:05, Chao Gao wrote: > It helps to distinguish parked CPUs from those are really offlined or > hot-added. We need to know the parked CPUs in order to do a special > check against them to ensure that all CPUs, except those are really > offlined or hot-added, have the same ucode version. > > Signed-off-by: Chao Gao <chao.gao@intel.com> > --- > Note that changes on ARM side are untested. > --- > xen/arch/arm/smpboot.c | 1 + > xen/arch/x86/cpu/common.c | 4 ++++ > xen/arch/x86/smpboot.c | 1 + > xen/common/cpu.c | 4 ++++ > xen/include/asm-arm/smp.h | 1 + > xen/include/asm-x86/smp.h | 1 + > xen/include/xen/cpumask.h | 1 + > 7 files changed, 13 insertions(+) > > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c > index 00b64c3..1b57ba4 100644 > --- a/xen/arch/arm/smpboot.c > +++ b/xen/arch/arm/smpboot.c > @@ -39,6 +39,7 @@ > cpumask_t cpu_online_map; > cpumask_t cpu_present_map; > cpumask_t cpu_possible_map; > +cpumask_var_t cpu_parked_map; You define cpu_parked_map but AFAIK it will never get allocated. The risk here is any access to that variable will result to a fault. Looking at the changes below, it looks like access in common code will be protected by park_offline_cpus. This is always false on Arm, so the compiler should remove any access to cpu_parked_map. With that in mind, I think it would be best to only provide a prototype for cpu_parked_map and so the linker can warn if someone used it. > diff --git a/xen/include/asm-arm/smp.h b/xen/include/asm-arm/smp.h > index fdbcefa..4b392fa 100644 > --- a/xen/include/asm-arm/smp.h > +++ b/xen/include/asm-arm/smp.h > @@ -19,6 +19,7 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask); > * would otherwise prefer them to be off? > */ > #define park_offline_cpus false > +extern cpumask_var_t cpu_parked_map; The prototype should be the same for all architectures. So is there any reason to duplicate it? Cheers,
On 21.11.2019 10:47, Julien Grall wrote: > On 20/11/2019 23:05, Chao Gao wrote: >> --- a/xen/arch/arm/smpboot.c >> +++ b/xen/arch/arm/smpboot.c >> @@ -39,6 +39,7 @@ >> cpumask_t cpu_online_map; >> cpumask_t cpu_present_map; >> cpumask_t cpu_possible_map; >> +cpumask_var_t cpu_parked_map; > > You define cpu_parked_map but AFAIK it will never get allocated. The > risk here is any access to that variable will result to a fault. > > Looking at the changes below, it looks like access in common code will > be protected by park_offline_cpus. This is always false on Arm, so the > compiler should remove any access to cpu_parked_map. > > With that in mind, I think it would be best to only provide a prototype > for cpu_parked_map and so the linker can warn if someone used it. +1 In fact I wonder whether the maintenance of the map should live in common code in the first place. While clearing the respective bit in cpu_up() looks correct (and could be done without any if()), I'm not convinced the setting of the bit in cpu_down() is going to be correct in all cases. I'd rather leave this to the relevant callers of cpu_down(). To deal with cpu_add_remove_lock not being held perhaps it would be best to set the flag _before_ calling cpu_down(); consumers of the map then would need to double check that a CPU is not _also_ (still) online. Jan
On 21.11.2019 00:05, Chao Gao wrote: > --- a/xen/arch/x86/cpu/common.c > +++ b/xen/arch/x86/cpu/common.c > @@ -337,7 +337,11 @@ void __init early_cpu_init(void) > } > > if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON))) > + { > park_offline_cpus = opt_mce; > + if (park_offline_cpus && !zalloc_cpumask_var(&cpu_parked_map)) > + panic("No memory for CPU parked map\n"); > + } Maybe shorter as if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) && opt_mce) { if (!zalloc_cpumask_var(&cpu_parked_map)) panic("No memory for CPU parked map\n"); park_offline_cpus = true; } ? Jan
On Thu, Nov 21, 2019 at 11:02:10AM +0100, Jan Beulich wrote: >On 21.11.2019 10:47, Julien Grall wrote: >> On 20/11/2019 23:05, Chao Gao wrote: >>> --- a/xen/arch/arm/smpboot.c >>> +++ b/xen/arch/arm/smpboot.c >>> @@ -39,6 +39,7 @@ >>> cpumask_t cpu_online_map; >>> cpumask_t cpu_present_map; >>> cpumask_t cpu_possible_map; >>> +cpumask_var_t cpu_parked_map; >> >> You define cpu_parked_map but AFAIK it will never get allocated. The >> risk here is any access to that variable will result to a fault. >> >> Looking at the changes below, it looks like access in common code will >> be protected by park_offline_cpus. This is always false on Arm, so the >> compiler should remove any access to cpu_parked_map. >> >> With that in mind, I think it would be best to only provide a prototype >> for cpu_parked_map and so the linker can warn if someone used it. > >+1 Will do. I added this because I am not sure all compilers would omit such access. > >In fact I wonder whether the maintenance of the map should live in >common code in the first place. While clearing the respective bit >in cpu_up() looks correct (and could be done without any if()), But when park_offline_cpus() is false, the map isn't allocated. I don't think it is safe to access the map in this case. >I'm not convinced the setting of the bit in cpu_down() is going to >be correct in all cases. Do you mean in some cases, cpu_down() is to really offline a CPU even park_offline_cpus is set? And in this case, setting the bit isn't correct. If yes, one thing confuses me is that cpu_down() would call cpu_notifier_call_chain() several times unconditionally. And registered callbacks take actions depending on the value of park_offline_cpus. I expected that callbacks would do more check to avoid parking a CPU in some cases. Thanks Chao
On 21.11.2019 12:43, Chao Gao wrote: > On Thu, Nov 21, 2019 at 11:02:10AM +0100, Jan Beulich wrote: >> On 21.11.2019 10:47, Julien Grall wrote: >>> On 20/11/2019 23:05, Chao Gao wrote: >>>> --- a/xen/arch/arm/smpboot.c >>>> +++ b/xen/arch/arm/smpboot.c >>>> @@ -39,6 +39,7 @@ >>>> cpumask_t cpu_online_map; >>>> cpumask_t cpu_present_map; >>>> cpumask_t cpu_possible_map; >>>> +cpumask_var_t cpu_parked_map; >>> >>> You define cpu_parked_map but AFAIK it will never get allocated. The >>> risk here is any access to that variable will result to a fault. >>> >>> Looking at the changes below, it looks like access in common code will >>> be protected by park_offline_cpus. This is always false on Arm, so the >>> compiler should remove any access to cpu_parked_map. >>> >>> With that in mind, I think it would be best to only provide a prototype >>> for cpu_parked_map and so the linker can warn if someone used it. >> >> +1 > > Will do. I added this because I am not sure all compilers would omit > such access. > >> >> In fact I wonder whether the maintenance of the map should live in >> common code in the first place. While clearing the respective bit >> in cpu_up() looks correct (and could be done without any if()), > > But when park_offline_cpus() is false, the map isn't allocated. I don't > think it is safe to access the map in this case. Oh, you're right of course. Unless the map was allocated unconditionally ... >> I'm not convinced the setting of the bit in cpu_down() is going to >> be correct in all cases. > > Do you mean in some cases, cpu_down() is to really offline a CPU even > park_offline_cpus is set? And in this case, setting the bit isn't > correct. The purposes of cpu_down() calls _may_ be different. Plus whether there's parking wanted/necessary for an architecture should remain - as much as possible - an architecture thing to deal with. I.e. despite park_offline_cpus being used in common code, I think we should strive to avoid adding more there when it can be avoided. Jan
diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c index 00b64c3..1b57ba4 100644 --- a/xen/arch/arm/smpboot.c +++ b/xen/arch/arm/smpboot.c @@ -39,6 +39,7 @@ cpumask_t cpu_online_map; cpumask_t cpu_present_map; cpumask_t cpu_possible_map; +cpumask_var_t cpu_parked_map; struct cpuinfo_arm cpu_data[NR_CPUS]; diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c index 6c6bd63..fbb961d 100644 --- a/xen/arch/x86/cpu/common.c +++ b/xen/arch/x86/cpu/common.c @@ -337,7 +337,11 @@ void __init early_cpu_init(void) } if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON))) + { park_offline_cpus = opt_mce; + if (park_offline_cpus && !zalloc_cpumask_var(&cpu_parked_map)) + panic("No memory for CPU parked map\n"); + } initialize_cpu_data(0); } diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index fa691b6..f66de8d 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -60,6 +60,7 @@ cpumask_t cpu_online_map __read_mostly; EXPORT_SYMBOL(cpu_online_map); bool __read_mostly park_offline_cpus; +cpumask_var_t cpu_parked_map; unsigned int __read_mostly nr_sockets; cpumask_t **__read_mostly socket_cpumask; diff --git a/xen/common/cpu.c b/xen/common/cpu.c index 66c855c..0090a19 100644 --- a/xen/common/cpu.c +++ b/xen/common/cpu.c @@ -117,6 +117,8 @@ int cpu_down(unsigned int cpu) cpu_notifier_call_chain(cpu, CPU_DEAD, NULL, true); send_global_virq(VIRQ_PCPU_STATE); + if ( park_offline_cpus ) + cpumask_set_cpu(cpu, cpu_parked_map); cpu_hotplug_done(); return 0; @@ -154,6 +156,8 @@ int cpu_up(unsigned int cpu) cpu_notifier_call_chain(cpu, CPU_ONLINE, NULL, true); send_global_virq(VIRQ_PCPU_STATE); + if ( park_offline_cpus ) + cpumask_clear_cpu(cpu, cpu_parked_map); cpu_hotplug_done(); return 0; diff --git a/xen/include/asm-arm/smp.h b/xen/include/asm-arm/smp.h index fdbcefa..4b392fa 100644 --- a/xen/include/asm-arm/smp.h +++ b/xen/include/asm-arm/smp.h @@ -19,6 +19,7 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask); * would otherwise prefer them to be off? */ #define park_offline_cpus false +extern cpumask_var_t cpu_parked_map; extern void noreturn stop_cpu(void); diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h index dbeed2f..886737d 100644 --- a/xen/include/asm-x86/smp.h +++ b/xen/include/asm-x86/smp.h @@ -31,6 +31,7 @@ DECLARE_PER_CPU(cpumask_var_t, scratch_cpumask); * would otherwise prefer them to be off? */ extern bool park_offline_cpus; +extern cpumask_var_t cpu_parked_map; void smp_send_nmi_allbutself(void); diff --git a/xen/include/xen/cpumask.h b/xen/include/xen/cpumask.h index 256b60b..543cec5 100644 --- a/xen/include/xen/cpumask.h +++ b/xen/include/xen/cpumask.h @@ -457,6 +457,7 @@ extern cpumask_t cpu_present_map; #define for_each_possible_cpu(cpu) for_each_cpu(cpu, &cpu_possible_map) #define for_each_online_cpu(cpu) for_each_cpu(cpu, &cpu_online_map) #define for_each_present_cpu(cpu) for_each_cpu(cpu, &cpu_present_map) +#define for_each_parked_cpu(cpu) for_each_cpu(cpu, cpu_parked_map) /* Copy to/from cpumap provided by control tools. */ struct xenctl_bitmap;
It helps to distinguish parked CPUs from those are really offlined or hot-added. We need to know the parked CPUs in order to do a special check against them to ensure that all CPUs, except those are really offlined or hot-added, have the same ucode version. Signed-off-by: Chao Gao <chao.gao@intel.com> --- Note that changes on ARM side are untested. --- xen/arch/arm/smpboot.c | 1 + xen/arch/x86/cpu/common.c | 4 ++++ xen/arch/x86/smpboot.c | 1 + xen/common/cpu.c | 4 ++++ xen/include/asm-arm/smp.h | 1 + xen/include/asm-x86/smp.h | 1 + xen/include/xen/cpumask.h | 1 + 7 files changed, 13 insertions(+)