Message ID | 20230504185938.393373946@linutronix.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | cpu/hotplug, x86: Reworked parallel CPU bringup | expand |
From: Thomas Gleixner <tglx@linutronix.de> Sent: Thursday, May 4, 2023 12:03 PM > > Implement the validation function which tells the core code whether > parallel bringup is possible. > > The only condition for now is that the kernel does not run in an encrypted > guest as these will trap the RDMSR via #VC, which cannot be handled at that > point in early startup. > > There was an earlier variant for AMD-SEV which used the GHBC protocol for > retrieving the APIC ID via CPUID, but there is no guarantee that the > initial APIC ID in CPUID is the same as the real APIC ID. There is no > enforcement from the secure firmware and the hypervisor can assign APIC IDs > as it sees fit as long as the ACPI/MADT table is consistent with that > assignment. > > Unfortunately there is no RDMSR GHCB protocol at the moment, so enabling > AMD-SEV guests for parallel startup needs some more thought. > > Intel-TDX provides a secure RDMSR hypercall, but supporting that is outside > the scope of this change. > > Fixup announce_cpu() as e.g. on Hyper-V CPU1 is the secondary sibling of > CPU0, which makes the @cpu == 1 logic in announce_cpu() fall apart. > > [ mikelley: Reported the announce_cpu() fallout > > Originally-by: David Woodhouse <dwmw@amazon.co.uk> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > --- > V2: Fixup announce_cpu() - Michael Kelley > --- > arch/x86/Kconfig | 3 + > arch/x86/kernel/cpu/common.c | 6 --- > arch/x86/kernel/smpboot.c | 83 ++++++++++++++++++++++++++++++++++++---- > --- [snip] > @@ -934,10 +961,10 @@ static void announce_cpu(int cpu, int ap > if (!node_width) > node_width = num_digits(num_possible_nodes()) + 1; /* + '#' */ > > - if (cpu == 1) > - printk(KERN_INFO "x86: Booting SMP configuration:\n"); > - > if (system_state < SYSTEM_RUNNING) { > + if (num_online_cpus() == 1) Unfortunately, this new check doesn't work. Here's the output I get: [ 0.721384] smp: Bringing up secondary CPUs ... [ 0.725359] smpboot: x86: Booting SMP configuration: [ 0.729249] .... node #0, CPUs: #2 [ 0.729654] smpboot: x86: Booting SMP configuration: [ 0.737247] #4 [ 0.737511] smpboot: x86: Booting SMP configuration: [ 0.741246] #6 [ 0.741508] smpboot: x86: Booting SMP configuration: [ 0.745248] #8 [ 0.745507] smpboot: x86: Booting SMP configuration: [ 0.749250] #10 [ 0.749514] smpboot: x86: Booting SMP configuration: [ 0.753248] #12 [ 0.753492] smpboot: x86: Booting SMP configuration: [ 0.757249] #14 #1 #3 #5 #7 #9 #11 #13 #15 [ 0.769317] smp: Brought up 1 node, 16 CPUs [ 0.773246] smpboot: Max logical packages: 1 [ 0.777257] smpboot: Total of 16 processors activated (78253.79 BogoMIPS) Evidently num_online_cpus() isn't updated until after all the primary siblings get started. When booting with cpuhp.parallel=0, the output is good. Michael > + pr_info("x86: Booting SMP configuration:\n"); > + > if (node != current_node) { > if (current_node > (-1)) > pr_cont("\n"); > @@ -948,7 +975,7 @@ static void announce_cpu(int cpu, int ap > } > > /* Add padding for the BSP */ > - if (cpu == 1) > + if (num_online_cpus() == 1) > pr_cont("%*s", width + 1, " "); > > pr_cont("%*s#%d", width - num_digits(cpu), " ", cpu);
On Sat, May 06 2023 at 00:53, Michael Kelley wrote: > From: Thomas Gleixner <tglx@linutronix.de> Sent: Thursday, May 4, 2023 12:03 PM > [snip] > >> @@ -934,10 +961,10 @@ static void announce_cpu(int cpu, int ap >> if (!node_width) >> node_width = num_digits(num_possible_nodes()) + 1; /* + '#' */ >> >> - if (cpu == 1) >> - printk(KERN_INFO "x86: Booting SMP configuration:\n"); >> - >> if (system_state < SYSTEM_RUNNING) { >> + if (num_online_cpus() == 1) > > Unfortunately, this new check doesn't work. Here's the output I get: > > [ 0.721384] smp: Bringing up secondary CPUs ... > [ 0.725359] smpboot: x86: Booting SMP configuration: > [ 0.729249] .... node #0, CPUs: #2 > [ 0.729654] smpboot: x86: Booting SMP configuration: > [ 0.737247] #4 > > Evidently num_online_cpus() isn't updated until after all the primary > siblings get started. Duh. Where is that brown paperbag? > When booting with cpuhp.parallel=0, the output is good. Exactly that was on the command line when I quickly booted that change :( The below should fix it for real. Thanks, tglx --- --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -951,9 +951,9 @@ static int wakeup_secondary_cpu_via_init /* reduce the number of lines printed when booting a large cpu count system */ static void announce_cpu(int cpu, int apicid) { + static int width, node_width, first = 1; static int current_node = NUMA_NO_NODE; int node = early_cpu_to_node(cpu); - static int width, node_width; if (!width) width = num_digits(num_possible_cpus()) + 1; /* + '#' sign */ @@ -962,7 +962,7 @@ static void announce_cpu(int cpu, int ap node_width = num_digits(num_possible_nodes()) + 1; /* + '#' */ if (system_state < SYSTEM_RUNNING) { - if (num_online_cpus() == 1) + if (first) pr_info("x86: Booting SMP configuration:\n"); if (node != current_node) { @@ -975,11 +975,11 @@ static void announce_cpu(int cpu, int ap } /* Add padding for the BSP */ - if (num_online_cpus() == 1) + if (first) pr_cont("%*s", width + 1, " "); + first = 0; pr_cont("%*s#%d", width - num_digits(cpu), " ", cpu); - } else pr_info("Booting Node %d Processor %d APIC 0x%x\n", node, cpu, apicid);
From: Thomas Gleixner <tglx@linutronix.de> Sent: Saturday, May 6, 2023 9:23 AM > > On Sat, May 06 2023 at 00:53, Michael Kelley wrote: > > From: Thomas Gleixner <tglx@linutronix.de> Sent: Thursday, May 4, 2023 12:03 PM > > [snip] > > > >> @@ -934,10 +961,10 @@ static void announce_cpu(int cpu, int ap > >> if (!node_width) > >> node_width = num_digits(num_possible_nodes()) + 1; /* + '#' */ > >> > >> - if (cpu == 1) > >> - printk(KERN_INFO "x86: Booting SMP configuration:\n"); > >> - > >> if (system_state < SYSTEM_RUNNING) { > >> + if (num_online_cpus() == 1) > > > > Unfortunately, this new check doesn't work. Here's the output I get: > > > > [ 0.721384] smp: Bringing up secondary CPUs ... > > [ 0.725359] smpboot: x86: Booting SMP configuration: > > [ 0.729249] .... node #0, CPUs: #2 > > [ 0.729654] smpboot: x86: Booting SMP configuration: > > [ 0.737247] #4 > > > > Evidently num_online_cpus() isn't updated until after all the primary > > siblings get started. > > Duh. Where is that brown paperbag? > > > When booting with cpuhp.parallel=0, the output is good. > > Exactly that was on the command line when I quickly booted that change :( > > The below should fix it for real. > > Thanks, > > tglx > --- > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -951,9 +951,9 @@ static int wakeup_secondary_cpu_via_init > /* reduce the number of lines printed when booting a large cpu count system */ > static void announce_cpu(int cpu, int apicid) > { > + static int width, node_width, first = 1; > static int current_node = NUMA_NO_NODE; > int node = early_cpu_to_node(cpu); > - static int width, node_width; > > if (!width) > width = num_digits(num_possible_cpus()) + 1; /* + '#' sign */ > @@ -962,7 +962,7 @@ static void announce_cpu(int cpu, int ap > node_width = num_digits(num_possible_nodes()) + 1; /* + '#' */ > > if (system_state < SYSTEM_RUNNING) { > - if (num_online_cpus() == 1) > + if (first) > pr_info("x86: Booting SMP configuration:\n"); > > if (node != current_node) { > @@ -975,11 +975,11 @@ static void announce_cpu(int cpu, int ap > } > > /* Add padding for the BSP */ > - if (num_online_cpus() == 1) > + if (first) > pr_cont("%*s", width + 1, " "); > + first = 0; > > pr_cont("%*s#%d", width - num_digits(cpu), " ", cpu); > - > } else > pr_info("Booting Node %d Processor %d APIC 0x%x\n", > node, cpu, apicid); This works. dmesg output is clean for these guest VM combinations on Hyper-V that I tested: * Normal VM: 16 vCPUs in 1 NUMA node and 32 vCPUs in 2 NUMA nodes * Same configs for a SEV-SNP Confidential VM with paravisor Tested with and without cpuhp.parallel=0 For the entire series: Tested-by: Michael Kelley <mikelley@microsoft.com>
--- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -274,8 +274,9 @@ config X86 select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER select HAVE_GENERIC_VDSO + select HOTPLUG_PARALLEL if SMP && X86_64 select HOTPLUG_SMT if SMP - select HOTPLUG_SPLIT_STARTUP if SMP + select HOTPLUG_SPLIT_STARTUP if SMP && X86_32 select IRQ_FORCED_THREADING select NEED_PER_CPU_EMBED_FIRST_CHUNK select NEED_PER_CPU_PAGE_FIRST_CHUNK --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -2128,11 +2128,7 @@ static inline void setup_getcpu(int cpu) } #ifdef CONFIG_X86_64 -static inline void ucode_cpu_init(int cpu) -{ - if (cpu) - load_ucode_ap(); -} +static inline void ucode_cpu_init(int cpu) { } static inline void tss_setup_ist(struct tss_struct *tss) { --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -58,6 +58,7 @@ #include <linux/overflow.h> #include <linux/stackprotector.h> #include <linux/cpuhotplug.h> +#include <linux/mc146818rtc.h> #include <asm/acpi.h> #include <asm/cacheinfo.h> @@ -75,7 +76,7 @@ #include <asm/fpu/api.h> #include <asm/setup.h> #include <asm/uv/uv.h> -#include <linux/mc146818rtc.h> +#include <asm/microcode.h> #include <asm/i8259.h> #include <asm/misc.h> #include <asm/qspinlock.h> @@ -128,7 +129,6 @@ int arch_update_cpu_topology(void) return retval; } - static unsigned int smpboot_warm_reset_vector_count; static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip) @@ -229,16 +229,43 @@ static void notrace start_secondary(void */ cr4_init(); -#ifdef CONFIG_X86_32 - /* switch away from the initial page table */ - load_cr3(swapper_pg_dir); - __flush_tlb_all(); -#endif + /* + * 32-bit specific. 64-bit reaches this code with the correct page + * table established. Yet another historical divergence. + */ + if (IS_ENABLED(CONFIG_X86_32)) { + /* switch away from the initial page table */ + load_cr3(swapper_pg_dir); + __flush_tlb_all(); + } + cpu_init_exception_handling(); /* - * Synchronization point with the hotplug core. Sets the - * synchronization state to ALIVE and waits for the control CPU to + * 32-bit systems load the microcode from the ASM startup code for + * historical reasons. + * + * On 64-bit systems load it before reaching the AP alive + * synchronization point below so it is not part of the full per + * CPU serialized bringup part when "parallel" bringup is enabled. + * + * That's even safe when hyperthreading is enabled in the CPU as + * the core code starts the primary threads first and leaves the + * secondary threads waiting for SIPI. Loading microcode on + * physical cores concurrently is a safe operation. + * + * This covers both the Intel specific issue that concurrent + * microcode loading on SMT siblings must be prohibited and the + * vendor independent issue`that microcode loading which changes + * CPUID, MSRs etc. must be strictly serialized to maintain + * software state correctness. + */ + if (IS_ENABLED(CONFIG_X86_64)) + load_ucode_ap(); + + /* + * Synchronization point with the hotplug core. Sets this CPUs + * synchronization state to ALIVE and spin-waits for the control CPU to * release this CPU for further bringup. */ cpuhp_ap_sync_alive(); @@ -934,10 +961,10 @@ static void announce_cpu(int cpu, int ap if (!node_width) node_width = num_digits(num_possible_nodes()) + 1; /* + '#' */ - if (cpu == 1) - printk(KERN_INFO "x86: Booting SMP configuration:\n"); - if (system_state < SYSTEM_RUNNING) { + if (num_online_cpus() == 1) + pr_info("x86: Booting SMP configuration:\n"); + if (node != current_node) { if (current_node > (-1)) pr_cont("\n"); @@ -948,7 +975,7 @@ static void announce_cpu(int cpu, int ap } /* Add padding for the BSP */ - if (cpu == 1) + if (num_online_cpus() == 1) pr_cont("%*s", width + 1, " "); pr_cont("%*s#%d", width - num_digits(cpu), " ", cpu); @@ -1242,6 +1269,36 @@ void __init smp_prepare_cpus_common(void set_cpu_sibling_map(0); } +#ifdef CONFIG_X86_64 +/* Establish whether parallel bringup can be supported. */ +bool __init arch_cpuhp_init_parallel_bringup(void) +{ + /* + * Encrypted guests require special handling. They enforce X2APIC + * mode but the RDMSR to read the APIC ID is intercepted and raises + * #VC or #VE which cannot be handled in the early startup code. + * + * AMD-SEV does not provide a RDMSR GHCB protocol so the early + * startup code cannot directly communicate with the secure + * firmware. The alternative solution to retrieve the APIC ID via + * CPUID(0xb), which is covered by the GHCB protocol, is not viable + * either because there is no enforcement of the CPUID(0xb) + * provided "initial" APIC ID to be the same as the real APIC ID. + * + * Intel-TDX has a secure RDMSR hypercall, but that needs to be + * implemented seperately in the low level startup ASM code. + */ + if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) { + pr_info("Parallel CPU startup disabled due to guest state encryption\n"); + return false; + } + + smpboot_control = STARTUP_READ_APICID; + pr_debug("Parallel CPU startup enabled: 0x%08x\n", smpboot_control); + return true; +} +#endif + /* * Prepare for SMP bootup. * @max_cpus: configured maximum number of CPUs, It is a legacy parameter