Message ID | 20220816012701.561435-3-guoren@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | riscv: kexec: Support crash_save percpu and machine_kexec_mask_interrupts | expand |
Hi, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v6.0-rc1 next-20220818] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/guoren-kernel-org/riscv-kexec-Support-crash_save-percpu-and-machine_kexec_mask_interrupts/20220816-144442 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 568035b01cfb107af8d2e4bd2fb9aea22cf5b868 config: riscv-randconfig-r035-20220818 (https://download.01.org/0day-ci/archive/20220818/202208181520.fYQOePu6-lkp@intel.com/config) compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project aed5e3bea138ce581d682158eb61c27b3cfdd6ec) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install riscv cross compiling tool for clang build # apt-get install binutils-riscv64-linux-gnu # https://github.com/intel-lab-lkp/linux/commit/0abdaf7e1f44634e1cee484e3cf01b7e8c851950 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review guoren-kernel-org/riscv-kexec-Support-crash_save-percpu-and-machine_kexec_mask_interrupts/20220816-144442 git checkout 0abdaf7e1f44634e1cee484e3cf01b7e8c851950 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash arch/riscv/kernel/ If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> arch/riscv/kernel/machine_kexec.c:217:7: error: call to undeclared function 'smp_crash_stop_failed'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] WARN(smp_crash_stop_failed(), ^ 1 error generated. vim +/smp_crash_stop_failed +217 arch/riscv/kernel/machine_kexec.c 193 194 /* 195 * machine_kexec - Jump to the loaded kimage 196 * 197 * This function is called by kernel_kexec which is called by the 198 * reboot system call when the reboot cmd is LINUX_REBOOT_CMD_KEXEC, 199 * or by crash_kernel which is called by the kernel's arch-specific 200 * trap handler in case of a kernel panic. It's the final stage of 201 * the kexec process where the pre-loaded kimage is ready to be 202 * executed. We assume at this point that all other harts are 203 * suspended and this hart will be the new boot hart. 204 */ 205 void __noreturn 206 machine_kexec(struct kimage *image) 207 { 208 struct kimage_arch *internal = &image->arch; 209 unsigned long jump_addr = (unsigned long) image->start; 210 unsigned long first_ind_entry = (unsigned long) &image->head; 211 unsigned long this_cpu_id = __smp_processor_id(); 212 unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id); 213 unsigned long fdt_addr = internal->fdt_addr; 214 void *control_code_buffer = page_address(image->control_code_page); 215 riscv_kexec_method kexec_method = NULL; 216 > 217 WARN(smp_crash_stop_failed(),
Hi, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v6.0-rc1 next-20220818] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/guoren-kernel-org/riscv-kexec-Support-crash_save-percpu-and-machine_kexec_mask_interrupts/20220816-144442 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 568035b01cfb107af8d2e4bd2fb9aea22cf5b868 config: riscv-buildonly-randconfig-r002-20220818 (https://download.01.org/0day-ci/archive/20220818/202208181655.NrkPo4lG-lkp@intel.com/config) compiler: riscv64-linux-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/0abdaf7e1f44634e1cee484e3cf01b7e8c851950 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review guoren-kernel-org/riscv-kexec-Support-crash_save-percpu-and-machine_kexec_mask_interrupts/20220816-144442 git checkout 0abdaf7e1f44634e1cee484e3cf01b7e8c851950 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): In file included from arch/riscv/include/asm/bug.h:83, from include/linux/bug.h:5, from include/linux/elfcore.h:6, from include/linux/crash_core.h:6, from include/linux/kexec.h:18, from arch/riscv/kernel/machine_kexec.c:7: arch/riscv/kernel/machine_kexec.c: In function 'machine_kexec': >> arch/riscv/kernel/machine_kexec.c:217:14: error: implicit declaration of function 'smp_crash_stop_failed' [-Werror=implicit-function-declaration] 217 | WARN(smp_crash_stop_failed(), | ^~~~~~~~~~~~~~~~~~~~~ include/asm-generic/bug.h:174:32: note: in definition of macro 'WARN' 174 | int __ret_warn_on = !!(condition); \ | ^~~~~~~~~ cc1: some warnings being treated as errors vim +/smp_crash_stop_failed +217 arch/riscv/kernel/machine_kexec.c 193 194 /* 195 * machine_kexec - Jump to the loaded kimage 196 * 197 * This function is called by kernel_kexec which is called by the 198 * reboot system call when the reboot cmd is LINUX_REBOOT_CMD_KEXEC, 199 * or by crash_kernel which is called by the kernel's arch-specific 200 * trap handler in case of a kernel panic. It's the final stage of 201 * the kexec process where the pre-loaded kimage is ready to be 202 * executed. We assume at this point that all other harts are 203 * suspended and this hart will be the new boot hart. 204 */ 205 void __noreturn 206 machine_kexec(struct kimage *image) 207 { 208 struct kimage_arch *internal = &image->arch; 209 unsigned long jump_addr = (unsigned long) image->start; 210 unsigned long first_ind_entry = (unsigned long) &image->head; 211 unsigned long this_cpu_id = __smp_processor_id(); 212 unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id); 213 unsigned long fdt_addr = internal->fdt_addr; 214 void *control_code_buffer = page_address(image->control_code_page); 215 riscv_kexec_method kexec_method = NULL; 216 > 217 WARN(smp_crash_stop_failed(),
Thx, It's a bug from !SMP. I would fixup it in the next version. On Thu, Aug 18, 2022 at 3:58 PM kernel test robot <lkp@intel.com> wrote: > > Hi, > > I love your patch! Yet something to improve: > > [auto build test ERROR on linus/master] > [also build test ERROR on v6.0-rc1 next-20220818] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch#_base_tree_information] > > url: https://github.com/intel-lab-lkp/linux/commits/guoren-kernel-org/riscv-kexec-Support-crash_save-percpu-and-machine_kexec_mask_interrupts/20220816-144442 > base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 568035b01cfb107af8d2e4bd2fb9aea22cf5b868 > config: riscv-randconfig-r035-20220818 (https://download.01.org/0day-ci/archive/20220818/202208181520.fYQOePu6-lkp@intel.com/config) > compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project aed5e3bea138ce581d682158eb61c27b3cfdd6ec) > reproduce (this is a W=1 build): > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # install riscv cross compiling tool for clang build > # apt-get install binutils-riscv64-linux-gnu > # https://github.com/intel-lab-lkp/linux/commit/0abdaf7e1f44634e1cee484e3cf01b7e8c851950 > git remote add linux-review https://github.com/intel-lab-lkp/linux > git fetch --no-tags linux-review guoren-kernel-org/riscv-kexec-Support-crash_save-percpu-and-machine_kexec_mask_interrupts/20220816-144442 > git checkout 0abdaf7e1f44634e1cee484e3cf01b7e8c851950 > # save the config file > mkdir build_dir && cp config build_dir/.config > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash arch/riscv/kernel/ > > If you fix the issue, kindly add following tag where applicable > Reported-by: kernel test robot <lkp@intel.com> > > All errors (new ones prefixed by >>): > > >> arch/riscv/kernel/machine_kexec.c:217:7: error: call to undeclared function 'smp_crash_stop_failed'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] > WARN(smp_crash_stop_failed(), > ^ > 1 error generated. > > > vim +/smp_crash_stop_failed +217 arch/riscv/kernel/machine_kexec.c > > 193 > 194 /* > 195 * machine_kexec - Jump to the loaded kimage > 196 * > 197 * This function is called by kernel_kexec which is called by the > 198 * reboot system call when the reboot cmd is LINUX_REBOOT_CMD_KEXEC, > 199 * or by crash_kernel which is called by the kernel's arch-specific > 200 * trap handler in case of a kernel panic. It's the final stage of > 201 * the kexec process where the pre-loaded kimage is ready to be > 202 * executed. We assume at this point that all other harts are > 203 * suspended and this hart will be the new boot hart. > 204 */ > 205 void __noreturn > 206 machine_kexec(struct kimage *image) > 207 { > 208 struct kimage_arch *internal = &image->arch; > 209 unsigned long jump_addr = (unsigned long) image->start; > 210 unsigned long first_ind_entry = (unsigned long) &image->head; > 211 unsigned long this_cpu_id = __smp_processor_id(); > 212 unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id); > 213 unsigned long fdt_addr = internal->fdt_addr; > 214 void *control_code_buffer = page_address(image->control_code_page); > 215 riscv_kexec_method kexec_method = NULL; > 216 > > 217 WARN(smp_crash_stop_failed(), > > -- > 0-DAY CI Kernel Test Service > https://01.org/lkp
diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h index d3443be7eedc..e0ddbfcf7c43 100644 --- a/arch/riscv/include/asm/smp.h +++ b/arch/riscv/include/asm/smp.h @@ -50,6 +50,12 @@ void riscv_set_ipi_ops(const struct riscv_ipi_ops *ops); /* Clear IPI for current CPU */ void riscv_clear_ipi(void); +/* stop and save status for other CPUs */ +void crash_smp_send_stop(void); + +/* Check other CPUs stop or not */ +extern bool smp_crash_stop_failed(void); + /* Secondary hart entry */ asmlinkage void smp_callin(void); diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c index db41c676e5a2..34c86d337448 100644 --- a/arch/riscv/kernel/machine_kexec.c +++ b/arch/riscv/kernel/machine_kexec.c @@ -140,22 +140,6 @@ void machine_shutdown(void) #endif } -/* Override the weak function in kernel/panic.c */ -void crash_smp_send_stop(void) -{ - static int cpus_stopped; - - /* - * This function can be called twice in panic path, but obviously - * we execute this only once. - */ - if (cpus_stopped) - return; - - smp_send_stop(); - cpus_stopped = 1; -} - static void machine_kexec_mask_interrupts(void) { unsigned int i; @@ -230,6 +214,9 @@ machine_kexec(struct kimage *image) void *control_code_buffer = page_address(image->control_code_page); riscv_kexec_method kexec_method = NULL; + WARN(smp_crash_stop_failed(), + "Some CPUs may be stale, kdump will be unreliable.\n"); + if (image->type != KEXEC_TYPE_CRASH) kexec_method = control_code_buffer; else diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c index 760a64518c58..a75ad9c373cd 100644 --- a/arch/riscv/kernel/smp.c +++ b/arch/riscv/kernel/smp.c @@ -12,6 +12,7 @@ #include <linux/clockchips.h> #include <linux/interrupt.h> #include <linux/module.h> +#include <linux/kexec.h> #include <linux/profile.h> #include <linux/smp.h> #include <linux/sched.h> @@ -27,6 +28,7 @@ enum ipi_message_type { IPI_RESCHEDULE, IPI_CALL_FUNC, IPI_CPU_STOP, + IPI_CPU_CRASH_STOP, IPI_IRQ_WORK, IPI_TIMER, IPI_MAX @@ -71,6 +73,22 @@ static void ipi_stop(void) wait_for_interrupt(); } +#ifdef CONFIG_KEXEC_CORE +static atomic_t waiting_for_crash_ipi = ATOMIC_INIT(0); + +static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs) +{ + crash_save_cpu(regs, cpu); + + atomic_dec(&waiting_for_crash_ipi); + + local_irq_disable(); + + while(1) + wait_for_interrupt(); +} +#endif + static const struct riscv_ipi_ops *ipi_ops __ro_after_init; void riscv_set_ipi_ops(const struct riscv_ipi_ops *ops) @@ -124,8 +142,9 @@ void arch_irq_work_raise(void) void handle_IPI(struct pt_regs *regs) { - unsigned long *pending_ipis = &ipi_data[smp_processor_id()].bits; - unsigned long *stats = ipi_data[smp_processor_id()].stats; + unsigned int cpu = smp_processor_id(); + unsigned long *pending_ipis = &ipi_data[cpu].bits; + unsigned long *stats = ipi_data[cpu].stats; riscv_clear_ipi(); @@ -154,6 +173,13 @@ void handle_IPI(struct pt_regs *regs) ipi_stop(); } + if (ops & (1 << IPI_CPU_CRASH_STOP)) { +#ifdef CONFIG_KEXEC_CORE + ipi_cpu_crash_stop(cpu, get_irq_regs()); +#endif + unreachable(); + } + if (ops & (1 << IPI_IRQ_WORK)) { stats[IPI_IRQ_WORK]++; irq_work_run(); @@ -176,6 +202,7 @@ static const char * const ipi_names[] = { [IPI_RESCHEDULE] = "Rescheduling interrupts", [IPI_CALL_FUNC] = "Function call interrupts", [IPI_CPU_STOP] = "CPU stop interrupts", + [IPI_CPU_CRASH_STOP] = "CPU stop (for crash dump) interrupts", [IPI_IRQ_WORK] = "IRQ work interrupts", [IPI_TIMER] = "Timer broadcast interrupts", }; @@ -235,6 +262,64 @@ void smp_send_stop(void) cpumask_pr_args(cpu_online_mask)); } +#ifdef CONFIG_KEXEC_CORE +/* + * The number of CPUs online, not counting this CPU (which may not be + * fully online and so not counted in num_online_cpus()). + */ +static inline unsigned int num_other_online_cpus(void) +{ + unsigned int this_cpu_online = cpu_online(smp_processor_id()); + + return num_online_cpus() - this_cpu_online; +} + +void crash_smp_send_stop(void) +{ + static int cpus_stopped; + cpumask_t mask; + unsigned long timeout; + + /* + * This function can be called twice in panic path, but obviously + * we execute this only once. + */ + if (cpus_stopped) + return; + + cpus_stopped = 1; + + /* + * If this cpu is the only one alive at this point in time, online or + * not, there are no stop messages to be sent around, so just back out. + */ + if (num_other_online_cpus() == 0) + return; + + cpumask_copy(&mask, cpu_online_mask); + cpumask_clear_cpu(smp_processor_id(), &mask); + + atomic_set(&waiting_for_crash_ipi, num_other_online_cpus()); + + pr_crit("SMP: stopping secondary CPUs\n"); + send_ipi_mask(&mask, IPI_CPU_CRASH_STOP); + + /* Wait up to one second for other CPUs to stop */ + timeout = USEC_PER_SEC; + while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--) + udelay(1); + + if (atomic_read(&waiting_for_crash_ipi) > 0) + pr_warn("SMP: failed to stop secondary CPUs %*pbl\n", + cpumask_pr_args(&mask)); +} + +bool smp_crash_stop_failed(void) +{ + return (atomic_read(&waiting_for_crash_ipi) > 0); +} +#endif + void smp_send_reschedule(int cpu) { send_ipi_single(cpu, IPI_RESCHEDULE);