From patchwork Tue Aug 16 01:27:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Guo Ren X-Patchwork-Id: 12944180 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F1E1BC25B0D for ; Tue, 16 Aug 2022 01:27:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=irRPoTU7jGy8yWJFy8YvVXFiMlyDG89vmywFSUSoonA=; b=ShkXdBDmViSA2I I+kDtbPUywySRyNEeZg0v/Xs6rDJBe/Xyby2sqzn+rwBlszlVAsKJhozBk369pcQKN5kndOiqbxr4 W02AvDFEQs/3KpWuJoEDetYf4ofomVRGBdkGrn9YB6rhOcbZHA/tgGjYQmDvCyWoieXVAFK7JWJE0 /sM9GEKAAPBguccxDRfYfAQ6T+nj0AjyJ92xtTgd1I4udaazL/U8N/UiDYcfiCi9lmsA9ShpjRetk +xzVL+LvInsm5hyVlg16yCttFp+/Nu2aqnbeUnJuRwmcqsnLXQ9iQxXEocWw517QK9z7x3nj3eozS bkgwnuTfqSa+hjOJTrBA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1oNlMX-008chI-MU; Tue, 16 Aug 2022 01:27:21 +0000 Received: from ams.source.kernel.org ([145.40.68.75]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1oNlMS-008ced-TI for linux-riscv@lists.infradead.org; Tue, 16 Aug 2022 01:27:18 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B0770B81236; Tue, 16 Aug 2022 01:27:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A748AC433D6; Tue, 16 Aug 2022 01:27:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1660613233; bh=dmLaa20TJjtO24bQO+wPMfgnol1pv4VrwShpWvKTm28=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NqTAHyjPMLvdcL95Os1dz+ijEWBu4FR7Av/+3Wdw3pUB9euNk6o3+3KA3AKeKeHff Dse4VTdhV+16cSlJaN8p7hossEELL7dUjNJSHcZMD4iozqOUB1BYHSkRk9+xLMacXJ 0EMuz2uAjPa8el5rW1TnqO/uvccwectU6KZWfhqLs4K9enMAQ/qjV+TbrIDnSRQF90 s0l4SiEdRF6h3WFCpwVlZm7kX1Lou3hdpFQOXZHjp9iEAD7UVlVf0Io1h7kH5ZnRoT A/DP5FGI8zIWHiYxBwVHFeTOs34WqsENTArh8DQPCbECNCZjmnPwt3OvZD3YpKCzTG FSX8TsteuzmaA== From: guoren@kernel.org To: xianting.tian@linux.alibaba.com, palmer@dabbelt.com, heiko@sntech.de Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, liaochang1@huawei.com, mick@ics.forth.gr, jszhang@kernel.org, Guo Ren , Guo Ren , Will Deacon , AKASHI Takahiro Subject: [PATCH 1/2] riscv: kexec: EOI active and mask all interrupts in kexec crash path Date: Mon, 15 Aug 2022 21:27:00 -0400 Message-Id: <20220816012701.561435-2-guoren@kernel.org> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220816012701.561435-1-guoren@kernel.org> References: <20220816012701.561435-1-guoren@kernel.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220815_182717_274873_DD7509B2 X-CRM114-Status: GOOD ( 15.17 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Guo Ren If a crash happens on cpu3 and all interrupts are binding on cpu0, the bad irq routing will cause a crash kernel that can't receive any irq. Because start-up of crash kernel won't clean up other hart plic to enable the context. The patch is similar to 9141a003a491 ("ARM: 7316/1: kexec: EOI active and mask all interrupts in kexec crash path") and 78fd584cdec0 ("arm64: kdump: implement machine_crash_shutdown()"), also PowerPC has the same mechanism. Signed-off-by: Guo Ren Signed-off-by: Guo Ren Cc: Will Deacon Cc: AKASHI Takahiro --- arch/riscv/kernel/machine_kexec.c | 35 +++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c index ee79e6839b86..db41c676e5a2 100644 --- a/arch/riscv/kernel/machine_kexec.c +++ b/arch/riscv/kernel/machine_kexec.c @@ -15,6 +15,8 @@ #include /* For unreachable() */ #include /* For cpu_down() */ #include +#include +#include /* * kexec_image_info - Print received image details @@ -154,6 +156,37 @@ void crash_smp_send_stop(void) cpus_stopped = 1; } +static void machine_kexec_mask_interrupts(void) +{ + unsigned int i; + struct irq_desc *desc; + + for_each_irq_desc(i, desc) { + struct irq_chip *chip; + int ret; + + chip = irq_desc_get_chip(desc); + if (!chip) + continue; + + /* + * First try to remove the active state. If this + * fails, try to EOI the interrupt. + */ + ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false); + + if (ret && irqd_irq_inprogress(&desc->irq_data) && + chip->irq_eoi) + chip->irq_eoi(&desc->irq_data); + + if (chip->irq_mask) + chip->irq_mask(&desc->irq_data); + + if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data)) + chip->irq_disable(&desc->irq_data); + } +} + /* * machine_crash_shutdown - Prepare to kexec after a kernel crash * @@ -169,6 +202,8 @@ machine_crash_shutdown(struct pt_regs *regs) crash_smp_send_stop(); crash_save_cpu(regs, smp_processor_id()); + machine_kexec_mask_interrupts(); + pr_info("Starting crashdump kernel...\n"); } From patchwork Tue Aug 16 01:27:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Guo Ren X-Patchwork-Id: 12944181 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 00A13C3F6B0 for ; Tue, 16 Aug 2022 01:27:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=quuWMMkPMwwDii2I4fj1s+vtE9wWaCM31omYL6kOYV4=; b=EnwuqZ/4T3w683 meE8Z4oBz9DrH2XxJwRVIO/rDibJ5DAhangEDAZWf1wXWl2xxHMZURzsuscpw19hYmQ5J4VAhrDxR vOPLiXCE7NDThIEzD962eTxkBpoUUOqEgbkv4SA1QwBotATwcgMbuGbwmcHJMc3BhGC3Mkt35gdaw hVQlRXxBFsnwjCsc23flGEvcQCw6G9jU/vV9e6gyqS8dtCuxpKBiNmc95wrg63AJORH+jDt2PGTJ3 8imEHJCO58nMqvMW90gOHZUMOyjfen0Tzx0ztWTBW8GdIuwFnFDXgfGnftbjfHVtfS6I5lr2S8jMq KesWQx8dawWFEPqGXxdg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1oNlMZ-008cij-TS; Tue, 16 Aug 2022 01:27:23 +0000 Received: from ams.source.kernel.org ([2604:1380:4601:e00::1]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1oNlMW-008cgC-25 for linux-riscv@lists.infradead.org; Tue, 16 Aug 2022 01:27:22 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id C343BB810A1; Tue, 16 Aug 2022 01:27:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DC7FFC433D7; Tue, 16 Aug 2022 01:27:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1660613237; bh=RnR2Mr7WZilnclmvzrxgM74fS3XQBeMDDiYYA4FNk6k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ANB+4wO0ZCNoyog0Jk++b09hO/EK1JE5jPKriN5QjMxSm2OlijxpsnQ1Wh/Futoiu sdj+iZGG68QUkAIZalgse1CnnKdi0CxpyftnhjZX/sqAxMjZvNmflvgiomJYXT8blZ w3QNVx549NHeHlQ9bFPbcboQNWL4r2TEJEVMxl6YvA234Fma6mp6y4QN9fTjfS0b1o J+rLwRS1UotufDsSWpaHVwJx+bYn6+eR7qLzxZPplNfLYeNTZ0Nb2flQ3l8lKXB2jm w2ZK4Etdp/rfB9OlcVsz3wYc23dGGOy2M+P726bCUkmMnAJSmYCAwLyakg6zC4mfe1 ob6gYVubW242A== From: guoren@kernel.org To: xianting.tian@linux.alibaba.com, palmer@dabbelt.com, heiko@sntech.de Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, liaochang1@huawei.com, mick@ics.forth.gr, jszhang@kernel.org, Guo Ren , Guo Ren , AKASHI Takahiro Subject: [PATCH 2/2] riscv: kexec: Implement crash_smp_send_stop with percpu crash_save_cpu Date: Mon, 15 Aug 2022 21:27:01 -0400 Message-Id: <20220816012701.561435-3-guoren@kernel.org> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220816012701.561435-1-guoren@kernel.org> References: <20220816012701.561435-1-guoren@kernel.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220815_182721_108619_D0D979D9 X-CRM114-Status: GOOD ( 21.25 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Guo Ren Current crash_smp_send_stop is the same as the generic one in kernel/panic. without crash_save_cpu in percpu. This patch is inspired by 78fd584cdec0 ("arm64: kdump: implement machine_crash_shutdown()") and adds the same mechanism for riscv. Signed-off-by: Guo Ren Signed-off-by: Guo Ren Cc: AKASHI Takahiro Reported-by: kernel test robot Reported-by: kernel test robot --- arch/riscv/include/asm/smp.h | 6 +++ arch/riscv/kernel/machine_kexec.c | 19 ++----- arch/riscv/kernel/smp.c | 89 ++++++++++++++++++++++++++++++- 3 files changed, 96 insertions(+), 18 deletions(-) diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h index d3443be7eedc..e0ddbfcf7c43 100644 --- a/arch/riscv/include/asm/smp.h +++ b/arch/riscv/include/asm/smp.h @@ -50,6 +50,12 @@ void riscv_set_ipi_ops(const struct riscv_ipi_ops *ops); /* Clear IPI for current CPU */ void riscv_clear_ipi(void); +/* stop and save status for other CPUs */ +void crash_smp_send_stop(void); + +/* Check other CPUs stop or not */ +extern bool smp_crash_stop_failed(void); + /* Secondary hart entry */ asmlinkage void smp_callin(void); diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c index db41c676e5a2..34c86d337448 100644 --- a/arch/riscv/kernel/machine_kexec.c +++ b/arch/riscv/kernel/machine_kexec.c @@ -140,22 +140,6 @@ void machine_shutdown(void) #endif } -/* Override the weak function in kernel/panic.c */ -void crash_smp_send_stop(void) -{ - static int cpus_stopped; - - /* - * This function can be called twice in panic path, but obviously - * we execute this only once. - */ - if (cpus_stopped) - return; - - smp_send_stop(); - cpus_stopped = 1; -} - static void machine_kexec_mask_interrupts(void) { unsigned int i; @@ -230,6 +214,9 @@ machine_kexec(struct kimage *image) void *control_code_buffer = page_address(image->control_code_page); riscv_kexec_method kexec_method = NULL; + WARN(smp_crash_stop_failed(), + "Some CPUs may be stale, kdump will be unreliable.\n"); + if (image->type != KEXEC_TYPE_CRASH) kexec_method = control_code_buffer; else diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c index 760a64518c58..a75ad9c373cd 100644 --- a/arch/riscv/kernel/smp.c +++ b/arch/riscv/kernel/smp.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -27,6 +28,7 @@ enum ipi_message_type { IPI_RESCHEDULE, IPI_CALL_FUNC, IPI_CPU_STOP, + IPI_CPU_CRASH_STOP, IPI_IRQ_WORK, IPI_TIMER, IPI_MAX @@ -71,6 +73,22 @@ static void ipi_stop(void) wait_for_interrupt(); } +#ifdef CONFIG_KEXEC_CORE +static atomic_t waiting_for_crash_ipi = ATOMIC_INIT(0); + +static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs) +{ + crash_save_cpu(regs, cpu); + + atomic_dec(&waiting_for_crash_ipi); + + local_irq_disable(); + + while(1) + wait_for_interrupt(); +} +#endif + static const struct riscv_ipi_ops *ipi_ops __ro_after_init; void riscv_set_ipi_ops(const struct riscv_ipi_ops *ops) @@ -124,8 +142,9 @@ void arch_irq_work_raise(void) void handle_IPI(struct pt_regs *regs) { - unsigned long *pending_ipis = &ipi_data[smp_processor_id()].bits; - unsigned long *stats = ipi_data[smp_processor_id()].stats; + unsigned int cpu = smp_processor_id(); + unsigned long *pending_ipis = &ipi_data[cpu].bits; + unsigned long *stats = ipi_data[cpu].stats; riscv_clear_ipi(); @@ -154,6 +173,13 @@ void handle_IPI(struct pt_regs *regs) ipi_stop(); } + if (ops & (1 << IPI_CPU_CRASH_STOP)) { +#ifdef CONFIG_KEXEC_CORE + ipi_cpu_crash_stop(cpu, get_irq_regs()); +#endif + unreachable(); + } + if (ops & (1 << IPI_IRQ_WORK)) { stats[IPI_IRQ_WORK]++; irq_work_run(); @@ -176,6 +202,7 @@ static const char * const ipi_names[] = { [IPI_RESCHEDULE] = "Rescheduling interrupts", [IPI_CALL_FUNC] = "Function call interrupts", [IPI_CPU_STOP] = "CPU stop interrupts", + [IPI_CPU_CRASH_STOP] = "CPU stop (for crash dump) interrupts", [IPI_IRQ_WORK] = "IRQ work interrupts", [IPI_TIMER] = "Timer broadcast interrupts", }; @@ -235,6 +262,64 @@ void smp_send_stop(void) cpumask_pr_args(cpu_online_mask)); } +#ifdef CONFIG_KEXEC_CORE +/* + * The number of CPUs online, not counting this CPU (which may not be + * fully online and so not counted in num_online_cpus()). + */ +static inline unsigned int num_other_online_cpus(void) +{ + unsigned int this_cpu_online = cpu_online(smp_processor_id()); + + return num_online_cpus() - this_cpu_online; +} + +void crash_smp_send_stop(void) +{ + static int cpus_stopped; + cpumask_t mask; + unsigned long timeout; + + /* + * This function can be called twice in panic path, but obviously + * we execute this only once. + */ + if (cpus_stopped) + return; + + cpus_stopped = 1; + + /* + * If this cpu is the only one alive at this point in time, online or + * not, there are no stop messages to be sent around, so just back out. + */ + if (num_other_online_cpus() == 0) + return; + + cpumask_copy(&mask, cpu_online_mask); + cpumask_clear_cpu(smp_processor_id(), &mask); + + atomic_set(&waiting_for_crash_ipi, num_other_online_cpus()); + + pr_crit("SMP: stopping secondary CPUs\n"); + send_ipi_mask(&mask, IPI_CPU_CRASH_STOP); + + /* Wait up to one second for other CPUs to stop */ + timeout = USEC_PER_SEC; + while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--) + udelay(1); + + if (atomic_read(&waiting_for_crash_ipi) > 0) + pr_warn("SMP: failed to stop secondary CPUs %*pbl\n", + cpumask_pr_args(&mask)); +} + +bool smp_crash_stop_failed(void) +{ + return (atomic_read(&waiting_for_crash_ipi) > 0); +} +#endif + void smp_send_reschedule(int cpu) { send_ipi_single(cpu, IPI_RESCHEDULE);