From patchwork Mon Dec 5 16:00:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xie XiuQi X-Patchwork-Id: 13064712 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C058BC4332F for ; Mon, 5 Dec 2022 15:45:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Mm/XlZ61Ce8RppggUYjZFmvgv6aosjfQlHNGCEFtZb8=; b=C29wilNxWchJFo IK3tu0ihE8N9ErIwTlQdyoH9JrArxfPm2TsdYi6Dzkk1qAvGKIFGSoDi0e86b8KmX2YTBhf25CV7y c2fUCBj/5fcweeCcDfBV0fGPs39WwlpbOhZSyBTsMr3WvgU4uYaz8l+x3hTQD/FiaowTLeERHHPLC 8CJBl/BI5UhOhj4gFzpeaKRA79ggf05HPU6BINCM2fh4nfDqcRWwNZlPCIVzDvjYOO5cVS6Ep3Q+c GNV1y9d9drOtuqPA/veEPxU3jQ3dHARrAnQH967BX5krsCrJlwhUq7ln5KFbKZgePTb7nolrkmUNE OSaZai9mH6+WC8BZxbeQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1p2Ddb-005ANx-M2; Mon, 05 Dec 2022 15:44:11 +0000 Received: from szxga01-in.huawei.com ([45.249.212.187]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1p2Dd4-00599L-Mo for linux-arm-kernel@lists.infradead.org; Mon, 05 Dec 2022 15:43:40 +0000 Received: from canpemm500001.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4NQnls6FkJzqSvC; Mon, 5 Dec 2022 23:39:21 +0800 (CST) Received: from localhost.localdomain.localdomain (10.175.113.25) by canpemm500001.china.huawei.com (7.192.104.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 5 Dec 2022 23:43:29 +0800 From: Xie XiuQi To: , , , , , , , , , , CC: , , , , Subject: [PATCH v3 3/4] arm64: ghes: handle the case when memory_failure recovery failed Date: Tue, 6 Dec 2022 00:00:42 +0800 Message-ID: <20221205160043.57465-4-xiexiuqi@huawei.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20221205160043.57465-1-xiexiuqi@huawei.com> References: <20221205160043.57465-1-xiexiuqi@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.113.25] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To canpemm500001.china.huawei.com (7.192.104.163) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20221205_074339_141373_A33451A8 X-CRM114-Status: GOOD ( 12.98 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org memory_failure() may not always recovery successfully. In synchronous external data abort case, if memory_failure() recovery failed, we must handle it. In this case, if the recovery fails, the common helper function arch_apei_do_recovery_failed() is invoked. For arm64 platform, we just send a SIGBUS. Signed-off-by: Xie XiuQi --- drivers/acpi/apei/ghes.c | 3 ++- include/linux/mm.h | 2 +- mm/memory-failure.c | 24 +++++++++++++++++------- 3 files changed, 20 insertions(+), 9 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index ba0631c54c52..ddc4da603215 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -435,7 +435,8 @@ static void ghes_kick_task_work(struct callback_head *head) estatus_node = container_of(head, struct ghes_estatus_node, task_work); if (IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) - memory_failure_queue_kick(estatus_node->task_work_cpu); + if (memory_failure_queue_kick(estatus_node->task_work_cpu)) + arch_apei_do_recovery_failed(); estatus = GHES_ESTATUS_FROM_NODE(estatus_node); node_len = GHES_ESTATUS_NODE_LEN(cper_estatus_len(estatus)); diff --git a/include/linux/mm.h b/include/linux/mm.h index 974ccca609d2..126d1395c208 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3290,7 +3290,7 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, unsigned long count, int mf_flags); extern int memory_failure(unsigned long pfn, int flags); extern void memory_failure_queue(unsigned long pfn, int flags); -extern void memory_failure_queue_kick(int cpu); +extern int memory_failure_queue_kick(int cpu); extern int unpoison_memory(unsigned long pfn); extern int sysctl_memory_failure_early_kill; extern int sysctl_memory_failure_recovery; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index bead6bccc7f2..b9398f67264a 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2240,12 +2240,12 @@ void memory_failure_queue(unsigned long pfn, int flags) } EXPORT_SYMBOL_GPL(memory_failure_queue); -static void memory_failure_work_func(struct work_struct *work) +static int __memory_failure_work_func(struct work_struct *work) { struct memory_failure_cpu *mf_cpu; struct memory_failure_entry entry = { 0, }; unsigned long proc_flags; - int gotten; + int gotten, ret = 0, result; mf_cpu = container_of(work, struct memory_failure_cpu, work); for (;;) { @@ -2254,24 +2254,34 @@ static void memory_failure_work_func(struct work_struct *work) spin_unlock_irqrestore(&mf_cpu->lock, proc_flags); if (!gotten) break; - if (entry.flags & MF_SOFT_OFFLINE) + if (entry.flags & MF_SOFT_OFFLINE) { soft_offline_page(entry.pfn, entry.flags); - else - memory_failure(entry.pfn, entry.flags); + } else { + result = memory_failure(entry.pfn, entry.flags); + if (ret == 0 && result != 0) + ret = result; + } } + + return ret; +} + +static void memory_failure_work_func(struct work_struct *work) +{ + __memory_failure_work_func(work); } /* * Process memory_failure work queued on the specified CPU. * Used to avoid return-to-userspace racing with the memory_failure workqueue. */ -void memory_failure_queue_kick(int cpu) +int memory_failure_queue_kick(int cpu) { struct memory_failure_cpu *mf_cpu; mf_cpu = &per_cpu(memory_failure_cpu, cpu); cancel_work_sync(&mf_cpu->work); - memory_failure_work_func(&mf_cpu->work); + return __memory_failure_work_func(&mf_cpu->work); } static int __init memory_failure_init(void)